Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis
NASA Technical Reports Server (NTRS)
Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)
1979-01-01
The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Multivariate regression model for predicting yields of grade lumber from yellow birch sawlogs
Andrew F. Howard; Daniel A. Yaussy
1986-01-01
A multivariate regression model was developed to predict green board-foot yields for the common grades of factory lumber processed from yellow birch factory-grade logs. The model incorporates the standard log measurements of scaling diameter, length, proportion of scalable defects, and the assigned USDA Forest Service log grade. Differences in yields between band and...
Evaluation of trends in wheat yield models
NASA Technical Reports Server (NTRS)
Ferguson, M. C.
1982-01-01
Trend terms in models for wheat yield in the U.S. Great Plains for the years 1932 to 1976 are evaluated. The subset of meteorological variables yielding the largest adjusted R(2) is selected using the method of leaps and bounds. Latent root regression is used to eliminate multicollinearities, and generalized ridge regression is used to introduce bias to provide stability in the data matrix. The regression model used provides for two trends in each of two models: a dependent model in which the trend line is piece-wise continuous, and an independent model in which the trend line is discontinuous at the year of the slope change. It was found that the trend lines best describing the wheat yields consisted of combinations of increasing, decreasing, and constant trend: four combinations for the dependent model and seven for the independent model.
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Terziotti, Silvia; Capel, Paul D.; Tesoriero, Anthony J.; Hopple, Jessica A.; Kronholm, Scott C.
2018-03-07
The water quality of the Chesapeake Bay may be adversely affected by dissolved nitrate carried in groundwater discharge to streams. To estimate the concentrations, loads, and yields of nitrate from groundwater to streams for the Chesapeake Bay watershed, a regression model was developed based on measured nitrate concentrations from 156 small streams with watersheds less than 500 square miles (mi2 ) at baseflow. The regression model has three predictive variables: geologic unit, percent developed land, and percent agricultural land. Comparisons of estimated and actual values within geologic units were closely matched. The coefficient of determination (R2 ) for the model was 0.6906. The model was used to calculate baseflow nitrate concentrations at over 83,000 National Hydrography Dataset Plus Version 2 catchments and aggregated to 1,966 total 12-digit hydrologic units in the Chesapeake Bay watershed. The modeled output geospatial data layers provided estimated annual loads and yields of nitrate from groundwater into streams. The spatial distribution of annual nitrate yields from groundwater estimated by this method was compared to the total watershed yields of all sources estimated from a Chesapeake Bay SPAtially Referenced Regressions On Watershed attributes (SPARROW) water-quality model. The comparison showed similar spatial patterns. The regression model for groundwater contribution had similar but lower yields, suggesting that groundwater is an important source of nitrogen for streams in the Chesapeake Bay watershed.
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Evaluation of the CEAS model for barley yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1981-01-01
The CEAS yield model is based upon multiple regression analysis at the CRD and state levels. For the historical time series, yield is regressed on a set of variables derived from monthly mean temperature and monthly precipitation. Technological trend is represented by piecewise linear and/or quadriatic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-79) demonstrated that biases are small and performance as indicated by the root mean square errors are acceptable for intended application, however, model response for individual years particularly unusual years, is not very reliable and shows some large errors. The model is objective, adequate, timely, simple and not costly. It considers scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S
2014-07-01
In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Montesinos-López, Abelardo; Montesinos-López, Osval A; Cuevas, Jaime; Mata-López, Walter A; Burgueño, Juan; Mondal, Sushismita; Huerta, Julio; Singh, Ravi; Autrique, Enrique; González-Pérez, Lorena; Crossa, José
2017-01-01
Modern agriculture uses hyperspectral cameras that provide hundreds of reflectance data at discrete narrow bands in many environments. These bands often cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra. With the bands, vegetation indices are constructed for predicting agronomically important traits such as grain yield and biomass. However, since vegetation indices only use some wavelengths (referred to as bands), we propose using all bands simultaneously as predictor variables for the primary trait grain yield; results of several multi-environment maize (Aguate et al. in Crop Sci 57(5):1-8, 2017) and wheat (Montesinos-López et al. in Plant Methods 13(4):1-23, 2017) breeding trials indicated that using all bands produced better prediction accuracy than vegetation indices. However, until now, these prediction models have not accounted for the effects of genotype × environment (G × E) and band × environment (B × E) interactions incorporating genomic or pedigree information. In this study, we propose Bayesian functional regression models that take into account all available bands, genomic or pedigree information, the main effects of lines and environments, as well as G × E and B × E interaction effects. The data set used is comprised of 976 wheat lines evaluated for grain yield in three environments (Drought, Irrigated and Reduced Irrigation). The reflectance data were measured in 250 discrete narrow bands ranging from 392 to 851 nm (nm). The proposed Bayesian functional regression models were implemented using two types of basis: B-splines and Fourier. Results of the proposed Bayesian functional regression models, including all the wavelengths for predicting grain yield, were compared with results from conventional models with and without bands. We observed that the models with B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction accuracy. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated is 21 (number of basis), rather than estimating the 250 regression coefficients for all bands. In this study adding pedigree or genomic information did not increase prediction accuracy.
Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G
2009-09-01
The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
Growth and yield in Eucalyptus globulus
James A. Rinehart; Richard B. Standiford
1983-01-01
A study of the major Eucalyptus globulus stands throughout California conducted by Woodbridge Metcalf in 1924 provides a complete and accurate data set for generating variable site-density yield models. Two models were developed using linear regression techniques. Model I depicts a linear relationship between age and yield best used for stands between five and fifteen...
Estimating the Depth of the Navy Recruiting Market
2016-09-01
recommend that NRC make use of the Poisson regression model in order to determine high-yield ZIP codes for market depth. 14. SUBJECT...recommend that NRC make use of the Poisson regression model in order to determine high-yield ZIP codes for market depth. vi THIS PAGE INTENTIONALLY LEFT...DEPTH OF THE NAVY RECRUITING MARKET by Emilie M. Monaghan September 2016 Thesis Advisor: Lyn R. Whitaker Second Reader: Jonathan K. Alt
2018-01-01
Objective The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. Methods A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. Results All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. Conclusion These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins. PMID:28823122
Ben Zaabza, Hafedh; Ben Gara, Abderrahmen; Rekik, Boulbaba
2018-05-01
The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins.
Simple agrometeorological models for estimating Guineagrass yield in Southeast Brazil.
Pezzopane, José Ricardo Macedo; da Cruz, Pedro Gomes; Santos, Patricia Menezes; Bosi, Cristiam; de Araujo, Leandro Coelho
2014-09-01
The objective of this work was to develop and evaluate agrometeorological models to simulate the production of Guineagrass. For this purpose, we used forage yield from 54 growing periods between December 2004-January 2007 and April 2010-March 2012 in irrigated and non-irrigated pastures in São Carlos, São Paulo state, Brazil (latitude 21°57'42″ S, longitude 47°50'28″ W and altitude 860 m). Initially we performed linear regressions between the agrometeorological variables and the average dry matter accumulation rate for irrigated conditions. Then we determined the effect of soil water availability on the relative forage yield considering irrigated and non-irrigated pastures, by means of segmented linear regression among water balance and relative production variables (dry matter accumulation rates with and without irrigation). The models generated were evaluated with independent data related to 21 growing periods without irrigation in the same location, from eight growing periods in 2000 and 13 growing periods between December 2004-January 2007 and April 2010-March 2012. The results obtained show the satisfactory predictive capacity of the agrometeorological models under irrigated conditions based on univariate regression (mean temperature, minimum temperature and potential evapotranspiration or degreedays) or multivariate regression. The response of irrigation on production was well correlated with the climatological water balance variables (ratio between actual and potential evapotranspiration or between actual and maximum soil water storage). The models that performed best for estimating Guineagrass yield without irrigation were based on minimum temperature corrected by relative soil water storage, determined by the ratio between the actual soil water storage and the soil water holding capacity.irrigation in the same location, in 2000, 2010 and 2011. The results obtained show the satisfactory predictive capacity of the agrometeorological models under irrigated conditions based on univariate regression (mean temperature, potential evapotranspiration or degree-days) or multivariate regression. The response of irrigation on production was well correlated with the climatological water balance variables (ratio between actual and potential evapotranspiration or between actual and maximum soil water storage). The models that performed best for estimating Guineagrass yield without irrigation were based on degree-days corrected by the water deficit factor.
Brügemann, K; Gernand, E; von Borstel, U U; König, S
2011-08-01
Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Remotely sensed rice yield prediction using multi-temporal NDVI data derived from NOAA's-AVHRR.
Huang, Jingfeng; Wang, Xiuzhen; Li, Xinxing; Tian, Hanqin; Pan, Zhuokun
2013-01-01
Grain-yield prediction using remotely sensed data have been intensively studied in wheat and maize, but such information is limited in rice, barley, oats and soybeans. The present study proposes a new framework for rice-yield prediction, which eliminates the influence of the technology development, fertilizer application, and management improvement and can be used for the development and implementation of provincial rice-yield predictions. The technique requires the collection of remotely sensed data over an adequate time frame and a corresponding record of the region's crop yields. Longer normalized-difference-vegetation-index (NDVI) time series are preferable to shorter ones for the purposes of rice-yield prediction because the well-contrasted seasons in a longer time series provide the opportunity to build regression models with a wide application range. A regression analysis of the yield versus the year indicated an annual gain in the rice yield of 50 to 128 kg ha(-1). Stepwise regression models for the remotely sensed rice-yield predictions have been developed for five typical rice-growing provinces in China. The prediction models for the remotely sensed rice yield indicated that the influences of the NDVIs on the rice yield were always positive. The association between the predicted and observed rice yields was highly significant without obvious outliers from 1982 to 2004. Independent validation found that the overall relative error is approximately 5.82%, and a majority of the relative errors were less than 5% in 2005 and 2006, depending on the study area. The proposed models can be used in an operational context to predict rice yields at the provincial level in China. The methodologies described in the present paper can be applied to any crop for which a sufficient time series of NDVI data and the corresponding historical yield information are available, as long as the historical yield increases significantly.
Remotely Sensed Rice Yield Prediction Using Multi-Temporal NDVI Data Derived from NOAA's-AVHRR
Huang, Jingfeng; Wang, Xiuzhen; Li, Xinxing; Tian, Hanqin; Pan, Zhuokun
2013-01-01
Grain-yield prediction using remotely sensed data have been intensively studied in wheat and maize, but such information is limited in rice, barley, oats and soybeans. The present study proposes a new framework for rice-yield prediction, which eliminates the influence of the technology development, fertilizer application, and management improvement and can be used for the development and implementation of provincial rice-yield predictions. The technique requires the collection of remotely sensed data over an adequate time frame and a corresponding record of the region's crop yields. Longer normalized-difference-vegetation-index (NDVI) time series are preferable to shorter ones for the purposes of rice-yield prediction because the well-contrasted seasons in a longer time series provide the opportunity to build regression models with a wide application range. A regression analysis of the yield versus the year indicated an annual gain in the rice yield of 50 to 128 kg ha−1. Stepwise regression models for the remotely sensed rice-yield predictions have been developed for five typical rice-growing provinces in China. The prediction models for the remotely sensed rice yield indicated that the influences of the NDVIs on the rice yield were always positive. The association between the predicted and observed rice yields was highly significant without obvious outliers from 1982 to 2004. Independent validation found that the overall relative error is approximately 5.82%, and a majority of the relative errors were less than 5% in 2005 and 2006, depending on the study area. The proposed models can be used in an operational context to predict rice yields at the provincial level in China. The methodologies described in the present paper can be applied to any crop for which a sufficient time series of NDVI data and the corresponding historical yield information are available, as long as the historical yield increases significantly. PMID:23967112
Brazil soybean yield covariance model
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate soybean yields for the seven soybean-growing states of Brazil. The meteorological data of these seven states were pooled and the years 1975 to 1980 were used to model since there was no technological trend in the yields during these years. Predictor variables were derived from monthly total precipitation and monthly average temperature.
NASA Astrophysics Data System (ADS)
O'Connor, J. E.; Wise, D. R.; Mangano, J.; Jones, K.
2015-12-01
Empirical analyses of suspended sediment and bedload transport gives estimates of sediment flux for western Oregon and northwestern California. The estimates of both bedload and suspended load are from regression models relating measured annual sediment yield to geologic, physiographic, and climatic properties of contributing basins. The best models include generalized geology and either slope or precipitation. The best-fit suspended-sediment model is based on basin geology, precipitation, and area of recent wildfire. It explains 65% of the variance for 68 suspended sediment measurement sites within the model area. Predicted suspended sediment yields range from no yield from the High Cascades geologic province to 200 tonnes/ km2-yr in the northern Oregon Coast Range and 1000 tonnes/km2-yr in recently burned areas of the northern Klamath terrain. Bed-material yield is similarly estimated from a regression model based on 22 sites of measured bed-material transport, mostly from reservoir accumulation analyses but also from several bedload measurement programs. The resulting best-fit regression is based on basin slope and the presence/absence of the Klamath geologic terrane. For the Klamath terrane, bed-material yield is twice that of the other geologic provinces. This model explains more than 80% of the variance of the better-quality measurements. Predicted bed-material yields range up to 350 tonnes/ km2-yr in steep areas of the Klamath terrane. Applying these regressions to small individual watersheds (mean size; 66 km2 for bed-material; 3 km2 for suspended sediment) and cumulating totals down the hydrologic network (but also decreasing the bed-material flux by experimentally determined attrition rates) gives spatially explicit estimates of both bed-material and suspended sediment flux. This enables assessment of several management issues, including the effects of dams on bedload transport, instream gravel mining, habitat formation processes, and water-quality. The combined fluxes can also be compared to long-term rock uplift and cosmogenically determined landscape erosion rates.
Solving large test-day models by iteration on data and preconditioned conjugate gradient.
Lidauer, M; Strandén, I; Mäntysaari, E A; Pösö, J; Kettunen, A
1999-12-01
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a single-trait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model ¿random regression test-day model¿ required 122 ¿305¿ rounds of iteration to converge with the reference algorithm, but only 88 ¿149¿ were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Modeling maximum daily temperature using a varying coefficient regression model
Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith
2014-01-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
Food Crops Response to Climate Change
NASA Astrophysics Data System (ADS)
Butler, E.; Huybers, P.
2009-12-01
Projections of future climate show a warming world and heterogeneous changes in precipitation. Generally, warming temperatures indicate a decrease in crop yields where they are currently grown. However, warmer climate will also open up new areas at high latitudes for crop production. Thus, there is a question whether the warmer climate with decreased yields but potentially increased growing area will produce a net increase or decrease of overall food crop production. We explore this question through a multiple linear regression model linking temperature and precipitation to crop yield. Prior studies have emphasised temporal regression which indicate uniformly decreased yields, but neglect the potentially increased area opened up for crop production. This study provides a compliment to the prior work by exploring this spatial variation. We explore this subject with a multiple linear regression model from temperature, precipitation and crop yield data over the United States. The United States was chosen as the training region for the model because there are good crop data available over the same time frame as climate data and presumably the yield from crops in the United States is optimized with respect to potential yield. We study corn, soybeans, sorghum, hard red winter wheat and soft red winter wheat using monthly averages of temperature and precipitation from NCEP reanalysis and yearly yield data from the National Agriculture Statistics Service for 1948-2008. The use of monthly averaged temperature and precipitation, which neglect extreme events that can have a significant impact on crops limits this study as does the exclusive use of United States agricultural data. The GFDL 2.1 model under a 720ppm CO2 scenario provides temperature and precipitation fields for 2040-2100 which are used to explore how the spatial regions available for crop production will change under these new conditions.
Berry, D P; Buckley, F; Dillon, P; Evans, R D; Rath, M; Veerkamp, R F
2003-11-01
Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields from 8725 multiparous Holstein-Friesian cows. A cubic random regression was sufficient to model the changing genetic variances for BCS, BW, and milk across different days in milk. The genetic correlations between BCS and fertility changed little over the lactation; genetic correlations between BCS and interval to first service and between BCS and pregnancy rate to first service varied from -0.47 to -0.31, and from 0.15 to 0.38, respectively. This suggests that maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in midlactation when the genetic variance for BCS is largest. Selection for increased BW resulted in shorter intervals to first service, but more services and poorer pregnancy rates; genetic correlations between BW and pregnancy rate to first service varied from -0.52 to -0.45. Genetic selection for higher lactation milk yield alone through selection on increased milk yield in early lactation is likely to have a more deleterious effect on genetic merit for fertility than selection on higher milk yield in late lactation.
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
Factor regression for interpreting genotype-environment interaction in bread-wheat trials.
Baril, C P
1992-05-01
The French INRA wheat (Triticum aestivum L. em Thell.) breeding program is based on multilocation trials to produce high-yielding, adapted lines for a wide range of environments. Differential genotypic responses to variable environment conditions limit the accuracy of yield estimations. Factor regression was used to partition the genotype-environment (GE) interaction into four biologically interpretable terms. Yield data were analyzed from 34 wheat genotypes grown in four environments using 12 auxiliary agronomic traits as genotypic and environmental covariates. Most of the GE interaction (91%) was explained by the combination of only three traits: 1,000-kernel weight, lodging susceptibility and spike length. These traits are easily measured in breeding programs, therefore factor regression model can provide a convenient and useful prediction method of yield.
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood
NASA Astrophysics Data System (ADS)
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P < .001). Means of software machine-derived values differed significantly from actual PLT yield, 4.72 × 10 11 vs.6.12 × 10 11 , respectively, (P < .001). The following equation was developed to adjust these values: actual PLT yield= 0.221 + (1.254 × theoretical platelet yield). ROC curve model showed an optimal apheresis device software prediction cut-off of 4.65 × 10 11 to obtain a DP, with a sensitivity of 82.2%, specificity of 93.3%, and an area under the curve (AUC) of 0.909. Trima Accel v6.0 software consistently underestimated PLT yields. Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Brazil wheat yield covariance model
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate wheat yields for the wheat growing states of Rio Grande do Sul, Parana, and Santa Catarina in Brazil. The meteorological data of these three states were pooled and the years 1972 to 1979 were used to develop the model since there was no technological trend in the yields during these years. Predictor variables were derived from monthly total precipitation, average monthly mean temperature, and average monthly maximum temperature.
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate soybean yields for the country of Argentina. A meteorological data set was obtained for the country by averaging data for stations within the soybean growing area. Predictor variables for the model were derived from monthly total precipitation and monthly average temperature. A trend variable was included for the years 1969 to 1978 since an increasing trend in yields due to technology was observed between these years.
Geodesic least squares regression on information manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be
We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply thismore » to scaling laws in magnetic confinement fusion.« less
Multivariate regression model for predicting lumber grade volumes of northern red oak sawlogs
Daniel A. Yaussy; Robert L. Brisbin
1983-01-01
A multivariate regression model was developed to predict green board-foot yields for the seven common factory lumber grades processed from northern red oak (Quercus rubra L.) factory grade logs. The model uses the standard log measurements of grade, scaling diameter, length, and percent defect. It was validated with an independent data set. The model...
Yield estimation of sugarcane based on agrometeorological-spectral models
NASA Technical Reports Server (NTRS)
Rudorff, Bernardo Friedrich Theodor; Batista, Getulio Teixeira
1990-01-01
This work has the objective to assess the performance of a yield estimation model for sugarcane (Succharum officinarum). The model uses orbital gathered spectral data along with yield estimated from an agrometeorological model. The test site includes the sugarcane plantations of the Barra Grande Plant located in Lencois Paulista municipality in Sao Paulo State. Production data of four crop years were analyzed. Yield data observed in the first crop year (1983/84) were regressed against spectral and agrometeorological data of that same year. This provided the model to predict the yield for the following crop year i.e., 1984/85. The model to predict the yield of subsequent years (up to 1987/88) were developed similarly, incorporating all previous years data. The yield estimations obtained from these models explained 69, 54, and 50 percent of the yield variation in the 1984/85, 1985/86, and 1986/87 crop years, respectively. The accuracy of yield estimations based on spectral data only (vegetation index model) and on agrometeorological data only (agrometeorological model) were also investigated.
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
Five models based on multiple regression were developed to estimate wheat yields for the five wheat growing provinces of Argentina. Meteorological data sets were obtained for each province by averaging data for stations within each province. Predictor variables for the models were derived from monthly total precipitation, average monthly mean temperature, and average monthly maximum temperature. Buenos Aires was the only province for which a trend variable was included because of increasing trend in yield due to technology from 1950 to 1963.
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate corn yields for the country of Argentina. A meteorological data set was obtained for the country by averaging data for stations within the corn-growing area. Predictor variables for the model were derived from monthly total precipitation, average monthly mean temperature, and average monthly maximum temperature. A trend variable was included for the years 1965 to 1980 since an increasing trend in yields due to technology was observed between these years.
Salience Assignment for Multiple-Instance Data and Its Application to Crop Yield Prediction
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran
2010-01-01
An algorithm was developed to generate crop yield predictions from orbital remote sensing observations, by analyzing thousands of pixels per county and the associated historical crop yield data for those counties. The algorithm determines which pixels contain which crop. Since each known yield value is associated with thousands of individual pixels, this is a multiple instance learning problem. Because individual crop growth is related to the resulting yield, this relationship has been leveraged to identify pixels that are individually related to corn, wheat, cotton, and soybean yield. Those that have the strongest relationship to a given crop s yield values are most likely to contain fields with that crop. Remote sensing time series data (a new observation every 8 days) was examined for each pixel, which contains information for that pixel s growth curve, peak greenness, and other relevant features. An alternating-projection (AP) technique was used to first estimate the "salience" of each pixel, with respect to the given target (crop yield), and then those estimates were used to build a regression model that relates input data (remote sensing observations) to the target. This is achieved by constructing an exemplar for each crop in each county that is a weighted average of all the pixels within the county; the pixels are weighted according to the salience values. The new regression model estimate then informs the next estimate of the salience values. By iterating between these two steps, the algorithm converges to a stable estimate of both the salience of each pixel and the regression model. The salience values indicate which pixels are most relevant to each crop under consideration.
Sun, Jin; Rutkoski, Jessica E; Poland, Jesse A; Crossa, José; Jannink, Jean-Luc; Sorrells, Mark E
2017-07-01
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat ( L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. Copyright © 2017 Crop Science Society of America.
Crop status evaluations and yield predictions
NASA Technical Reports Server (NTRS)
Haun, J. R.
1975-01-01
A model was developed for predicting the day 50 percent of the wheat crop is planted in North Dakota. This model incorporates location as an independent variable. The Julian date when 50 percent of the crop was planted for the nine divisions of North Dakota for seven years was regressed on the 49 variables through the step-down multiple regression procedure. This procedure begins with all of the independent variables and sequentially removes variables that are below a predetermined level of significance after each step. The prediction equation was tested on daily data. The accuracy of the model is considered satisfactory for finding the historic dates on which to initiate yield prediction model. Growth prediction models were also developed for spring wheat.
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.
Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W
1992-03-01
The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.
NASA Astrophysics Data System (ADS)
Yan, B.; Fang, N. F.; Zhang, P. C.; Shi, Z. H.
2013-03-01
SummaryUnderstanding how changes in individual land use types influence the dynamics of streamflow and sediment yield would greatly improve the predictability of the hydrological consequences of land use changes and could thus help stakeholders to make better decisions. Multivariate statistics are commonly used to compare individual land use types to control the dynamics of streamflow or sediment yields. However, one issue with the use of conventional statistical methods to address relationships between land use types and streamflow or sediment yield is multicollinearity. In this study, an integrated approach involving hydrological modelling and partial least squares regression (PLSR) was used to quantify the contributions of changes in individual land use types to changes in streamflow and sediment yield. In a case study, hydrological modelling was conducted using land use maps from four time periods (1978, 1987, 1999, and 2007) for the Upper Du watershed (8973 km2) in China using the Soil and Water Assessment Tool (SWAT). Changes in streamflow and sediment yield across the two simulations conducted using the land use maps from 2007 to 1978 were found to be related to land use changes according to a PLSR, which was used to quantify the effect of this influence at the sub-basin scale. The major land use changes that affected streamflow in the studied catchment areas were related to changes in the farmland, forest and urban areas between 1978 and 2007; the corresponding regression coefficients were 0.232, -0.147 and 1.256, respectively, and the Variable Influence on Projection (VIP) was greater than 1. The dominant first-order factors affecting the changes in sediment yield in our study were: farmland (the VIP and regression coefficient were 1.762 and 14.343, respectively) and forest (the VIP and regression coefficient were 1.517 and -7.746, respectively). The PLSR methodology presented in this paper is beneficial and novel, as it partially eliminates the co-dependency of the variables and facilitates a more unbiased view of the contribution of the changes in individual land use types to changes in streamflow and sediment yield. This practicable and simple approach could be applied to a variety of other watersheds for which time-sequenced digital land use maps are available.
Comparison of random regression test-day models for Polish Black and White cattle.
Strabel, T; Szyda, J; Ptak, E; Jamrozik, J
2005-10-01
Test-day milk yields of first-lactation Black and White cows were used to select the model for routine genetic evaluation of dairy cattle in Poland. The population of Polish Black and White cows is characterized by small herd size, low level of production, and relatively early peak of lactation. Several random regression models for first-lactation milk yield were initially compared using the "percentage of squared bias" criterion and the correlations between true and predicted breeding values. Models with random herd-test-date effects, fixed age-season and herd-year curves, and random additive genetic and permanent environmental curves (Legendre polynomials of different orders were used for all regressions) were chosen for further studies. Additional comparisons included analyses of the residuals and shapes of variance curves in days in milk. The low production level and early peak of lactation of the breed required the use of Legendre polynomials of order 5 to describe age-season lactation curves. For the other curves, Legendre polynomials of order 3 satisfactorily described daily milk yield variation. Fitting third-order polynomials for the permanent environmental effect made it possible to adequately account for heterogeneous residual variance at different stages of lactation.
Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan
2016-12-01
The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.
Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan
2016-01-01
The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192
NASA Astrophysics Data System (ADS)
Jayakumar, M.; Rajavel, M.; Surendran, U.
2016-12-01
A study on the variability of coffee yield of both Coffea arabica and Coffea canephora as influenced by climate parameters (rainfall (RF), maximum temperature (Tmax), minimum temperature (Tmin), and mean relative humidity (RH)) was undertaken at Regional Coffee Research Station, Chundale, Wayanad, Kerala State, India. The result on the coffee yield data of 30 years (1980 to 2009) revealed that the yield of coffee is fluctuating with the variations in climatic parameters. Among the species, productivity was higher for C. canephora coffee than C. arabica in most of the years. Maximum yield of C. canephora (2040 kg ha-1) was recorded in 2003-2004 and there was declining trend of yield noticed in the recent years. Similarly, the maximum yield of C. arabica (1745 kg ha-1) was recorded in 1988-1989 and decreased yield was noticed in the subsequent years till 1997-1998 due to year to year variability in climate. The highest correlation coefficient was found between the yield of C. arabica coffee and maximum temperature during January (0.7) and between C. arabica coffee yield and RH during July (0.4). Yield of C. canephora coffee had highest correlation with maximum temperature, RH and rainfall during February. Statistical regression model between selected climatic parameters and yield of C. arabica and C. canephora coffee was developed to forecast the yield of coffee in Wayanad district in Kerala. The model was validated for years 2010, 2011, and 2012 with the coffee yield data obtained during the years and the prediction was found to be good.
A spectral-spatial-dynamic hierarchical Bayesian (SSD-HB) model for estimating soybean yield
NASA Astrophysics Data System (ADS)
Kazama, Yoriko; Kujirai, Toshihiro
2014-10-01
A method called a "spectral-spatial-dynamic hierarchical-Bayesian (SSD-HB) model," which can deal with many parameters (such as spectral and weather information all together) by reducing the occurrence of multicollinearity, is proposed. Experiments conducted on soybean yields in Brazil fields with a RapidEye satellite image indicate that the proposed SSD-HB model can predict soybean yield with a higher degree of accuracy than other estimation methods commonly used in remote-sensing applications. In the case of the SSD-HB model, the mean absolute error between estimated yield of the target area and actual yield is 0.28 t/ha, compared to 0.34 t/ha when conventional PLS regression was applied, showing the potential effectiveness of the proposed model.
USDA-ARS?s Scientific Manuscript database
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect s...
Comparison of methods for the analysis of relatively simple mediation models.
Rijnhart, Judith J M; Twisk, Jos W R; Chinapaw, Mai J M; de Boer, Michiel R; Heymans, Martijn W
2017-09-01
Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework. OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction. Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.
Alexeeff, Stacey E.; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A.
2016-01-01
Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1km x 1km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R2 yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with greater than 0.9 out-of-sample R2 yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the standard errors. Land use regression models performed better in chronic effects simulations. These results can help researchers when interpreting health effect estimates in these types of studies. PMID:24896768
Ran, Tao; Liu, Yong; Li, Hengzhi; Tang, Shaoxun; He, Zhixiong; Munteanu, Cristian R; González-Díaz, Humberto; Tan, Zhiliang; Zhou, Chuanshe
2016-07-27
The management of ruminant growth yield has economic importance. The current work presents a study of the spatiotemporal dynamic expression of Ghrelin and GHR at mRNA levels throughout the gastrointestinal tract (GIT) of kid goats under housing and grazing systems. The experiments show that the feeding system and age affected the expression of either Ghrelin or GHR with different mechanisms. Furthermore, the experimental data are used to build new Machine Learning models based on the Perturbation Theory, which can predict the effects of perturbations of Ghrelin and GHR mRNA expression on the growth yield. The models consider eight longitudinal GIT segments (rumen, abomasum, duodenum, jejunum, ileum, cecum, colon and rectum), seven time points (0, 7, 14, 28, 42, 56 and 70 d) and two feeding systems (Supplemental and Grazing feeding) as perturbations from the expected values of the growth yield. The best regression model was obtained using Random Forest, with the coefficient of determination R(2) of 0.781 for the test subset. The current results indicate that the non-linear regression model can accurately predict the growth yield and the key nodes during gastrointestinal development, which is helpful to optimize the feeding management strategies in ruminant production system.
Ran, Tao; Liu, Yong; Li, Hengzhi; Tang, Shaoxun; He, Zhixiong; Munteanu, Cristian R.; González-Díaz, Humberto; Tan, Zhiliang; Zhou, Chuanshe
2016-01-01
The management of ruminant growth yield has economic importance. The current work presents a study of the spatiotemporal dynamic expression of Ghrelin and GHR at mRNA levels throughout the gastrointestinal tract (GIT) of kid goats under housing and grazing systems. The experiments show that the feeding system and age affected the expression of either Ghrelin or GHR with different mechanisms. Furthermore, the experimental data are used to build new Machine Learning models based on the Perturbation Theory, which can predict the effects of perturbations of Ghrelin and GHR mRNA expression on the growth yield. The models consider eight longitudinal GIT segments (rumen, abomasum, duodenum, jejunum, ileum, cecum, colon and rectum), seven time points (0, 7, 14, 28, 42, 56 and 70 d) and two feeding systems (Supplemental and Grazing feeding) as perturbations from the expected values of the growth yield. The best regression model was obtained using Random Forest, with the coefficient of determination R2 of 0.781 for the test subset. The current results indicate that the non-linear regression model can accurately predict the growth yield and the key nodes during gastrointestinal development, which is helpful to optimize the feeding management strategies in ruminant production system. PMID:27460882
Pereira, R J; Bignardi, A B; El Faro, L; Verneque, R S; Vercesi Filho, A E; Albuquerque, L G
2013-01-01
Studies investigating the use of random regression models for genetic evaluation of milk production in Zebu cattle are scarce. In this study, 59,744 test-day milk yield records from 7,810 first lactations of purebred dairy Gyr (Bos indicus) and crossbred (dairy Gyr × Holstein) cows were used to compare random regression models in which additive genetic and permanent environmental effects were modeled using orthogonal Legendre polynomials or linear spline functions. Residual variances were modeled considering 1, 5, or 10 classes of days in milk. Five classes fitted the changes in residual variances over the lactation adequately and were used for model comparison. The model that fitted linear spline functions with 6 knots provided the lowest sum of residual variances across lactation. On the other hand, according to the deviance information criterion (DIC) and bayesian information criterion (BIC), a model using third-order and fourth-order Legendre polynomials for additive genetic and permanent environmental effects, respectively, provided the best fit. However, the high rank correlation (0.998) between this model and that applying third-order Legendre polynomials for additive genetic and permanent environmental effects, indicates that, in practice, the same bulls would be selected by both models. The last model, which is less parameterized, is a parsimonious option for fitting dairy Gyr breed test-day milk yield records. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Iavarone, Salvatore; Smith, Sean T.; Smith, Philip J.
Oxy-coal combustion is an emerging low-cost “clean coal” technology for emissions reduction and Carbon Capture and Sequestration (CCS). The use of Computational Fluid Dynamics (CFD) tools is crucial for the development of cost-effective oxy-fuel technologies and the minimization of environmental concerns at industrial scale. The coupling of detailed chemistry models and CFD simulations is still challenging, especially for large-scale plants, because of the high computational efforts required. The development of scale-bridging models is therefore necessary, to find a good compromise between computational efforts and the physical-chemical modeling precision. This paper presents a procedure for scale-bridging modeling of coal devolatilization, inmore » the presence of experimental error, that puts emphasis on the thermodynamic aspect of devolatilization, namely the final volatile yield of coal, rather than kinetics. The procedure consists of an engineering approach based on dataset consistency and Bayesian methodology including Gaussian-Process Regression (GPR). Experimental data from devolatilization tests carried out in an oxy-coal entrained flow reactor were considered and CFD simulations of the reactor were performed. Jointly evaluating experiments and simulations, a novel yield model was validated against the data via consistency analysis. In parallel, a Gaussian-Process Regression was performed, to improve the understanding of the uncertainty associated to the devolatilization, based on the experimental measurements. Potential model forms that could predict yield during devolatilization were obtained. The set of model forms obtained via GPR includes the yield model that was proven to be consistent with the data. Finally, the overall procedure has resulted in a novel yield model for coal devolatilization and in a valuable evaluation of uncertainty in the data, in the model form, and in the model parameters.« less
Iavarone, Salvatore; Smith, Sean T.; Smith, Philip J.; ...
2017-06-03
Oxy-coal combustion is an emerging low-cost “clean coal” technology for emissions reduction and Carbon Capture and Sequestration (CCS). The use of Computational Fluid Dynamics (CFD) tools is crucial for the development of cost-effective oxy-fuel technologies and the minimization of environmental concerns at industrial scale. The coupling of detailed chemistry models and CFD simulations is still challenging, especially for large-scale plants, because of the high computational efforts required. The development of scale-bridging models is therefore necessary, to find a good compromise between computational efforts and the physical-chemical modeling precision. This paper presents a procedure for scale-bridging modeling of coal devolatilization, inmore » the presence of experimental error, that puts emphasis on the thermodynamic aspect of devolatilization, namely the final volatile yield of coal, rather than kinetics. The procedure consists of an engineering approach based on dataset consistency and Bayesian methodology including Gaussian-Process Regression (GPR). Experimental data from devolatilization tests carried out in an oxy-coal entrained flow reactor were considered and CFD simulations of the reactor were performed. Jointly evaluating experiments and simulations, a novel yield model was validated against the data via consistency analysis. In parallel, a Gaussian-Process Regression was performed, to improve the understanding of the uncertainty associated to the devolatilization, based on the experimental measurements. Potential model forms that could predict yield during devolatilization were obtained. The set of model forms obtained via GPR includes the yield model that was proven to be consistent with the data. Finally, the overall procedure has resulted in a novel yield model for coal devolatilization and in a valuable evaluation of uncertainty in the data, in the model form, and in the model parameters.« less
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Veerkamp, R F; Koenen, E P; De Jong, G
2001-10-01
Twenty type classifiers scored body condition (BCS) of 91,738 first-parity cows from 601 sires and 5518 maternal grandsires. Fertility data during first lactation were extracted for 177,220 cows, of which 67,278 also had a BCS observation, and first-lactation 305-d milk, fat, and protein yields were added for 180,631 cows. Heritabilities and genetic correlations were estimated using a sire-maternal grandsire model. Heritability of BCS was 0.38. Heritabilities for fertility traits were low (0.01 to 0.07), but genetic standard deviations were substantial, 9 d for days to first service and calving interval, 0.25 for number of services, and 5% for first-service conception. Phenotypic correlations between fertility and yield or BCS were small (-0.15 to 0.20). Genetic correlations between yield and all fertility traits were unfavorable (0.37 to 0.74). Genetic correlations with BCS were between -0.4 and -0.6 for calving interval and days to first service. Random regression analysis (RR) showed that correlations changed with days in milk for BCS. Little agreement was found between variances and correlations from RR, and analysis including a single month (mo 1 to 10) of data for BCS, especially during early and late lactation. However, this was due to excluding data from the conventional analysis, rather than due to the polynomials used. RR and a conventional five-traits model where BCS in mo 1, 4, 7, and 10 was treated as a separate traits (plus yield or fertility) gave similar results. Thus a parsimonious random regression model gave more realistic estimates for the (co)variances than a series of bivariate analysis on subsets of the data for BCS. A higher genetic merit for yield has unfavorable effects on fertility, but the genetic correlation suggests that BCS (at some stages of lactation) might help to alleviate the unfavorable effect of selection for higher yield on fertility.
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Raman spectroscopy-based screening of hepatitis C and associated molecular changes
NASA Astrophysics Data System (ADS)
Bilal, Maria; Bilal, M.; Saleem, M.; Khan, Saranjam; Ullah, Rahat; Fatima, Kiran; Ahmed, M.; Hayat, Abbas; Shahzada, Shaista; Ullah Khan, Ehsan
2017-09-01
This study presents the optical screening of hepatitis C and its associated molecular changes in human blood sera using a partial least-squares regression model based on their Raman spectra. In total, 152 samples were tested through enzyme-linked immunosorbent assay for confirmation. This model utilizes minor spectral variations in the Raman spectra of the positive and control groups. Regression coefficients of this model were analyzed with reference to the variations in concentration of associated molecules in these two groups. It was found that trehalose, chitin, ammonia, and cytokines are positively correlated while lipids, beta structures of proteins, and carbohydrate-binding proteins are negatively correlated with hepatitis C. The regression vector yielded by this model is utilized to predict hepatitis C in unknown samples. This model has been evaluated by a cross-validation method, which yielded a correlation coefficient of 0.91. Moreover, 30 unknown samples were screened for hepatitis C infection using this model to test its performance. Sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve from these predictions were found to be 93.3%, 100%, 96.7%, and 1, respectively.
Human impact on sediment fluxes within the Blue Nile and Atbara River basins
NASA Astrophysics Data System (ADS)
Balthazar, Vincent; Vanacker, Veerle; Girma, Atkilt; Poesen, Jean; Golla, Semunesh
2013-01-01
A regional assessment of the spatial variability in sediment yields allows filling the gap between detailed, process-based understanding of erosion at field scale and empirical sediment flux models at global scale. In this paper, we focus on the intrabasin variability in sediment yield within the Blue Nile and Atbara basins as biophysical and anthropogenic factors are presumably acting together to accelerate soil erosion. The Blue Nile and Atbara River systems are characterized by an important spatial variability in sediment fluxes, with area-specific sediment yield (SSY) values ranging between 4 and 4935 t/km2/y. Statistical analyses show that 41% of the observed variation in SSY can be explained by remote sensing proxy data of surface vegetation cover, rainfall intensity, mean annual temperature, and human impact. The comparison of a locally adapted regression model with global predictive sediment flux models indicates that global flux models such as the ART and BQART models are less suited to capture the spatial variability in area-specific sediment yields (SSY), but they are very efficient to predict absolute sediment yields (SY). We developed a modified version of the BQART model that estimates the human influence on sediment yield based on a high resolution composite measure of local human impact (human footprint index) instead of countrywide estimates of GNP/capita. Our modified version of the BQART is able to explain 80% of the observed variation in SY for the Blue Nile and Atbara basins and thereby performs only slightly less than locally adapted regression models.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Ren, Jianqiang; Chen, Zhongxin; Tang, Huajun
2006-12-01
Taking Jining City of Shandong Province, one of the most important winter wheat production regions in Huanghuaihai Plain as an example, the winter wheat yield was estimated by using the 250 m MODIS-NDVI data smoothed by Savitzky-Golay filter. The NDVI values between 0. 20 and 0. 80 were selected, and the sum of NDVI value for each county was calculated to build its relation with winter wheat yield. By using stepwise regression method, the linear regression model between NDVI and winter wheat yield was established, with the precision validated by the ground survey data. The results showed that the relative error of predicted yield was between -3.6% and 3.9%, suggesting that the method was relatively accurate and feasible.
Green lumber grade yields from factory grade logs of three oak species
Daniel A. Yaussy
1986-01-01
Multivariate regression models were developed to predict green board foot yields for the seven common factory lumber grades processed from white, black, and chestnut oak factory grade logs. These models use the standard log measurements of grade, scaling diameter, log length, and proportion of scaling defect. Any combination of lumber grades (such as 1 Common and...
Daniel A. Yaussy
1989-01-01
Multivariate regression models were developed to predict green board-foot yields (1 board ft. = 2.360 dm 3) for the standard factory lumber grades processed from black cherry (Prunus serotina Ehrh.) and red maple (Acer rubrum L.) factory grade logs sawed at band and circular sawmills. The models use log...
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Hoffman, Haydn; Lee, Sunghoon I; Garst, Jordan H; Lu, Derek S; Li, Charles H; Nagasawa, Daniel T; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C
2015-09-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R(2)) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.452; MAD=0.0887; p=1.17 × 10(-3)). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.932; MAD=0.0283; p=5.73 × 10(-12)). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bignardi, A B; El Faro, L; Torres Júnior, R A A; Cardoso, V L; Machado, P F; Albuquerque, L G
2011-10-31
We analyzed 152,145 test-day records from 7317 first lactations of Holstein cows recorded from 1995 to 2003. Our objective was to model variations in test-day milk yield during the first lactation of Holstein cows by random regression model (RRM), using various functions in order to obtain adequate and parsimonious models for the estimation of genetic parameters. Test-day milk yields were grouped into weekly classes of days in milk, ranging from 1 to 44 weeks. The contemporary groups were defined as herd-test-day. The analyses were performed using a single-trait RRM, including the direct additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. The mean trend of milk yield was modeled with a fourth-order orthogonal Legendre polynomial. The additive genetic and permanent environmental covariance functions were estimated by random regression on two parametric functions, Ali and Schaeffer and Wilmink, and on B-spline functions of days in milk. The covariance components and the genetic parameters were estimated by the restricted maximum likelihood method. Results from RRM parametric and B-spline functions were compared to RRM on Legendre polynomials and with a multi-trait analysis, using the same data set. Heritability estimates presented similar trends during mid-lactation (13 to 31 weeks) and between week 37 and the end of lactation, for all RRM. Heritabilities obtained by multi-trait analysis were of a lower magnitude than those estimated by RRM. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. RRM using B-spline and Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data.
Yamazaki, T; Hagiya, K; Takeda, H; Osawa, T; Yamaguchi, S; Nagamine, Y
2016-08-01
Pregnancy and calving are elements indispensable for dairy production, but the daily milk yield of cows decline as pregnancy progresses, especially during the late stages. Therefore, the effect of stage of pregnancy on daily milk yield must be clarified to accurately estimate the breeding values and lifetime productivity of cows. To improve the genetic evaluation model for daily milk yield and determine the effect of the timing of pregnancy on productivity, we used a test-day model to assess the effects of stage of pregnancy on variance component estimates, daily milk yields and 305-day milk yield during the first three lactations of Holstein cows. Data were 10 646 333 test-day records for the first lactation; 8 222 661 records for the second; and 5 513 039 records for the third. The data were analyzed within each lactation by using three single-trait random regression animal models: one model that did not account for the stage of pregnancy effect and two models that did. The effect of stage of pregnancy on test-day milk yield was included in the model by applying a regression on days pregnant or fitting a separate lactation curve for each days open (days from calving to pregnancy) class (eight levels). Stage of pregnancy did not affect the heritability estimates of daily milk yield, although the additive genetic and permanent environmental variances in late lactation were decreased by accounting for the stage of pregnancy effect. The effects of days pregnant on daily milk yield during late lactation were larger in the second and third lactations than in the first lactation. The rates of reduction of the 305-day milk yield of cows that conceived fewer than 90 days after the second or third calving were significantly (P<0.05) greater than that after the first calving. Therefore, we conclude that differences between the negative effects of early pregnancy in the first, compared with later, lactations should be included when determining the optimal number of days open to maximize lifetime productivity in dairy cows.
Categorical Variables in Multiple Regression: Some Cautions.
ERIC Educational Resources Information Center
O'Grady, Kevin E.; Medoff, Deborah R.
1988-01-01
Limitations of dummy coding and nonsense coding as methods of coding categorical variables for use as predictors in multiple regression analysis are discussed. The combination of these approaches often yields estimates and tests of significance that are not intended by researchers for inclusion in their models. (SLD)
Comparison of CEAS and Williams-type models for spring wheat yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1982-01-01
The CEAS and Williams-type yield models are both based on multiple regression analysis of historical time series data at CRD level. The CEAS model develops a separate relation for each CRD; the Williams-type model pools CRD data to regional level (groups of similar CRDs). Basic variables considered in the analyses are USDA yield, monthly mean temperature, monthly precipitation, and variables derived from these. The Williams-type model also used soil texture and topographic information. Technological trend is represented in both by piecewise linear functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test of each model (1970-1979) demonstrate that the models are very similar in performance in all respects. Both models are about equally objective, adequate, timely, simple, and inexpensive. Both consider scientific knowledge on a broad scale but not in detail. Neither provides a good current measure of modeled yield reliability. The CEAS model is considered very slightly preferable for AgRISTARS applications.
NASA Astrophysics Data System (ADS)
Linard, J.; Leib, K.; Colorado Water Science Center
2010-12-01
Elevated levels of salinity and dissolved selenium can detrimentally effect the quality of water where anthropogenic and natural uses are concerned. In areas, such as the lower Gunnison Basin of western Colorado, salinity and selenium are such a concern that control projects are implemented to limit their mobilization. To prioritize the locations in which control projects are implemented, multi-parameter regression models were developed to identify subbasins in the lower Gunnison River Basin that were most likely to have elevated salinity and dissolved selenium levels. The drainage area is about 5,900 mi2 and is underlain by Cretaceous marine shale, which is the most common source of salinity and dissolved selenium. To characterize the complex hydrologic and chemical processes governing constituent mobilization, geospatial variables representing 70 different environmental characteristics were correlated to mean seasonal (irrigation and nonirrigation seasons) salinity and selenium yields estimated at 154 sampling sites. The variables generally represented characteristics of the physical basin, precipitation, soil, geology, land use, and irrigation water delivery systems. Irrigation and nonirrigation seasons were selected due to documented effects of irrigation on constituent mobilization. Following a stepwise approach, combinations of the geospatial variables were used to develop four multi-parameter regression models. These models predicted salinity and selenium yield, within a 95 percent confidence range, at individual points in the Lower Gunnison Basin for irrigation and non-irrigation seasons. The corresponding subbasins were ranked according to their potential to yield salinity and selenium and rankings were used to prioritize areas that would most benefit from control projects.
Evaluation of the Williams-type model for barley yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1981-01-01
The Williams-type yield model is based on multiple regression analysis of historial time series data at CRD level pooled to regional level (groups of similar CRDs). Basic variables considered in the analysis include USDA yield, monthly mean temperature, monthly precipitation, soil texture and topographic information, and variables derived from these. Technologic trend is represented by piecewise linear and/or quadratic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-1979) demonstrate that biases are small and performance based on root mean square appears to be acceptable for the intended AgRISTARS large area applications. The model is objective, adequate, timely, simple, and not costly. It consideres scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.
Yamazaki, Takeshi; Takeda, Hisato; Hagiya, Koichi; Yamaguchi, Satoshi; Sasaki, Osamu
2018-03-13
Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a random regression model. We analyzed test-day milk records from 85690 Holstein cows in their first lactations and 131727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. The first-order Legendre polynomials were practical covariates of random regression for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.
USDA-ARS?s Scientific Manuscript database
A steam distillation extraction kinetics experiment was conducted to estimate essential oil yield, composition, antimalarial, and antioxidant capacity of cumin (Cuminum cyminum L.) seed (fruits). Furthermore, regression models were developed to predict essential oil yield and composition for a given...
Multiple-Instance Regression with Structured Data
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Boomer, Kathleen B; Weller, Donald E; Jordan, Thomas E
2008-01-01
The Universal Soil Loss Equation (USLE) and its derivatives are widely used for identifying watersheds with a high potential for degrading stream water quality. We compared sediment yields estimated from regional application of the USLE, the automated revised RUSLE2, and five sediment delivery ratio algorithms to measured annual average sediment delivery in 78 catchments of the Chesapeake Bay watershed. We did the same comparisons for another 23 catchments monitored by the USGS. Predictions exceeded observed sediment yields by more than 100% and were highly correlated with USLE erosion predictions (Pearson r range, 0.73-0.92; p < 0.001). RUSLE2-erosion estimates were highly correlated with USLE estimates (r = 0.87; p < 001), so the method of implementing the USLE model did not change the results. In ranked comparisons between observed and predicted sediment yields, the models failed to identify catchments with higher yields (r range, -0.28-0.00; p > 0.14). In a multiple regression analysis, soil erodibility, log (stream flow), basin shape (topographic relief ratio), the square-root transformed proportion of forest, and occurrence in the Appalachian Plateau province explained 55% of the observed variance in measured suspended sediment loads, but the model performed poorly (r(2) = 0.06) at predicting loads in the 23 USGS watersheds not used in fitting the model. The use of USLE or multiple regression models to predict sediment yields is not advisable despite their present widespread application. Integrated watershed models based on the USLE may also be unsuitable for making management decisions.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Effect of warming temperatures on US wheat yields.
Tack, Jesse; Barkley, Andrew; Nalley, Lawton Lanier
2015-06-02
Climate change is expected to increase future temperatures, potentially resulting in reduced crop production in many key production regions. Research quantifying the complex relationship between weather variables and wheat yields is rapidly growing, and recent advances have used a variety of model specifications that differ in how temperature data are included in the statistical yield equation. A unique data set that combines Kansas wheat variety field trial outcomes for 1985-2013 with location-specific weather data is used to analyze the effect of weather on wheat yield using regression analysis. Our results indicate that the effect of temperature exposure varies across the September-May growing season. The largest drivers of yield loss are freezing temperatures in the Fall and extreme heat events in the Spring. We also find that the overall effect of warming on yields is negative, even after accounting for the benefits of reduced exposure to freezing temperatures. Our analysis indicates that there exists a tradeoff between average (mean) yield and ability to resist extreme heat across varieties. More-recently released varieties are less able to resist heat than older lines. Our results also indicate that warming effects would be partially offset by increased rainfall in the Spring. Finally, we find that the method used to construct measures of temperature exposure matters for both the predictive performance of the regression model and the forecasted warming impacts on yields.
Poss, J A; Russell, W B; Grieve, C M
2006-01-01
In arid irrigated regions, the proportion of crop production under deficit irrigation with poorer quality water is increasing as demand for fresh water soars and efforts to prevent saline water table development occur. Remote sensing technology to quantify salinity and water stress effects on forage yield can be an important tool to address yield loss potential when deficit irrigating with poor water quality. Two important forages, alfalfa (Medicago sativa L.) and tall wheatgrass (Agropyron elongatum L.), were grown in a volumetric lysimeter facility where rootzone salinity and water content were varied and monitored. Ground-based hyperspectral canopy reflectance in the visible and near infrared (NIR) were related to forage yields from a broad range of salinity and water stress conditions. Canopy reflectance spectra were obtained in the 350- to 1000-nm region from two viewing angles (nadir view, 45 degrees from nadir). Nadir view vegetation indices (VI) were not as strongly correlated with leaf area index changes attributed to water and salinity stress treatments for both alfalfa and wheatgrass. From a list of 71 VIs, two were selected for a multiple linear-regression model that estimated yield under varying salinity and water stress conditions. With data obtained during the second harvest of a three-harvest 100-d growing period, regression coefficients for each crop were developed and then used with the model to estimate fresh weights for preceding and succeeding harvests during the same 100-d interval. The model accounted for 72% of the variation in yields in wheatgrass and 94% in yields of alfalfa within the same salinity and water stress treatment period. The model successfully predicted yield in three out of four cases when applied to the first and third harvest yields. Correlations between indices and yield increased as canopy development progressed. Growth reductions attributed to simultaneous salinity and water stress were well characterized, but the corrections for effects of varying tissue nitrogen (N) and very low leaf area index (LAI) are necessary.
Research in the application of spectral data to crop identification and assessment, volume 2
NASA Technical Reports Server (NTRS)
Daughtry, C. S. T. (Principal Investigator); Hixson, M. M.; Bauer, M. E.
1980-01-01
The development of spectrometry crop development stage models is discussed with emphasis on models for corn and soybeans. One photothermal and four thermal meteorological models are evaluated. Spectral data were investigated as a source of information for crop yield models. Intercepted solar radiation and soil productivity are identified as factors related to yield which can be estimated from spectral data. Several techniques for machine classification of remotely sensed data for crop inventory were evaluated. Early season estimation, training procedures, the relationship of scene characteristics to classification performance, and full frame classification methods were studied. The optimal level for combining area and yield estimates of corn and soybeans is assessed utilizing current technology: digital analysis of LANDSAT MSS data on sample segments to provide area estimates and regression models to provide yield estimates.
Using within-day hive weight changes to measure environmental effects on honey bee colonies
USDA-ARS?s Scientific Manuscript database
Patterns in within-day hive weight data from two independent datasets in Arizona and California were modeled using piecewise regression, and analyzed with respect to honey bee colony behavior and landscape effects. The regression analysis yielded information on the start and finish of a colony’s dai...
Luukkonen, Carol L.; Holtschlag, David J.; Reeves, Howard W.; Hoard, Christopher J.; Fuller, Lori M.
2015-01-01
Monthly water yields from 105,829 catchments and corresponding flows in 107,691 stream segments were estimated for water years 1951–2012 in the Great Lakes Basin in the United States. Both sets of estimates were computed by using the Analysis of Flows In Networks of CHannels (AFINCH) application within the NHDPlus geospatial data framework. AFINCH provides an environment to develop constrained regression models to integrate monthly streamflow and water-use data with monthly climatic data and fixed basin characteristics data available within NHDPlus or supplied by the user. For this study, the U.S. Great Lakes Basin was partitioned into seven study areas by grouping selected hydrologic subregions and adjoining cataloguing units. This report documents the regression models and data used to estimate monthly water yields and flows in each study area. Estimates of monthly water yields and flows are presented in a Web-based mapper application. Monthly flow time series for individual stream segments can be retrieved from the Web application and used to approximate monthly flow-duration characteristics and to identify possible trends.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Harden, Stephen L.; Cuffney, Thomas F.; Terziotti, Silvia; Kolb, Katharine R.
2013-01-01
Data collected between 1997 and 2008 at 48 stream sites were used to characterize relations between watershed settings and stream nutrient yields throughout central and eastern North Carolina. The focus of the investigation was to identify environmental variables in watersheds that influence nutrient export for supporting the development and prioritization of management strategies for restoring nutrient-impaired streams. Nutrient concentration data and streamflow data compiled for the 1997 to 2008 study period were used to compute stream yields of nitrate, total nitrogen (N), and total phosphorus (P) for each study site. Compiled environmental data (including variables for land cover, hydrologic soil groups, base-flow index, streams, wastewater treatment facilities, and concentrated animal feeding operations) were used to characterize the watershed settings for the study sites. Data for the environmental variables were analyzed in combination with the stream nutrient yields to explore relations based on watershed characteristics and to evaluate whether particular variables were useful indicators of watersheds having relatively higher or lower potential for exporting nutrients. Data evaluations included an examination of median annual nutrient yields based on a watershed land-use classification scheme developed as part of the study. An initial examination of the data indicated that the highest median annual nutrient yields occurred at both agricultural and urban sites, especially for urban sites having large percentages of point-source flow contributions to the streams. The results of statistical testing identified significant differences in annual nutrient yields when sites were analyzed on the basis of watershed land-use category. When statistical differences in median annual yields were noted, the results for nitrate, total N, and total P were similar in that highly urbanized watersheds (greater than 30 percent developed land use) and (or) watersheds with greater than 10 percent point-source flow contributions to streamflow had higher yields relative to undeveloped watersheds (having less than 10 and 15 percent developed and agricultural land uses, respectively) and watersheds with relatively low agricultural land use (between 15 and 30 percent). The statistical tests further indicated that the median annual yields for total P were statistically higher for watersheds with high agricultural land use (greater than 30 percent) compared to the undeveloped watersheds and watersheds with low agricultural land use. The total P yields also were higher for watersheds with low urban land use (between 10 and 30 percent developed land) compared to the undeveloped watersheds. The study data indicate that grouping and examining stream nutrient yields based on the land-use classifications used in this report can be useful for characterizing relations between watershed settings and nutrient yields in streams located throughout central and eastern North Carolina. Compiled study data also were analyzed with four regression tree models as a means of determining which watershed environmental variables or combination of variables result in basins that are likely to have high or low nutrient yields. The regression tree analyses indicated that some of the environmental variables examined in this study were useful for predicting yields of nitrate, total N, and total P. When the median annual nutrient yields for all 48 sites were evaluated as a group (Model 1), annual point-source flow yields had the greatest influence on nitrate and total N yields observed in streams, and annual streamflow yields had the greatest influence on yields of total P. The Model 1 results indicated that watersheds with higher annual point-source flow yields had higher annual yields of nitrate and total N, and watersheds with higher annual streamflow yields had higher annual yields of total P. When sites with high point-source flows (greater than 10 percent of total streamflow) were excluded from the regression tree analyses (Models 2–4), the percentage of forested land in the watersheds was identified as the primary environmental variable influencing stream yields for both total N and total P. Models 2, 3 and 4 did not identify any watershed environmental variables that could adequately explain the observed variability in the nitrate yields among the set of sites examined by each of these models. The results for Models 2, 3, and 4 indicated that watersheds with higher percentages of forested land had lower annual total N and total P yields compared to watersheds with lower percentages of forested land, which had higher median annual total N and total P yields. Additional environmental variables determined to further influence the stream nutrient yields included median annual percentage of point-source flow contributions to the streams, variables of land cover (percentage of forested land, agricultural land, and (or) forested land plus wetlands) in the watershed and (or) in the stream buffer, and drainage area. The regression tree models can serve as a tool for relating differences in select watershed attributes to differences in stream yields of nitrate, total N, and total P, which can provide beneficial information for improving nutrient management in streams throughout North Carolina and for reducing nutrient loads to coastal waters.
Similar Estimates of Temperature Impacts on Global Wheat Yield by Three Independent Methods
NASA Technical Reports Server (NTRS)
Liu, Bing; Asseng, Senthold; Muller, Christoph; Ewart, Frank; Elliott, Joshua; Lobell, David B.; Martre, Pierre; Ruane, Alex C.; Wallach, Daniel; Jones, James W.;
2016-01-01
The potential impact of global temperature change on global crop yield has recently been assessed with different methods. Here we show that grid-based and point-based simulations and statistical regressions (from historic records), without deliberate adaptation or CO2 fertilization effects, produce similar estimates of temperature impact on wheat yields at global and national scales. With a 1 C global temperature increase, global wheat yield is projected to decline between 4.1% and 6.4%. Projected relative temperature impacts from different methods were similar for major wheat-producing countries China, India, USA and France, but less so for Russia. Point-based and grid-based simulations, and to some extent the statistical regressions, were consistent in projecting that warmer regions are likely to suffer more yield loss with increasing temperature than cooler regions. By forming a multi-method ensemble, it was possible to quantify 'method uncertainty' in addition to model uncertainty. This significantly improves confidence in estimates of climate impacts on global food security.
Similar estimates of temperature impacts on global wheat yield by three independent methods
NASA Astrophysics Data System (ADS)
Liu, Bing; Asseng, Senthold; Müller, Christoph; Ewert, Frank; Elliott, Joshua; Lobell, David B.; Martre, Pierre; Ruane, Alex C.; Wallach, Daniel; Jones, James W.; Rosenzweig, Cynthia; Aggarwal, Pramod K.; Alderman, Phillip D.; Anothai, Jakarat; Basso, Bruno; Biernath, Christian; Cammarano, Davide; Challinor, Andy; Deryng, Delphine; Sanctis, Giacomo De; Doltra, Jordi; Fereres, Elias; Folberth, Christian; Garcia-Vila, Margarita; Gayler, Sebastian; Hoogenboom, Gerrit; Hunt, Leslie A.; Izaurralde, Roberto C.; Jabloun, Mohamed; Jones, Curtis D.; Kersebaum, Kurt C.; Kimball, Bruce A.; Koehler, Ann-Kristin; Kumar, Soora Naresh; Nendel, Claas; O'Leary, Garry J.; Olesen, Jørgen E.; Ottman, Michael J.; Palosuo, Taru; Prasad, P. V. Vara; Priesack, Eckart; Pugh, Thomas A. M.; Reynolds, Matthew; Rezaei, Ehsan E.; Rötter, Reimund P.; Schmid, Erwin; Semenov, Mikhail A.; Shcherbak, Iurii; Stehfest, Elke; Stöckle, Claudio O.; Stratonovitch, Pierre; Streck, Thilo; Supit, Iwan; Tao, Fulu; Thorburn, Peter; Waha, Katharina; Wall, Gerard W.; Wang, Enli; White, Jeffrey W.; Wolf, Joost; Zhao, Zhigan; Zhu, Yan
2016-12-01
The potential impact of global temperature change on global crop yield has recently been assessed with different methods. Here we show that grid-based and point-based simulations and statistical regressions (from historic records), without deliberate adaptation or CO2 fertilization effects, produce similar estimates of temperature impact on wheat yields at global and national scales. With a 1 °C global temperature increase, global wheat yield is projected to decline between 4.1% and 6.4%. Projected relative temperature impacts from different methods were similar for major wheat-producing countries China, India, USA and France, but less so for Russia. Point-based and grid-based simulations, and to some extent the statistical regressions, were consistent in projecting that warmer regions are likely to suffer more yield loss with increasing temperature than cooler regions. By forming a multi-method ensemble, it was possible to quantify `method uncertainty’ in addition to model uncertainty. This significantly improves confidence in estimates of climate impacts on global food security.
NASA Astrophysics Data System (ADS)
Molina, Armando; Govers, Gerard; Poesen, Jean; Van Hemelryck, Hendrik; De Bièvre, Bert; Vanacker, Veerle
2008-06-01
A large spatial variability in sediment yield was observed from small streams in the Ecuadorian Andes. The objective of this study was to analyze the environmental factors controlling these variations in sediment yield in the Paute basin, Ecuador. Sediment yield data were calculated based on sediment volumes accumulated behind checkdams for 37 small catchments. Mean annual specific sediment yield (SSY) shows a large spatial variability and ranges between 26 and 15,100 Mg km - 2 year - 1 . Mean vegetation cover (C, fraction) in the catchment, i.e. the plant cover at or near the surface, exerts a first order control on sediment yield. The fractional vegetation cover alone explains 57% of the observed variance in ln(SSY). The negative exponential relation (SSY = a × e- b C) which was found between vegetation cover and sediment yield at the catchment scale (10 3-10 9 m 2), is very similar to the equations derived from splash, interrill and rill erosion experiments at the plot scale (1-10 3 m 2). This affirms the general character of an exponential decrease of sediment yield with increasing vegetation cover at a wide range of spatial scales, provided the distribution of cover can be considered to be essentially random. Lithology also significantly affects the sediment yield, and explains an additional 23% of the observed variance in ln(SSY). Based on these two catchment parameters, a multiple regression model was built. This empirical regression model already explains more than 75% of the total variance in the mean annual sediment yield. These results highlight the large potential of revegetation programs for controlling sediment yield. They show that a slight increase in the overall fractional vegetation cover of degraded land is likely to have a large effect on sediment production and delivery. Moreover, they point to the importance of detailed surface vegetation data for predicting and modeling sediment production rates.
Canaza-Cayo, Ali William; Lopes, Paulo Sávio; da Silva, Marcos Vinicius Gualberto Barbosa; de Almeida Torres, Robledo; Martins, Marta Fonseca; Arbex, Wagner Antonio; Cobuci, Jaime Araujo
2015-01-01
A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre’s polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield (PSi) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used indicated RRM employing the Legendre’s polynomial of orders 3 and 5 for fitting the genetic additive and permanent environment effects, respectively, as the best model. The heritability and genetic correlation for TDMY throughout the lactation, obtained with the best model, varied from 0.18 to 0.23 and from −0.03 to 1.00, respectively. The heritability and genetic correlation for persistency and 305MY varied from 0.10 to 0.33 and from −0.98 to 1.00, respectively. The use of PS7 would be the most suitable option for the evaluation of Girolando cattle. The estimated breeding values for 305MY of sires and cows showed significant and positive genetic trends. Thus, the use of selection indices would be indicated in the genetic evaluation of Girolando cattle for both traits. PMID:26323397
Plant, soil, and shadow reflectance components of row crops
NASA Technical Reports Server (NTRS)
Richardson, A. J.; Wiegand, C. L.; Gausman, H. W.; Cuellar, J. A.; Gerbermann, A. H.
1975-01-01
Data from the first Earth Resource Technology Satellite (LANDSAT-1) multispectral scanner (MSS) were used to develop three plant canopy models (Kubelka-Munk (K-M), regression, and combined K-M and regression models) for extracting plant, soil, and shadow reflectance components of cropped fields. The combined model gave the best correlation between MSS data and ground truth, by accounting for essentially all of the reflectance of plants, soil, and shadow between crop rows. The principles presented can be used to better forecast crop yield and to estimate acreage.
Crop weather models of barley and spring wheat yield for agrophysical units in North Dakota
NASA Technical Reports Server (NTRS)
Leduc, S. (Principal Investigator)
1982-01-01
Models based on multiple regression were developed to estimate barley yield and spring wheat yield from weather data for Agrophysical units(APU) in North Dakota. The predictor variables are derived from monthly average temperature and monthly total precipitation data at meteorological stations in the cooperative network. The models are similar in form to the previous models developed for Crop Reporting Districts (CRD). The trends and derived variables were the same and the approach to select the significant predictors was similar to that used in developing the CRD models. The APU models show sight improvements in some of the statistics of the models, e.g., explained variation. These models are to be independently evaluated and compared to the previously evaluated CRD models. The comparison will indicate the preferred model area for this application, i.e., APU or CRD.
Modeling individual tree survial
Quang V. Cao
2016-01-01
Information provided by growth and yield models is the basis for forest managers to make decisions on how to manage their forests. Among different types of growth models, whole-stand models offer predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-known logistic regression is commonly used to predict tree...
Sills, Deborah L; Gossett, James M
2012-04-01
Fourier transform infrared, attenuated total reflectance (FTIR-ATR) spectroscopy, combined with partial least squares (PLS) regression, accurately predicted solubilization of plant cell wall constituents and NaOH consumption through pretreatment, and overall sugar productions from combined pretreatment and enzymatic hydrolysis. PLS regression models were constructed by correlating FTIR spectra of six raw biomasses (two switchgrass cultivars, big bluestem grass, a low-impact, high-diversity mixture of prairie biomasses, mixed hardwood, and corn stover), plus alkali loading in pretreatment, to nine dependent variables: glucose, xylose, lignin, and total solids solubilized in pretreatment; NaOH consumed in pretreatment; and overall glucose and xylose conversions and yields from combined pretreatment and enzymatic hydrolysis. PLS models predicted the dependent variables with the following values of coefficient of determination for cross-validation (Q²): 0.86 for glucose, 0.90 for xylose, 0.79 for lignin, and 0.85 for total solids solubilized in pretreatment; 0.83 for alkali consumption; 0.93 for glucose conversion, 0.94 for xylose conversion, and 0.88 for glucose and xylose yields. The sugar yield models are noteworthy for their ability to predict overall saccharification through combined pretreatment and enzymatic hydrolysis per mass dry untreated solids without a priori knowledge of the composition of solids. All wavenumbers with significant variable-important-for-projection (VIP) scores have been attributed to chemical features of lignocellulose, demonstrating the models were based on real chemical information. These models suggest that PLS regression can be applied to FTIR-ATR spectra of raw biomasses to rapidly predict effects of pretreatment on solids and on subsequent enzymatic hydrolysis. Copyright © 2011 Wiley Periodicals, Inc.
Evaluation of the Williams-type spring wheat model in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Leduc, S. (Principal Investigator)
1982-01-01
The Williams type model, developed similarly to previous models of C.V.D. Williams, uses monthly temperature and precipitation data as well as soil and topological variables to predict the yield of the spring wheat crop. The models are statistically developed using the regression technique. Eight model characteristics are examined in the evaluation of the model. Evaluation is at the crop reporting district level, the state level and for the entire region. A ten year bootstrap test was the basis of the statistical evaluation. The accuracy and current indication of modeled yield reliability could show improvement. There is great variability in the bias measured over the districts, but there is a slight overall positive bias. The model estimates for the east central crop reporting district in Minnesota are not accurate. The estimate of yield for 1974 were inaccurate for all of the models.
Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A.
R.M. Rice; D.J. Furbish
1984-01-01
The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative...
"Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A."
R. M. Rice; D. J. Furbish
1984-01-01
The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R 2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative differences in...
NASA Astrophysics Data System (ADS)
Rahayu, A. P.; Hartatik, T.; Purnomoadi, A.; Kurnianto, E.
2018-02-01
The aims of this study were to estimate 305 day first lactation milk yield of Indonesian Holstein cattle from cumulative monthly and bimonthly test day records and to analyze its accuracy.The first lactation records of 258 dairy cows from 2006 to 2014 consisted of 2571 monthly (MTDY) and 1281 bimonthly test day yield (BTDY) records were used. Milk yields were estimated by regression method. Correlation coefficients between actual and estimated milk yield by cumulative MTDY were 0.70, 0.78, 0.83, 0.86, 0.89, 0.92, 0.94 and 0.96 for 2-9 months, respectively, meanwhile by cumulative BTDY were 0.69, 0.81, 0.87 and 0.92 for 2, 4, 6 and 8 months, respectively. The accuracy of fitting regression models (R2) increased with the increasing in the number of cumulative test day used. The used of 5 cumulative MTDY was considered sufficient for estimating 305 day first lactation milk yield with 80.6% accuracy and 7% error percentage of estimation. The estimated milk yield from MTDY was more accurate than BTDY by 1.1 to 2% less error percentage in the same time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less
Santana, Mário L; Bignardi, Annaiza Braga; Pereira, Rodrigo Junqueira; Menéndez-Buxadera, Alberto; El Faro, Lenira
2016-02-01
The present study had the following objectives: to compare random regression models (RRM) considering the time-dependent (days in milk, DIM) and/or temperature × humidity-dependent (THI) covariate for genetic evaluation; to identify the effect of genotype by environment interaction (G×E) due to heat stress on milk yield; and to quantify the loss of milk yield due to heat stress across lactation of cows under tropical conditions. A total of 937,771 test-day records from 3603 first lactations of Brazilian Holstein cows obtained between 2007 and 2013 were analyzed. An important reduction in milk yield due to heat stress was observed for THI values above 66 (-0.23 kg/day/THI). Three phases of milk yield loss were identified during lactation, the most damaging one at the end of lactation (-0.27 kg/day/THI). Using the most complex RRM, the additive genetic variance could be altered simultaneously as a function of both DIM and THI values. This model could be recommended for the genetic evaluation taking into account the effect of G×E. The response to selection in the comfort zone (THI ≤ 66) is expected to be higher than that obtained in the heat stress zone (THI > 66) of the animals. The genetic correlations between milk yield in the comfort and heat stress zones were less than unity at opposite extremes of the environmental gradient. Thus, the best animals for milk yield in the comfort zone are not necessarily the best in the zone of heat stress and, therefore, G×E due to heat stress should not be neglected in the genetic evaluation.
Panayi, Efstathios; Peters, Gareth W; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields.
Panayi, Efstathios; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields. PMID:28961254
Remontet, L; Bossard, N; Belot, A; Estève, J
2007-05-10
Relative survival provides a measure of the proportion of patients dying from the disease under study without requiring the knowledge of the cause of death. We propose an overall strategy based on regression models to estimate the relative survival and model the effects of potential prognostic factors. The baseline hazard was modelled until 10 years follow-up using parametric continuous functions. Six models including cubic regression splines were considered and the Akaike Information Criterion was used to select the final model. This approach yielded smooth and reliable estimates of mortality hazard and allowed us to deal with sparse data taking into account all the available information. Splines were also used to model simultaneously non-linear effects of continuous covariates and time-dependent hazard ratios. This led to a graphical representation of the hazard ratio that can be useful for clinical interpretation. Estimates of these models were obtained by likelihood maximization. We showed that these estimates could be also obtained using standard algorithms for Poisson regression. Copyright 2006 John Wiley & Sons, Ltd.
Hamada, Yuki; Ssegane, Herbert; Negri, Maria Cristina
2015-07-31
Biofuels are important alternatives for meeting our future energy needs. Successful bioenergy crop production requires maintaining environmental sustainability and minimum impacts on current net annual food, feed, and fiber production. The objectives of this study were to: (1) determine under-productive areas within an agricultural field in a watershed using a single date; high resolution remote sensing and (2) examine impacts of growing bioenergy crops in the under-productive areas using hydrologic modeling in order to facilitate sustainable landscape design. Normalized difference indices (NDIs) were computed based on the ratio of all possible two-band combinations using the RapidEye and the National Agriculturalmore » Imagery Program images collected in summer 2011. A multiple regression analysis was performed using 10 NDIs and five RapidEye spectral bands. The regression analysis suggested that the red and near infrared bands and NDI using red-edge and near infrared that is known as the red-edge normalized difference vegetation index (RENDVI) had the highest correlation (R 2 = 0.524) with the reference yield. Although predictive yield map showed striking similarity to the reference yield map, the model had modest correlation; thus, further research is needed to improve predictive capability for absolute yields. Forecasted impact using the Soil and Water Assessment Tool model of growing switchgrass ( Panicum virgatum) on under-productive areas based on corn yield thresholds of 3.1, 4.7, and 6.3 Mg·ha -1 showed reduction of tile NO 3-N and sediment exports by 15.9%–25.9% and 25%–39%, respectively. Corresponding reductions in water yields ranged from 0.9% to 2.5%. While further research is warranted, the study demonstrated the integration of remote sensing and hydrologic modeling to quantify the multifunctional value of projected future landscape patterns in a context of sustainable bioenergy crop production.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamada, Yuki; Ssegane, Herbert; Negri, Maria Cristina
Biofuels are important alternatives for meeting our future energy needs. Successful bioenergy crop production requires maintaining environmental sustainability and minimum impacts on current net annual food, feed, and fiber production. The objectives of this study were to: (1) determine under-productive areas within an agricultural field in a watershed using a single date; high resolution remote sensing and (2) examine impacts of growing bioenergy crops in the under-productive areas using hydrologic modeling in order to facilitate sustainable landscape design. Normalized difference indices (NDIs) were computed based on the ratio of all possible two-band combinations using the RapidEye and the National Agriculturalmore » Imagery Program images collected in summer 2011. A multiple regression analysis was performed using 10 NDIs and five RapidEye spectral bands. The regression analysis suggested that the red and near infrared bands and NDI using red-edge and near infrared that is known as the red-edge normalized difference vegetation index (RENDVI) had the highest correlation (R 2 = 0.524) with the reference yield. Although predictive yield map showed striking similarity to the reference yield map, the model had modest correlation; thus, further research is needed to improve predictive capability for absolute yields. Forecasted impact using the Soil and Water Assessment Tool model of growing switchgrass ( Panicum virgatum) on under-productive areas based on corn yield thresholds of 3.1, 4.7, and 6.3 Mg·ha -1 showed reduction of tile NO 3-N and sediment exports by 15.9%–25.9% and 25%–39%, respectively. Corresponding reductions in water yields ranged from 0.9% to 2.5%. While further research is warranted, the study demonstrated the integration of remote sensing and hydrologic modeling to quantify the multifunctional value of projected future landscape patterns in a context of sustainable bioenergy crop production.« less
Conduct urban agglomeration with the baton of transportation.
DOT National Transportation Integrated Search
2013-12-01
A key indicator of traffic activity patterns is commuting distance. Shorter commuting distances yield less traffic, fewer emissions, : and lower energy consumption. This study develops a spatial error seemingly unrelated regression model to investiga...
Hopkins, D L; Safari, E; Thompson, J M; Smith, C R
2004-06-01
A wide selection of lamb types of mixed sex (ewes and wethers) were slaughtered at a commercial abattoir and during this process images of 360 carcasses were obtained online using the VIAScan® system developed by Meat and Livestock Australia. Soft tissue depth at the GR site (thickness of tissue over the 12th rib 110 mm from the midline) was measured by an abattoir employee using the AUS-MEAT sheep probe (PGR). Another measure of this thickness was taken in the chiller using a GR knife (NGR). Each carcass was subsequently broken down to a range of trimmed boneless retail cuts and the lean meat yield determined. The current industry model for predicting meat yield uses hot carcass weight (HCW) and tissue depth at the GR site. A low level of accuracy and precision was found when HCW and PGR were used to predict lean meat yield (R(2)=0.19, r.s.d.=2.80%), which could be improved markedly when PGR was replaced by NGR (R(2)=0.41, r.s.d.=2.39%). If the GR measures were replaced by 8 VIAScan® measures then greater prediction accuracy could be achieved (R(2)=0.52, r.s.d.=2.17%). A similar result was achieved when the model was based on principal components (PCs) computed from the 8 VIAScan® measures (R(2)=0.52, r.s.d.=2.17%). The use of PCs also improved the stability of the model compared to a regression model based on HCW and NGR. The transportability of the models was tested by randomly dividing the data set and comparing coefficients and the level of accuracy and precision. Those models based on PCs were superior to those based on regression. It is demonstrated that with the appropriate modeling the VIAScan® system offers a workable method for predicting lean meat yield automatically.
NASA Astrophysics Data System (ADS)
Zhang, J.; Ives, A. R.; Turner, M. G.; Kucharik, C. J.
2017-12-01
Previous studies have identified global agricultural regions where "stagnation" of long-term crop yield increases has occurred. These studies have used a variety of simple statistical methods that often ignore important aspects of time series regression modeling. These methods can lead to differing and contradictory results, which creates uncertainty regarding food security given rapid global population growth. Here, we present a new statistical framework incorporating time series-based algorithms into standard regression models to quantify spatiotemporal yield trends of US maize, soybean, and winter wheat from 1970-2016. Our primary goal was to quantify spatial differences in yield trends for these three crops using USDA county level data. This information was used to identify regions experiencing the largest changes in the rate of yield increases over time, and to determine whether abrupt shifts in the rate of yield increases have occurred. Although crop yields continue to increase in most maize-, soybean-, and winter wheat-growing areas, yield increases have stagnated in some key agricultural regions during the most recent 15 to 16 years: some maize-growing areas, except for the northern Great Plains, have shown a significant trend towards smaller annual yield increases for maize; soybean has maintained an consistent long-term yield gains in the Northern Great Plains, the Midwest, and southeast US, but has experienced a shift to smaller annual increases in other regions; winter wheat maintained a moderate annual increase in eastern South Dakota and eastern US locations, but showed a decline in the magnitude of annual increases across the central Great Plains and western US regions. Our results suggest that there were abrupt shifts in the rate of annual yield increases in a variety of US regions among the three crops. The framework presented here can be broadly applied to additional yield trend analyses for different crops and regions of the Earth.
NASA Astrophysics Data System (ADS)
Chen, Pengfei; Jing, Qi
2017-02-01
An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Learning Models and Real-Time Speech Recognition.
ERIC Educational Resources Information Center
Danforth, Douglas G.; And Others
This report describes the construction and testing of two "psychological" learning models for the purpose of computer recognition of human speech over the telephone. One of the two models was found to be superior in all tests. A regression analysis yielded a 92.3% recognition rate for 14 subjects ranging in age from 6 to 13 years. Tests…
Monitoring interannual variation in global crop yield using long-term AVHRR and MODIS observations
NASA Astrophysics Data System (ADS)
Zhang, Xiaoyang; Zhang, Qingyuan
2016-04-01
Advanced Very High Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectroradiometer (MODIS) data have been extensively applied for crop yield prediction because of their daily temporal resolution and a global coverage. This study investigated global crop yield using daily two band Enhanced Vegetation Index (EVI2) derived from AVHRR (1981-1999) and MODIS (2000-2013) observations at a spatial resolution of 0.05° (∼5 km). Specifically, EVI2 temporal trajectory of crop growth was simulated using a hybrid piecewise logistic model (HPLM) for individual pixels, which was used to detect crop phenological metrics. The derived crop phenology was then applied to calculate crop greenness defined as EVI2 amplitude and EVI2 integration during annual crop growing seasons, which was further aggregated for croplands in each country, respectively. The interannual variations in EVI2 amplitude and EVI2 integration were combined to correlate to the variation in cereal yield from 1982-2012 for individual countries using a stepwise regression model, respectively. The results show that the confidence level of the established regression models was higher than 90% (P value < 0.1) in most countries in the northern hemisphere although it was relatively poor in the southern hemisphere (mainly in Africa). The error in the yield predication was relatively smaller in America, Europe and East Asia than that in Africa. In the 10 countries with largest cereal production across the world, the prediction error was less than 9% during past three decades. This suggests that crop phenology-controlled greenness from coarse resolution satellite data has the capability of predicting national crop yield across the world, which could provide timely and reliable crop information for global agricultural trade and policymakers.
NASA Astrophysics Data System (ADS)
Jaafar, H. H.; Ahmad, F. A.
2015-04-01
In semi-arid areas within the MENA region, food security problems are the main problematic imposed. Remote sensing can be a promising too early diagnose food shortages and further prevent the population from famine risks. This study is aimed at examining the possibility of forecasting yield before harvest from remotely sensed MODIS-derived Enhanced Vegetation Index (EVI), Net photosynthesis (net PSN), and Gross Primary Production (GPP) in semi-arid and arid irrigated agro-ecosystems within the conflict affected country of Syria. Relationships between summer yield and remotely sensed indices were derived and analyzed. Simple regression spatially-based models were developed to predict summer crop production. The validation of these models was tested during conflict years. A significant correlation (p<0.05) was found between summer crop yield and EVI, GPP and net PSN. Results indicate the efficiency of remotely sensed-based models in predicting summer yield, mostly for cotton yields and vegetables. Cumulative summer EVI-based model can predict summer crop yield during crisis period, with deviation less than 20% where vegetables are the major yield. This approach prompts to an early assessment of food shortages and lead to a real time management and decision making, especially in periods of crisis such as wars and drought.
Forecasting space weather: Can new econometric methods improve accuracy?
NASA Astrophysics Data System (ADS)
Reikard, Gordon
2011-06-01
Space weather forecasts are currently used in areas ranging from navigation and communication to electric power system operations. The relevant forecast horizons can range from as little as 24 h to several days. This paper analyzes the predictability of two major space weather measures using new time series methods, many of them derived from econometrics. The data sets are the A p geomagnetic index and the solar radio flux at 10.7 cm. The methods tested include nonlinear regressions, neural networks, frequency domain algorithms, GARCH models (which utilize the residual variance), state transition models, and models that combine elements of several techniques. While combined models are complex, they can be programmed using modern statistical software. The data frequency is daily, and forecasting experiments are run over horizons ranging from 1 to 7 days. Two major conclusions stand out. First, the frequency domain method forecasts the A p index more accurately than any time domain model, including both regressions and neural networks. This finding is very robust, and holds for all forecast horizons. Combining the frequency domain method with other techniques yields a further small improvement in accuracy. Second, the neural network forecasts the solar flux more accurately than any other method, although at short horizons (2 days or less) the regression and net yield similar results. The neural net does best when it includes measures of the long-term component in the data.
Yobbi, D.K.
2000-01-01
A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
NASA Astrophysics Data System (ADS)
Riddin, T. L.; Gericke, M.; Whiteley, C. G.
2006-07-01
Fusarium oxysporum fungal strain was screened and found to be successful for the inter- and extracellular production of platinum nanoparticles. Nanoparticle formation was visually observed, over time, by the colour of the extracellular solution and/or the fungal biomass turning from yellow to dark brown, and their concentration was determined from the amount of residual hexachloroplatinic acid measured from a standard curve at 456 nm. The extracellular nanoparticles were characterized by transmission electron microscopy. Nanoparticles of varying size (10-100 nm) and shape (hexagons, pentagons, circles, squares, rectangles) were produced at both extracellular and intercellular levels by the Fusarium oxysporum. The particles precipitate out of solution and bioaccumulate by nucleation either intercellularly, on the cell wall/membrane, or extracellularly in the surrounding medium. The importance of pH, temperature and hexachloroplatinic acid (H2PtCl6) concentration in nanoparticle formation was examined through the use of a statistical response surface methodology. Only the extracellular production of nanoparticles proved to be statistically significant, with a concentration yield of 4.85 mg l-1 estimated by a first-order regression model. From a second-order polynomial regression, the predicted yield of nanoparticles increased to 5.66 mg l-1 and, after a backward step, regression gave a final model with a yield of 6.59 mg l-1.
Santellano-Estrada, E; Becerril-Pérez, C M; de Alba, J; Chang, Y M; Gianola, D; Torres-Hernández, G; Ramírez-Valverde, R
2008-11-01
This study inferred genetic and permanent environmental variation of milk yield in Tropical Milking Criollo cattle and compared 5 random regression test-day models using Wilmink's function and Legendre polynomials. Data consisted of 15,377 test-day records from 467 Tropical Milking Criollo cows that calved between 1974 and 2006 in the tropical lowlands of the Gulf Coast of Mexico and in southern Nicaragua. Estimated heritabilities of test-day milk yields ranged from 0.18 to 0.45, and repeatabilities ranged from 0.35 to 0.68 for the period spanning from 6 to 400 d in milk. Genetic correlation between days in milk 10 and 400 was around 0.50 but greater than 0.90 for most pairs of test days. The model that used first-order Legendre polynomials for additive genetic effects and second-order Legendre polynomials for permanent environmental effects gave the smallest residual variance and was also favored by the Akaike information criterion and likelihood ratio tests.
Random Forests for Global and Regional Crop Yield Predictions.
Jeong, Jig Han; Resop, Jonathan P; Mueller, Nathaniel D; Fleisher, David H; Yun, Kyungdahm; Butler, Ethan E; Timlin, Dennis J; Shim, Kyo-Moon; Gerber, James S; Reddy, Vangimalla R; Kim, Soo-Hyung
2016-01-01
Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.
Estimating standard errors in feature network models.
Frank, Laurence E; Heiser, Willem J
2007-05-01
Feature network models are graphical structures that represent proximity data in a discrete space while using the same formalism that is the basis of least squares methods employed in multidimensional scaling. Existing methods to derive a network model from empirical data only give the best-fitting network and yield no standard errors for the parameter estimates. The additivity properties of networks make it possible to consider the model as a univariate (multiple) linear regression problem with positivity restrictions on the parameters. In the present study, both theoretical and empirical standard errors are obtained for the constrained regression parameters of a network model with known features. The performance of both types of standard error is evaluated using Monte Carlo techniques.
Cao, Xueren; Luo, Yong; Zhou, Yilin; Fan, Jieru; Xu, Xiangming; West, Jonathan S.; Duan, Xiayu; Cheng, Dengfa
2015-01-01
To determine the influence of plant density and powdery mildew infection of winter wheat and to predict grain yield, hyperspectral canopy reflectance of winter wheat was measured for two plant densities at Feekes growth stage (GS) 10.5.3, 10.5.4, and 11.1 in the 2009–2010 and 2010–2011 seasons. Reflectance in near infrared (NIR) regions was significantly correlated with disease index at GS 10.5.3, 10.5.4, and 11.1 at two plant densities in both seasons. For the two plant densities, the area of the red edge peak (Σdr 680–760 nm), difference vegetation index (DVI), and triangular vegetation index (TVI) were significantly correlated negatively with disease index at three GSs in two seasons. Compared with other parameters Σdr 680–760 nm was the most sensitive parameter for detecting powdery mildew. Linear regression models relating mildew severity to Σdr 680–760 nm were constructed at three GSs in two seasons for the two plant densities, demonstrating no significant difference in the slope estimates between the two plant densities at three GSs. Σdr 680–760 nm was correlated with grain yield at three GSs in two seasons. The accuracies of partial least square regression (PLSR) models were consistently higher than those of models based on Σdr 680760 nm for disease index and grain yield. PLSR can, therefore, provide more accurate estimation of disease index of wheat powdery mildew and grain yield using canopy reflectance. PMID:25815468
Ding, Changfeng; Li, Xiaogang; Zhang, Taolin; Ma, Yibing; Wang, Xingxiang
2014-10-01
Soil environmental quality standards in respect of heavy metals for farmlands should be established considering both their effects on crop yield and their accumulation in the edible part. A greenhouse experiment was conducted to investigate the effects of chromium (Cr) on biomass production and Cr accumulation in carrot plants grown in a wide range of soils. The results revealed that carrot yield significantly decreased in 18 of the total 20 soils with Cr addition being the soil environmental quality standard of China. The Cr content of carrot grown in the five soils with pH>8.0 exceeded the maximum allowable level (0.5mgkg(-1)) according to the Chinese General Standard for Contaminants in Foods. The relationship between carrot Cr concentration and soil pH could be well fitted (R(2)=0.70, P<0.0001) by a linear-linear segmented regression model. The addition of Cr to soil influenced carrot yield firstly rather than the food quality. The major soil factors controlling Cr phytotoxicity and the prediction models were further identified and developed using path analysis and stepwise multiple linear regression analysis. Soil Cr thresholds for phytotoxicity meanwhile ensuring food safety were then derived on the condition of 10 percent yield reduction. Copyright © 2014 Elsevier Inc. All rights reserved.
Nishiura, Akiko; Sasaki, Osamu; Aihara, Mitsuo; Takeda, Hisato; Satoh, Masahiro
2015-12-01
We estimated the genetic parameters of fat-to-protein ratio (FPR) and the genetic correlations between FPR and milk yield or somatic cell score in the first three lactations in dairy cows. Data included 3,079,517 test-day records of 201,138 Holstein cows in Japan from 2006 to 2011. Genetic parameters were estimated with a multiple-trait random regression model in which the records within and between parities were treated as separate traits. The phenotypic values of FPR increased soon after parturition and peaked at 10 to 20 days in milk, then decreased slowly in mid- and late lactation. Heritability estimates for FPR yielded moderate values. Genetic correlations of FPR among parities were low in early lactation. Genetic correlations between FPR and milk yield were positive and low in early lactation, but only in the first lactation. Genetic correlations between FPR and somatic cell score were positive in early lactation and decreased to become negative in mid- to late lactation. By using these results for genetic evaluation it should be possible to improve energy balance in dairy cows. © 2015 Japanese Society of Animal Science.
[Predicting the impact of climate change in the next 40 years on the yield of maize in China].
Ma, Yu-ping; Sun, Lin-li; E, You-hao; Wu, Wei
2015-01-01
Climate change will significantly affect agricultural production in China. The combination of the integral regression model and the latest climate projection may well assess the impact of future climate change on crop yield. In this paper, the correlation model of maize yield and meteorological factors was firstly established for different provinces in China by using the integral regression method, then the impact of climate change in the next 40 years on China's maize production was evaluated combined the latest climate prediction with the reason be ing analyzed. The results showed that if the current speeds of maize variety improvement and science and technology development were constant, maize yield in China would be mainly in an increasing trend of reduction with time in the next 40 years in a range generally within 5%. Under A2 climate change scenario, the region with the most reduction of maize yield would be the Northeast except during 2021-2030, and the reduction would be generally in the range of 2.3%-4.2%. Maize yield reduction would be also high in the Northwest, Southwest and middle and lower reaches of Yangtze River after 2031. Under B2 scenario, the reduction of 5.3% in the Northeast in 2031-2040 would be the greatest across all regions. Other regions with considerable maize yield reduction would be mainly in the Northwest and the Southwest. Reduction in maize yield in North China would be small, generally within 2%, under any scenarios, and that in South China would be almost unchanged. The reduction of maize yield in most regions would be greater under A2 scenario than under B2 scenario except for the period of 2021-2030. The effect of the ten day precipitation on maize yield in northern China would be almost positive. However, the effect of ten day average temperature on yield of maize in all regions would be generally negative. The main reason of maize yield reduction was temperature increase in most provinces but precipitation decrease in a few provinces. Assessments of the future change of maize yield in China based on the different methods were not consistent. Further evaluation needs to consider the change of maize variety and scientific and technological progress, and to enhance the reliability of evaluation models.
Use of vegetation health data for estimation of aus rice yield in bangladesh.
Rahman, Atiqur; Roytman, Leonid; Krakauer, Nir Y; Nizamuddin, Mohammad; Goldberg, Mitch
2009-01-01
Rice is a vital staple crop for Bangladesh and surrounding countries, with interannual variation in yields depending on climatic conditions. We compared Bangladesh yield of aus rice, one of the main varieties grown, from official agricultural statistics with Vegetation Health (VH) Indices [Vegetation Condition Index (VCI), Temperature Condition Index (TCI) and Vegetation Health Index (VHI)] computed from Advanced Very High Resolution Radiometer (AVHRR) data covering a period of 15 years (1991-2005). A strong correlation was found between aus rice yield and VCI and VHI during the critical period of aus rice development that occurs during March-April (weeks 8-13 of the year), several months in advance of the rice harvest. Stepwise principal component regression (PCR) was used to construct a model to predict yield as a function of critical-period VHI. The model reduced the yield prediction error variance by 62% compared with a prediction of average yield for each year. Remote sensing is a valuable tool for estimating rice yields well in advance of harvest and at a low cost.
Use of Vegetation Health Data for Estimation of Aus Rice Yield in Bangladesh
Rahman, Atiqur; Roytman, Leonid; Krakauer, Nir Y.; Nizamuddin, Mohammad; Goldberg, Mitch
2009-01-01
Rice is a vital staple crop for Bangladesh and surrounding countries, with interannual variation in yields depending on climatic conditions. We compared Bangladesh yield of aus rice, one of the main varieties grown, from official agricultural statistics with Vegetation Health (VH) Indices [Vegetation Condition Index (VCI), Temperature Condition Index (TCI) and Vegetation Health Index (VHI)] computed from Advanced Very High Resolution Radiometer (AVHRR) data covering a period of 15 years (1991–2005). A strong correlation was found between aus rice yield and VCI and VHI during the critical period of aus rice development that occurs during March–April (weeks 8–13 of the year), several months in advance of the rice harvest. Stepwise principal component regression (PCR) was used to construct a model to predict yield as a function of critical-period VHI. The model reduced the yield prediction error variance by 62% compared with a prediction of average yield for each year. Remote sensing is a valuable tool for estimating rice yields well in advance of harvest and at a low cost. PMID:22574057
de Melo, C M R; Packer, I U; Costa, C N; Machado, P F
2007-03-01
Covariance components for test day milk yield using 263 390 first lactation records of 32 448 Holstein cows were estimated using random regression animal models by restricted maximum likelihood. Three functions were used to adjust the lactation curve: the five-parameter logarithmic Ali and Schaeffer function (AS), the three-parameter exponential Wilmink function in its standard form (W) and in a modified form (W*), by reducing the range of covariate, and the combination of Legendre polynomial and W (LEG+W). Heterogeneous residual variance (RV) for different classes (4 and 29) of days in milk was considered in adjusting the functions. Estimates of RV were quite similar, rating from 4.15 to 5.29 kg2. Heritability estimates for AS (0.29 to 0.42), LEG+W (0.28 to 0.42) and W* (0.33 to 0.40) were similar, but heritability estimates used W (0.25 to 0.65) were highest than those estimated by the other functions, particularly at the end of lactation. Genetic correlations between milk yield on consecutive test days were close to unity, but decreased as the interval between test days increased. The AS function with homogeneous RV model had the best fit among those evaluated.
Rouphail, Nagui M.
2011-01-01
This paper presents behavioral-based models for describing pedestrian gap acceptance at unsignalized crosswalks in a mixed-priority environment, where some drivers yield and some pedestrians cross in gaps. Logistic regression models are developed to predict the probability of pedestrian crossings as a function of vehicle dynamics, pedestrian assertiveness, and other factors. In combination with prior work on probabilistic yielding models, the results can be incorporated in a simulation environment, where they can more fully describe the interaction of these two modes. The approach is intended to supplement HCM analytical procedure for locations where significant interaction occurs between drivers and pedestrians, including modern roundabouts. PMID:21643488
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary
NASA Astrophysics Data System (ADS)
Gillis, Nicolas; Luce, Robert
2018-01-01
A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Seaman, Shaun R; Hughes, Rachael A
2018-06-01
Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.
Antwi, Philip; Li, Jianzheng; Boadi, Portia Opoku; Meng, Jia; Shi, En; Deng, Kaiwen; Bondinuba, Francis Kwesi
2017-03-01
Three-layered feedforward backpropagation (BP) artificial neural networks (ANN) and multiple nonlinear regression (MnLR) models were developed to estimate biogas and methane yield in an upflow anaerobic sludge blanket (UASB) reactor treating potato starch processing wastewater (PSPW). Anaerobic process parameters were optimized to identify their importance on methanation. pH, total chemical oxygen demand, ammonium, alkalinity, total Kjeldahl nitrogen, total phosphorus, volatile fatty acids and hydraulic retention time selected based on principal component analysis were used as input variables, whiles biogas and methane yield were employed as target variables. Quasi-Newton method and conjugate gradient backpropagation algorithms were best among eleven training algorithms. Coefficient of determination (R 2 ) of the BP-ANN reached 98.72% and 97.93% whiles MnLR model attained 93.9% and 91.08% for biogas and methane yield, respectively. Compared with the MnLR model, BP-ANN model demonstrated significant performance, suggesting possible control of the anaerobic digestion process with the BP-ANN model. Copyright © 2016 Elsevier Ltd. All rights reserved.
Risk factors for displaced abomasum or ketosis in Swedish dairy herds.
Stengärde, L; Hultgren, J; Tråvén, M; Holtenius, K; Emanuelson, U
2012-03-01
Risk factors associated with high or low long-term incidence of displaced abomasum (DA) or clinical ketosis were studied in 60 Swedish dairy herds, using multivariable logistic regression modelling. Forty high-incidence herds were included as cases and 20 low-incidence herds as controls. Incidence rates were calculated based on veterinary records of clinical diagnoses. During the 3-year period preceding the herd classification, herds with a high incidence had a disease incidence of DA or clinical ketosis above the 3rd quartile in a national database for disease recordings. Control herds had no cows with DA or clinical ketosis. All herds were visited during the housing period and herdsmen were interviewed about management routines, housing, feeding, milk yield, and herd health. Target groups were heifers in late gestation, dry cows, and cows in early lactation. Univariable logistic regression was used to screen for factors associated with being a high-incidence herd. A multivariable logistic regression model was built using stepwise regression. A higher maximum daily milk yield in multiparous cows and a large herd size (p=0.054 and p=0.066, respectively) tended to be associated with being a high-incidence herd. Not cleaning the heifer feeding platform daily increased the odds of having a high-incidence herd twelvefold (p<0.01). Keeping cows in only one group in the dry period increased the odds of having a high incidence herd eightfold (p=0.03). Herd size was confounded with housing system. Housing system was therefore added to the final logistic regression model. In conclusion, a large herd size, a high maximum daily milk yield, keeping dry cows in one group, and not cleaning the feeding platform daily appear to be important risk factors for a high incidence of DA or clinical ketosis in Swedish dairy herds. These results confirm the importance of housing, management and feeding in the prevention of metabolic disorders in dairy cows around parturition and in early lactation. Copyright © 2011 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Shi, Z. H.
2014-12-01
There are strong ties between land use and sediment yield in watersheds. Many studies have used multivariate regression techniques to explore the response of sediment yield to land-use compositions and spatial configurations in watersheds. However, one issue with the use of conventional statistical methods to address relationships between land-use compositions and spatial configurations and sediment yield is multicollinearity. This paper examines the combined effects of land-use compositions and land-use spatial configurations of the watershed on the specific sediment yield of the Upper Du River watershed (8,973 km2) in China using the Soil and Water Assessment Tool (SWAT) and partial least-squares regression (PLSR). The land-use compositions and spatial configurations of the watershed were calculated at the sub-watershed scale. The sediment yields from sub-watershed were evaluated using SWAT model. The first-order factors were identified by calculating the variable importance for the projection (VIP). The results revealed that the land-use compositions exerted the largest effects on the specific sediment yield and explained 61.2% of the variation in the specific sediment yield. Land-use spatial configurations were also found to have a large effect on the specific sediment yield and explained 21.7% of the observed variation in the specific sediment yield. The following are the dominant first-order factors of the specific sediment yield at the sub-watershed scale: the areal percentages of agriculture and forest, patch density, value of the Shannon's diversity index, contagion. The VIP values suggested that the Shannon's diversity index and contagion are important factors for sediment delivery.
Transfer Student Success: Educationally Purposeful Activities Predictive of Undergraduate GPA
ERIC Educational Resources Information Center
Fauria, Renee M.; Fuller, Matthew B.
2015-01-01
Researchers evaluated the effects of Educationally Purposeful Activities (EPAs) on transfer and nontransfer students' cumulative GPAs. Hierarchical, linear, and multiple regression models yielded seven statistically significant educationally purposeful items that influenced undergraduate student GPAs. Statistically significant positive EPAs for…
Ngwa, Julius S; Cabral, Howard J; Cheng, Debbie M; Pencina, Michael J; Gagnon, David R; LaValley, Michael P; Cupples, L Adrienne
2016-11-03
Typical survival studies follow individuals to an event and measure explanatory variables for that event, sometimes repeatedly over the course of follow up. The Cox regression model has been used widely in the analyses of time to diagnosis or death from disease. The associations between the survival outcome and time dependent measures may be biased unless they are modeled appropriately. In this paper we explore the Time Dependent Cox Regression Model (TDCM), which quantifies the effect of repeated measures of covariates in the analysis of time to event data. This model is commonly used in biomedical research but sometimes does not explicitly adjust for the times at which time dependent explanatory variables are measured. This approach can yield different estimates of association compared to a model that adjusts for these times. In order to address the question of how different these estimates are from a statistical perspective, we compare the TDCM to Pooled Logistic Regression (PLR) and Cross Sectional Pooling (CSP), considering models that adjust and do not adjust for time in PLR and CSP. In a series of simulations we found that time adjusted CSP provided identical results to the TDCM while the PLR showed larger parameter estimates compared to the time adjusted CSP and the TDCM in scenarios with high event rates. We also observed upwardly biased estimates in the unadjusted CSP and unadjusted PLR methods. The time adjusted PLR had a positive bias in the time dependent Age effect with reduced bias when the event rate is low. The PLR methods showed a negative bias in the Sex effect, a subject level covariate, when compared to the other methods. The Cox models yielded reliable estimates for the Sex effect in all scenarios considered. We conclude that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses. We present results from the Framingham Heart Study in which lipid measurements and myocardial infarction data events were collected over a period of 26 years.
Impact of a comprehensive population health management program on health care costs.
Grossmeier, Jessica; Seaverson, Erin L D; Mangen, David J; Wright, Steven; Dalal, Karl; Phalen, Chris; Gold, Daniel B
2013-06-01
Assess the influence of participation in a population health management (PHM) program on health care costs. A quasi-experimental study relied on logistic and ordinary least squares regression models to compare the costs of program participants with those of nonparticipants, while controlling for differences in health care costs and utilization, demographics, and health status. Propensity score models were developed and analyses were weighted by inverse propensity scores to control for selection bias. Study models yielded an estimated savings of $60.65 per wellness participant per month and $214.66 per disease management participant per month. Program savings were combined to yield an integrated return-on-investment of $3 in savings for every dollar invested. A PHM program yielded a positive return on investment after 2 years of wellness program and 1 year of integrated disease management program launch.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Climate change and maize yield in Iowa
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Hong; Twine, Tracy E.; Girvetz, Evan
Climate is changing across the world, including the major maize-growing state of Iowa in the USA. To maintain crop yields, farmers will need a suite of adaptation strategies, and choice of strategy will depend on how the local to regional climate is expected to change. Here we predict how maize yield might change through the 21 st century as compared with late 20 th century yields across Iowa, USA, a region representing ideal climate and soils for maize production that contributes substantially to the global maize economy. To account for climate model uncertainty, we drive a dynamic ecosystem model withmore » output from six climate models and two future climate forcing scenarios. Despite a wide range in the predicted amount of warming and change to summer precipitation, all simulations predict a decrease in maize yields from late 20 th century to middle and late 21 st century ranging from 15% to 50%. Linear regression of all models predicts a 6% state-averaged yield decrease for every 1°C increase in warm season average air temperature. When the influence of moisture stress on crop growth is removed from the model, yield decreases either remain the same or are reduced, depending on predicted changes in warm season precipitation. Lastly, our results suggest that even if maize were to receive all the water it needed, under the strongest climate forcing scenario yields will decline by 10-20% by the end of the 21 st century.« less
Climate change and maize yield in Iowa
Xu, Hong; Twine, Tracy E.; Girvetz, Evan
2016-05-24
Climate is changing across the world, including the major maize-growing state of Iowa in the USA. To maintain crop yields, farmers will need a suite of adaptation strategies, and choice of strategy will depend on how the local to regional climate is expected to change. Here we predict how maize yield might change through the 21 st century as compared with late 20 th century yields across Iowa, USA, a region representing ideal climate and soils for maize production that contributes substantially to the global maize economy. To account for climate model uncertainty, we drive a dynamic ecosystem model withmore » output from six climate models and two future climate forcing scenarios. Despite a wide range in the predicted amount of warming and change to summer precipitation, all simulations predict a decrease in maize yields from late 20 th century to middle and late 21 st century ranging from 15% to 50%. Linear regression of all models predicts a 6% state-averaged yield decrease for every 1°C increase in warm season average air temperature. When the influence of moisture stress on crop growth is removed from the model, yield decreases either remain the same or are reduced, depending on predicted changes in warm season precipitation. Lastly, our results suggest that even if maize were to receive all the water it needed, under the strongest climate forcing scenario yields will decline by 10-20% by the end of the 21 st century.« less
Aqua/Aura Updated Inclination Adjust Maneuver Performance Prediction Model
NASA Technical Reports Server (NTRS)
Boone, Spencer
2017-01-01
This presentation will discuss the updated Inclination Adjust Maneuver (IAM) performance prediction model that was developed for Aqua and Aura following the 2017 IAM series. This updated model uses statistical regression methods to identify potential long-term trends in maneuver parameters, yielding improved predictions when re-planning past maneuvers. The presentation has been reviewed and approved by Eric Moyer, ESMO Deputy Project Manager.
Robertson, Dale M.; Saad, David A.; Schwarz, Gregory E.
2014-01-01
Nitrogen (N) and phosphorus (P) loading from the Mississippi/Atchafalaya River Basin (MARB) has been linked to hypoxia in the Gulf of Mexico. With geospatial datasets for 2002, including inputs from wastewater treatment plants (WWTPs), and monitored loads throughout the MARB, SPAtially Referenced Regression On Watershed attributes (SPARROW) watershed models were constructed specifically for the MARB, which reduced simulation errors from previous models. Based on these models, N loads/yields were highest from the central part (centered over Iowa and Indiana) of the MARB (Corn Belt), and the highest P yields were scattered throughout the MARB. Spatial differences in yields from previous studies resulted from different descriptions of the dominant sources (N yields are highest with crop-oriented agriculture and P yields are highest with crop and animal agriculture and major WWTPs) and different descriptions of downstream transport. Delivered loads/yields from the MARB SPARROW models are used to rank subbasins, states, and eight-digit Hydrologic Unit Code basins (HUC8s) by N and P contributions and then rankings are compared with those from other studies. Changes in delivered yields result in an average absolute change of 1.3 (N) and 1.9 (P) places in state ranking and 41 (N) and 69 (P) places in HUC8 ranking from those made with previous national-scale SPARROW models. This information may help managers decide where efforts could have the largest effects (highest ranked areas) and thus reduce hypoxia in the Gulf of Mexico.
Genetic evaluation of lactation persistency for five breeds of dairy cattle.
Cole, J B; Null, D J
2009-05-01
Cows with high lactation persistency tend to produce less milk than expected at the beginning of lactation and more than expected at the end. Best prediction of lactation persistency is calculated as a function of trait-specific standard lactation curves and linear regressions of test-day deviations on days in milk. Because regression coefficients are deviations from a tipping point selected to make yield and lactation persistency phenotypically uncorrelated it should be possible to use 305-d actual yield and lactation persistency to predict yield for lactations with later endpoints. The objectives of this study were to calculate (co)variance components and breeding values for best predictions of lactation persistency of milk (PM), fat (PF), protein (PP), and somatic cell score (PSCS) in breeds other than Holstein, and to demonstrate the calculation of prediction equations for 400-d actual milk yield. Data included lactations from Ayrshire, Brown Swiss, Guernsey (GU), Jersey (JE), and Milking Shorthorn (MS) cows calving since 1997. The number of sires evaluated ranged from 86 (MS) to 3,192 (JE), and mean sire estimated breeding value for PM ranged from 0.001 (Ayrshire) to 0.10 (Brown Swiss); mean estimated breeding value for PSCS ranged from -0.01 (MS) to -0.043 (JE). Heritabilities were generally highest for PM (0.09 to 0.15) and lowest for PSCS (0.03 to 0.06), with PF and PP having intermediate values (0.07 to 0.13). Repeatabilities varied considerably between breeds, ranging from 0.08 (PSCS in GU, JE, and MS) to 0.28 (PM in GU). Genetic correlations of PM, PF, and PP with PSCS were moderate and favorable (negative), indicating that increasing lactation persistency of yield traits is associated with decreases in lactation persistency of SCS, as expected. Genetic correlations among yield and lactation persistency were low to moderate and ranged from -0.55 (PP in GU) to 0.40 (PP in MS). Prediction equations for 400-d milk yield were calculated for each breed by regression of both 305-d yield and 305-d yield and lactation persistency on 400-d yield. Goodness-of-fit was very good for both models, but the addition of lactation persistency to the model significantly improved fit in all cases. Routine genetic evaluations for lactation persistency, as well as the development of prediction equations for several lactation end-points, may provide producers with tools to better manage their herds.
Greeven, Anja; van Balkom, Anton J L M; Spinhoven, Philip
2014-05-01
We aimed to investigate whether personality characteristics predict time to remission and psychiatric status. The follow-up was at most 6 years and was performed within the scope of a randomized controlled trial that investigated the efficacy of cognitive behavioral therapy, paroxetine, and placebo in hypochondriasis. The Life Chart Interview was administered to investigate for each year if remission had occurred. Personality was assessed at pretest by the Abbreviated Dutch Temperament and Character Inventory. Cox's regression models for recurrent events were compared with logistic regression models. Sixteen (36.4%) of 44 patients achieved remission during the follow-up period. Cox's regression yielded approximately the same results as the logistic regression. Being less harm avoidant and more cooperative were associated with a shorter time to remission and a remitted state after the follow-up period. Personality variables seem to be relevant for describing patients with a more chronic course of hypochondriacal complaints.
Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P
2014-06-26
To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
Above-ground biomass of mangrove species. I. Analysis of models
NASA Astrophysics Data System (ADS)
Soares, Mário Luiz Gomes; Schaeffer-Novelli, Yara
2005-10-01
This study analyzes the above-ground biomass of Rhizophora mangle and Laguncularia racemosa located in the mangroves of Bertioga (SP) and Guaratiba (RJ), Southeast Brazil. Its purpose is to determine the best regression model to estimate the total above-ground biomass and compartment (leaves, reproductive parts, twigs, branches, trunk and prop roots) biomass, indirectly. To do this, we used structural measurements such as height, diameter at breast-height (DBH), and crown area. A combination of regression types with several compositions of independent variables generated 2.272 models that were later tested. Subsequent analysis of the models indicated that the biomass of reproductive parts, branches, and prop roots yielded great variability, probably because of environmental factors and seasonality (in the case of reproductive parts). It also indicated the superiority of multiple regression to estimate above-ground biomass as it allows researchers to consider several aspects that affect above-ground biomass, specially the influence of environmental factors. This fact has been attested to the models that estimated the biomass of crown compartments.
Jiang, Wei; Xu, Chao-Zhen; Jiang, Si-Zhi; Zhang, Tang-Duo; Wang, Shi-Zhen; Fang, Bai-Shan
2017-04-01
L-tert-Leucine (L-Tle) and its derivatives are extensively used as crucial building blocks for chiral auxiliaries, pharmaceutically active ingredients, and ligands. Combining with formate dehydrogenase (FDH) for regenerating the expensive coenzyme NADH, leucine dehydrogenase (LeuDH) is continually used for synthesizing L-Tle from α-keto acid. A multilevel factorial experimental design was executed for research of this system. In this work, an efficient optimization method for improving the productivity of L-Tle was developed. And the mathematical model between different fermentation conditions and L-Tle yield was also determined in the form of the equation by using uniform design and regression analysis. The multivariate regression equation was conveniently implemented in water, with a space time yield of 505.9 g L -1 day -1 and an enantiomeric excess value of >99 %. These results demonstrated that this method might become an ideal protocol for industrial production of chiral compounds and unnatural amino acids such as chiral drug intermediates.
Cho, C. I.; Alam, M.; Choi, T. J.; Choy, Y. H.; Choi, J. G.; Lee, S. S.; Cho, K. H.
2016-01-01
The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3–L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea. PMID:26954184
Cho, C I; Alam, M; Choi, T J; Choy, Y H; Choi, J G; Lee, S S; Cho, K H
2016-05-01
The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3-L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea.
Aspilcueta-Borquis, Rúsbel R; Araujo Neto, Francisco R; Baldi, Fernando; Santos, Daniel J A; Albuquerque, Lucia G; Tonhati, Humberto
2012-08-01
The test-day yields of milk, fat and protein were analysed from 1433 first lactations of buffaloes of the Murrah breed, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, born between 1985 and 2007. For the test-day yields, 10 monthly classes of lactation days were considered. The contemporary groups were defined as the herd-year-month of the test day. Random additive genetic, permanent environmental and residual effects were included in the model. The fixed effects considered were the contemporary group, number of milkings (1 or 2 milkings), linear and quadratic effects of the covariable cow age at calving and the mean lactation curve of the population (modelled by third-order Legendre orthogonal polynomials). The random additive genetic and permanent environmental effects were estimated by means of regression on third- to sixth-order Legendre orthogonal polynomials. The residual variances were modelled with a homogenous structure and various heterogeneous classes. According to the likelihood-ratio test, the best model for milk and fat production was that with four residual variance classes, while a third-order Legendre polynomial was best for the additive genetic effect for milk and fat yield, a fourth-order polynomial was best for the permanent environmental effect for milk production and a fifth-order polynomial was best for fat production. For protein yield, the best model was that with three residual variance classes and third- and fourth-order Legendre polynomials were best for the additive genetic and permanent environmental effects, respectively. The heritability estimates for the characteristics analysed were moderate, varying from 0·16±0·05 to 0·29±0·05 for milk yield, 0·20±0·05 to 0·30±0·08 for fat yield and 0·18±0·06 to 0·27±0·08 for protein yield. The estimates of the genetic correlations between the tests varied from 0·18±0·120 to 0·99±0·002; from 0·44±0·080 to 0·99±0·004; and from 0·41±0·080 to 0·99±0·004, for milk, fat and protein production, respectively, indicating that whatever the selection criterion used, indirect genetic gains can be expected throughout the lactation curve.
NASA Astrophysics Data System (ADS)
Gao, Ming; Li, Shiwei
2017-05-01
Based on experimental data of the soybean yield and quality from 30 sampling points, a quantitative structure-activity relationship model (2D-QSAR) was established using the soil quality (elements, pH, organic matter content and cation exchange capacity) as independent variables and soybean yield or quality as the dependent variable, with SPSS software. During the modeling, the full data set (30 and 14 compounds) was divided into a training set (24 and 11 compounds) for model generation and a test set (6 and 3 compounds) for model validation. The R2 values of the resulting models and data were 0.826 and 0.808 for soybean yield and quality, respectively, and all regression coefficients were significant (P < 0.05). The correlation coefficient R2pred of observed values and predicted values of the soybean yield and soybean quality in the test set were 0.961 and 0.956, respectively, indicating that the models had a good predictive ability. Moreover, the Mo, Se, K, N and organic matter contents and the cation exchange capacity of soil had a positive effect on soybean production, and the B, Mo, Se, K and N contents and cation exchange coefficient had a positive effect on soybean quality. The results are instructive for enhancing soils to improve the yield and quality of soybean, and this method can also be used to study other crops or regions, providing a theoretical basis to improving the yield and quality of crops.
Yu, Meijuan; Zhao, Mingxing; Huang, Zhenxing; Xi, Kezhong; Shi, Wansheng; Ruan, Wenquan
2018-02-01
A model based on feature objects (FOs) aided strategy was used to evaluate the methane generation from food waste by anaerobic digestion. The kinetics of feature objects was tested by the modified Gompertz model and the first-order kinetic model, and the first-order kinetic hydrolysis constants were used to estimate the reaction rate of homemade and actual food waste. The results showed that the methane yields of four feature objects were significantly different. The anaerobic digestion of homemade food waste and actual food waste had various methane yields and kinetic constants due to the different contents of FOs in food waste. Combining the kinetic equations with the multiple linear regression equation could well express the methane yield of food waste, as the R 2 of food waste was more than 0.9. The predictive methane yields of the two actual food waste were 528.22 mL g -1 TS and 545.29 mL g -1 TS with the model, while the experimental values were 527.47 mL g -1 TS and 522.1 mL g -1 TS, respectively. The relative error between the experimental cumulative methane yields and the predicted cumulative methane yields were both less than 5%. Copyright © 2017 Elsevier Ltd. All rights reserved.
Meseret, S.; Tamir, B.; Gebreyohannes, G.; Lidauer, M.; Negussie, E.
2015-01-01
The development of effective genetic evaluations and selection of sires requires accurate estimates of genetic parameters for all economically important traits in the breeding goal. The main objective of this study was to assess the relative performance of the traditional lactation average model (LAM) against the random regression test-day model (RRM) in the estimation of genetic parameters and prediction of breeding values for Holstein Friesian herds in Ethiopia. The data used consisted of 6,500 test-day (TD) records from 800 first-lactation Holstein Friesian cows that calved between 1997 and 2013. Co-variance components were estimated using the average information restricted maximum likelihood method under single trait animal model. The estimate of heritability for first-lactation milk yield was 0.30 from LAM whilst estimates from the RRM model ranged from 0.17 to 0.29 for the different stages of lactation. Genetic correlations between different TDs in first-lactation Holstein Friesian ranged from 0.37 to 0.99. The observed genetic correlation was less than unity between milk yields at different TDs, which indicated that the assumption of LAM may not be optimal for accurate evaluation of the genetic merit of animals. A close look at estimated breeding values from both models showed that RRM had higher standard deviation compared to LAM indicating that the TD model makes efficient utilization of TD information. Correlations of breeding values between models ranged from 0.90 to 0.96 for different group of sires and cows and marked re-rankings were observed in top sires and cows in moving from the traditional LAM to RRM evaluations. PMID:26194217
Effect of pregnancy on the genetic evaluation of dairy cattle.
Pereira, R J; Santana, M L; Bignardi, A B; Verneque, R S; El Faro, L; Albuquerque, L G
2011-09-26
We investigated the effect of stage of pregnancy on estimates of breeding values for milk yield and milk persistency in Gyr and Holstein dairy cattle in Brazil. Test-day milk yield records were analyzed using random regression models with or without the effect of pregnancy. Models were compared using residual variances, heritabilities, rank correlations of estimated breeding values of bulls and cows, and number of nonpregnant cows in the top 200 for milk yield and milk persistency. The estimates of residual variance and heritabilities obtained with the models with or without the effect of pregnancy were similar for the two breeds. Inclusion of the effect of pregnancy in genetic evaluation models for these populations did not affect the ranking of cows and sires based on their predicted breeding values for 305-day cumulative milk yield. In contrast, when we examined persistency of milk yield, lack of adjustment for the effect of pregnancy overestimated breeding values of nonpregnant cows and cows with a long days open period and underestimated breeding values of cows with a short days open period. We recommend that models include the effect of days of pregnancy for estimation of adjustment factors for the effect of pregnancy in genetic evaluations of Dairy Gyr and Holstein cattle.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Modeling heat stress effect on Holstein cows under hot and dry conditions: selection tools.
Carabaño, M J; Bachagha, K; Ramón, M; Díaz, C
2014-12-01
Data from milk recording of Holstein-Friesian cows together with weather information from 2 regions in Southern Spain were used to define the models that can better describe heat stress response for production traits and somatic cell score (SCS). Two sets of analyses were performed, one aimed at defining the population phenotypic response and the other at studying the genetic components. The first involved 2,514,762 test-day records from up to 5 lactations of 128,112 cows. Two models, one fitting a comfort threshold for temperature and a slope of decay after the threshold, and the other a cubic Legendre polynomial (LP) model were tested. Average (TAVE) and maximum daily temperatures were alternatively considered as covariates. The LP model using TAVE as covariate showed the best goodness of fit for all traits. Estimated rates of decay from this model for production at 25 and 34°C were 36 and 170, 3.8 and 3.0, and 3.9 and 8.2g/d per degree Celsius for milk, fat, and protein yield, respectively. In the second set of analyses, a sample of 280,958 test-day records from first lactations of 29,114 cows was used. Random regression models including quadratic or cubic LP regressions (TEM_) on TAVE or a fixed threshold and an unknown slope (DUMMY), including or not cubic regressions on days in milk (DIM3_), were tested. For milk and SCS, the best models were the DIM3_ models. In contrast, for fat and protein yield, the best model was TEM3. The DIM3DUMMY models showed similar performance to DIM3TEM3. The estimated genetic correlations between the same trait under cold and hot temperatures (ρ) indicated the existence of a large genotype by environment interaction for fat (ρ=0.53 for model TEM3) and protein yield (ρ around 0.6 for DIM3TEM3) and for SCS (ρ=0.64 for model DIM3TEM3), and a small genotype by environment interaction for milk (ρ over 0.8). The eigendecomposition of the additive genetic covariance matrix from model TEM3 showed the existence of a dominant component, a constant term that is not affected by temperature, representing from 64% of the variation for SCS to 91% of the variation for milk. The second component, showing a flat pattern at intermediate temperatures and increasing or decreasing slopes for the extremes, gathered 15, 11, and 24% of the variation for fat and protein yield and SCS, respectively. This component could be further evaluated as a selection criterion for heat tolerance independently of the production level. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Reconstruction of missing daily streamflow data using dynamic regression models
NASA Astrophysics Data System (ADS)
Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault
2015-12-01
River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.
NASA Astrophysics Data System (ADS)
Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried
2018-03-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P < 0.001) in rice yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Cross-validation pitfalls when selecting and assessing regression and classification models.
Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon
2014-03-29
We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.
Data error and highly parameterized groundwater models
Hill, M.C.
2008-01-01
Strengths and weaknesses of highly parameterized models, in which the number of parameters exceeds the number of observations, are demonstrated using a synthetic test case. Results suggest that the approach can yield close matches to observations but also serious errors in system representation. It is proposed that avoiding the difficulties of highly parameterized models requires close evaluation of: (1) model fit, (2) performance of the regression, and (3) estimated parameter distributions. Comparisons to hydrogeologic information are expected to be critical to obtaining credible models. Copyright ?? 2008 IAHS Press.
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy
2016-01-01
This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.
A Clinical Decision Support System for Breast Cancer Patients
NASA Astrophysics Data System (ADS)
Fernandes, Ana S.; Alves, Pedro; Jarman, Ian H.; Etchells, Terence A.; Fonseca, José M.; Lisboa, Paulo J. G.
This paper proposes a Web clinical decision support system for clinical oncologists and for breast cancer patients making prognostic assessments, using the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the clinically widely used Nottingham prognostic index (NPI); the Cox regression modelling and a partial logistic artificial neural network with automatic relevance determination (PLANN-ARD). All three models yield a different prognostic index that can be analysed together in order to obtain a more accurate prognostic assessment of the patient. Missing data is incorporated in the mentioned models, a common issue in medical data that was overcome using multiple imputation techniques. Risk group assignments are also provided through a methodology based on regression trees, where Boolean rules can be obtained expressed with patient characteristics.
Liu, Huolong; Galbraith, S C; Ricart, Brendon; Stanton, Courtney; Smith-Goettler, Brandye; Verdi, Luke; O'Connor, Thomas; Lee, Sau; Yoon, Seongkyu
2017-06-15
In this study, the influence of key process variables (screw speed, throughput and liquid to solid (L/S) ratio) of a continuous twin screw wet granulation (TSWG) was investigated using a central composite face-centered (CCF) experimental design method. Regression models were developed to predict the process responses (motor torque, granule residence time), granule properties (size distribution, volume average diameter, yield, relative width, flowability) and tablet properties (tensile strength). The effects of the three key process variables were analyzed via contour and interaction plots. The experimental results have demonstrated that all the process responses, granule properties and tablet properties are influenced by changing the screw speed, throughput and L/S ratio. The TSWG process was optimized to produce granules with specific volume average diameter of 150μm and the yield of 95% based on the developed regression models. A design space (DS) was built based on volume average granule diameter between 90 and 200μm and the granule yield larger than 75% with a failure probability analysis using Monte Carlo simulations. Validation experiments successfully validated the robustness and accuracy of the DS generated using the CCF experimental design in optimizing a continuous TSWG process. Copyright © 2017 Elsevier B.V. All rights reserved.
A provisional effective evaluation when errors are present in independent variables
NASA Technical Reports Server (NTRS)
Gurin, L. S.
1983-01-01
Algorithms are examined for evaluating the parameters of a regression model when there are errors in the independent variables. The algorithms are fast and the estimates they yield are stable with respect to the correlation of errors and measurements of both the dependent variable and the independent variables.
USDA-ARS?s Scientific Manuscript database
In the western Great Plains, climate dictates dryland wheat (Triticum aestivum, L) and corn (Zea mays, L.) production. Municipalities also use this region to recycle sewage biosolids. Will biosolids (from the Littleton/Englewood, CO Wastewater Treatment Plant) applications to western Great Plains ...
Merchantable sawlog and bole-length equations for the Northeastern United States
Daniel A. Yaussy; Martin E. Dale; Martin E. Dale
1991-01-01
A modified Richards growth model is used to develop species-specific coefficients for equations estimating the merchantable sawlog and bole lengths of trees from 25 species groups common to the Northeastern United States. These regression coefficients have been incorporated into the growth-and-yield simulation software, NE-TWIGS.
Wheat yield dynamics: a structural econometric analysis.
Sahin, Afsin; Akdi, Yilmaz; Arslan, Fahrettin
2007-10-15
In this study we initially have tried to explore the wheat situation in Turkey, which has a small-open economy and in the member countries of European Union (EU). We have observed that increasing the wheat yield is fundamental to obtain comparative advantage among countries by depressing domestic prices. Also the changing structure of supporting schemes in Turkey makes it necessary to increase its wheat yield level. For this purpose, we have used available data to determine the dynamics of wheat yield by Ordinary Least Square Regression methods. In order to find out whether there is a linear relationship among these series we have checked each series whether they are integrated at the same order or not. Consequently, we have pointed out that fertilizer usage and precipitation level are substantial inputs for producing high wheat yield. Furthermore, in respect for our model, fertilizer usage affects wheat yield more than precipitation level.
Johnson, Henry M.; Black, Robert W.; Wise, Daniel R.
2013-01-01
The watershed model SPARROW (Spatially Related Regressions on Watershed attributes) was used to predict total nitrogen (TN) and total phosphorus (TP) loads and yields for the Middle Columbia River Basin in Idaho, Oregon, and Washington. The new models build on recently published models for the entire Pacific Northwest, and provide revised load predictions for the arid interior of the region by restricting the modeling domain and recalibrating the models. Results from the new TN and TP models are provided for the entire region, and discussed with special emphasis on the Yakima River Basin, Washington. In most catchments of the Yakima River Basin, the TN and TP in streams is from natural sources, specifically nitrogen fixation in forests (TN) and weathering and erosion of geologic materials (TP). The natural nutrient sources are overshadowed by anthropogenic sources of TN and TP in highly agricultural and urbanized catchments; downstream of the city of Yakima, most of the load in the Yakima River is derived from anthropogenic sources. Yields of TN and TP from catchments with nearly uniform land use were compared with other yield values and export coefficients published in the scientific literature, and generally were in agreement. The median yield of TN was greatest in catchments dominated by agricultural land and smallest in catchments dominated by grass and scrub land. The median yield of TP was greatest in catchments dominated by forest land, but the largest yields (90th percentile) of TP were from agricultural catchments. As with TN, the smallest TP yields were from catchments dominated by grass and scrub land.
[Winter wheat yield gap between field blocks based on comparative performance analysis].
Chen, Jian; Wang, Zhong-Yi; Li, Liang-Tao; Zhang, Ke-Feng; Yu, Zhen-Rong
2008-09-01
Based on a two-year household survey data, the yield gap of winter wheat in Quzhou County of Hebei Province, China in 2003-2004 was studied through comparative performance analysis (CPA). The results showed that there was a greater yield gap (from 4.2 to 7.9 t x hm(-2)) between field blocks, with a variation coefficient of 0.14. Through stepwise forward linear multiple regression, it was found that the yield model with 8 selected variables could explain 63% variability of winter wheat yield. Among the variables selected, soil salinity, soil fertility, and irrigation water quality were the most important limiting factors, accounting for 52% of the total yield gap. Crop variety was another important limiting factor, accounting for 14%; while planting date, fertilizer type, disease and pest, and water press accounted for 7%, 14%, 10%, and 3%, respectively. Therefore, besides soil and climate conditions, management practices occupied the majority of yield variability in Quzhou County, suggesting that the yield gap could be reduced significantly through optimum field management.
Reisler, Ronald B; Gibbs, Paul H; Danner, Denise K; Boudreau, Ellen F
2012-11-26
We compared the effect on primary vaccination plaque-reduction neutralization 80% titers (PRNT80) responses of same-day administration (at different injection sites) of two similar investigational inactivated alphavirus vaccines, eastern equine encephalitis (EEE) vaccine (TSI-GSD 104) and western equine encephalitis (WEE) vaccine (TSI-GSD 210) to separate administration. Overall, primary response rate for EEE vaccine was 524/796 (66%) and overall primary response rate for WEE vaccine was 291/695 (42%). EEE vaccine same-day administration yielded a 59% response rate and a responder geometric mean titer (GMT)=89 while separate administration yielded a response rate of 69% and a responder GMT=119. WEE vaccine same-day administration yielded a 30% response rate and a responder GMT=53 while separate administration yielded a response rate of 54% and a responder GMT=79. EEE response rates for same-day administration (group A) vs. non-same-day administration (group B) were significantly affected by gender. A logistic regression model predicting response to EEE comparing group B to group A for females yielded an OR=4.10 (95% CL 1.97-8.55; p=.0002) and for males yielded an OR=1.25 (95% CL 0.76-2.07; p=.3768). WEE response rates for same-day administration vs. non-same-day administration were independent of gender. A logistic regression model predicting response to WEE comparing group B to group A yielded an OR=2.14 (95% CL 1.22-3.73; p=.0077). We report immune interference occurring with same-day administration of two completely separate formalin inactivated viral vaccines in humans. These findings combined with the findings of others regarding immune interference would argue for a renewed emphasis on studying the immunological mechanisms of induction of inactivated viral vaccine protection. Copyright © 2012. Published by Elsevier Ltd.
Yield Strength Testing in Human Cadaver Nasal Septal Cartilage and L-Strut Constructs.
Liu, Yuan F; Messinger, Kelton; Inman, Jared C
2017-01-01
To our knowledge, yield strength testing in human nasal septal cartilage has not been reported to date. An understanding of the basic mechanics of the nasal septum may help surgeons decide how much of an L-strut to preserve and how much grafting is needed. To determine the factors correlated with yield strength of the cartilaginous nasal septum and to explore the association between L-strut width and thickness in determining yield strength. In an anatomy laboratory, yield strength of rectangular pieces of fresh cadaver nasal septal cartilage was measured, and regression was performed to identify the factors correlated with yield strength. To measure yield strength in L-shaped models, 4 bonded paper L-struts models were constructed for every possible combination of the width and thickness, for a total of 240 models. Mathematical modeling using the resultant data with trend lines and surface fitting was performed to quantify the associations among L-strut width, thickness, and yield strength. The study dates were November 1, 2015, to April 1, 2016. The factors correlated with nasal cartilage yield strength and the associations among L-strut width, thickness, and yield strength in L-shaped models. Among 95 cartilage pieces from 12 human cadavers (mean [SD] age, 67.7 [12.6] years) and 240 constructed L-strut models, L-strut thickness was the only factor correlated with nasal septal cartilage yield strength (coefficient for thickness, 5.54; 95% CI, 4.08-7.00; P < .001), with an adjusted R2 correlation coefficient of 0.37. The mean (SD) yield strength R2 varied with L-strut thickness exponentially (0.93 [0.06]) for set widths, and it varied with L-strut width linearly (0.82 [0.11]) or logarithmically (0.85 [0.17]) for set thicknesses. A 3-dimensional surface model of yield strength with L-strut width and thickness as variables was created using a 2-dimensional gaussian function (adjusted R2 = 0.94). Estimated yield strengths were generated from the model to allow determination of the desired yield strength with different permutations of L-strut width and thickness. In this study of human cadaver nasal septal cartilage, L-strut thickness was significantly associated with yield strength. In a bonded paper L-strut model, L-strut thickness had a more important role in determining yield strength than L-strut width. Surgeons should consider the thickness of potential L-struts when determining the amount of cartilaginous septum to harvest and graft. NA.
Effects of diurnal temperature range and drought on wheat yield in Spain
NASA Astrophysics Data System (ADS)
Hernandez-Barrera, S.; Rodriguez-Puebla, C.; Challinor, A. J.
2017-07-01
This study aims to provide new insight on the wheat yield historical response to climate processes throughout Spain by using statistical methods. Our data includes observed wheat yield, pseudo-observations E-OBS for the period 1979 to 2014, and outputs of general circulation models in phase 5 of the Coupled Models Inter-comparison Project (CMIP5) for the period 1901 to 2099. In investigating the relationship between climate and wheat variability, we have applied the approach known as the partial least-square regression, which captures the relevant climate drivers accounting for variations in wheat yield. We found that drought occurring in autumn and spring and the diurnal range of temperature experienced during the winter are major processes to characterize the wheat yield variability in Spain. These observable climate processes are used for an empirical model that is utilized in assessing the wheat yield trends in Spain under different climate conditions. To isolate the trend within the wheat time series, we implemented the adaptive approach known as Ensemble Empirical Mode Decomposition. Wheat yields in the twenty-first century are experiencing a downward trend that we claim is a consequence of widespread drought over the Iberian Peninsula and an increase in the diurnal range of temperature. These results are important to inform about the wheat vulnerability in this region to coming changes and to develop adaptation strategies.
Importance of spatial autocorrelation in modeling bird distributions at a continental scale
Bahn, V.; O'Connor, R.J.; Krohn, W.B.
2006-01-01
Spatial autocorrelation in species' distributions has been recognized as inflating the probability of a type I error in hypotheses tests, causing biases in variable selection, and violating the assumption of independence of error terms in models such as correlation or regression. However, it remains unclear whether these problems occur at all spatial resolutions and extents, and under which conditions spatially explicit modeling techniques are superior. Our goal was to determine whether spatial models were superior at large extents and across many different species. In addition, we investigated the importance of purely spatial effects in distribution patterns relative to the variation that could be explained through environmental conditions. We studied distribution patterns of 108 bird species in the conterminous United States using ten years of data from the Breeding Bird Survey. We compared the performance of spatially explicit regression models with non-spatial regression models using Akaike's information criterion. In addition, we partitioned the variance in species distributions into an environmental, a pure spatial and a shared component. The spatially-explicit conditional autoregressive regression models strongly outperformed the ordinary least squares regression models. In addition, partialling out the spatial component underlying the species' distributions showed that an average of 17% of the explained variation could be attributed to purely spatial effects independent of the spatial autocorrelation induced by the underlying environmental variables. We concluded that location in the range and neighborhood play an important role in the distribution of species. Spatially explicit models are expected to yield better predictions especially for mobile species such as birds, even in coarse-grained models with a large extent. ?? Ecography.
Relationships between surface solar radiation and wheat yield in Spain
NASA Astrophysics Data System (ADS)
Hernandez-Barrera, Sara; Rodriguez-Puebla, Concepción
2017-04-01
Here we examine the role of solar radiation to describe wheat-yield variability in Spain. We used Partial Least Square regression to capture the modes of surface solar radiation that drive wheat-yield variability. We will show that surface solar radiation introduces the effects of teleconnection patterns on wheat yield and also it is associated with drought and diurnal temperature range. We highlight the importance of surface solar radiation to obtain models for wheat-yield projections because it could reduce uncertainty with respect to the projections based on temperatures and precipitation variables. In addition, the significance of the model based on surface solar radiation is greater than the previous one based on drought and diurnal temperature range (Hernandez-Barrera et al., 2016). According to our results, the increase of solar radiation over Spain for 21st century could force a wheat-yield decrease (Hernandez-Barrera et al., 2017). Hernandez-Barrera S., Rodríguez-Puebla C. and Challinor A.J. 2016 Effects of diurnal temperature range and drought on wheat yield in Spain. Theoretical and Applied Climatology. DOI: 10.1007/s00704-016-1779-9 Hernandez-Barrera S., Rodríguez-Puebla C. 2017 Wheat yield in Spain and associated solar radiation patterns. International Journal of Climatology. DOI: 10.1002/joc.4975
Kitchenham, B A; Rowlands, G J; Shorbagi, H
1975-05-01
Regression analyses were performed on data from 48 Compton metabolic profile tests relating the concentrations of certain constituents in the blood of dairy cows to their milk yield, age and stage of lactation. The common partial regression coefficients for milk yield, age and stage of lactation were estimated for each blood constituent. The relationships of greatest statistical significance were between the concentrations of inorganic phosphate and globulin and age, and the concentration of albumin and milk yield.
Estuarine Sediment Deposition during Wetland Restoration: A GIS and Remote Sensing Modeling Approach
NASA Technical Reports Server (NTRS)
Newcomer, Michelle; Kuss, Amber; Kentron, Tyler; Remar, Alex; Choksi, Vivek; Skiles, J. W.
2011-01-01
Restoration of the industrial salt flats in the San Francisco Bay, California is an ongoing wetland rehabilitation project. Remote sensing maps of suspended sediment concentration, and other GIS predictor variables were used to model sediment deposition within these recently restored ponds. Suspended sediment concentrations were calibrated to reflectance values from Landsat TM 5 and ASTER using three statistical techniques -- linear regression, multivariate regression, and an Artificial Neural Network (ANN), to map suspended sediment concentrations. Multivariate and ANN regressions using ASTER proved to be the most accurate methods, yielding r2 values of 0.88 and 0.87, respectively. Predictor variables such as sediment grain size and tidal frequency were used in the Marsh Sedimentation (MARSED) model for predicting deposition rates for three years. MARSED results for a fully restored pond show a root mean square deviation (RMSD) of 66.8 mm (<1) between modeled and field observations. This model was further applied to a pond breached in November 2010 and indicated that the recently breached pond will reach equilibrium levels after 60 months of tidal inundation.
Yield of bedrock wells in the Nashoba terrane, central and eastern Massachusetts
DeSimone, Leslie A.; Barbaro, Jeffrey R.
2012-01-01
The yield of bedrock wells in the fractured-bedrock aquifers of the Nashoba terrane and surrounding area, central and eastern Massachusetts, was investigated with analyses of existing data. Reported well yield was compiled for 7,287 wells from Massachusetts Department of Environmental Protection and U.S. Geological Survey databases. Yield of these wells ranged from 0.04 to 625 gallons per minute. In a comparison with data from 103 supply wells, yield and specific capacity from aquifer tests were well correlated, indicating that reported well yield was a reasonable measure of aquifer characteristics in the study area. Statistically significant relations were determined between well yield and a number of cultural and hydrogeologic factors. Cultural variables included intended water use, well depth, year of construction, and method of yield measurement. Bedrock geology, topography, surficial geology, and proximity to surface waters were statistically significant hydrogeologic factors. Yield of wells was higher in areas of granites, mafic intrusive rocks, and amphibolites than in areas of schists and gneisses or pelitic rocks; higher in valleys and low-slope areas than on hills, ridges, or high slopes; higher in areas overlain by stratified glacial deposits than in areas overlain by till; and higher in close proximity to streams, ponds, and wetlands than at greater distances from these surface-water features. Proximity to mapped faults and to lineaments from aerial photographs also were related to well yield by some measures in three quadrangles in the study area. Although the statistical significance of these relations was high, their predictive power was low, and these relations explained little of the variability in the well-yield data. Similar results were determined from a multivariate regression analysis. Multivariate regression models for the Nashoba terrane and for a three-quadrangle subarea included, as significant variables, many of the cultural and hydrogeologic factors that were individually related to well yield, in ways that are consistent with conceptual understanding of their effects, but the models explained only 21 percent (regional model for the entire terrane) and 30 percent (quadrangle model) of the overall variance in yield. Moreover, most of the explained variance was due to well characteristics rather than hydrogeologic factors. Hydrogeologic factors such as topography and geology are likely important. However, the overall high variability in the well-yield data, which results from the high variability in aquifer hydraulic properties as well as from limitations of the dataset, would make it difficult to use hydrogeologic factors to predict well yield in the study area. Geostatistical analysis (variograms), on the other hand, indicated that, although highly variable, the well-yield data are spatially correlated. The spatial continuity appears greater in the northeast-southwest direction and less in the southeast-northwest direction, directions that are parallel and perpendicular, respectively, to the regional geologic structural trends. Geostatistical analysis (kriging), used to estimate yield values throughout the study area, identified regional-scale areas of higher and lower yield that may be related to regional structural features—in particular, to a northeast-southwest trending regional fault zone within the Nashoba terrane. It also would be difficult to use kriging to predict yield at specific locations, however, because of the spatial variability in yield, particularly at small scales. The regional-scale analyses in this study, both with hydrogeologic variables and geostatistics, provide a context for understanding the variability in well yield, rather a basis for precise predictions, and site-specific information would be needed to understand local conditions.
A test of inflated zeros for Poisson regression models.
He, Hua; Zhang, Hui; Ye, Peng; Tang, Wan
2017-01-01
Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.
Odegård, J; Klemetsdal, G; Heringstad, B
2005-04-01
Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Mckay, Garrett; Huang, Wenxi; Romera-Castillo, Cristina; Crouch, Jenna E; Rosario-Ortiz, Fernando L; Jaffé, Rudolf
2017-05-16
The antioxidant capacity and formation of photochemically produced reactive intermediates (RI) was studied for water samples collected from the Florida Everglades with different spatial (marsh versus estuarine) and temporal (wet versus dry season) characteristics. Measured RI included triplet excited states of dissolved organic matter ( 3 DOM*), singlet oxygen ( 1 O 2 ), and the hydroxyl radical ( • OH). Single and multiple linear regression modeling were performed using a broad range of extrinsic (to predict RI formation rates, R RI ) and intrinsic (to predict RI quantum yields, Φ RI ) parameters. Multiple linear regression models consistently led to better predictions of R RI and Φ RI for our data set but poor prediction of Φ RI for a previously published data set,1 probably because the predictors are intercorrelated (Pearson's r > 0.5). Single linear regression models were built with data compiled from previously published studies (n ≈ 120) in which E2:E3, S, and Φ RI values were measured, which revealed a high degree of similarity between RI-optical property relationships across DOM samples of diverse sources. This study reveals that • OH formation is, in general, decoupled from 3 DOM* and 1 O 2 formation, providing supporting evidence that 3 DOM* is not a • OH precursor. Finally, Φ RI for 1 O 2 and 3 DOM* correlated negatively with antioxidant activity (a surrogate for electron donating capacity) for the collected samples, which is consistent with intramolecular oxidation of DOM moieties by 3 DOM*.
Fuel model selection for BEHAVE in midwestern oak savannas
Grabner, K.W.; Dwyer, J.P.; Cutter, B.E.
2001-01-01
BEHAVE, a fire behavior prediction system, can be a useful tool for managing areas with prescribed fire. However, the proper choice of fuel models can be critical in developing management scenarios. BEHAVE predictions were evaluated using four standardized fuel models that partially described oak savanna fuel conditions: Fuel Model 1 (Short Grass), 2 (Timber and Grass), 3 (Tall Grass), and 9 (Hardwood Litter). Although all four models yielded regressions with R2 in excess of 0.8, Fuel Model 2 produced the most reliable fire behavior predictions.
Sacks, Jason D; Ito, Kazuhiko; Wilson, William E; Neas, Lucas M
2012-10-01
With the advent of multicity studies, uniform statistical approaches have been developed to examine air pollution-mortality associations across cities. To assess the sensitivity of the air pollution-mortality association to different model specifications in a single and multipollutant context, the authors applied various regression models developed in previous multicity time-series studies of air pollution and mortality to data from Philadelphia, Pennsylvania (May 1992-September 1995). Single-pollutant analyses used daily cardiovascular mortality, fine particulate matter (particles with an aerodynamic diameter ≤2.5 µm; PM(2.5)), speciated PM(2.5), and gaseous pollutant data, while multipollutant analyses used source factors identified through principal component analysis. In single-pollutant analyses, risk estimates were relatively consistent across models for most PM(2.5) components and gaseous pollutants. However, risk estimates were inconsistent for ozone in all-year and warm-season analyses. Principal component analysis yielded factors with species associated with traffic, crustal material, residual oil, and coal. Risk estimates for these factors exhibited less sensitivity to alternative regression models compared with single-pollutant models. Factors associated with traffic and crustal material showed consistently positive associations in the warm season, while the coal combustion factor showed consistently positive associations in the cold season. Overall, mortality risk estimates examined using a source-oriented approach yielded more stable and precise risk estimates, compared with single-pollutant analyses.
Evaluation of the Carrying Capacity of Rectangular Steel-Concrete Columns
NASA Astrophysics Data System (ADS)
Vatulia, Glib; Rezunenko, Maryna; Petrenko, Dmytro; Rezunenko, Sergii
2018-06-01
Experimental studies of rectangular steel-concrete columns under centric compression with random eccentricity were conducted. The stress-strain state and the carrying capacity exhaustion have been assessed. The regression dependence is proposed to determine the maximum carrying capacity of such columns. The mathematical model takes into account the combined influence of the physical and geometric characteristics of the columns, such as their length, crosssectional area, casing thickness, prism strength of concrete, yield strength of steel, modulus of elasticity of both steel and concrete. The correspondence of the obtained model to the experimental data, as well as the significance of the regression parameters are confirmed by the Fisher and Student criteria.
Vanhove, Wouter; Maalsté, Nicole; Van Damme, Patrick
2017-07-01
Together, the Netherlands and Belgium are the largest indoor cannabis producing countries in Europe. In both countries, legal prosecution procedure of convicted illicit cannabis growers usually includes recovery of the profits gained. However, it is not easy to make a reliable estimation of the latter profits, due to the wide range of factors that determine indoor cannabis yields and eventual selling prices. In the Netherlands, since 2005, a reference model is used that assumes a constant yield (g) per plant for a given indoor cannabis plant density. Later, in 2011, a new model was developed in Belgium for yield estimation of Belgian indoor cannabis plantations that assumes a constant yield per m 2 of growth surface, provided that a number of growth conditions are met. Indoor cannabis plantations in the Netherlands and Belgium share similar technical characteristics. As a result, for indoor cannabis plantations in both countries, both aforementioned yield estimation models should yield similar yield estimations. By means of a real-case study from the Netherlands, we show that the reliability of both models is hampered by a number of flaws and unmet preconditions. The Dutch model is based on a regression equation that makes use of ill-defined plant development stages, assumes a linear plant growth, does not discriminate between different plantation size categories and does not include other important yield determining factors (such as fertilization). The Belgian model addresses some of the latter shortcomings, but its applicability is constrained by a number of pre-conditions including plantation size between 50 and 1000 plants; cultivation in individual pots with peat soil; 600W (electrical power) assimilation lamps; constant temperature between 20°C and 30°C; adequate fertilizer application and plants unaffected by pests and diseases. Judiciary in both the Netherlands and Belgium require robust indoor cannabis yield models for adequate legal prosecution of illicit indoor cannabis growth operations. To that aim, the current models should be optimized whereas the validity of their application should be examined case by case. Copyright © 2017 Elsevier B.V. All rights reserved.
Suspended-Sediment Loads and Yields in the North Santiam River Basin, Oregon, Water Years 1999-2004
Bragg, Heather M.; Sobieszczyk, Steven; Uhrich, Mark A.; Piatt, David R.
2007-01-01
The North Santiam River provides drinking water to the residents and businesses of the city of Salem, Oregon, and many surrounding communities. Since 1998, water-quality data, including turbidity, were collected continuously at monitoring stations throughout the basin as part of the North Santiam River Basin Turbidity and Suspended Sediment Study. In addition, sediment samples have been collected over a range of turbidity and streamflow values. Regression models were developed between the instream turbidity and suspended-sediment concentration from the samples collected from each monitoring station. The models were then used to estimate the daily and annual suspended-sediment loads and yields. For water years 1999-2004, suspended-sediment loads and yields were estimated for each station. Annual suspended-sediment loads and yields were highest during water years 1999 and 2000. A drought during water year 2001 resulted in the lowest suspended-sediment loads and yields for all monitoring stations. High-turbidity events that were unrelated or disproportional to increased streamflow occurred at several of the monitoring stations during the period of study. These events highlight the advantage of estimating suspended-sediment loads and yields from instream turbidity rather than from streamflow alone.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Gaussian functional regression for output prediction: Model assimilation and experimental design
NASA Astrophysics Data System (ADS)
Nguyen, N. C.; Peraire, J.
2016-03-01
In this paper, we introduce a Gaussian functional regression (GFR) technique that integrates multi-fidelity models with model reduction to efficiently predict the input-output relationship of a high-fidelity model. The GFR method combines the high-fidelity model with a low-fidelity model to provide an estimate of the output of the high-fidelity model in the form of a posterior distribution that can characterize uncertainty in the prediction. A reduced basis approximation is constructed upon the low-fidelity model and incorporated into the GFR method to yield an inexpensive posterior distribution of the output estimate. As this posterior distribution depends crucially on a set of training inputs at which the high-fidelity models are simulated, we develop a greedy sampling algorithm to select the training inputs. Our approach results in an output prediction model that inherits the fidelity of the high-fidelity model and has the computational complexity of the reduced basis approximation. Numerical results are presented to demonstrate the proposed approach.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Global Gridded Crop Model Evaluation: Benchmarking, Skills, Deficiencies and Implications.
NASA Technical Reports Server (NTRS)
Muller, Christoph; Elliott, Joshua; Chryssanthacopoulos, James; Arneth, Almut; Balkovic, Juraj; Ciais, Philippe; Deryng, Delphine; Folberth, Christian; Glotter, Michael; Hoek, Steven;
2017-01-01
Crop models are increasingly used to simulate crop yields at the global scale, but so far there is no general framework on how to assess model performance. Here we evaluate the simulation results of 14 global gridded crop modeling groups that have contributed historic crop yield simulations for maize, wheat, rice and soybean to the Global Gridded Crop Model Intercomparison (GGCMI) of the Agricultural Model Intercomparison and Improvement Project (AgMIP). Simulation results are compared to reference data at global, national and grid cell scales and we evaluate model performance with respect to time series correlation, spatial correlation and mean bias. We find that global gridded crop models (GGCMs) show mixed skill in reproducing time series correlations or spatial patterns at the different spatial scales. Generally, maize, wheat and soybean simulations of many GGCMs are capable of reproducing larger parts of observed temporal variability (time series correlation coefficients (r) of up to 0.888 for maize, 0.673 for wheat and 0.643 for soybean at the global scale) but rice yield variability cannot be well reproduced by most models. Yield variability can be well reproduced for most major producing countries by many GGCMs and for all countries by at least some. A comparison with gridded yield data and a statistical analysis of the effects of weather variability on yield variability shows that the ensemble of GGCMs can explain more of the yield variability than an ensemble of regression models for maize and soybean, but not for wheat and rice. We identify future research needs in global gridded crop modeling and for all individual crop modeling groups. In the absence of a purely observation-based benchmark for model evaluation, we propose that the best performing crop model per crop and region establishes the benchmark for all others, and modelers are encouraged to investigate how crop model performance can be increased. We make our evaluation system accessible to all crop modelers so that other modeling groups can also test their model performance against the reference data and the GGCMI benchmark.
Climatically driven yield variability of major crops in Khakassia (South Siberia)
NASA Astrophysics Data System (ADS)
Babushkina, Elena A.; Belokopytova, Liliana V.; Zhirnova, Dina F.; Shah, Santosh K.; Kostyakova, Tatiana V.
2018-06-01
We investigated the variability of yield of the three main crop cultures in the Khakassia Republic: spring wheat, spring barley, and oats. In terms of yield values, variability characteristics, and climatic response, the agricultural territory of Khakassia can be divided into three zones: (1) the Northern Zone, where crops yield has a high positive response to the amount of precipitation, May-July, and a moderately negative one to the temperatures of the same period; (2) the Central Zone, where crops yield depends mainly on temperatures; and (3) the Southern Zone, where climate has the least expressed impact on yield. The dominant pattern in the crops yield is caused by water stress during periods of high temperatures and low moisture supply with heat stress as additional reason. Differences between zones are due to combinations of temperature latitudinal gradient, precipitation altitudinal gradient, and the presence of a well-developed hydrological network and the irrigational system as moisture sources in the Central Zone. More detailed analysis shows differences in the climatic sensitivity of crops during phases of their vegetative growth and grain development and, to a lesser extent, during harvesting period. Multifactor linear regression models were constructed to estimate climate- and autocorrelation-induced variability of the crops yield. These models allowed prediction of the possibility of yield decreasing by at least 2-11% in the next decade due to increasing of the regional summer temperatures.
Climatically driven yield variability of major crops in Khakassia (South Siberia)
NASA Astrophysics Data System (ADS)
Babushkina, Elena A.; Belokopytova, Liliana V.; Zhirnova, Dina F.; Shah, Santosh K.; Kostyakova, Tatiana V.
2017-12-01
We investigated the variability of yield of the three main crop cultures in the Khakassia Republic: spring wheat, spring barley, and oats. In terms of yield values, variability characteristics, and climatic response, the agricultural territory of Khakassia can be divided into three zones: (1) the Northern Zone, where crops yield has a high positive response to the amount of precipitation, May-July, and a moderately negative one to the temperatures of the same period; (2) the Central Zone, where crops yield depends mainly on temperatures; and (3) the Southern Zone, where climate has the least expressed impact on yield. The dominant pattern in the crops yield is caused by water stress during periods of high temperatures and low moisture supply with heat stress as additional reason. Differences between zones are due to combinations of temperature latitudinal gradient, precipitation altitudinal gradient, and the presence of a well-developed hydrological network and the irrigational system as moisture sources in the Central Zone. More detailed analysis shows differences in the climatic sensitivity of crops during phases of their vegetative growth and grain development and, to a lesser extent, during harvesting period. Multifactor linear regression models were constructed to estimate climate- and autocorrelation-induced variability of the crops yield. These models allowed prediction of the possibility of yield decreasing by at least 2-11% in the next decade due to increasing of the regional summer temperatures.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Extraction of anthocyanins from red cabbage using high pressure CO2.
Xu, Zhenzhen; Wu, Jihong; Zhang, Yan; Hu, Xiaosong; Liao, Xiaojun; Wang, Zhengfu
2010-09-01
The extraction kinetics of anthocyanins from red cabbage using high pressure CO(2) (HPCD) against conventional acidified water (CAW) was investigated. The HPCD time, temperature, pressure and volume ratio of solid-liquid mixture vs. pressurized CO(2) (R((S+L)/G)) exhibited important roles on the extraction kinetics of anthocyanins. The extraction kinetics showed two phases, the yield increased with increasing the time in the first phase, the yield defined as steady-state yield (y(*)) was constant in the second phase. The y(*) of anthocyanins using HPCD increased with higher temperature, higher pressure and lower R((S+L)/G). The general mass transfer model with higher regression coefficients (R(2)>0.97) fitted the kinetic data better than the Fick's second law diffusion model. As compared with CAW, the time (t(*)) to reach the y(*) of anthocyanins using HPCD was reduced by half while its corresponding overall volumetric mass transfer coefficients k(L)xa from the general mass transfer model increased by two folds. Copyright 2010 Elsevier Ltd. All rights reserved.
Electricity Load Forecasting Using Support Vector Regression with Memetic Algorithms
Hu, Zhongyi; Xiong, Tao
2013-01-01
Electricity load forecasting is an important issue that is widely explored and examined in power systems operation literature and commercial transactions in electricity markets literature as well. Among the existing forecasting models, support vector regression (SVR) has gained much attention. Considering the performance of SVR highly depends on its parameters; this study proposed a firefly algorithm (FA) based memetic algorithm (FA-MA) to appropriately determine the parameters of SVR forecasting model. In the proposed FA-MA algorithm, the FA algorithm is applied to explore the solution space, and the pattern search is used to conduct individual learning and thus enhance the exploitation of FA. Experimental results confirm that the proposed FA-MA based SVR model can not only yield more accurate forecasting results than the other four evolutionary algorithms based SVR models and three well-known forecasting models but also outperform the hybrid algorithms in the related existing literature. PMID:24459425
Electricity load forecasting using support vector regression with memetic algorithms.
Hu, Zhongyi; Bao, Yukun; Xiong, Tao
2013-01-01
Electricity load forecasting is an important issue that is widely explored and examined in power systems operation literature and commercial transactions in electricity markets literature as well. Among the existing forecasting models, support vector regression (SVR) has gained much attention. Considering the performance of SVR highly depends on its parameters; this study proposed a firefly algorithm (FA) based memetic algorithm (FA-MA) to appropriately determine the parameters of SVR forecasting model. In the proposed FA-MA algorithm, the FA algorithm is applied to explore the solution space, and the pattern search is used to conduct individual learning and thus enhance the exploitation of FA. Experimental results confirm that the proposed FA-MA based SVR model can not only yield more accurate forecasting results than the other four evolutionary algorithms based SVR models and three well-known forecasting models but also outperform the hybrid algorithms in the related existing literature.
Ferrell, Gloria M.
2001-01-01
Transport rates for total solids, total nitrogen, total phosphorus, biochemical oxygen demand, chromium, copper, lead, nickel, and zinc during 1994–98 were computed for six stormwater-monitoring sites in Mecklenburg County, North Carolina. These six stormwater-monitoring sites were operated by the Mecklenburg County Department of Environmental Protection, in cooperation with the City of Charlotte, and are located near the mouths of major streams. Constituent transport at the six study sites generally was dominated by nonpoint sources, except for nitrogen and phosphorus at two sites located downstream from the outfalls of major municipal wastewater-treatment plants.To relate land use to constituent transport, regression equations to predict constituent yield were developed by using water-quality data from a previous study of nine stormwater-monitoring sites on small streams in Mecklenburg County. The drainage basins of these nine stormwater sites have relatively homogeneous land-use characteristics compared to the six study sites. Mean annual construction activity, based on building permit files, was estimated for all stormwater-monitoring sites and included as an explanatory variable in the regression equations. These regression equations were used to predict constituent yield for the six study sites. Predicted yields generally were in agreement with computed yields. In addition, yields were predicted by using regression equations derived from a national urban water-quality database. Yields predicted from the regional regression equations generally were about an order of magnitude lower than computed yields.Regression analysis indicated that construction activity was a major contributor to transport of the constituents evaluated in this study except for total nitrogen and biochemical oxygen demand. Transport of total nitrogen and biochemical oxygen demand was dominated by point-source contributions. The two study basins that had the largest amounts of construction activity also had the highest total solids yields (1,300 and 1,500 tons per square mile per year). The highest total phosphorus yields (3.2 and 1.7 tons per square mile per year) attributable to nonpoint sources also occurred in these basins. Concentrations of chromium, copper, lead, nickel, and zinc were positively correlated with total solids concentrations at most of the study sites (Pearson product-moment correlation >0.50). The site having the highest median concentrations of chromium, copper, and nickel also was the site having the highest computed yield for total solids.
Yield of illicit indoor cannabis cultivation in the Netherlands.
Toonen, Marcel; Ribot, Simon; Thissen, Jac
2006-09-01
To obtain a reliable estimation on the yield of illicit indoor cannabis cultivation in The Netherlands, cannabis plants confiscated by the police were used to determine the yield of dried female flower buds. The developmental stage of flower buds of the seized plants was described on a scale from 1 to 10 where the value of 10 indicates a fully developed flower bud ready for harvesting. Using eight additional characteristics describing the grow room and cultivation parameters, regression analysis with subset selection was carried out to develop two models for the yield of indoor cannabis cultivation. The median Dutch illicit grow room consists of 259 cannabis plants, has a plant density of 15 plants/m(2), and 510 W of growth lamps per m(2). For the median Dutch grow room, the predicted yield of female flower buds at the harvestable developmental stage (stage 10) was 33.7 g/plant or 505 g/m(2).
Bayesian Correction for Misclassification in Multilevel Count Data Models.
Nelson, Tyler; Song, Joon Jin; Chin, Yoo-Mi; Stamey, James D
2018-01-01
Covariate misclassification is well known to yield biased estimates in single level regression models. The impact on hierarchical count models has been less studied. A fully Bayesian approach to modeling both the misclassified covariate and the hierarchical response is proposed. Models with a single diagnostic test and with multiple diagnostic tests are considered. Simulation studies show the ability of the proposed model to appropriately account for the misclassification by reducing bias and improving performance of interval estimators. A real data example further demonstrated the consequences of ignoring the misclassification. Ignoring misclassification yielded a model that indicated there was a significant, positive impact on the number of children of females who observed spousal abuse between their parents. When the misclassification was accounted for, the relationship switched to negative, but not significant. Ignoring misclassification in standard linear and generalized linear models is well known to lead to biased results. We provide an approach to extend misclassification modeling to the important area of hierarchical generalized linear models.
Annual Corn Yield Estimation through Multi-temporal MODIS Data
NASA Astrophysics Data System (ADS)
Shao, Y.; Zheng, B.; Campbell, J. B.
2013-12-01
This research employed 13 years of the Moderate Resolution Imaging Spectroradiometer (MODIS) to estimate annual corn yield for the Midwest of the United States. The overall objective of this study was to examine if annual corn yield could be accurately predicted using MODIS time-series NDVI (Normalized Difference Vegetation Index) and ancillary data such monthly precipitation and temperature. MODIS-NDVI 16-Day composite images were acquired from the USGS EROS Data Center for calendar years 2000 to 2012. For the same time-period, county level corn yield statistics were obtained from the National Agricultural Statistics Service (NASS). The monthly precipitation and temperature measures were derived from Precipitation-Elevation Regressions on Independent Slopes Model (PRISM) climate data. A cropland mask was derived using 2006 National Land Cover Database. For each county and within the cropland mask, the MODIS-NDVI time-series data and PRISM climate data were spatially averaged, at their respective time steps. We developed a random forest predictive model with the MODIS-NDVI and climate data as predictors and corn yield as response. To assess the model accuracy, we used twelve years of data as training and the remaining year as hold-out testing set. The training and testing procedures were repeated 13 times. The R2 ranged from 0.72 to 0.83 for testing years. It was also found that the inclusion of climate data did not improve the model predictive performance. MODIS-NDVI time-series data alone might provide sufficient information for county level corn yield prediction.
Peak-flow characteristics of Virginia streams
Austin, Samuel H.; Krstolic, Jennifer L.; Wiegand, Ute
2011-01-01
Peak-flow annual exceedance probabilities, also called probability-percent chance flow estimates, and regional regression equations are provided describing the peak-flow characteristics of Virginia streams. Statistical methods are used to evaluate peak-flow data. Analysis of Virginia peak-flow data collected from 1895 through 2007 is summarized. Methods are provided for estimating unregulated peak flow of gaged and ungaged streams. Station peak-flow characteristics identified by fitting the logarithms of annual peak flows to a Log Pearson Type III frequency distribution yield annual exceedance probabilities of 0.5, 0.4292, 0.2, 0.1, 0.04, 0.02, 0.01, 0.005, and 0.002 for 476 streamgaging stations. Stream basin characteristics computed using spatial data and a geographic information system are used as explanatory variables in regional regression model equations for six physiographic regions to estimate regional annual exceedance probabilities at gaged and ungaged sites. Weighted peak-flow values that combine annual exceedance probabilities computed from gaging station data and from regional regression equations provide improved peak-flow estimates. Text, figures, and lists are provided summarizing selected peak-flow sites, delineated physiographic regions, peak-flow estimates, basin characteristics, regional regression model equations, error estimates, definitions, data sources, and candidate regression model equations. This study supersedes previous studies of peak flows in Virginia.
Estimates of genetic parameters and eigenvector indices for milk production of Holstein cows.
Savegnago, R P; Rosa, G J M; Valente, B D; Herrera, L G G; Carneiro, R L R; Sesana, R C; El Faro, L; Munari, D P
2013-01-01
The objectives of the present study were to estimate genetic parameters of monthly test-day milk yield (TDMY) of the first lactation of Brazilian Holstein cows using random regression (RR), and to compare the genetic gains for milk production and persistency, derived from RR models, using eigenvector indices and selection indices that did not consider eigenvectors. The data set contained monthly TDMY of 3,543 first lactations of Brazilian Holstein cows calving between 1994 and 2011. The RR model included the fixed effect of the contemporary group (herd-month-year of test days), the covariate calving age (linear and quadratic effects), and a fourth-order regression on Legendre orthogonal polynomials of days in milk (DIM) to model the population-based mean curve. Additive genetic and nongenetic animal effects were fit as RR with 4 classes of residual variance random effect. Eigenvector indices based on the additive genetic RR covariance matrix were used to evaluate the genetic gains of milk yield and persistency compared with the traditional selection index (selection index based on breeding values of milk yield until 305 DIM). The heritability estimates for monthly TDMY ranged from 0.12 ± 0.04 to 0.31 ± 0.04. The estimates of additive genetic and nongenetic animal effects correlation were close to 1 at adjacent monthly TDMY, with a tendency to diminish as the time between DIM classes increased. The first eigenvector was related to the increase of the genetic response of the milk yield and the second eigenvector was related to the increase of the genetic gains of the persistency but it contributed to decrease the genetic gains for total milk yield. Therefore, using this eigenvector to improve persistency will not contribute to change the shape of genetic curve pattern. If the breeding goal is to improve milk production and persistency, complete sequential eigenvector indices (selection indices composite with all eigenvectors) could be used with higher economic values for persistency. However, if the breeding goal is to improve only milk yield, the traditional selection index is indicated. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Vergara-Diaz, O.; Obata, T., Sr.; Kefauver, S. C.; Fernie, A., Sr.; Araus, J. L.
2017-12-01
The advance on metabolomics has led to a better understanding of plant-environment interactions and how the levels of specific metabolites may be used as indicators of plant performance. In cereals, the accumulation of certain metabolites -such as proline and sugars- has been related with water stress and drought tolerance/susceptibility, even revealing significant relationships with yield. On the other hand, recent studies relating plant biochemicals with spectral reflectance open the door to a deep assessment of plant status which would have implications on plant breeding and ecosystem studies. In this study, we investigated in durum wheat the relationship between the reflectance in the visible and near infrared regions (400-2500 µm wavelength) of the spectrum of the flag leaf, the ears and canopy levels with their respective metabolite profiles as well as its relationship with yield. To this aim, five durum wheat genotypes grown in four environments in the field were examined. PLS regression models indicated a strong determination of yield by using the spectrum of either leaves, ears and canopy. Additionally, grain yield was strongly predicted by the metabolite content of leaves and ears with multivariate regression analysis. Further preliminary results showed a promising performance of hyperspectral remote-proximal sensing for the calibration of plant metabolite content.
Ngo, Long H; Inouye, Sharon K; Jones, Richard N; Travison, Thomas G; Libermann, Towia A; Dillon, Simon T; Kuchel, George A; Vasunilashorn, Sarinnapha M; Alsop, David C; Marcantonio, Edward R
2017-06-06
The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study. We present methodological considerations in a matched NCC design and analysis, which include the choice of match algorithms, analysis methods to evaluate the association of exposures of interest with outcomes, and consideration of overmatching. Matched, NCC design within a longitudinal observational prospective cohort study in the setting of two academic hospitals. Study participants are patients aged over 70 years who underwent scheduled major non-cardiac surgery. The primary outcome was postoperative delirium from in-hospital interviews and medical record review. The main exposure was IL-6 concentration (pg/ml) from blood sampled at three time points before delirium occurred. We used nonparametric signed ranked test to test for the median of the paired differences. We used conditional logistic regression to model the risk of IL-6 on delirium incidence. Simulation was used to generate a sample of cohort data on which unconditional multivariable logistic regression was used, and the results were compared to those of the conditional logistic regression. Partial R-square was used to assess the level of overmatching. We found that the optimal match algorithm yielded more matched pairs than the greedy algorithm. The choice of analytic strategy-whether to consider measured cytokine levels as the predictor or outcome-- yielded inferences that have different clinical interpretations but similar levels of statistical significance. Estimation results from NCC design using conditional logistic regression, and from simulated cohort design using unconditional logistic regression, were similar. We found minimal evidence for overmatching. Using a matched NCC approach introduces methodological challenges into the study design and data analysis. Nonetheless, with careful selection of the match algorithm, match factors, and analysis methods, this design is cost effective and, for our study, yields estimates that are similar to those from a prospective cohort study design.
Barrett, Bruce; Brown, Roger; Mundt, Marlon
2008-02-01
Evaluative health-related quality-of-life instruments used in clinical trials should be able to detect small but important changes in health status. Several approaches to minimal important difference (MID) and responsiveness have been developed. To compare anchor-based and distributional approaches to important difference and responsiveness for the Wisconsin Upper Respiratory Symptom Survey (WURSS), an illness-specific quality of life outcomes instrument. Participants with community-acquired colds self-reported daily using the WURSS-44. Distribution-based methods calculated standardized effect size (ES) and standard error of measurement (SEM). Anchor-based methods compared daily interval changes to global ratings of change, using: (1) standard MID methods based on correspondence to ratings of "a little better" or "somewhat better," and (2) two-level multivariate regression models. About 150 adults were monitored throughout their colds (1,681 sick days.): 88% were white, 69% were women, and 50% had completed college. The mean age was 35.5 years (SD = 14.7). WURSS scores increased 2.2 points from the first to second day, and then dropped by an average of 8.2 points per day from days 2 to 7. The SEM averaged 9.1 during these 7 days. Standard methods yielded a between day MID of 22 points. Regression models of MID projected 11.3-point daily changes. Dividing these estimates of small-but-important-difference by pooled SDs yielded coefficients of .425 for standard MID, .218 for regression model, .177 for SEM, and .157 for ES. These imply per-group sample sizes of 870 using ES, 616 for SEM, 302 for regression model, and 89 for standard MID, assuming alpha = .05, beta = .20 (80% power), and two-tailed testing. Distribution and anchor-based approaches provide somewhat different estimates of small but important difference, which in turn can have substantial impact on trial design.
NASA Astrophysics Data System (ADS)
Fujisawa, Mariko; Kanamaru, Hideki
2016-04-01
Agriculture is vulnerable to environmental changes, and climate change has been recognized as one of the most devastating factors. In many developing countries, however, few studies have focused on nation-wide assessment of crop yield and crop suitability in the future, and hence there is a large pressure on science to provide policy makers with solid predictions for major crops in the countries in support of climate risk management policies and programmes. FAO has developed the tool MOSAICC (Modelling System for Agricultural Impacts of Climate Change) where statistical climate downscaling is combined with crop yield projections under climate change scenarios. Three steps are required to get the results: 1. The historical meteorological data such as temperature and precipitation for about 30 years were collected, and future climates were statistically downscaled to the local scale, 2. The historical crop yield data were collected and regression functions were made to estimate the yield by using observed climatic data and water balance during the growing period for each crop, and 3. The yield changes in the future were estimated by using the future climate data, produced by the first step, as an input to the yield regression functions. The yield was first simulated at sub-national scale and aggregated to national scale, which is intended to provide national policies with adaptation options. The methodology considers future changes in characteristics of extreme weather events as the climate projections are on daily scale while crop simulations are on 10-daily scale. Yields were simulated with two greenhouse gas concentration pathways (RCPs) for three GCMs per crop to account for uncertainties in projections. The crop assessment constitutes a larger multi-disciplinary assessment of climate change impacts on agriculture and vulnerability of livelihoods in terms of food security (e.g. water resources, agriculture market, household-level food security from socio-economic perspective). In our presentation we will show the cases of Peru and the Philippines, and discuss the implications for agriculture policies and risk management.
NASA Technical Reports Server (NTRS)
French, V.
1983-01-01
A comparison was made among the CEAS crop reporting district (CRD), agrophysical unit (APU), and state level multiple regression yield models for corn and soybeans in Iowa and barley and spring wheat in North Dakota. The best predictions were made by the state model for North Dakota spring wheat, by the APU models for barley, by the CRD models for Iowa soybeans, and by APU covariance models for Iowa corn. Because of this lack of consistency of model performance, CRD models would be recommended due to the availability of the data.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
NASA Astrophysics Data System (ADS)
Yan, Maoling; Liu, Pingzeng; Zhang, Chao; Zheng, Yong; Wang, Xizhi; Zhang, Yan; Chen, Weijie; Zhao, Rui
2018-01-01
Agroclimatological resources provide material and energy for agricultural production. This study is aimed to analyze the impact of selected climate factors change on wheat yield over the different growth period applied quantitatively method, by comparing two different time division modules of wheat growth cycle- monthly empirical-statistical multiple regression models ( From October to June of next year ) and growth stage empirical-statistical multiple regression models (Including sowing stage, seedling stage, tillering stage, overwintering period, regreening period, jointing stage, heading stage, maturity stage) analysis of relationship between agrometeorological data and growth stage records and winter wheat production in Yanzhou, Shandong Province of China. Correlation analysis(CA)was done for 35 years (from 1981 to 2015) between crop yield and corresponding weather parameters including daily mean temperature, sunshine duration, and average daily precipitation selected from 18 different meteorological factors. The results shows that the greatest impact on the winter wheat yield is the precipitation overwintering period in this area, each 1mm increase in daily mean rainfall was associated with 201.64 kg/hm2 lowered output. Moreover, the temperature and sunshine duration in heading period and maturity stage also exert significant influence on the output, every 1°C increase in daily mean temperature was associated with 199.85kg/hm2 adding output, every 1h increase in mean sunshine duration was associated with 130.68kg/hm2 reduced output. Comparing with the results of experiment which using months as step sizes and using farming as step sizes was in better agreement with the fluctuation in meteorological yield, offered a better explanation on the growth mechanism of wheat. Eventually the results indicated that 3 factors affects the yield during different growing periods of wheat in different extent and provided more specific reference to guide the agricultural production management in this area.
Multivariate Statistical Models for Predicting Sediment Yields from Southern California Watersheds
Gartner, Joseph E.; Cannon, Susan H.; Helsel, Dennis R.; Bandurraga, Mark
2009-01-01
Debris-retention basins in Southern California are frequently used to protect communities and infrastructure from the hazards of flooding and debris flow. Empirical models that predict sediment yields are used to determine the size of the basins. Such models have been developed using analyses of records of the amount of material removed from debris retention basins, associated rainfall amounts, measures of watershed characteristics, and wildfire extent and history. In this study we used multiple linear regression methods to develop two updated empirical models to predict sediment yields for watersheds located in Southern California. The models are based on both new and existing measures of volume of sediment removed from debris retention basins, measures of watershed morphology, and characterization of burn severity distributions for watersheds located in Ventura, Los Angeles, and San Bernardino Counties. The first model presented reflects conditions in watersheds located throughout the Transverse Ranges of Southern California and is based on volumes of sediment measured following single storm events with known rainfall conditions. The second model presented is specific to conditions in Ventura County watersheds and was developed using volumes of sediment measured following multiple storm events. To relate sediment volumes to triggering storm rainfall, a rainfall threshold was developed to identify storms likely to have caused sediment deposition. A measured volume of sediment deposited by numerous storms was parsed among the threshold-exceeding storms based on relative storm rainfall totals. The predictive strength of the two models developed here, and of previously-published models, was evaluated using a test dataset consisting of 65 volumes of sediment yields measured in Southern California. The evaluation indicated that the model developed using information from single storm events in the Transverse Ranges best predicted sediment yields for watersheds in San Bernardino, Los Angeles, and Ventura Counties. This model predicts sediment yield as a function of the peak 1-hour rainfall, the watershed area burned by the most recent fire (at all severities), the time since the most recent fire, watershed area, average gradient, and relief ratio. The model that reflects conditions specific to Ventura County watersheds consistently under-predicted sediment yields and is not recommended for application. Some previously-published models performed reasonably well, while others either under-predicted sediment yields or had a larger range of errors in the predicted sediment yields.
Yield and yield gaps in central U.S. corn production systems
USDA-ARS?s Scientific Manuscript database
The magnitude of yield gaps (YG) (potential yield – farmer yield) provides some indication of the prospects for increasing crop yield. Quantile regression analysis was applied to county maize (Zea mays L.) yields (1972 – 2011) from Kentucky, Iowa and Nebraska (irrigated) (total of 115 counties) to e...
Yield gaps and yield relationships in US soybean production systems
USDA-ARS?s Scientific Manuscript database
The magnitude of yield gaps (YG) (potential yield – farmer yield) provides some indication of the prospects for increasing crop yield to meet the food demands of future populations. Quantile regression analysis was applied to county soybean [Glycine max (L.) Merrill] yields (1971 – 2011) from Kentuc...
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan; Clegg, Samuel M.; Frydenvang, Jens; Wiens, Roger C.; McLennan, Scott M.; Morris, Richard V.; Ehlmann, Bethany L.; Dyar, M. Darby
2017-01-01
Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response of an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “sub-model” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. The sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.
Quantifying the impacts of climatic trend and fluctuation on crop yields in northern China.
Qiao, Jianmin; Yu, Deyong; Liu, Yupeng
2017-10-01
Climate change plays a critical role in crop yield variations, which has attracted a great deal of concern worldwide. However, the mechanisms of how climatic trend and fluctuations affect crop yields are not well understood and need to be further investigated. Thus, using the GIS-based Environmental Policy Integrated Climate (EPIC) model, we simulated the yields of major crops (i.e., wheat, maize, and rice) and evaluated the impacts of climatic factors on crop yields in the Agro-Pastoral Transitional Zone (APTZ) of northern China between 1980 and 2010. The partial least squares regression model was used to assess the contribution rates of climatic factors (i.e., precipitation, photosynthetically active radiation (PAR), minimum temperature (T min ), maximum temperature (T max )) to the variation of crop yields. The Breaks for Additive Season and Trend (BFAST) model was adopted to decompose the climate factors into trend and fluctuation components, and the relative contributions of climate trend and fluctuation were then evaluated. The results indicated that the contributions of climatic factors to yield variations of wheat, maize, and rice were 31.7, 37.7, and 23.1%, respectively. That is, climate change had larger impacts on maize than wheat and rice. More cultivated areas were significantly and positively correlated with precipitation than with other climatic factors due to the limited precipitation in the APTZ. Also, climatic trend component had positive impacts on crop yields in the whole region, whereas the climate fluctuation was associated mainly with the areas where the crop yields decreased. This study helps improve our understanding of the mechanisms of climate change impacts on crop yields, and provides useful scientific information for designing regional-scale strategies of adaptation to climate change.
NASA Technical Reports Server (NTRS)
Maughan, P. M. (Principal Investigator)
1973-01-01
The author has identified the following significant results. Linear regression of secchi disc visibility against number of sets yielded significant results in a number of instances. The variability seen in the slope of the regression lines is due to the nonuniformity of sample size. The longer the period sampled, the larger the total number of attempts. Further, there is no reason to expect either the influence of transparency or of other variables to remain constant throughout the season. However, the fact that the data for the entire season, variable as it is, was significant at the 5% level, suggests its potential utility for predictive modeling. Thus, this regression equation will be considered representative and will be utilized for the first numerical model. Secchi disc visibility was also regressed against number of sets for the three day period September 27-September 29, 1972 to determine if surface truth data supported the intense relationship between ERTS-1 identified turbidity and fishing effort previously discussed. A very negative correlation was found. These relationship lend additional credence to the hypothesis that ERTS imagery, when utilized as a source of visibility (turbidity) data, may be useful as a predictive tool.
USDA-ARS?s Scientific Manuscript database
Chamomile (Matricaria chamomilla L.) is one of the most widely spread and used medicinal and essential oil crop in the world. Chamomile essential oil is extracted via steam distillation of the inflorescences (flowers). In this study, distillation time (DT) was found to be a crucial determinant of yi...
NASA Astrophysics Data System (ADS)
Caimmi, R.
2011-08-01
Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both heteroscedastic and homoscedastic data. Conversely, samples related to different methods produce discrepant results, due to the presence of (still undetected) systematic errors, which implies no definitive statement can be made at present. A comparison is also made between different expressions of regression line slope and intercept variance estimators, where fractional discrepancies are found to be not exceeding a few percent, which grows up to about 20% in the presence of large dispersion data. An extension of the formalism to structural models is left to a forthcoming paper.
Gunaseelan, Victor Nallathambi
2014-02-01
In this study, I investigated the chemical characteristics, biochemical methane potential, conversion kinetics and biodegradability of untreated and NaOH-treated Pongamia plant parts, and pod husk and press cake from the biodiesel industry to evaluate their suitability as an alternative feedstock for biogas production. The untreated Pongamia seeds exhibited the maximum CH4 yield of 473 ml g (-1) volatile solid (VS) added. Yellow, withered leaves gave a yield as low as 122 ml CH4 g (-1) VS added. There were significant variations in the CH4 production rate constants, which ranged from 0.02 to 0.15 d (-1), and biodegradability, which ranged from 0.25 to 0.98. NaOH treatment of leaf and pod husk, which were highly rich in fibers, increased the yields by 15-22% and CH4 production rate constants by 20-75%. Utilization of Pongamia wastes in biogas digesters not only influences the economics of biodiesel production but also yields CH4 fuel and protects the environment. The experimental data from this study were used to develop a multiple regression model, which could estimate biodegradability based on biochemical characteristics. The model predicted the biodegradability of previously published biomass wastes (r(2) = 0.88) from their biochemical composition. The theoretical CH4 yields estimated as 350 ml g(-1) chemical oxygen demand destroyed are much higher than the experimental yields as 100% biodegradability is assumed for each substrate. Upon correcting the theoretical CH4 yields with biodegradability data obtained from chemical analyses of substrates, their ultimate CH4 yields could be predicted rapidly.
Padilha, Alessandro Haiduck; Cobuci, Jaime Araujo; Costa, Cláudio Napolis; Neto, José Braccini
2016-01-01
The aim of this study was to compare two random regression models (RRM) fitted by fourth (RRM4) and fifth-order Legendre polynomials (RRM5) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike’s information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (−2LogL) were for RRM4. Heritability for 305-day milk yield (305MY) was 0.23 (RRM4), 0.24 (RRM5), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from RRM4 and RRM5 were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values. PMID:26954176
Padilha, Alessandro Haiduck; Cobuci, Jaime Araujo; Costa, Cláudio Napolis; Neto, José Braccini
2016-06-01
The aim of this study was to compare two random regression models (RRM) fitted by fourth (RRM4) and fifth-order Legendre polynomials (RRM5) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike's information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (-2LogL) were for RRM4. Heritability for 305-day milk yield (305MY) was 0.23 (RRM4), 0.24 (RRM5), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from RRM4 and RRM5 were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values.
Optimization of fermentation conditions for alcohol production.
Bowman, L; Geiger, E
1984-12-01
The quantitative effects of carbohydrate levels, degree of initial saccharification, glucoamylase dosage, temperature, and fermentation time were investigated using a Box-Wilson central composite design protocol. With Saccharomyces cerevisiae ATCC 4126, it was found that the use of a partially saccharified starch substrate markedly increased yields and attainable alcohol levels. Balancing the degree of initial saccharification with the level of glucoamylase used to complete hydrolysis was found necessary to obtain optimum yields. The temperature optimum was found to be 36 degrees C. The regression equations obtained were used to model the fermentation in order to determine optimum fermentation conditions.
Hansson, Lisbeth; Khamis, Harry J
2008-12-01
Simulated data sets are used to evaluate conditional and unconditional maximum likelihood estimation in an individual case-control design with continuous covariates when there are different rates of excluded cases and different levels of other design parameters. The effectiveness of the estimation procedures is measured by method bias, variance of the estimators, root mean square error (RMSE) for logistic regression and the percentage of explained variation. Conditional estimation leads to higher RMSE than unconditional estimation in the presence of missing observations, especially for 1:1 matching. The RMSE is higher for the smaller stratum size, especially for the 1:1 matching. The percentage of explained variation appears to be insensitive to missing data, but is generally higher for the conditional estimation than for the unconditional estimation. It is particularly good for the 1:2 matching design. For minimizing RMSE, a high matching ratio is recommended; in this case, conditional and unconditional logistic regression models yield comparable levels of effectiveness. For maximizing the percentage of explained variation, the 1:2 matching design with the conditional logistic regression model is recommended.
Geospatial modeling of plant stable isotope ratios - the development of isoscapes
NASA Astrophysics Data System (ADS)
West, J. B.; Ehleringer, J. R.; Hurley, J. M.; Cerling, T. E.
2007-12-01
Large-scale spatial variation in stable isotope ratios can yield critical insights into the spatio-temporal dynamics of biogeochemical cycles, animal movements, and shifts in climate, as well as anthropogenic activities such as commerce, resource utilization, and forensic investigation. Interpreting these signals requires that we understand and model the variation. We report progress in our development of plant stable isotope ratio landscapes (isoscapes). Our approach utilizes a GIS, gridded datasets, a range of modeling approaches, and spatially distributed observations. We synthesize findings from four studies to illustrate the general utility of the approach, its ability to represent observed spatio-temporal variability in plant stable isotope ratios, and also outline some specific areas of uncertainty. We also address two basic, but critical questions central to our ability to model plant stable isotope ratios using this approach: 1. Do the continuous precipitation isotope ratio grids represent reasonable proxies for plant source water?, and 2. Do continuous climate grids (as is or modified) represent a reasonable proxy for the climate experienced by plants? Plant components modeled include leaf water, grape water (extracted from wine), bulk leaf material ( Cannabis sativa; marijuana), and seed oil ( Ricinus communis; castor bean). Our approaches to modeling the isotope ratios of these components varied from highly sophisticated process models to simple one-step fractionation models to regression approaches. The leaf water isosocapes were produced using steady-state models of enrichment and continuous grids of annual average precipitation isotope ratios and climate. These were compared to other modeling efforts, as well as a relatively sparse, but geographically distributed dataset from the literature. The latitudinal distributions and global averages compared favorably to other modeling efforts and the observational data compared well to model predictions. These results yield confidence in the precipitation isoscapes used to represent plant source water, the modified climate grids used to represent leaf climate, and the efficacy of this approach to modeling. Further work confirmed these observations. The seed oil isoscape was produced using a simple model of lipid fractionation driven with the precipitation grid, and compared well to widely distributed observations of castor bean oil, again suggesting that the precipitation grids were reasonable proxies for plant source water. The marijuana leaf δ2H observations distributed across the continental United States were regressed against the precipitation δ2H grids and yielded a strong relationship between them, again suggesting that plant source water was reasonably well represented by the precipitation grid. Finally, the wine water δ18O isoscape was developed from regressions that related precipitation isotope ratios and climate to observations from a single vintage. Favorable comparisons between year-specific wine water isoscapes and inter-annual variations in previous vintages yielded confidence in the climate grids. Clearly significant residual variability remains to be explained in all of these cases and uncertainties vary depending on the component modeled, but we conclude from this synthesis that isoscapes are capable of representing real spatial and temporal variability in plant stable isotope ratios.
Farrelly, Matthew C; Loomis, Brett R; Mann, Nathan H
2007-10-01
We used scanner data on cigarette prices and sales collected from supermarkets across the United States from 1994 to 2004 to test the hypothesis that cigarette prices are positively correlated with sales of cigarettes with higher tar and nicotine content. During this period the average inflation-adjusted price for menthol cigarettes increased 55.8%. Price elasticities from multivariate regression models suggest that this price increase led to an increase of 1.73% in sales-weighted average tar yields and a 1.28% increase in sales-weighted average nicotine yields for menthol cigarettes. The 50.5% price increase of nonmenthol varieties over the same period yielded an estimated increase of 1% in tar per cigarette but no statistically significant increase in nicotine yields. An ordered probit model of the impact of cigarette prices on cigarette strength (ultra-light, light, full flavor, unfiltered) offers an explanation: As cigarette prices increase, the probability that stronger cigarette types will be sold increases. This effect is larger for menthol than for nonmenthol cigarettes. Our results are consistent with earlier population-based cross-sectional and longitudinal studies showing that higher cigarette prices and taxes are associated with increasing consumption of higher-yield cigarettes by smokers.
Ameye, Lieveke; Fischerova, Daniela; Epstein, Elisabeth; Melis, Gian Benedetto; Guerriero, Stefano; Van Holsbeke, Caroline; Savelli, Luca; Fruscio, Robert; Lissoni, Andrea Alberto; Testa, Antonia Carla; Veldman, Joan; Vergote, Ignace; Van Huffel, Sabine; Bourne, Tom; Valentin, Lil
2010-01-01
Objectives To prospectively assess the diagnostic performance of simple ultrasound rules to predict benignity/malignancy in an adnexal mass and to test the performance of the risk of malignancy index, two logistic regression models, and subjective assessment of ultrasonic findings by an experienced ultrasound examiner in adnexal masses for which the simple rules yield an inconclusive result. Design Prospective temporal and external validation of simple ultrasound rules to distinguish benign from malignant adnexal masses. The rules comprised five ultrasonic features (including shape, size, solidity, and results of colour Doppler examination) to predict a malignant tumour (M features) and five to predict a benign tumour (B features). If one or more M features were present in the absence of a B feature, the mass was classified as malignant. If one or more B features were present in the absence of an M feature, it was classified as benign. If both M features and B features were present, or if none of the features was present, the simple rules were inconclusive. Setting 19 ultrasound centres in eight countries. Participants 1938 women with an adnexal mass examined with ultrasound by the principal investigator at each centre with a standardised research protocol. Reference standard Histological classification of the excised adnexal mass as benign or malignant. Main outcome measures Diagnostic sensitivity and specificity. Results Of the 1938 patients with an adnexal mass, 1396 (72%) had benign tumours, 373 (19.2%) had primary invasive tumours, 111 (5.7%) had borderline malignant tumours, and 58 (3%) had metastatic tumours in the ovary. The simple rules yielded a conclusive result in 1501 (77%) masses, for which they resulted in a sensitivity of 92% (95% confidence interval 89% to 94%) and a specificity of 96% (94% to 97%). The corresponding sensitivity and specificity of subjective assessment were 91% (88% to 94%) and 96% (94% to 97%). In the 357 masses for which the simple rules yielded an inconclusive result and with available results of CA-125 measurements, the sensitivities were 89% (83% to 93%) for subjective assessment, 50% (42% to 58%) for the risk of malignancy index, 89% (83% to 93%) for logistic regression model 1, and 82% (75% to 87%) for logistic regression model 2; the corresponding specificities were 78% (72% to 83%), 84% (78% to 88%), 44% (38% to 51%), and 48% (42% to 55%). Use of the simple rules as a triage test and subjective assessment for those masses for which the simple rules yielded an inconclusive result gave a sensitivity of 91% (88% to 93%) and a specificity of 93% (91% to 94%), compared with a sensitivity of 90% (88% to 93%) and a specificity of 93% (91% to 94%) when subjective assessment was used in all masses. Conclusions The use of the simple rules has the potential to improve the management of women with adnexal masses. In adnexal masses for which the rules yielded an inconclusive result, subjective assessment of ultrasonic findings by an experienced ultrasound examiner was the most accurate diagnostic test; the risk of malignancy index and the two regression models were not useful. PMID:21156740
2016-03-01
regression models that yield hedonic price indexes is closely related to standard techniques for developing cost estimating relationships ( CERs ...October 2014). iii analysis) and derives a price index from the coefficients on variables reflecting the year of purchase. In CER development, the...index. The relevant cost metric in both cases is unit recurring flyaway (URF) costs. For the current project, we develop a “Baseline” CER model, taking
Lamm, Steven H; Ferdosi, Hamid; Dissen, Elisabeth K; Li, Ji; Ahn, Jaeil
2015-12-07
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1-1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100-150 µg/L arsenic.
Lamm, Steven H.; Ferdosi, Hamid; Dissen, Elisabeth K.; Li, Ji; Ahn, Jaeil
2015-01-01
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1–1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100–150 µg/L arsenic. PMID:26690190
Evaluation of weather-based rice yield models in India.
Sudharsan, D; Adinarayana, J; Reddy, D Raji; Sreenivas, G; Ninomiya, S; Hirafuji, M; Kiura, T; Tanaka, K; Desai, U B; Merchant, S N
2013-01-01
The objective of this study was to compare two different rice simulation models--standalone (Decision Support System for Agrotechnology Transfer [DSSAT]) and web based (SImulation Model for RIce-Weather relations [SIMRIW])--with agrometeorological data and agronomic parameters for estimation of rice crop production in southern semi-arid tropics of India. Studies were carried out on the BPT5204 rice variety to evaluate two crop simulation models. Long-term experiments were conducted in a research farm of Acharya N G Ranga Agricultural University (ANGRAU), Hyderabad, India. Initially, the results were obtained using 4 years (1994-1997) of data with weather parameters from a local weather station to evaluate DSSAT simulated results with observed values. Linear regression models used for the purpose showed a close relationship between DSSAT and observed yield. Subsequently, yield comparisons were also carried out with SIMRIW and DSSAT, and validated with actual observed values. Realizing the correlation coefficient values of SIMRIW simulation values in acceptable limits, further rice experiments in monsoon (Kharif) and post-monsoon (Rabi) agricultural seasons (2009, 2010 and 2011) were carried out with a location-specific distributed sensor network system. These proximal systems help to simulate dry weight, leaf area index and potential yield by the Java based SIMRIW on a daily/weekly/monthly/seasonal basis. These dynamic parameters are useful to the farming community for necessary decision making in a ubiquitous manner. However, SIMRIW requires fine tuning for better results/decision making.
Hyperspectral sensing to detect the impact of herbicide drift on cotton growth and yield
NASA Astrophysics Data System (ADS)
Suarez, L. A.; Apan, A.; Werth, J.
2016-10-01
Yield loss in crops is often associated with plant disease or external factors such as environment, water supply and nutrient availability. Improper agricultural practices can also introduce risks into the equation. Herbicide drift can be a combination of improper practices and environmental conditions which can create a potential yield loss. As traditional assessment of plant damage is often imprecise and time consuming, the ability of remote and proximal sensing techniques to monitor various bio-chemical alterations in the plant may offer a faster, non-destructive and reliable approach to predict yield loss caused by herbicide drift. This paper examines the prediction capabilities of partial least squares regression (PLS-R) models for estimating yield. Models were constructed with hyperspectral data of a cotton crop sprayed with three simulated doses of the phenoxy herbicide 2,4-D at three different growth stages. Fibre quality, photosynthesis, conductance, and two main hormones, indole acetic acid (IAA) and abscisic acid (ABA) were also analysed. Except for fibre quality and ABA, Spearman correlations have shown that these variables were highly affected by the chemical. Four PLS-R models for predicting yield were developed according to four timings of data collection: 2, 7, 14 and 28 days after the exposure (DAE). As indicated by the model performance, the analysis revealed that 7 DAE was the best time for data collection purposes (RMSEP = 2.6 and R2 = 0.88), followed by 28 DAE (RMSEP = 3.2 and R2 = 0.84). In summary, the results of this study show that it is possible to accurately predict yield after a simulated herbicide drift of 2,4-D on a cotton crop, through the analysis of hyperspectral data, thereby providing a reliable, effective and non-destructive alternative based on the internal response of the cotton leaves.
Field Scale Spatial Modelling of Surface Soil Quality Attributes in Controlled Traffic Farming
NASA Astrophysics Data System (ADS)
Guenette, Kris; Hernandez-Ramirez, Guillermo
2017-04-01
The employment of controlled traffic farming (CTF) can yield improvements to soil quality attributes through the confinement of equipment traffic to tramlines with the field. There is a need to quantify and explain the spatial heterogeneity of soil quality attributes affected by CTF to further improve our understanding and modelling ability of field scale soil dynamics. Soil properties such as available nitrogen (AN), pH, soil total nitrogen (STN), soil organic carbon (SOC), bulk density, macroporosity, soil quality S-Index, plant available water capacity (PAWC) and unsaturated hydraulic conductivity (Km) were analysed and compared among trafficked and un-trafficked areas. We contrasted standard geostatistical methods such as ordinary kriging (OK) and covariate kriging (COK) as well as the hybrid method of regression kriging (ROK) to predict the spatial distribution of soil properties across two annual cropland sites actively employing CTF in Alberta, Canada. Field scale variability was quantified more accurately through the inclusion of covariates; however, the use of ROK was shown to improve model accuracy despite the regression model composition limiting the robustness of the ROK method. The exclusion of traffic from the un-trafficked areas displayed significant improvements to bulk density, macroporosity and Km while subsequently enhancing AN, STN and SOC. The ability of the regression models and the ROK method to account for spatial trends led to the highest goodness-of-fit and lowest error achieved for the soil physical properties, as the rigid traffic regime of CTF altered their spatial distribution at the field scale. Conversely, the COK method produced the most optimal predictions for the soil nutrient properties and Km. The use of terrain covariates derived from light ranging and detection (LiDAR), such as of elevation and topographic position index (TPI), yielded the best models in the COK method at the field scale.
Modelling drought-related yield losses in Iberia using remote sensing and multiscalar indices
NASA Astrophysics Data System (ADS)
Ribeiro, Andreia F. S.; Russo, Ana; Gouveia, Célia M.; Páscoa, Patrícia
2018-04-01
The response of two rainfed winter cereal yields (wheat and barley) to drought conditions in the Iberian Peninsula (IP) was investigated for a long period (1986-2012). Drought hazard was evaluated based on the multiscalar Standardized Precipitation Evapotranspiration Index (SPEI) and three remote sensing indices, namely the Vegetation Condition (VCI), the Temperature Condition (TCI), and the Vegetation Health (VHI) Indices. A correlation analysis between the yield and the drought indicators was conducted, and multiple linear regression (MLR) and artificial neural network (ANN) models were established to estimate yield at the regional level. The correlation values suggested that yield reduces with moisture depletion (low values of VCI) during early-spring and with too high temperatures (low values of TCI) close to the harvest time. Generally, all drought indicators displayed greatest influence during the plant stages in which the crop is photosynthetically more active (spring and summer), rather than the earlier moments of plants life cycle (autumn/winter). Our results suggested that SPEI is more relevant in the southern sector of the IP, while remote sensing indices are rather good in estimating cereal yield in the northern sector of the IP. The strength of the statistical relationships found by MLR and ANN methods is quite similar, with some improvements found by the ANN. A great number of true positives (hits) of occurrence of yield-losses exhibiting hit rate (HR) values higher than 69% was obtained.
Model for the separate collection of packaging waste in Portuguese low-performing recycling regions.
Oliveira, V; Sousa, V; Vaz, J M; Dias-Ferreira, C
2018-06-15
Separate collection of packaging waste (glass; plastic/metals; paper/cardboard), is currently a widespread practice throughout Europe. It enables the recovery of good quality recyclable materials. However, separate collection performance are quite heterogeneous, with some countries reaching higher levels than others. In the present work, separate collection of packaging waste has been evaluated in a low-performance recycling region in Portugal in order to investigate which factors are most affecting the performance in bring-bank collection system. The variability of separate collection yields (kg per inhabitant per year) among 42 municipalities was scrutinized for the year 2015 against possible explanatory factors. A total of 14 possible explanatory factors were analysed, falling into two groups: socio-economic/demographic and waste collection service related. Regression models were built in an attempt to evaluate the individual effect of each factor on separate collection yields and predict changes on the collection yields by acting on those factors. The best model obtained is capable to explain 73% of the variation found in the separate collection yields. The model includes the following statistically significant indicators affecting the success of separate collection yields: i) inhabitants per bring-bank; ii) relative accessibility to bring-banks; iii) degree of urbanization; iv) number of school years attended; and v) area. The model presented in this work was developed specifically for the bring-bank system, has an explanatory power and quantifies the impact of each factor on separate collection yields. It can therefore be used as a support tool by local and regional waste management authorities in the definition of future strategies to increase collection of recyclables of good quality and to achieve national and regional targets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Speiser, Jaime Lynn; Lee, William M; Karvellas, Constantine J
2015-01-01
Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models. CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission) and post-admission (days 3-7) for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC), sensitivity (SN), specificity (SP), and area under receiver-operating curve (AUROC) were compared between 3 models: KCC (INR, creatinine, coma grade, pH), CART analysis using only KCC variables (KCC-CART) and a CART model using new variables (NEW-CART). Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease), lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade) offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73. CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed. • Prognostication in acetaminophen-induced acute liver failure (APAP-ALF) is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC) beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART) methodology allows the development of predictive models using binary splits and offers an intuitive method for predicting outcome, using processes familiar to clinicians • Data from the ALFSG registry suggested that CART prognosis models for the APAP population offer improved sensitivity and model performance over traditional regression-based KCC, while maintaining similar accuracy and negligibly worse specificity • KCC-CART models offered modest improvement over traditional KCC, with NEW-CART models performing better than KCC-CART particularly at late time points.
Estimating V0[subscript 2]max Using a Personalized Step Test
ERIC Educational Resources Information Center
Webb, Carrie; Vehrs, Pat R.; George, James D.; Hager, Ronald
2014-01-01
The purpose of this study was to develop a step test with a personalized step rate and step height to predict cardiorespiratory fitness in 80 college-aged males and females using the self-reported perceived functional ability scale and data collected during the step test. Multiple linear regression analysis yielded a model (R = 0.90, SEE = 3.43…
ERIC Educational Resources Information Center
Culpepper, Steven Andrew
2012-01-01
The study of prediction bias is important and the last five decades include research studies that examined whether test scores differentially predict academic or employment performance. Previous studies used ordinary least squares (OLS) to assess whether groups differ in intercepts and slopes. This study shows that OLS yields inaccurate inferences…
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Determination of the optimal level for combining area and yield estimates
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Hixson, M. M.; Jobusch, C. D.
1981-01-01
Several levels of obtaining both area and yield estimates of corn and soybeans in Iowa were considered: county, refined strata, refined/split strata, crop reporting district, and state. Using the CCEA model form and smoothed weather data, regression coefficients at each level were derived to compute yield and its variance. Variances were also computed with stratum level. The variance of the yield estimates was largest at the state and smallest at the county level for both crops. The refined strata had somewhat larger variances than those associated with the refined/split strata and CRD. For production estimates, the difference in standard deviations among levels was not large for corn, but for soybeans the standard deviation at the state level was more than 50% greater than for the other levels. The refined strata had the smallest standard deviations. The county level was not considered in evaluation of production estimates due to lack of county area variances.
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens; ...
2016-12-15
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
A study of Lusitano mare lactation curve with Wood's model.
Santos, A S; Silvestre, A M
2008-02-01
Milk yield and composition data from 7 nursing Lusitano mares (450 to 580 kg of body weight and 2 to 9 parities) were used in this study (5 measurements per mare for milk yield and 8 measurements for composition). Wood's lactation model was used to describe milk fat, protein, and lactose lactation curves. Mean values for the concentration of major milk components across the lactation period (180 d) were 5.9 g/kg of fat, 18.4 g/kg of protein, and 60.8 g/kg of lactose. Milk fat and protein (g/kg) decreased and lactose (g/kg) increased during the 180 d of lactation. Curves for milk protein and lactose yields (g) were similar in shape to the milk yield curve; protein yield peaked at 307 g on d 10 and lactose peaked at 816 g on d 45. The fat (g) curve was different in shape compared with milk, protein, and lactose yields. Total production of the major milk constituents throughout the 180 d of lactation was estimated to be 12.0, 36.1, and 124 kg for fat, protein, and lactose, respectively. The algebraic model fitted by a nonlinear regression procedure to the data resulted in reasonable prediction curves for milk yield (R(a)(2) of 0.89) and the major constituents (R(a)(2) ranged from 0.89 to 0.95). The lactation curves of major milk constituents in Lusitano mares were similar, both in shape and values, to those found in other horse breeds. The established curves facilitate the estimation of milk yield and variation of milk constituents at different stages of lactation for both nursing and dairy mares, providing important information relative to weaning time and foal supplementation.
Zhou, Pei-pei; Shan, Jin-feng; Jiang, Jian-lan
2015-12-01
To optimize the optimal microwave-assisted extraction method of curcuminoids from Curcuma longa. On the base of single factor experiment, the ethanol concentration, the ratio of liquid to solid and the microwave time were selected for further optimization. Support Vector Regression (SVR) and Central Composite Design-Response Surface Methodology (CCD) algorithm were utilized to design and establish models respectively, while Particle Swarm Optimization (PSO) was introduced to optimize the parameters of SVR models and to search optimal points of models. The evaluation indicator, the sum of curcumin, demethoxycurcumin and bisdemethoxycurcumin by HPLC, were used. The optimal parameters of microwave-assisted extraction were as follows: ethanol concentration of 69%, ratio of liquid to solid of 21 : 1, microwave time of 55 s. On those conditions, the sum of three curcuminoids was 28.97 mg/g (per gram of rhizomes powder). Both the CCD model and the SVR model were credible, for they have predicted the similar process condition and the deviation of yield were less than 1.2%.
Grieve, Richard; Nixon, Richard; Thompson, Simon G
2010-01-01
Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.
NASA Technical Reports Server (NTRS)
Carder, K. L.; Lee, Z. P.; Marra, John; Steward, R. G.; Perry, M. J.
1995-01-01
The quantum yield of photosynthesis (mol C/mol photons) was calculated at six depths for the waters of the Marine Light-Mixed Layer (MLML) cruise of May 1991. As there were photosynthetically available radiation (PAR) but no spectral irradiance measurements for the primary production incubations, three ways are presented here for the calculation of the absorbed photons (AP) by phytoplankton for the purpose of calculating phi. The first is based on a simple, nonspectral model; the second is based on a nonlinear regression using measured PAR values with depth; and the third is derived through remote sensing measurements. We show that the results of phi calculated using the nonlinear regreesion method and those using remote sensing are in good agreement with each other, and are consistent with the reported values of other studies. In deep waters, however, the simple nonspectral model may cause quantum yield values much higher than theoretically possible.
Beef quality grading using machine vision
NASA Astrophysics Data System (ADS)
Jeyamkondan, S.; Ray, N.; Kranzler, Glenn A.; Biju, Nisha
2000-12-01
A video image analysis system was developed to support automation of beef quality grading. Forty images of ribeye steaks were acquired. Fat and lean meat were differentiated using a fuzzy c-means clustering algorithm. Muscle longissimus dorsi (l.d.) was segmented from the ribeye using morphological operations. At the end of each iteration of erosion and dilation, a convex hull was fitted to the image and compactness was measured. The number of iterations was selected to yield the most compact l.d. Match between the l.d. muscle traced by an expert grader and that segmented by the program was 95.9%. Marbling and color features were extracted from the l.d. muscle and were used to build regression models to predict marbling and color scores. Quality grade was predicted using another regression model incorporating all features. Grades predicted by the model were statistically equivalent to the grades assigned by expert graders.
Tchetgen Tchetgen, Eric
2011-03-01
This article considers the detection and evaluation of genetic effects incorporating gene-environment interaction and independence. Whereas ordinary logistic regression cannot exploit the assumption of gene-environment independence, the proposed approach makes explicit use of the independence assumption to improve estimation efficiency. This method, which uses both cases and controls, fits a constrained retrospective regression in which the genetic variant plays the role of the response variable, and the disease indicator and the environmental exposure are the independent variables. The regression model constrains the association of the environmental exposure with the genetic variant among the controls to be null, thus explicitly encoding the gene-environment independence assumption, which yields substantial gain in accuracy in the evaluation of genetic effects. The proposed retrospective regression approach has several advantages. It is easy to implement with standard software, and it readily accounts for multiple environmental exposures of a polytomous or of a continuous nature, while easily incorporating extraneous covariates. Unlike the profile likelihood approach of Chatterjee and Carroll (Biometrika. 2005;92:399-418), the proposed method does not require a model for the association of a polytomous or continuous exposure with the disease outcome, and, therefore, it is agnostic to the functional form of such a model and completely robust to its possible misspecification.
NASA Astrophysics Data System (ADS)
Bilal, Maria; Bilal, Muhammad; Saleem, Muhammad; Khurram, Muhammad; Khan, Saranjam; Ullah, Rahat; Ali, Hina; Ahmed, Mushtaq; Shahzada, Shaista; Ullah Khan, Ehsan
2017-04-01
Raman spectroscopy based investigations of the molecular changes associated with an early stage of dengue virus infection (DENV) using a partial least squares (PLS) regression model is presented. This study is based on non-structural protein 1 (NS1) which appears after three days of DENV infection. In total, 39 blood sera samples were collected and divided into two groups. The control group contained samples which were the negative for NS1 and antibodies and the positive group contained those samples in which NS1 is positive and antibodies were negative. Out of 39 samples, 29 Raman spectra were used for the model development while the remaining 10 were kept hidden for blind testing of the model. PLS regression yielded a vector of regression coefficients as a function of Raman shift, which were analyzed. Cytokines in the region 775-875 cm-1, lectins at 1003, 1238, 1340, 1449 and 1672 cm-1, DNA in the region 1040-1140 cm-1 and alpha and beta structures of proteins in the region 933-967 cm-1 have been identified in the regression vector for their role in an early stage of DENV infection. Validity of the model was established by its R-square value of 0.891. Sensitivity, specificity and accuracy were 100% each and the area under the receiver operator characteristic curve was found to be 1.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
Vegetation Monitoring with Gaussian Processes and Latent Force Models
NASA Astrophysics Data System (ADS)
Camps-Valls, Gustau; Svendsen, Daniel; Martino, Luca; Campos, Manuel; Luengo, David
2017-04-01
Monitoring vegetation by biophysical parameter retrieval from Earth observation data is a challenging problem, where machine learning is currently a key player. Neural networks, kernel methods, and Gaussian Process (GP) regression have excelled in parameter retrieval tasks at both local and global scales. GP regression is based on solid Bayesian statistics, yield efficient and accurate parameter estimates, and provides interesting advantages over competing machine learning approaches such as confidence intervals. However, GP models are hampered by lack of interpretability, that prevented the widespread adoption by a larger community. In this presentation we will summarize some of our latest developments to address this issue. We will review the main characteristics of GPs and their advantages in vegetation monitoring standard applications. Then, three advanced GP models will be introduced. First, we will derive sensitivity maps for the GP predictive function that allows us to obtain feature ranking from the model and to assess the influence of examples in the solution. Second, we will introduce a Joint GP (JGP) model that combines in situ measurements and simulated radiative transfer data in a single GP model. The JGP regression provides more sensible confidence intervals for the predictions, respects the physics of the underlying processes, and allows for transferability across time and space. Finally, a latent force model (LFM) for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical models of the system is presented. The LFM performs multi-output regression, adapts to the signal characteristics, is able to cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. Empirical evidence of the performance of these models will be presented through illustrative examples.
Ajaz Ahmed, Mukhtar Ahmed; Abd-Elrahman, Amr; Escobedo, Francisco J; Cropper, Wendell P; Martin, Timothy A; Timilsina, Nilesh
2017-09-01
Understanding ecosystem processes and the influence of regional scale drivers can provide useful information for managing forest ecosystems. Examining more local scale drivers of forest biomass and water yield can also provide insights for identifying and better understanding the effects of climate change and management on forests. We used diverse multi-scale datasets, functional models and Geographically Weighted Regression (GWR) to model ecosystem processes at the watershed scale and to interpret the influence of ecological drivers across the Southeastern United States (SE US). Aboveground forest biomass (AGB) was determined from available geospatial datasets and water yield was estimated using the Water Supply and Stress Index (WaSSI) model at the watershed level. Our geostatistical model examined the spatial variation in these relationships between ecosystem processes, climate, biophysical, and forest management variables at the watershed level across the SE US. Ecological and management drivers at the watershed level were analyzed locally to identify whether drivers contribute positively or negatively to aboveground forest biomass and water yield ecosystem processes and thus identifying potential synergies and tradeoffs across the SE US region. Although AGB and water yield drivers varied geographically across the study area, they were generally significantly influenced by climate (rainfall and temperature), land-cover factor1 (Water and barren), land-cover factor2 (wetland and forest), organic matter content high, rock depth, available water content, stand age, elevation, and LAI drivers. These drivers were positively or negatively associated with biomass or water yield which significantly contributes to ecosystem interactions or tradeoff/synergies. Our study introduced a spatially-explicit modelling framework to analyze the effect of ecosystem drivers on forest ecosystem structure, function and provision of services. This integrated model approach facilitates multi-scale analyses of drivers and interactions at the local to regional scale. Copyright © 2017 Elsevier Ltd. All rights reserved.
Li, Wenqin; Dang, Qi; Brown, Robert C; Laird, David; Wright, Mark M
2017-10-01
This study evaluated the impact of biomass properties on the pyrolysis product yields, economic and environmental performance for the pyrolysis-biochar-bioenergy platform. We developed and applied a fast pyrolysis, feedstock-sensitive, regression-based chemical process model to 346 different feedstocks, which were grouped into five types: woody, stalk/cob/ear, grass/plant, organic residue/product and husk/shell/pit. The results show that biomass ash content of 0.3-7.7wt% increases biochar yield from 0.13 to 0.16kg/kg of biomass, and decreases biofuel yields from 87.3 to 40.7 gallons per tonne. Higher O/C ratio (0.88-1.12) in biomass decreases biochar yield and increases biofuel yields within the same ash content level. Higher ash content of biomass increases minimum fuel selling price (MFSP), while higher O/C ratio of biomass decreases MFSP within the same ash content level. The impact of ash and O/C ratio of biomass on GHG emissions are not consistent for all feedstocks. Copyright © 2017 Elsevier Ltd. All rights reserved.
Tillard, E; Humblot, P; Faye, B; Lecomte, P; Dohoo, I; Bocquier, F
2008-03-01
The objective was to identify postpartum risk factors between nutritional imbalance and health disorders affecting first-service conception risk (FSCR) in 21 commercial Holstein herds in Reunion Island. Multivariate logistic-regression models including herd as a random effect were used to analyze the relationship between FSCR and energy status, nitrogen status, hepatic function, mineral deficiencies, and postpartum health disorders. Two models (A and B) were built on two subsets of data (n=446 and n=863) with risk indicators measured during the first month of lactation and around time of first service, respectively, adjusted for season, breed, parity, origin, milk yield, calving to first service interval (CS1), and type of estrus (spontaneous vs. induced). The averaged conception risk was 0.266+/-0.015 (n=913) (mean+/-S.E.M.). In both models, FSCR was decreased by CS1 < or = 60 d and induced estrus. In model A, FSCR was decreased (p<0.05) for cows with mean cumulative 100 d daily milk yield < or =23 kg/d and >27 kg/d, with losses of body condition score >1.5, and with retained placenta. In model B, FSCR was decreased (p<0.05) for cows inseminated during wet season, previously raised out of the farm as nulliparous, with blood magnesium concentration < or =0.9 mmol/L, and for high-yielding cows (100 d milk yield > 27 kg/d) with glutamate deshydrogenase>17 UI/L. Hence, high-body-lipid mobilization during the first month of lactation was a strong nutritional predictor of low FSCR together with liver damage in high-yielding cows. Interestingly, our models revealed that infertility is better related to nutritional factors than to postpartum health disorders occurrence.
Broderick, G A; Huhtanen, P; Ahvenjärvi, S; Reynal, S M; Shingfield, K J
2010-07-01
Mixed model analysis of data from 32 studies (122 diets) was used to evaluate the precision and accuracy of the omasal sampling technique for quantifying ruminal-N metabolism and to assess the relationships between nonammonia-N flow at the omasal canal and milk protein yield. Data were derived from experiments in cattle fed North American diets (n=36) based on alfalfa silage, corn silage, and corn grain and Northern European diets (n=86) composed of grass silage and barley-based concentrates. In all studies, digesta flow was quantified using a triple-marker approach. Linear regressions were used to predict microbial-N flow to the omasum from intake of dry matter (DM), organic matter (OM), or total digestible nutrients. Efficiency of microbial-N synthesis increased with DM intake and there were trends for increased efficiency with elevated dietary concentrations of crude protein (CP) and rumen-degraded protein (RDP) but these effects were small. Regression of omasal rumen-undegraded protein (RUP) flow on CP intake indicated that an average 32% of dietary CP escaped and 68% was degraded in the rumen. The slope from regression of observed omasal flows of RUP on flows predicted by the National Research Council (2001) model indicated that NRC predicted greater RUP supply. Measured microbial-N flow was, on average, 26% greater than that predicted by the NRC model. Zero ruminal N-balance (omasal CP flow=CP intake) was obtained at dietary CP and RDP concentrations of 147 and 106 g/kg of DM, corresponding to ruminal ammonia-N and milk urea N concentrations of 7.1 and 8.3mg/100mL, respectively. Milk protein yield was positively related to the efficiency of microbial-N synthesis and measured RUP concentration. Improved efficiency of microbial-N synthesis and reduced ruminal CP degradability were positively associated with efficiency of capture of dietary N as milk N. In conclusion, the results of this study indicate that the omasal sampling technique yields valuable estimates of RDP, RUP, and ruminal microbial protein supply in cattle. Copyright (c) 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Adler, Philipp; Hugen, Thorsten; Wiewiora, Marzena; Kunz, Benno
2011-03-07
An unstructured model for an integrated fermentation/membrane extraction process for the production of the aroma compounds 2-phenylethanol and 2-phenylethylacetate by Kluyveromyces marxianus CBS 600 was developed. The extent to which this model, based only on data from the conventional fermentation and separation processes, provided an estimation of the integrated process was evaluated. The effect of product inhibition on specific growth rate and on biomass yield by both aroma compounds was approximated by multivariate regression. Simulations of the respective submodels for fermentation and the separation process matched well with experimental results. With respect to the in situ product removal (ISPR) process, the effect of reduced product inhibition due to product removal on specific growth rate and biomass yield was predicted adequately by the model simulations. Overall product yields were increased considerably in this process (4.0 g/L 2-PE+2-PEA vs. 1.4 g/L in conventional fermentation) and were even higher than predicted by the model. To describe the effect of product concentration on product formation itself, the model was extended using results from the conventional and the ISPR process, thus agreement between model and experimental data improved notably. Therefore, this model can be a useful tool for the development and optimization of an efficient integrated bioprocess. Copyright © 2010 Elsevier Inc. All rights reserved.
Random regression analyses using B-splines to model growth of Australian Angus cattle
Meyer, Karin
2005-01-01
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011
NASA Astrophysics Data System (ADS)
Sivalingam, Udhayaraj; Wels, Michael; Rempfler, Markus; Grosskopf, Stefan; Suehling, Michael; Menze, Bjoern H.
2016-03-01
In this paper, we present a fully automated approach to coronary vessel segmentation, which involves calcification or soft plaque delineation in addition to accurate lumen delineation, from 3D Cardiac Computed Tomography Angiography data. Adequately virtualizing the coronary lumen plays a crucial role for simulating blood ow by means of fluid dynamics while additionally identifying the outer vessel wall in the case of arteriosclerosis is a prerequisite for further plaque compartment analysis. Our method is a hybrid approach complementing Active Contour Model-based segmentation with an external image force that relies on a Random Forest Regression model generated off-line. The regression model provides a strong estimate of the distance to the true vessel surface for every surface candidate point taking into account 3D wavelet-encoded contextual image features, which are aligned with the current surface hypothesis. The associated external image force is integrated in the objective function of the active contour model, such that the overall segmentation approach benefits from the advantages associated with snakes and from the ones associated with machine learning-based regression alike. This yields an integrated approach achieving competitive results on a publicly available benchmark data collection (Rotterdam segmentation challenge).
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Time series regression and ARIMAX for forecasting currency flow at Bank Indonesia in Sulawesi region
NASA Astrophysics Data System (ADS)
Suharsono, Agus; Suhartono, Masyitha, Aulia; Anuravega, Arum
2015-12-01
The purpose of the study is to forecast the outflow and inflow of currency at Indonesian Central Bank or Bank Indonesia (BI) in Sulawesi Region. The currency outflow and inflow data tend to have a trend pattern which is influenced by calendar variation effects. Therefore, this research focuses to apply some forecasting methods that could handle calendar variation effects, i.e. Time Series Regression (TSR) and ARIMAX models, and compare the forecast accuracy with ARIMA model. The best model is selected based on the lowest of Root Mean Squares Errors (RMSE) at out-sample dataset. The results show that ARIMA is the best model for forecasting the currency outflow and inflow at South Sulawesi. Whereas, the best model for forecasting the currency outflow at Central Sulawesi and Southeast Sulawesi, and for forecasting the currency inflow at South Sulawesi and North Sulawesi is TSR. Additionally, ARIMAX is the best model for forecasting the currency outflow at North Sulawesi. Hence, the results show that more complex models do not neccessary yield more accurate forecast than the simpler one.
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
Yin, Xinyou; Belay, Daniel W; van der Putten, Peter E L; Struik, Paul C
2014-12-01
Maximum quantum yield for leaf CO2 assimilation under limiting light conditions (Φ CO2LL) is commonly estimated as the slope of the linear regression of net photosynthetic rate against absorbed irradiance over a range of low-irradiance conditions. Methodological errors associated with this estimation have often been attributed either to light absorptance by non-photosynthetic pigments or to some data points being beyond the linear range of the irradiance response, both causing an underestimation of Φ CO2LL. We demonstrate here that a decrease in photosystem (PS) photochemical efficiency with increasing irradiance, even at very low levels, is another source of error that causes a systematic underestimation of Φ CO2LL. A model method accounting for this error was developed, and was used to estimate Φ CO2LL from simultaneous measurements of gas exchange and chlorophyll fluorescence on leaves using various combinations of species, CO2, O2, or leaf temperature levels. The conventional linear regression method under-estimated Φ CO2LL by ca. 10-15%. Differences in the estimated Φ CO2LL among measurement conditions were generally accounted for by different levels of photorespiration as described by the Farquhar-von Caemmerer-Berry model. However, our data revealed that the temperature dependence of PSII photochemical efficiency under low light was an additional factor that should be accounted for in the model.
Climate Change Impact on Rainfall: How will Threaten Wheat Yield?
NASA Astrophysics Data System (ADS)
Tafoughalti, K.; El Faleh, E. M.; Moujahid, Y.; Ouargaga, F.
2018-05-01
Climate change has a significant impact on the environmental condition of the agricultural region. Meknes has an agrarian economy and wheat production is of paramount importance. As most arable area are under rainfed system, Meknes is one of the sensitive regions to rainfall variability and consequently to climate change. Therefore, the use of changes in rainfall is vital for detecting the influence of climate system on agricultural productivity. This article identifies rainfall temporal variability and its impact on wheat yields. We used monthly rainfall records for three decades and wheat yields records of fifteen years. Rainfall variability is assessed utilizing the precipitation concentration index and the variation coefficient. The association between wheat yields and cumulative rainfall amounts of different scales was calculated based on a regression model. The analysis shown moderate seasonal and irregular annual rainfall distribution. Yields fluctuated from 210 to 4500 Kg/ha with 52% of coefficient of variation. The correlation results shows that wheat yields are strongly correlated with rainfall of the period January to March. This investigation concluded that climate change is altering wheat yield and it is crucial to adept the necessary adaptation to challenge the risk.
Van Hertem, T; Maltz, E; Antler, A; Romanini, C E B; Viazzi, S; Bahr, C; Schlageter-Tello, A; Lokhorst, C; Berckmans, D; Halachmi, I
2013-07-01
The objective of this study was to develop and validate a mathematical model to detect clinical lameness based on existing sensor data that relate to the behavior and performance of cows in a commercial dairy farm. Identification of lame (44) and not lame (74) cows in the database was done based on the farm's daily herd health reports. All cows were equipped with a behavior sensor that measured neck activity and ruminating time. The cow's performance was measured with a milk yield meter in the milking parlor. In total, 38 model input variables were constructed from the sensor data comprising absolute values, relative values, daily standard deviations, slope coefficients, daytime and nighttime periods, variables related to individual temperament, and milk session-related variables. A lame group, cows recognized and treated for lameness, to not lame group comparison of daily data was done. Correlations between the dichotomous output variable (lame or not lame) and the model input variables were made. The highest correlation coefficient was obtained for the milk yield variable (rMY=0.45). In addition, a logistic regression model was developed based on the 7 highest correlated model input variables (the daily milk yield 4d before diagnosis; the slope coefficient of the daily milk yield 4d before diagnosis; the nighttime to daytime neck activity ratio 6d before diagnosis; the milk yield week difference ratio 4d before diagnosis; the milk yield week difference 4d before diagnosis; the neck activity level during the daytime 7d before diagnosis; the ruminating time during nighttime 6d before diagnosis). After a 10-fold cross-validation, the model obtained a sensitivity of 0.89 and a specificity of 0.85, with a correct classification rate of 0.86 when based on the averaged 10-fold model coefficients. This study demonstrates that existing farm data initially used for other purposes, such as heat detection, can be exploited for the automated detection of clinically lame animals on a daily basis as well. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Li, Liang; Wang, Yiying; Xu, Jiting; Flora, Joseph R V; Hoque, Shamia; Berge, Nicole D
2018-08-01
Hydrothermal carbonization (HTC) is a wet, low temperature thermal conversion process that continues to gain attention for the generation of hydrochar. The importance of specific process conditions and feedstock properties on hydrochar characteristics is not well understood. To evaluate this, linear and non-linear models were developed to describe hydrochar characteristics based on data collected from HTC-related literature. A Sobol analysis was subsequently conducted to identify parameters that most influence hydrochar characteristics. Results from this analysis indicate that for each investigated hydrochar property, the model fit and predictive capability associated with the random forest models is superior to both the linear and regression tree models. Based on results from the Sobol analysis, the feedstock properties and process conditions most influential on hydrochar yield, carbon content, and energy content were identified. In addition, a variational process parameter sensitivity analysis was conducted to determine how feedstock property importance changes with process conditions. Copyright © 2018 Elsevier Ltd. All rights reserved.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Quality by Design approach to spray drying processing of crystalline nanosuspensions.
Kumar, Sumit; Gokhale, Rajeev; Burgess, Diane J
2014-04-10
Quality by Design (QbD) principles were explored to understand spray drying process for the conversion of liquid nanosuspensions into solid nano-crystalline dry powders using indomethacin as a model drug. The effects of critical process variables: inlet temperature, flow and aspiration rates on critical quality attributes (CQAs): particle size, moisture content, percent yield and crystallinity were investigated employing a full factorial design. A central cubic design was employed to generate the response surface for particle size and percent yield. Multiple linear regression analysis and ANOVA were employed to identify and estimate the effect of critical parameters, establish their relationship with CQAs, create design space and model the spray drying process. Inlet temperature was identified as the only significant factor (p value <0.05) to affect dry powder particle size. Higher inlet temperatures caused drug surface melting and hence aggregation of the dried nano-crystalline powders. Aspiration and flow rates were identified as significant factors affecting yield (p value <0.05). Higher yields were obtained at higher aspiration and lower flow rates. All formulations had less than 3% (w/w) moisture content. Formulations dried at higher inlet temperatures had lower moisture compared to those dried at lower inlet temperatures. Published by Elsevier B.V.
Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A L; Van Mechelen, Iven; Ceulemans, Eva
2017-03-01
In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.
Microwave pretreatment of switchgrass for bioethanol production
NASA Astrophysics Data System (ADS)
Keshwani, Deepak Radhakrishin
Lignocellulosic materials are promising alternative feedstocks for bioethanol production. These materials include agricultural residues, cellulosic waste such as newsprint and office paper, logging residues, and herbaceous and woody crops. However, the recalcitrant nature of lignocellulosic biomass necessitates a pretreatment step to improve the yield of fermentable sugars. The overall goal of this dissertation is to expand the current state of knowledge on microwave-based pretreatment of lignocellulosic biomass. Existing research on bioenergy and value-added applications of switchgrass is reviewed in Chapter 2. Switchgrass is an herbaceous energy crop native to North America and has high biomass productivity, potentially low requirements for agricultural inputs and positive environmental impacts. Based on results from test plots, yields in excess of 20 Mg/ha have been reported. Environmental benefits associated with switchgrass include the potential for carbon sequestration, nutrient recovery from run-off, soil remediation and provision of habitats for grassland birds. Published research on pretreatment of switchgrass reported glucose yields ranging from 70-90% and xylose yields ranging from 70-100% after hydrolysis and ethanol yields ranging from 72-92% after fermentation. Other potential value-added uses of switchgrass include gasification, bio-oil production, newsprint production and fiber reinforcement in thermoplastic composites. Research on microwave-based pretreatment of switchgrass and coastal bermudagrass is presented in Chapter 3. Pretreatments were carried out by immersing the biomass in dilute chemical reagents and exposing the slurry to microwave radiation at 250 watts for residence times ranging from 5 to 20 minutes. Preliminary experiments identified alkalis as suitable chemical reagents for microwave-based pretreatment. An evaluation of different alkalis identified sodium hydroxide as the most effective alkali reagent. Under optimum pretreatment conditions, 82% glucose and 63% xylose yields were achieved for switchgrass, and 87% glucose and 59% xylose yields were achieved for coastal bermudagrass following enzymatic hydrolysis of the pretreated biomass. The optimum enzyme loadings were 15 FPU/g and 20 CBU/g for switchgrass and 10 FPU/g and 20 CBU/g for coastal bermudagrass. Dielectric properties for dilute sodium hydroxide solutions were measured and compared to solid loss, lignin reduction and reducing sugar levels in hydrolyzates. Results indicate that the dielectric loss tangent of alkali solutions is a potential indicator of the severity of microwave-based pretreatments. Modeling of pretreatment processes can be a valuable tool in process simulations of bioethanol production from lignocellulosic biomass. Chapter 4 discusses three different approaches that were used to model delignification and carbohydrate loss during microwave-based pretreatment of switchgrass: statistical linear regression modeling, kinetic modeling using a time-dependent rate coefficient, and a Mamdani-type fuzzy inference system. The dielectric loss tangent of the alkali reagent and pretreatment time were used as predictors in all models. The statistical linear regression model for delignification gave comparable root mean square error (RMSE) values for training and testing data and predictions were approximately within 1% of experimental values. The kinetic model for delignification and xylan loss gave comparable RMSE values for training and testing data sets and predictions were approximately within 2% of experimental values. The kinetic model for cellulose loss was not as effective and predictions were only within 5-7% of experimental values. The time-dependent rate coefficients of the kinetic models calculated from experimental data were consistent with the heterogeneity (or lack thereof) of individual biomass components. The Mamdani-type fuzzy inference system was shown to be an effective means to model pretreatment processes and gave the most accurate predictions (<3%) for cellulose loss.
Modelling crop yield in Iberia under drought conditions
NASA Astrophysics Data System (ADS)
Ribeiro, Andreia; Páscoa, Patrícia; Russo, Ana; Gouveia, Célia
2017-04-01
The improved assessment of the cereal yield and crop loss under drought conditions are essential to meet the increasing economy demands. The growing frequency and severity of the extreme drought conditions in the Iberian Peninsula (IP) has been likely responsible for negative impacts on agriculture, namely on crop yield losses. Therefore, a continuous monitoring of vegetation activity and a reliable estimation of drought impacts is crucial to contribute for the agricultural drought management and development of suitable information tools. This works aims to assess the influence of drought conditions in agricultural yields over the IP, considering cereal yields from mainly rainfed agriculture for the provinces with higher productivity. The main target is to develop a strategy to model drought risk on agriculture for wheat yield at a province level. In order to achieve this goal a combined assessment was made using a drought indicator (Standardized Precipitation Evapotranspiration Index, SPEI) to evaluate drought conditions together with a widely used vegetation index (Normalized Difference Vegetation Index, NDVI) to monitor vegetation activity. A correlation analysis between detrended wheat yield and SPEI was performed in order to assess the vegetation response to each time scale of drought occurrence and also identify the moment of the vegetative cycle when the crop yields are more vulnerable to drought conditions. The time scales and months of SPEI, together with the months of NDVI, better related with wheat yield were chosen to perform a multivariate regression analysis to simulate crop yield. Model results are satisfactory and highlighted the usefulness of such analysis in the framework of developing a drought risk model for crop yields. In terms of an operational point of view, the results aim to contribute to an improved understanding of crop yield management under dry conditions, particularly adding substantial information on the advantages of combining vegetation and hydro-meteorological drought indices for the assessment of cereal yield. Moreover, the present study will provide some guidance on user's decision making process in agricultural practices in the IP, assisting farmers in deciding whether to purchase crop insurance. Acknowledgements: This work was partially supported by national funds through FCT (Fundação para a Ciência e a Tecnologia, Portugal) under project IMDROFLOOD (WaterJPI/0004/2014). Ana Russo thanks FCT for granted support (SFRH/BPD/99757/2014). Andreia Ribeiro also thanks FCT for grant PD/BD/114481/2016.
Nikol'skii, A A
2017-11-01
Dependence of the sound-signal frequency on the animal body length was studied in 14 ground squirrel species (genus Spermophilus) of Eurasia. Regression analysis of the total sample yielded a low determination coefficient (R 2 = 26%), because the total sample proved to be heterogeneous in terms of signal frequency within the dimension classes of animals. When the total sample was divided into two groups according to signal frequency, two statistically significant models (regression equations) were obtained in which signal frequency depended on the body size at high determination coefficients (R 2 = 73 and 94% versus 26% for the total sample). Thus, the problem of correlation between animal body size and the frequency of their vocal signals does not have a unique solution.
NASA Astrophysics Data System (ADS)
Kouadio, Louis; Duveiller, Grégory; Djaby, Bakary; El Jarroudi, Moussa; Defourny, Pierre; Tychon, Bernard
2012-08-01
Earth observation data, owing to their synoptic, timely and repetitive coverage, have been recognized as a valuable tool for crop monitoring at different levels. At the field level, the close correlation between green leaf area (GLA) during maturation and grain yield in wheat revealed that the onset and rate of senescence appeared to be important factors for determining wheat grain yield. Our study sought to explore a simple approach for wheat yield forecasting at the regional level, based on metrics derived from the senescence phase of the green area index (GAI) retrieved from remote sensing data. This study took advantage of recent methodological improvements in which imagery with high revisit frequency but coarse spatial resolution can be exploited to derive crop-specific GAI time series by selecting pixels whose ground-projected instantaneous field of view is dominated by the target crop: winter wheat. A logistic function was used to characterize the GAI senescence phase and derive the metrics of this phase. Four regression-based models involving these metrics (i.e., the maximum GAI value, the senescence rate and the thermal time taken to reach 50% of the green surface in the senescent phase) were related to official wheat yield data. The performances of such models at this regional scale showed that final yield could be estimated with an RMSE of 0.57 ton ha-1, representing about 7% as relative RMSE. Such an approach may be considered as a first yield estimate that could be performed in order to provide better integrated yield assessments in operational systems.
Bark analysis as a guide to cassava nutrition in Sierra Leone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Godfrey-Sam-Aggrey, W.; Garber, M.J.
1979-01-01
Cassava main stem barks from two experiments in which similar fertilizers were applied directly in a 2/sup 5/ confounded factorial design were analyzed and the bark nutrients used as a guide to cassava nutrition. The application of multiple regression analysis to the respective root yields and bark nutrient concentrations enable nutrient levels and optimum adjusted root yields to be derived. Differences in bark nutrient concentrations reflected soil fertility levels. Bark analysis and the application of multiple regression analysis to root yields and bark nutrients appear to be useful tools for predicting fertilizer recommendations for cassava production.
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.
Guzman, L; Ortega-Hrepich, C; Polyzos, N P; Anckaert, E; Verheyen, G; Coucke, W; Devroey, P; Tournaye, H; Smitz, J; De Vos, M
2013-05-01
Which baseline patient characteristics can help assisted reproductive technology practitioners to identify patients who are suitable for in-vitro maturation (IVM) treatment? In patients with polycystic ovary syndrome (PCOS) who undergo oocyte IVM in a non-hCG-triggered system, circulating anti-Müllerian hormone (AMH), antral follicle count (AFC) and total testosterone are independently related to the number of immature oocytes and hold promise as outcome predictors to guide the patient selection process for IVM. Patient selection criteria for IVM treatment have been described in normo-ovulatory patients, although patients with PCOS constitute the major target population for IVM. With this study, we assessed the independent predictive value of clinical and endocrine parameters that are related to oocyte yield in patients with PCOS undergoing IVM. Cohort study involving 124 consecutive patients with PCOS undergoing IVM whose data were prospectively collected. Enrolment took place between January 2010 and January 2012. Only data relating to the first IVM cycle of each patient were included. Patients with PCOS underwent oocyte retrieval for IVM after minimal gonadotrophin stimulation and no hCG trigger. Correlation coefficients were calculated to investigate which parameters are related to immature oocyte yield (patient's age, BMI, baseline hormonal profile and AMH, AFC). The independence of predictive parameters was tested using multivariate linear regression analysis. Finally, multivariate receiver operating characteristic (ROC) analyses for cumulus oocyte complexes (COC) yield were performed to assess the efficiency of the prediction model to select suitable candidates for IVM. Using multivariate regression analysis, circulating baseline AMH, AFC and baseline total testosterone serum concentration were incorporated into a model to predict the number of COC retrieved in an IVM cycle, with unstandardized coefficients [95% confidence interval (CI)] of 0.03 (0.02-0.03) (P < 0.001), 0.012 (0.008-0.017) (P < 0.001) and 0.37 (0.18-0.57) (P < 0.001), respectively. Logistic regression analysis shows that a prediction model based on AMH and AFC, with unstandardized coefficients (95% CI) of 0.148 (0.03-0.25) (P < 0.001) and 0.034 (-0.003-0.07) (P = 0.025), respectively, is a useful patient selection tool to predict the probability to yield at least eight COCs for IVM in patients with PCOS. In this population, patients with at least eight COC available for IVM have a statistically higher number of embryos of good morphological quality (2.9 ± 2.3; 0.9 ± 0.9; P < 0.001) and cumulative ongoing pregnancy rate [30.4% (24 out of 79); 11% (5 out of 45); P = 0.01] when compared with patients with less than eight COC. ROC curve analysis showed that this prediction model has an area under the curve of 0.7864 (95% CI = 0.6997-0.8732) for the prediction of oocyte yield in IVM. The proposed model has been constructed based on a genuine IVM system, i.e. no hCG trigger was given and none of the oocytes matured in vivo. However, other variables, such as needle type, aspiration technique and whether or not hCG-triggering is used, should be considered as confounding factors. The results of this study have to be confirmed using a second independent validation sample. The proposed model could be applied to patients with PCOS after confirmation through a further validation study. This study was supported by a research grant by the Institute for the Promotion of Innovation by Science and Technology in Flanders, Project number IWT 070719.
Togashi, K; Lin, C Y
2008-07-01
The objective of this study was to compare 6 selection criteria in terms of 3-parity total milk yield and 9 selection criteria in terms of total net merit (H) comprising 3-parity total milk yield and total lactation persistency. The 6 selection criteria compared were as follows: first-parity milk estimated breeding value (EBV; M1), first 2-parity milk EBV (M2), first 3-parity milk EBV (M3), first-parity eigen index (EI(1)), first 2-parity eigen index (EI(2)), and first 3-parity eigen index (EI(3)). The 9 selection criteria compared in terms of H were M1, M2, M3, EI(1), EI(2), EI(3), and first-parity, first 2-parity, and first 3-parity selection indices (I(1), I(2), and I(3), respectively). In terms of total milk yield, selection on M3 or EI(3) achieved the greatest genetic response, whereas selection on EI(1) produced the largest genetic progress per day. In terms of total net merit, selection on I(3) brought the largest response, whereas selection EI(1) yielded the greatest genetic progress per day. A multiple-lactation random regression test-day model simultaneously yields the EBV of the 3 lactations for all animals included in the analysis even though the younger animals do not have the opportunity to complete the first 3 lactations. It is important to use the first 3 lactation EBV for selection decision rather than only the first lactation EBV in spite of the fact that the first-parity selection criteria achieved a faster genetic progress per day than the 3-parity selection criteria. Under a multiple-lactation random regression animal model analysis, the use of the first 3 lactation EBV for selection decision does not prolong the generation interval as compared with the use of only the first lactation EBV. Thus, it is justified to compare genetic response on a lifetime basis rather than on a per-day basis. The results suggest the use of M3 or EI(3) for genetic improvement of total milk yield and the use of I(3) for genetic improvement of total net merit H. Although this study deals with selection for 3-parity milk production, the same principle applies to selection for lifetime milk production.
Quantum algorithm for linear regression
NASA Astrophysics Data System (ADS)
Wang, Guoming
2017-07-01
We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.
Du, Ziyan; He, Yingsheng; Fan, Jianing; Fu, Heyun; Zheng, Shourong; Xu, Zhaoyi; Qu, Xiaolei; Kong, Ao; Zhu, Dongqiang
2018-03-01
Dissolved black carbon (DBC) is ubiquitous in aquatic systems, being an important subgroup of the dissolved organic matter (DOM) pool. Nevertheless, its aquatic photoactivity remains largely unknown. In this study, a range of spectroscopic indices of DBC and humic substance (HS) samples were determined using UV-Vis spectroscopy, fluorescence spectroscopy, and proton nuclear magnetic resonance. DBC can be readily differentiated from HS using spectroscopic indices. It has lower average molecular weight, but higher aromaticity and lignin content. The apparent singlet oxygen quantum yield (Φ singlet oxygen ) of DBC under simulated sunlight varies from 3.46% to 6.13%, significantly higher than HS, 1.26%-3.57%, suggesting that DBC is the more photoactive component in the DOM pool. Despite drastically different formation processes and structural properties, the Φ singlet oxygen of DBC and HS can be well predicted by the same simple linear regression models using optical indices including spectral slope coefficient (S 275-295 ) and absorbance ratio (E 2 /E 3 ) which are proxies for the abundance of singlet oxygen sensitizers and for the significance of intramolecular charge transfer interactions. The regression models can be potentially used to assess the photoactivity of DOM at large scales with in situ water spectrophotometry or satellite remote sensing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Embedded measures of performance validity using verbal fluency tests in a clinical sample.
Sugarman, Michael A; Axelrod, Bradley N
2015-01-01
The objective of this study was to determine to what extent verbal fluency measures can be used as performance validity indicators during neuropsychological evaluation. Participants were clinically referred for neuropsychological evaluation in an urban-based Veteran's Affairs hospital. Participants were placed into 2 groups based on their objectively evaluated effort on performance validity tests (PVTs). Individuals who exhibited credible performance (n = 431) failed 0 PVTs, and those with poor effort (n = 192) failed 2 or more PVTs. All participants completed the Controlled Oral Word Association Test (COWAT) and Animals verbal fluency measures. We evaluated how well verbal fluency scores could discriminate between the 2 groups. Raw scores and T scores for Animals discriminated between the credible performance and poor-effort groups with 90% specificity and greater than 40% sensitivity. COWAT scores had lower sensitivity for detecting poor effort. A combination of FAS and Animals scores into logistic regression models yielded acceptable group classification, with 90% specificity and greater than 44% sensitivity. Verbal fluency measures can yield adequate detection of poor effort during neuropsychological evaluation. We provide suggested cut points and logistic regression models for predicting the probability of poor effort in our clinical setting and offer suggested cutoff scores to optimize sensitivity and specificity.
Murad, Havi; Kipnis, Victor; Freedman, Laurence S
2016-10-01
Assessing interactions in linear regression models when covariates have measurement error (ME) is complex.We previously described regression calibration (RC) methods that yield consistent estimators and standard errors for interaction coefficients of normally distributed covariates having classical ME. Here we extend normal based RC (NBRC) and linear RC (LRC) methods to a non-classical ME model, and describe more efficient versions that combine estimates from the main study and internal sub-study. We apply these methods to data from the Observing Protein and Energy Nutrition (OPEN) study. Using simulations we show that (i) for normally distributed covariates efficient NBRC and LRC were nearly unbiased and performed well with sub-study size ≥200; (ii) efficient NBRC had lower MSE than efficient LRC; (iii) the naïve test for a single interaction had type I error probability close to the nominal significance level, whereas efficient NBRC and LRC were slightly anti-conservative but more powerful; (iv) for markedly non-normal covariates, efficient LRC yielded less biased estimators with smaller variance than efficient NBRC. Our simulations suggest that it is preferable to use: (i) efficient NBRC for estimating and testing interaction effects of normally distributed covariates and (ii) efficient LRC for estimating and testing interactions for markedly non-normal covariates. © The Author(s) 2013.
Robertson, Dale M.; Schwarz, Gregory E.; Saad, David A.; Alexander, Richard B.
2009-01-01
Excessive loads of nutrients transported by tributary rivers have been linked to hypoxia in the Gulf of Mexico. Management efforts to reduce the hypoxic zone in the Gulf of Mexico and improve the water quality of rivers and streams could benefit from targeting nutrient reductions toward watersheds with the highest nutrient yields delivered to sensitive downstream waters. One challenge is that most conventional watershed modeling approaches (e.g., mechanistic models) used in these management decisions do not consider uncertainties in the predictions of nutrient yields and their downstream delivery. The increasing use of parameter estimation procedures to statistically estimate model coefficients, however, allows uncertainties in these predictions to be reliably estimated. Here, we use a robust bootstrapping procedure applied to the results of a previous application of the hybrid statistical/mechanistic watershed model SPARROW (Spatially Referenced Regression On Watershed attributes) to develop a statistically reliable method for identifying “high priority” areas for management, based on a probabilistic ranking of delivered nutrient yields from watersheds throughout a basin. The method is designed to be used by managers to prioritize watersheds where additional stream monitoring and evaluations of nutrient-reduction strategies could be undertaken. Our ranking procedure incorporates information on the confidence intervals of model predictions and the corresponding watershed rankings of the delivered nutrient yields. From this quantified uncertainty, we estimate the probability that individual watersheds are among a collection of watersheds that have the highest delivered nutrient yields. We illustrate the application of the procedure to 818 eight-digit Hydrologic Unit Code watersheds in the Mississippi/Atchafalaya River basin by identifying 150 watersheds having the highest delivered nutrient yields to the Gulf of Mexico. Highest delivered yields were from watersheds in the Central Mississippi, Ohio, and Lower Mississippi River basins. With 90% confidence, only a few watersheds can be reliably placed into the highest 150 category; however, many more watersheds can be removed from consideration as not belonging to the highest 150 category. Results from this ranking procedure provide robust information on watershed nutrient yields that can benefit management efforts to reduce nutrient loadings to downstream coastal waters, such as the Gulf of Mexico, or to local receiving streams and reservoirs.
He, Jin-Zhe; Shao, Ping; Liu, Jian-Hua; Ru, Qiao-Mei
2012-01-01
Supercritical carbon dioxide (SC-CO2) extraction of flavonoids from pomelo (Citrus grandis (L.) Osbeck) peel and their antioxidant activity were investigated. Box-Behnken design combined with response surface methodology was employed to maximize the extraction yield of flavonoids. Correlation analysis of the mathematical-regression model indicated that a quadratic polynomial model could be used to optimize the SC-CO2 extraction of flavonoids. The optimal conditions for obtaining the highest extraction yield of flavonoids from pomelo peel were a temperature of 80 °C, a pressure of 39 MPa and a static extraction time of 49 min in the presence of 85% ethanol as modifier. Under these conditions, the experimental yield was 2.37%, which matched positively with the value predicted by the model. Furthermore, flavonoids obtained by SC-CO2 extraction showed a higher scavenging activity on hydroxyl, 1,1-diphenyl-2-picrylhydrazyl (DPPH) and 2,2′-azino-bis(3-ethylbenzthiazoline-6-sulphonic acid) (ABTS) radicals than those obtained by conventional solvent extraction (CSE). Therefore, SC-CO2 extraction can be considered as a suitable technique for the obtainment of flavonoids from pomelo peel. PMID:23202938
A hierarchical spatial model for well yield in complex aquifers
NASA Astrophysics Data System (ADS)
Montgomery, J.; O'sullivan, F.
2017-12-01
Efficiently siting and managing groundwater wells requires reliable estimates of the amount of water that can be produced, or the well yield. This can be challenging to predict in highly complex, heterogeneous fractured aquifers due to the uncertainty around local hydraulic properties. Promising statistical approaches have been advanced in recent years. For instance, kriging and multivariate regression analysis have been applied to well test data with limited but encouraging levels of prediction accuracy. Additionally, some analytical solutions to diffusion in homogeneous porous media have been used to infer "effective" properties consistent with observed flow rates or drawdown. However, this is an under-specified inverse problem with substantial and irreducible uncertainty. We describe a flexible machine learning approach capable of combining diverse datasets with constraining physical and geostatistical models for improved well yield prediction accuracy and uncertainty quantification. Our approach can be implemented within a hierarchical Bayesian framework using Markov Chain Monte Carlo, which allows for additional sources of information to be incorporated in priors to further constrain and improve predictions and reduce the model order. We demonstrate the usefulness of this approach using data from over 7,000 wells in a fractured bedrock aquifer.
Pakrokh Ghavi, Peyman
2015-04-01
Response surface methodology (RSM) with a central composite rotatable design (CCRD) based on five levels was employed to model and optimize four experimental operating conditions of extraction temperature (10-90 °C) and time (6-30 h), particle size (6-24 mm) and water to solid (W/S, 10-50) ratio, obtaining polysaccharides from Althaea officinalis roots with high yield and antioxidant activity. For each response, a second-order polynomial model with high R(2) values (> 0.966) was developed using multiple linear regression analysis. Results showed that the most significant (P < 0.05) extraction conditions that affect the yield and antioxidant activity of extracted polysaccharides were the main effect of extraction temperature and the interaction effect of the particle size and W/S ratio. The optimum conditions to maximize yield (10.80%) and antioxidant activity (84.09%) for polysaccharides extraction from A. officinalis roots were extraction temperature 60.90 °C, extraction time 12.01 h, particle size 12.0mm and W/S ratio of 40.0. The experimental values were found to be in agreement with those predicted, indicating the models suitability for optimizing the polysaccharides extraction conditions. Copyright © 2015 Elsevier B.V. All rights reserved.
The Role of Climate Covariability on Crop Yields in the Conterminous United States
Leng, Guoyong; Zhang, Xuesong; Huang, Maoyi; ...
2016-09-12
The covariability of temperature (T), precipitation (P) and radiation (R) is an important aspect in understanding the climate influence on crop yields. Here in this paper, we analyze county-level corn and soybean yields and observed climate for the period 1983–2012 to understand how growing-season (June, July and August) mean T, P and R influence crop yields jointly and in isolation across the CONterminous United States (CONUS). Results show that nationally averaged corn and soybean yields exhibit large interannual variability of 21% and 22%, of which 35% and 32% can be significantly explained by T and P, respectively. By including R,more » an additional of 5% in variability can be explained for both crops. Using partial regression analyses, we find that studies that ignore the covariability among T, P, and R can substantially overestimate the sensitivity of crop yields to a single climate factor at the county scale. Further analyses indicate large spatial variation in the relative contributions of different climate variables to the variability of historical corn and soybean yields. Finally, the structure of the dominant climate factors did not change substantially over 1983–2012, confirming the robustness of the findings, which have important implications for crop yield prediction and crop model validations.« less
NASA Astrophysics Data System (ADS)
Al-Shomrany, Adel
The study aims to evaluate various remote sensing drought indices to assess those most fitting for monitoring agricultural drought. The objectives are (1) to assess and study the impact of drought effect on (corn and soybean) crop production by crop mapping information and GIS technology; (2) to use Geographical Weighted Regression (GWR) as a technical approach to evaluate the spatial relationships between precipitation vs. irrigated and non-irrigated corn and soybean yield, using a Nebraska county-level case study; (3) to assess agricultural drought indices derived from remote sensing (NDVI, NMDI, NDWI, and NDII6); (4) to develop an optimal approach for agricultural drought detection based on remote sensing measurements to determine the relationship between US county-level yields versus relatively common variables collected. Extreme drought creates low corn and soybean production where irrigation systems are not implemented. This results in a lack of moisture in soil leading to dry land and stale crop yields. When precipitation and moisture is found across all states, corn and soybean production flourishes. For Kansas, Nebraska, and South Dakota, irrigation management methods assist in strong crop yields throughout SPI monthly averages. The data gathered on irrigation consisted of using drought indices gathered by the national agricultural statistics service website. For the SPI levels ranging between one-month and nine-months, Kansas and Nebraska performed the best out of all 12-states contained in the Midwestern primary Corn and Soybean Belt. The reasoning behind Kansas and Nebraska's results was due to a more efficient and sustainable irrigation system, where upon South Dakota lacked. South Dakota was leveled by strong correlations throughout all SPI periods for corn only. Kansas showed its strongest correlations for the two-month and three-month averages, for both corn and soybean. Precipitation regression with irrigated and non-irrigated maize (corn) and soybean levels show yields as a function of precipitation. The GWR models predicted that yields were significantly better than OLS performances for maize (corn) and soybean. The OLS regression model when used showed a general trend of correlation between observed yields and long-term mean precipitation totals, with 84% and 63% of the variability in mean yield explained by the mean annual precipitation for the non-irrigated crops. The GWR technique performance in predicting yields was significantly better than OLS performances. For instance in the months of June, July, and August precipitations had greater impacts on maize (corn) yields than soybeans under non-irrigated conditions as a result of the greater sensitivity maize (corn) had to water stress. SPI is capable of offering various time-scales enabling it to show initial warning signs of drought conditions and accompanying severity levels. SPI calculation techniques used for various locations are reflected upon the precipitation records acquired during those periods. Over the 3, 6, and 9-month periods, NDII6 performed the best out of all of the MODIS indices as shown in its results in monitoring vegetation moisture and drought detection. NDII6 performed the best due to its detection abilities. The 9-month SPI provides an indication of inter-seasonal precipitation patterns over medium timescale duration. A new approach used is to average corn and soybean yields for all counties of the study area in comparison with average anomalies of the MODIS indices for the growing season between May through September from 2006-2012. There was a strong correlation between average corn yields versus MODIS NDII6 averages for these years with R2 equaling 0.62. That means NDII6 is the best indicator to show drought conditions and vegetation moisture monitoring. There was a weak correlation with R2 = 0.16 between averages of soybean yields and averages of precipitation. Irrigation and management systems, technological improvements from hybrids, producer management techniques, and other management practices have an impact on crop yield productions. (Abstract shortened by ProQuest.).
NASA Astrophysics Data System (ADS)
Alam, Md Jahangir; Goodall, Jonathan L.
2012-04-01
The goal of this research was to quantify the relative impact of hydrologic and nitrogen source changes on incremental nitrogen yield in the contiguous United States. Using nitrogen source estimates from various federal data bases, remotely sensed land use data from the National Land Cover Data program, and observed instream loadings from the United States Geological Survey National Stream Quality Accounting Network program, we calibrated and applied the spatially referenced regression model SPARROW to estimate incremental nitrogen yield for the contiguous United States. We ran different model scenarios to separate the effects of changes in source contributions from hydrologic changes for the years 1992 and 2001, assuming that only state conditions changed and that model coefficients describing the stream water-quality response to changes in state conditions remained constant between 1992 and 2001. Model results show a decrease of 8.2% in the median incremental nitrogen yield over the period of analysis with the vast majority of this decrease due to changes in hydrologic conditions rather than decreases in nitrogen sources. For example, when we changed the 1992 version of the model to have nitrogen source data from 2001, the model results showed only a small increase in median incremental nitrogen yield (0.12%). However, when we changed the 1992 version of the model to have hydrologic conditions from 2001, model results showed a decrease of approximately 8.7% in median incremental nitrogen yield. We did, however, find notable differences in incremental yield estimates for different sources of nitrogen after controlling for hydrologic changes, particularly for population related sources. For example, the median incremental yield for population related sources increased by 8.4% after controlling for hydrologic changes. This is in contrast to a 2.8% decrease in population related sources when hydrologic changes are included in the analysis. Likewise we found that median incremental yield from urban watersheds increased by 6.8% after controlling for hydrologic changes—in contrast to the median incremental nitrogen yield from cropland watersheds, which decreased by 2.1% over the same time period. These results suggest that, after accounting for hydrologic changes, population related sources became a more significant contributor of nitrogen yield to streams in the contiguous United States over the period of analysis. However, this study was not able to account for the influence of human management practices such as improvements in wastewater treatment plants or Best Management Practices that likely improved water quality, due to a lack of data for quantifying the impact of these practices for the study area.
Garriga, Miguel; Romero-Bravo, Sebastián; Estrada, Félix; Escobar, Alejandro; Matus, Iván A.; del Pozo, Alejandro; Astudillo, Cesar A.; Lobos, Gustavo A.
2017-01-01
Phenotyping, via remote and proximal sensing techniques, of the agronomic and physiological traits associated with yield potential and drought adaptation could contribute to improvements in breeding programs. In the present study, 384 genotypes of wheat (Triticum aestivum L.) were tested under fully irrigated (FI) and water stress (WS) conditions. The following traits were evaluated and assessed via spectral reflectance: Grain yield (GY), spikes per square meter (SM2), kernels per spike (KPS), thousand-kernel weight (TKW), chlorophyll content (SPAD), stem water soluble carbohydrate concentration and content (WSC and WSCC, respectively), carbon isotope discrimination (Δ13C), and leaf area index (LAI). The performances of spectral reflectance indices (SRIs), four regression algorithms (PCR, PLSR, ridge regression RR, and SVR), and three classification methods (PCA-LDA, PLS-DA, and kNN) were evaluated for the prediction of each trait. For the classification approaches, two classes were established for each trait: The lower 80% of the trait variability range (Class 1) and the remaining 20% (Class 2 or elite genotypes). Both the SRIs and regression methods performed better when data from FI and WS were combined. The traits that were best estimated by SRIs and regression methods were GY and Δ13C. For most traits and conditions, the estimations provided by RR and SVR were the same, or better than, those provided by the SRIs. PLS-DA showed the best performance among the categorical methods and, unlike the SRI and regression models, most traits were relatively well-classified within a specific hydric condition (FI or WS), proving that classification approach is an effective tool to be explored in future studies related to genotype selection. PMID:28337210
Garriga, Miguel; Romero-Bravo, Sebastián; Estrada, Félix; Escobar, Alejandro; Matus, Iván A; Del Pozo, Alejandro; Astudillo, Cesar A; Lobos, Gustavo A
2017-01-01
Phenotyping, via remote and proximal sensing techniques, of the agronomic and physiological traits associated with yield potential and drought adaptation could contribute to improvements in breeding programs. In the present study, 384 genotypes of wheat ( Triticum aestivum L.) were tested under fully irrigated (FI) and water stress (WS) conditions. The following traits were evaluated and assessed via spectral reflectance: Grain yield (GY), spikes per square meter (SM2), kernels per spike (KPS), thousand-kernel weight (TKW), chlorophyll content (SPAD), stem water soluble carbohydrate concentration and content (WSC and WSCC, respectively), carbon isotope discrimination (Δ 13 C), and leaf area index (LAI). The performances of spectral reflectance indices (SRIs), four regression algorithms (PCR, PLSR, ridge regression RR, and SVR), and three classification methods (PCA-LDA, PLS-DA, and k NN) were evaluated for the prediction of each trait. For the classification approaches, two classes were established for each trait: The lower 80% of the trait variability range (Class 1) and the remaining 20% (Class 2 or elite genotypes). Both the SRIs and regression methods performed better when data from FI and WS were combined. The traits that were best estimated by SRIs and regression methods were GY and Δ 13 C. For most traits and conditions, the estimations provided by RR and SVR were the same, or better than, those provided by the SRIs. PLS-DA showed the best performance among the categorical methods and, unlike the SRI and regression models, most traits were relatively well-classified within a specific hydric condition (FI or WS), proving that classification approach is an effective tool to be explored in future studies related to genotype selection.
Ebshish, Ali; Yaakob, Zahira; Taufiq-Yap, Yun Hin; Bshish, Ahmed
2014-03-19
In this work; a response surface methodology (RSM) was implemented to investigate the process variables in a hydrogen production system. The effects of five independent variables; namely the temperature (X₁); the flow rate (X₂); the catalyst weight (X₃); the catalyst loading (X₄) and the glycerol-water molar ratio (X₅) on the H₂ yield (Y₁) and the conversion of glycerol to gaseous products (Y₂) were explored. Using multiple regression analysis; the experimental results of the H₂ yield and the glycerol conversion to gases were fit to quadratic polynomial models. The proposed mathematical models have correlated the dependent factors well within the limits that were being examined. The best values of the process variables were a temperature of approximately 600 °C; a feed flow rate of 0.05 mL/min; a catalyst weight of 0.2 g; a catalyst loading of 20% and a glycerol-water molar ratio of approximately 12; where the H₂ yield was predicted to be 57.6% and the conversion of glycerol was predicted to be 75%. To validate the proposed models; statistical analysis using a two-sample t -test was performed; and the results showed that the models could predict the responses satisfactorily within the limits of the variables that were studied.
Probabilistic Forecasting of Surface Ozone with a Novel Statistical Approach
NASA Technical Reports Server (NTRS)
Balashov, Nikolay V.; Thompson, Anne M.; Young, George S.
2017-01-01
The recent change in the Environmental Protection Agency's surface ozone regulation, lowering the surface ozone daily maximum 8-h average (MDA8) exceedance threshold from 75 to 70 ppbv, poses significant challenges to U.S. air quality (AQ) forecasters responsible for ozone MDA8 forecasts. The forecasters, supplied by only a few AQ model products, end up relying heavily on self-developed tools. To help U.S. AQ forecasters, this study explores a surface ozone MDA8 forecasting tool that is based solely on statistical methods and standard meteorological variables from the numerical weather prediction (NWP) models. The model combines the self-organizing map (SOM), which is a clustering technique, with a step wise weighted quadratic regression using meteorological variables as predictors for ozone MDA8. The SOM method identifies different weather regimes, to distinguish between various modes of ozone variability, and groups them according to similarity. In this way, when a regression is developed for a specific regime, data from the other regimes are also used, with weights that are based on their similarity to this specific regime. This approach, regression in SOM (REGiS), yields a distinct model for each regime taking into account both the training cases for that regime and other similar training cases. To produce probabilistic MDA8 ozone forecasts, REGiS weighs and combines all of the developed regression models on the basis of the weather patterns predicted by an NWP model. REGiS is evaluated over the San Joaquin Valley in California and the northeastern plains of Colorado. The results suggest that the model performs best when trained and adjusted separately for an individual AQ station and its corresponding meteorological site.
Evaluation of weather-based rice yield models in India
NASA Astrophysics Data System (ADS)
Sudharsan, D.; Adinarayana, J.; Reddy, D. Raji; Sreenivas, G.; Ninomiya, S.; Hirafuji, M.; Kiura, T.; Tanaka, K.; Desai, U. B.; Merchant, S. N.
2013-01-01
The objective of this study was to compare two different rice simulation models—standalone (Decision Support System for Agrotechnology Transfer [DSSAT]) and web based (SImulation Model for RIce-Weather relations [SIMRIW])—with agrometeorological data and agronomic parameters for estimation of rice crop production in southern semi-arid tropics of India. Studies were carried out on the BPT5204 rice variety to evaluate two crop simulation models. Long-term experiments were conducted in a research farm of Acharya N G Ranga Agricultural University (ANGRAU), Hyderabad, India. Initially, the results were obtained using 4 years (1994-1997) of data with weather parameters from a local weather station to evaluate DSSAT simulated results with observed values. Linear regression models used for the purpose showed a close relationship between DSSAT and observed yield. Subsequently, yield comparisons were also carried out with SIMRIW and DSSAT, and validated with actual observed values. Realizing the correlation coefficient values of SIMRIW simulation values in acceptable limits, further rice experiments in monsoon (Kharif) and post-monsoon (Rabi) agricultural seasons (2009, 2010 and 2011) were carried out with a location-specific distributed sensor network system. These proximal systems help to simulate dry weight, leaf area index and potential yield by the Java based SIMRIW on a daily/weekly/monthly/seasonal basis. These dynamic parameters are useful to the farming community for necessary decision making in a ubiquitous manner. However, SIMRIW requires fine tuning for better results/decision making.
NASA Astrophysics Data System (ADS)
Berger, Lukas; Kleinheinz, Konstantin; Attili, Antonio; Bisetti, Fabrizio; Pitsch, Heinz; Mueller, Michael E.
2018-05-01
Modelling unclosed terms in partial differential equations typically involves two steps: First, a set of known quantities needs to be specified as input parameters for a model, and second, a specific functional form needs to be defined to model the unclosed terms by the input parameters. Both steps involve a certain modelling error, with the former known as the irreducible error and the latter referred to as the functional error. Typically, only the total modelling error, which is the sum of functional and irreducible error, is assessed, but the concept of the optimal estimator enables the separate analysis of the total and the irreducible errors, yielding a systematic modelling error decomposition. In this work, attention is paid to the techniques themselves required for the practical computation of irreducible errors. Typically, histograms are used for optimal estimator analyses, but this technique is found to add a non-negligible spurious contribution to the irreducible error if models with multiple input parameters are assessed. Thus, the error decomposition of an optimal estimator analysis becomes inaccurate, and misleading conclusions concerning modelling errors may be drawn. In this work, numerically accurate techniques for optimal estimator analyses are identified and a suitable evaluation of irreducible errors is presented. Four different computational techniques are considered: a histogram technique, artificial neural networks, multivariate adaptive regression splines, and an additive model based on a kernel method. For multiple input parameter models, only artificial neural networks and multivariate adaptive regression splines are found to yield satisfactorily accurate results. Beyond a certain number of input parameters, the assessment of models in an optimal estimator analysis even becomes practically infeasible if histograms are used. The optimal estimator analysis in this paper is applied to modelling the filtered soot intermittency in large eddy simulations using a dataset of a direct numerical simulation of a non-premixed sooting turbulent flame.
Optimization of grapevine yield by applying mathematical models to obtain quality wine products
NASA Astrophysics Data System (ADS)
Alina, Dobrei; Alin, Dobrei; Eleonora, Nistor; Teodor, Cristea; Marius, Boldea; Florin, Sala
2016-06-01
Relationship between the crop load and the grape yield and quality is a dynamic process, specific for wine cultivars and for fresh consumption varieties. Modeling these relations is important for the improvement of technological works. This study evaluated the interrelationship of crop load (B - buds number) and several production parameters (Y - yield; S - sugar; A - acidity; GaI - Glucoacidimetric index; AP - alcoholic potential; F - flavorings, WA - wine alcohol; SR - sugar residue, in Muscat Ottonel wine cultivar and Y - yield; S - sugar; A - acidity; GaI - Glucoacidimetric Index; CP - commercial production; BS - berries size in the Victoria table grape cultivar). In both varieties have been identified correlations between the independent variable (B - buds number as a result of pruning and training practices) and quality parameters analyzed (r = -0.699 for B vsY relationship; r = 0.961 for the relationship B vs S; r = -0.959 for B vs AP relationship; r = 0.743 for the relationship Y vs S, p <0.01, in the Muscat Ottonel cultivar, respectively r = -0.907 for relationship B vs Y; r = -0.975 for B vs CP relationship; r = -0.971 for relationship B vs BS; r = 0.990 for CP vs BS relationship in the Victoria cultivar. Through regression analysis were obtained models that describe the variation concerning production and quality parameters in relation to the independent variable (B - buds number) with statistical significance results.
Guillaume, Bryan; Wang, Changqing; Poh, Joann; Shen, Mo Jun; Ong, Mei Lyn; Tan, Pei Fang; Karnani, Neerja; Meaney, Michael; Qiu, Anqi
2018-06-01
Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
EPA Office of Water (OW): 2002 SPARROW Total NP (Catchments)
SPARROW (SPAtially Referenced Regressions On Watershed attributes) is a watershed modeling tool with output that allows the user to interpret water quality monitoring data at the regional and sub-regional scale. The model relates in-stream water-quality measurements to spatially referenced characteristics of watersheds, including pollutant sources and environmental factors that affect rates of pollutant delivery to streams from the land and aquatic, in-stream processing . The core of the model consists of a nonlinear regression equation describing the non-conservative transport of contaminants from point and non-point (or ??diffuse??) sources on land to rivers and through the stream and river network. SPARROW estimates contaminant concentrations, loads (or ??mass,?? which is the product of concentration and streamflow), and yields in streams (mass of nitrogen and of phosphorus entering a stream per acre of land). It empirically estimates the origin and fate of contaminants in streams and receiving bodies, and quantifies uncertainties in model predictions. The model predictions are illustrated through detailed maps that provide information about contaminant loadings and source contributions at multiple scales for specific stream reaches, basins, or other geographic areas.
Imaging genetics approach to predict progression of Parkinson's diseases.
Mansu Kim; Seong-Jin Son; Hyunjin Park
2017-07-01
Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
NASA Astrophysics Data System (ADS)
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
2017-06-01
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
Fadzillah, Nurrulhidayah Ahmad; Man, Yaakob bin Che; Rohman, Abdul; Rosman, Arieff Salleh; Ismail, Amin; Mustafa, Shuhaimi; Khatib, Alfi
2015-01-01
The authentication of food products from the presence of non-allowed components for certain religion like lard is very important. In this study, we used proton Nuclear Magnetic Resonance ((1)H-NMR) spectroscopy for the analysis of butter adulterated with lard by simultaneously quantification of all proton bearing compounds, and consequently all relevant sample classes. Since the spectra obtained were too complex to be analyzed visually by the naked eyes, the classification of spectra was carried out.The multivariate calibration of partial least square (PLS) regression was used for modelling the relationship between actual value of lard and predicted value. The model yielded a highest regression coefficient (R(2)) of 0.998 and the lowest root mean square error calibration (RMSEC) of 0.0091% and root mean square error prediction (RMSEP) of 0.0090, respectively. Cross validation testing evaluates the predictive power of the model. PLS model was shown as good models as the intercept of R(2)Y and Q(2)Y were 0.0853 and -0.309, respectively.
Models of subjective response to in-flight motion data
NASA Technical Reports Server (NTRS)
Rudrapatna, A. N.; Jacobson, I. D.
1973-01-01
Mathematical relationships between subjective comfort and environmental variables in an air transportation system are investigated. As a first step in model building, only the motion variables are incorporated and sensitivities are obtained using stepwise multiple regression analysis. The data for these models have been collected from commercial passenger flights. Two models are considered. In the first, subjective comfort is assumed to depend on rms values of the six-degrees-of-freedom accelerations. The second assumes a Rustenburg type human response function in obtaining frequency weighted rms accelerations, which are used in a linear model. The form of the human response function is examined and the results yield a human response weighting function for different degrees of freedom.
A non-destructive selection criterion for fibre content in jute : II. Regression approach.
Arunachalam, V; Iyer, R D
1974-01-01
An experiment with ten populations of jute, comprising varieties and mutants of the two species Corchorus olitorius and C.capsularis was conducted at two different locations with the object of evolving an effective criterion for selecting superior single plants for fibre yield. At Delhi, variation existed only between varieties as a group and mutants as a group, while at Pusa variation also existed among the mutant populations of C. capsularis.A multiple regression approach was used to find the optimum combination of characters for prediction of fibre yield. A process of successive elimination of characters based on the coefficient of determination provided by individual regression equations was employed to arrive at the optimal set of characters for predicting fibre yield. It was found that plant height, basal and mid-diameters and basal and mid-dry fibre weights would provide such an optimal set.
Kamphuis, C; Frank, E; Burke, J K; Verkerk, G A; Jago, J G
2013-01-01
The hypothesis was that sensors currently available on farm that monitor behavioral and physiological characteristics have potential for the detection of lameness in dairy cows. This was tested by applying additive logistic regression to variables derived from sensor data. Data were collected between November 2010 and June 2012 on 5 commercial pasture-based dairy farms. Sensor data from weigh scales (liveweight), pedometers (activity), and milk meters (milking order, unadjusted and adjusted milk yield in the first 2 min of milking, total milk yield, and milking duration) were collected at every milking from 4,904 cows. Lameness events were recorded by farmers who were trained in detecting lameness before the study commenced. A total of 318 lameness events affecting 292 cows were available for statistical analyses. For each lameness event, the lame cow's sensor data for a time period of 14 d before observation date were randomly matched by farm and date to 10 healthy cows (i.e., cows that were not lame and had no other health event recorded for the matched time period). Sensor data relating to the 14-d time periods were used for developing univariable (using one source of sensor data) and multivariable (using multiple sources of sensor data) models. Model development involved the use of additive logistic regression by applying the LogitBoost algorithm with a regression tree as base learner. The model's output was a probability estimate for lameness, given the sensor data collected during the 14-d time period. Models were validated using leave-one-farm-out cross-validation and, as a result of this validation, each cow in the data set (318 lame and 3,180 nonlame cows) received a probability estimate for lameness. Based on the area under the curve (AUC), results indicated that univariable models had low predictive potential, with the highest AUC values found for liveweight (AUC=0.66), activity (AUC=0.60), and milking order (AUC=0.65). Combining these 3 sensors improved AUC to 0.74. Detection performance of this combined model varied between farms but it consistently and significantly outperformed univariable models across farms at a fixed specificity of 80%. Still, detection performance was not high enough to be implemented in practice on large, pasture-based dairy farms. Future research may improve performance by developing variables based on sensor data of liveweight, activity, and milking order, but that better describe changes in sensor data patterns when cows go lame. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Estimating climate change, CO2 and technology development effects on wheat yield in northeast Iran
NASA Astrophysics Data System (ADS)
Bannayan, M.; Mansoori, H.; Rezaei, E. Eyshi
2014-04-01
Wheat is the main food for the majority of Iran's population. Precise estimation of wheat yield change in future is essential for any possible revision of management strategies. The main objective of this study was to evaluate the effects of climate change, CO2 concentration, technology development and their integrated effects on wheat production under future climate change. This study was performed under two scenarios of the IPCC Special Report on Emission Scenarios (SRES): regional economic (A2) and global environmental (B1). Crop production was projected for three future time periods (2020, 2050 and 2080) in comparison with a baseline year (2005) for Khorasan province located in the northeast of Iran. Four study locations in the study area included Mashhad, Birjand, Bojnourd and Sabzevar. The effect of technology development was calculated by fitting a regression equation between the observed wheat yields against historical years considering yield potential increase and yield gap reduction as technology development. Yield relative increase per unit change of CO2 concentration (1 ppm-1) was considered 0.05 % and was used to implement the effect of elevated CO2. The HadCM3 general circulation model along with the CSM-CERES-Wheat crop model were used to project climate change effects on wheat crop yield. Our results illustrate that, among all the factors considered, technology development provided the highest impact on wheat yield change. Highest wheat yield increase across all locations and time periods was obtained under the A2 scenario. Among study locations, Mashhad showed the highest change in wheat yield. Yield change compared to baseline ranged from -28 % to 56 % when the integration of all factors was considered across all locations. It seems that achieving higher yield of wheat in future may be expected in northeast Iran assuming stable improvements in production technology.
Estimating climate change, CO2 and technology development effects on wheat yield in northeast Iran.
Bannayan, M; Mansoori, H; Rezaei, E Eyshi
2014-04-01
Wheat is the main food for the majority of Iran's population. Precise estimation of wheat yield change in future is essential for any possible revision of management strategies. The main objective of this study was to evaluate the effects of climate change, CO2 concentration, technology development and their integrated effects on wheat production under future climate change. This study was performed under two scenarios of the IPCC Special Report on Emission Scenarios (SRES): regional economic (A2) and global environmental (B1). Crop production was projected for three future time periods (2020, 2050 and 2080) in comparison with a baseline year (2005) for Khorasan province located in the northeast of Iran. Four study locations in the study area included Mashhad, Birjand, Bojnourd and Sabzevar. The effect of technology development was calculated by fitting a regression equation between the observed wheat yields against historical years considering yield potential increase and yield gap reduction as technology development. Yield relative increase per unit change of CO2 concentration (1 ppm(-1)) was considered 0.05 % and was used to implement the effect of elevated CO2. The HadCM3 general circulation model along with the CSM-CERES-Wheat crop model were used to project climate change effects on wheat crop yield. Our results illustrate that, among all the factors considered, technology development provided the highest impact on wheat yield change. Highest wheat yield increase across all locations and time periods was obtained under the A2 scenario. Among study locations, Mashhad showed the highest change in wheat yield. Yield change compared to baseline ranged from -28 % to 56 % when the integration of all factors was considered across all locations. It seems that achieving higher yield of wheat in future may be expected in northeast Iran assuming stable improvements in production technology.
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08
Casey, Martin F.; Neidell, Matthew
2013-01-01
Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Peeters, R; Galesloot, P J B
2002-03-01
The objective of this study was to estimate the daily fat yield and fat percentage from one sampled milking per cow per test day in an automatic milking system herd, when the milking times and milk yields of all individual milkings are recorded by the automatic milking system. Multiple regression models were used to estimate the 24-h fat percentage when only one milking is sampled for components and milk yields and milking times are known for all milkings in the 24-h period before the sampled milking. In total, 10,697 cow test day records, from 595 herd tests at 91 Dutch herds milked with an automatic milking system, were used. The best model to predict 24-h fat percentage included fat percentage, protein percentage, milk yield and milking interval of the sampled milking, milk yield, and milking interval of the preceding milking, and the interaction between milking interval and the ratio of fat and protein percentage of the sampled milking. This model gave a standard deviation of the prediction error (SE) for 24-h fat percentage of 0.321 and a correlation between the predicted and actual 24-h fat percentage of 0.910. For the 24-h fat yield, we found SE = 90 g and correlation = 0.967. This precision is slightly better than that of present a.m.-p.m. testing schemes. Extra attention must be paid to correctly matching the sample jars and the milkings. Furthermore, milkings with an interval of less than 4 h must be excluded from sampling as well as milkings that are interrupted or that follow an interrupted milking. Under these restrictions (correct matching, interval of at least 4 h, and no interrupted milking), one sampled milking suffices to get a satisfactory estimate for the test-day fat yield.
Multi-fidelity Gaussian process regression for prediction of random fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parussini, L.; Venturi, D., E-mail: venturi@ucsc.edu; Perdikaris, P.
We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgersmore » equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.« less
Methods for estimating drought streamflow probabilities for Virginia streams
Austin, Samuel H.
2014-01-01
Maximum likelihood logistic regression model equations used to estimate drought flow probabilities for Virginia streams are presented for 259 hydrologic basins in Virginia. Winter streamflows were used to estimate the likelihood of streamflows during the subsequent drought-prone summer months. The maximum likelihood logistic regression models identify probable streamflows from 5 to 8 months in advance. More than 5 million streamflow daily values collected over the period of record (January 1, 1900 through May 16, 2012) were compiled and analyzed over a minimum 10-year (maximum 112-year) period of record. The analysis yielded the 46,704 equations with statistically significant fit statistics and parameter ranges published in two tables in this report. These model equations produce summer month (July, August, and September) drought flow threshold probabilities as a function of streamflows during the previous winter months (November, December, January, and February). Example calculations are provided, demonstrating how to use the equations to estimate probable streamflows as much as 8 months in advance.
Caccamo, M; Ferguson, J D; Veerkamp, R F; Schadt, I; Petriglieri, R; Azzaro, G; Pozzebon, A; Licitra, G
2014-01-01
As part of a larger project aiming to develop management evaluation tools based on results from test-day (TD) models, the objective of this study was to examine the effect of physical composition of total mixed rations (TMR) tested quarterly from March 2006 through December 2008 on milk, fat, and protein yield curves for 25 herds in Ragusa, Sicily. A random regression sire-maternal grandsire model was used to estimate variance components for milk, fat, and protein yields fitted on a full data set, including 241,153 TD records from 9,809 animals in 42 herds recorded from 1995 through 2008. The model included parity, age at calving, year at calving, and stage of pregnancy as fixed effects. Random effects were herd × test date, sire and maternal grandsire additive genetic effect, and permanent environmental effect modeled using third-order Legendre polynomials. Model fitting was carried out using ASREML. Afterward, for the 25 herds involved in the study, 9 particle size classes were defined based on the proportions of TMR particles on the top (19-mm) and middle (8-mm) screen of the Penn State Particle Separator. Subsequently, the model with estimated variance components was used to examine the influence of TMR particle size class on milk, fat, and protein yield curves. An interaction was included with the particle size class and days in milk. The effect of the TMR particle size class was modeled using a ninth-order Legendre polynomial. Lactation curves were predicted from the model while controlling for TMR chemical composition (crude protein content of 15.5%, neutral detergent fiber of 40.7%, and starch of 19.7% for all classes), to have pure estimates of particle distribution not confounded by nutrient content of TMR. We found little effect of class of particle proportions on milk yield and fat yield curves. Protein yield was greater for sieve classes with 10.4 to 17.4% of TMR particles retained on the top (19-mm) sieve. Optimal distributions different from those recommended may reflect regional differences based on climate and types and quality of forages fed. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Smith, James A.
1992-01-01
The inversion of the leaf area index (LAI) canopy parameter from optical spectral reflectance measurements is obtained using a backpropagation artificial neural network trained using input-output pairs generated by a multiple scattering reflectance model. The problem of LAI estimation over sparse canopies (LAI < 1.0) with varying soil reflectance backgrounds is particularly difficult. Standard multiple regression methods applied to canopies within a single homogeneous soil type yield good results but perform unacceptably when applied across soil boundaries, resulting in absolute percentage errors of >1000 percent for low LAI. Minimization methods applied to merit functions constructed from differences between measured reflectances and predicted reflectances using multiple-scattering models are unacceptably sensitive to a good initial guess for the desired parameter. In contrast, the neural network reported generally yields absolute percentage errors of <30 percent when weighting coefficients trained on one soil type were applied to predicted canopy reflectance at a different soil background.
Garrido-Varo, Ana; Sánchez, María-Teresa; De la Haba, María-José; Torres, Irina; Pérez-Marín, Dolores
2017-01-01
Near-Infrared (NIR) Spectroscopy was used for the non-destructive assessment of physico-chemical quality parameters in olive oil. At the same time, the influence of the sample presentation mode (spinning versus static cup) was evaluated using two spectrophotometers with similar optical characteristics. A total of 478 olive oil samples were used to develop calibration models, testing various spectral signal pre-treatments. The models obtained by applying MPLS regression to spectroscopic data yielded promising results for olive oil quality measurements, particularly for acidity, the peroxide index and alkyl and ethyl ester content. The results obtained indicate that this non-invasive technology can be used successfully by the olive oil sector to categorize olive oils, to detect potential fraud and to provide consumers with more reliable information. Although both sample presentation modes yielded comparable results, equations constructed with samples scanned using the spinning mode provided greater predictive capacity. PMID:29144417
Modeling the relationships between quality and biochemical composition of fatty liver in mule ducks.
Theron, L; Cullere, M; Bouillier-Oudot, M; Manse, H; Dalle Zotte, A; Molette, C; Fernandez, X; Vitezica, Z G
2012-09-01
The fatty liver of mule ducks (i.e., French "foie gras") is the most valuable product in duck production systems. Its quality is measured by the technological yield, which is the opposite of the fat loss during cooking. The purpose of this study was to determine whether biochemical measures of fatty liver could be used to accurately predict the technological yield (TY). Ninety-one male mule ducks were bred, overfed, and slaughtered under commercial conditions. Fatty liver weight (FLW) and biochemical variables, such as DM, lipid (LIP), and protein content (PROT), were collected. To evaluate evidence for nonlinear fat loss during cooking, we compared regression models describing linear and nonlinear relations between biochemical measures and TY. We detected significantly greater (P = 0.02) linear relation between DM and TY. Our results indicate that LIP and PROT follow a different pattern (linear) than DM and showed that LIP and PROT are nonexclusive contributing factors to TY. Other components, such as carbohydrates, other than those measured in this study, could contribute to DM. Stepwise regression for TY was performed. The traditional model with FLW was tested. The results showed that the weight of the liver is of limited value in the determination of fat loss during cooking (R(2) = 0.14). The most accurate TY prediction equation included DM (in linear and quadratic terms), FLW, and PROT (R(2) = 0.43). Biochemical measures in the fatty liver were more accurate predictors of TY than FLW. The model is useful in commercial conditions because DM, PROT, and FLW are noninvasive measures.
Hong, Huachang; Qian, Lingya; Xiong, Yujing; Xiao, Zhuoqun; Lin, Hongjun; Yu, Haiying
2015-01-01
The deterioration of water quality, especially organic pollution in Tai Lake and the Qiantang River, have recently received attention in China. The objectives of this study were to evaluate the formation of halonitromethanes (HNMs) using multiple regression models for chlorination and chloramination and to identify the key factors that influence the formation of HNMs in Tai Lake and the Qiantang River. The results showed that the total formation of HNMs (T-HNMs) during chlorination and chloramination could be described using the following models: (1) [Formula: see text] =(10)(5.267)(DON)(6.645)(Br(-))(0.737)(DOC)(-)(5.537)(Cl2)(0.333)(t)(0.165) (R(2)=0.974, p<0.01, n=33), and (2) T-HNMNH2Cl=(10)(-)(2.481)(Cl2)(0.451)(NO2(-))(0.382)(Br(-))(0.630)(t)(0.640)(Temp)(0.581) (R(2)=0.961, p<0.05, n=33), respectively. The key factors that influenced the T-HNM yields during chlorination were dissolved organic nitrogen (DON), bromide and dissolved organic carbon (DOC). The nitrite and bromide concentrations and the reaction time mainly affected the T-HNM yields during chloramination. Additional analysis indicated that the bromine incorporation factors (BIFs) for trihalogenated HNMs generally decreased as the chlorine/chloramine dose, temperature and reaction time decreased and increased as the bromide concentration increased. Copyright © 2014 Elsevier Ltd. All rights reserved.
Estimates of spatial and temporal variation of energy crops biomass yields in the US
NASA Astrophysics Data System (ADS)
Song, Y.; Jain, A. K.; Landuyt, W.; Kheshgi, H. S.
2013-12-01
Perennial grasses, such as switchgrass (Panicum viragatum) and Miscanthus (Miscanthus x giganteus) have been identified for potential use as biomass feedstocks in the US. Current research on perennial grass biomass production has been evaluated on small-scale plots. However, the extent to which this potential can be realized at a landscape-scale will depend on the biophysical potential to grow these grasses with minimum possible amount of land that needs to be diverted from food to fuel production. To assess this potential three questions about the biomass yield for these grasses need to be answered: (1) how the yields for different grasses are varied spatially and temporally across the US; (2) whether the yields are temporally stable or not; and (3) how the spatial and temporal trends in yields of these perennial grasses are controlled by limiting factors, including soil type, water availability, climate, and crop varieties. To answer these questions, the growth processes of the perennial grasses are implemented into a coupled biophysical, physiological and biogeochemical model (ISAM). The model has been applied to quantitatively investigate the spatial and temporal trends in biomass yields for over the period 1980 -2010 in the US. The bioenergy grasses considered in this study include Miscanthus, Cave-in-Rock switchgrass and Alamo switchgrass. The effects of climate, soil and topography on the spatial and temporal trends of biomass yields are quantitatively analyzed using principal component analysis and GIS based geographically weighted regression. The spatial temporal trend results are evaluated further to classify each part of the US into four homogeneous potential yield zones: high and stable yield zone (HS), high but unstable yield zone (HU), low and stable yield zone (LS) and low but unstable yield zone (LU). Our preliminary results indicate that the yields for perennial grasses among different zones are strongly related to the different controlling factors. For example, the yield in HS zone is depended on soil and topography factors. However, the yields in HU zone are more controlled by climate factors, leading to a large uncertainty in yield potential of bioenergy grasses under future climate change.
[Effects of Chemical Fertilizers and Organic Fertilizer on Yield of Ligusticum chuanxiong Rhizome].
Liang, Qin; Chen, Xing-fu; Li, Yan; Zhang, Jun; Meng, Jie; Peng, Shi-ming
2015-10-01
To study the effects of different N, P, K and organic fertilizer (OF) on yield of Ligusticum chuanxiong rhizome, in order to provide the theoretical foundation for the establishment of standardization cultivation techniques. The field plot experiments used Ligusticum chuanxiong rhizome which planted in Pengshan as material, and were studied by the four factors and five levels with quadratic regression rotation-orthogonal combination design. According to the data obtained, a function model which could predict the fertilization and yield of Ligusticum chuanxiong rhizome accurately was established. The model analysis showed that the yields of Ligusticum chuanxiong rhizome were significantly influenced by the N, P, K and OF applications. Among these factors, the order of increase rates by the fertilizers was K > OF > N > P; The effect of interaction between N and K, N and OF, K and OF on the yield of Ligusticum chuanxiong rhizome were significantly different. High levels of N and P, N and organic fertilizer, K and organic fertilizer were conducive to improve the yield of Ligusticum chuanxiong rhizome. The results showed that the optimal fertilizer application rates of N was 148.20 - 172.28 kg/hm2, P was 511.92 - 599.40 kg/hm2, K was 249.70 - 282.37 kg/hm2, and OF was 940.00 - 1 104.00 kg/hm2. N, P, K and OF obviously affect the yield of Ligusticum chuanxiong rhizome. K and OF can significantly increase the yield of Ligusticum chuanxiong rhizome. Thus it is suggested that properly high mount of K and OF and appropriate increasing N are two favorable factors for cultivating Ligusticum chuanxiong.
NASA Astrophysics Data System (ADS)
Pleijel, H.; Danielsson, H.; Emberson, L.; Ashmore, M. R.; Mills, G.
Applications of a parameterised Jarvis-type multiplicative stomatal conductance model with data collated from open-top chamber experiments on field grown wheat and potato were used to derive relationships between relative yield and stomatal ozone uptake. The relationships were based on thirteen experiments from four European countries for wheat and seven experiments from four European countries for potato. The parameterisation of the conductance model was based both on an extensive literature review and primary data. Application of the stomatal conductance models to the open-top chamber experiments resulted in improved linear regressions between relative yield and ozone uptake compared to earlier stomatal conductance models, both for wheat ( r2=0.83) and potato ( r2=0.76). The improvement was largest for potato. The relationships with the highest correlation were obtained using a stomatal ozone flux threshold. For both wheat and potato the best performing exposure index was AF st6 (accumulated stomatal flux of ozone above a flux rate threshold of 6 nmol ozone m -2 projected sunlit leaf area, based on hourly values of ozone flux). The results demonstrate that flux-based models are now sufficiently well calibrated to be used with confidence to predict the effects of ozone on yield loss of major arable crops across Europe. Further studies, using innovations in stomatal conductance modelling and plant exposure experimentation, are needed if these models are to be further improved.
Third molar development: measurements versus scores as age predictor.
Thevissen, P W; Fieuws, S; Willems, G
2011-10-01
Human third molar development is widely used to predict chronological age of sub adult individuals with unknown or doubted age. For these predictions, classically, the radiologically observed third molar growth and maturation is registered using a staging and related scoring technique. Measures of lengths and widths of the developing wisdom tooth and its adjacent second molar can be considered as an alternative registration. The aim of this study was to verify relations between mandibular third molar developmental stages or measurements of mandibular second molar and third molars and age. Age related performance of stages and measurements were compared to assess if measurements added information to age predictions from third molar formation stage. The sample was 340 orthopantomograms (170 females, 170 males) of individuals homogenously distributed in age between 7 and 24 years. Mandibular lower right, third and second molars, were staged following Gleiser and Hunt, length and width measurements were registered, and various ratios of these measurements were calculated. Univariable regression models with age as response and third molar stage, measurements and ratios of second and third molars as predictors, were considered. Multivariable regression models assessed if measurements or ratios added information to age prediction from third molar stage. Coefficients of determination (R(2)) and root mean squared errors (RMSE) obtained from all regression models were compared. The univariable regression model using stages as predictor yielded most accurate age predictions (males: R(2) 0.85, RMSE between 0.85 and 1.22 year; females: R(2) 0.77, RMSE between 1.19 and 2.11 year) compared to all models including measurements and ratios. The multivariable regression models indicated that measurements and ratios added no clinical relevant information to the age prediction from third molar stage. Ratios and measurements of second and third molars are less accurate age predictors than stages of developing third molars. Copyright © 2011 Elsevier Ltd. All rights reserved.
Shackelford, S D; Wheeler, T L; Koohmaraie, M
2003-01-01
The present experiment was conducted to evaluate the ability of the U.S. Meat Animal Research Center's beef carcass image analysis system to predict calculated yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score under commercial beef processing conditions. In two commercial beef-processing facilities, image analysis was conducted on 800 carcasses on the beef-grading chain immediately after the conventional USDA beef quality and yield grades were applied. Carcasses were blocked by plant and observed calculated yield grade. The carcasses were then separated, with 400 carcasses assigned to a calibration data set that was used to develop regression equations, and the remaining 400 carcasses assigned to a prediction data set used to validate the regression equations. Prediction equations, which included image analysis variables and hot carcass weight, accounted for 90, 88, 90, 88, and 76% of the variation in calculated yield grade, longissimus muscle area, preliminary yield grade, adjusted preliminary yield grade, and marbling score, respectively, in the prediction data set. In comparison, the official USDA yield grade as applied by online graders accounted for 73% of the variation in calculated yield grade. The technology described herein could be used by the beef industry to more accurately determine beef yield grades; however, this system does not provide an accurate enough prediction of marbling score to be used without USDA grader interaction for USDA quality grading.
Chan, King-Pan; Chan, Kwok-Hung; Wong, Wilfred Hing-Sang; Peiris, J. S. Malik; Wong, Chit-Ming
2011-01-01
Background Reliable estimates of disease burden associated with respiratory viruses are keys to deployment of preventive strategies such as vaccination and resource allocation. Such estimates are particularly needed in tropical and subtropical regions where some methods commonly used in temperate regions are not applicable. While a number of alternative approaches to assess the influenza associated disease burden have been recently reported, none of these models have been validated with virologically confirmed data. Even fewer methods have been developed for other common respiratory viruses such as respiratory syncytial virus (RSV), parainfluenza and adenovirus. Methods and Findings We had recently conducted a prospective population-based study of virologically confirmed hospitalization for acute respiratory illnesses in persons <18 years residing in Hong Kong Island. Here we used this dataset to validate two commonly used models for estimation of influenza disease burden, namely the rate difference model and Poisson regression model, and also explored the applicability of these models to estimate the disease burden of other respiratory viruses. The Poisson regression models with different link functions all yielded estimates well correlated with the virologically confirmed influenza associated hospitalization, especially in children older than two years. The disease burden estimates for RSV, parainfluenza and adenovirus were less reliable with wide confidence intervals. The rate difference model was not applicable to RSV, parainfluenza and adenovirus and grossly underestimated the true burden of influenza associated hospitalization. Conclusion The Poisson regression model generally produced satisfactory estimates in calculating the disease burden of respiratory viruses in a subtropical region such as Hong Kong. PMID:21412433
Portable visible and near-infrared spectrophotometer for triglyceride measurements.
Kobayashi, Takanori; Kato, Yukiko Hakariya; Tsukamoto, Megumi; Ikuta, Kazuyoshi; Sakudo, Akikazu
2009-01-01
An affordable and portable machine is required for the practical use of visible and near-infrared (Vis-NIR) spectroscopy. A portable fruit tester comprising a Vis-NIR spectrophotometer was modified for use in the transmittance mode and employed to quantify triglyceride levels in serum in combination with a chemometric analysis. Transmittance spectra collected in the 600- to 1100-nm region were subjected to a partial least-squares regression analysis and leave-out cross-validation to develop a chemometrics model for predicting triglyceride concentrations in serum. The model yielded a coefficient of determination in cross-validation (R2VAL) of 0.7831 with a standard error of cross-validation (SECV) of 43.68 mg/dl. The detection limit of the model was 148.79 mg/dl. Furthermore, masked samples predicted by the model yielded a coefficient of determination in prediction (R2PRED) of 0.6856 with a standard error of prediction (SEP) and detection limit of 61.54 and 159.38 mg/dl, respectively. The portable Vis-NIR spectrophotometer may prove convenient for the measurement of triglyceride concentrations in serum, although before practical use there remain obstacles, which are discussed.
He, Yong; Hou, Lingling; Wang, Hong; Hu, Kelin; McConkey, Brian
2014-07-30
Soil surface texture is an important environmental factor that influences crop productivity because of its direct effect on soil water and complex interactions with other environmental factors. Using 30-year data, an agricultural system model (DSSAT-CERES-Wheat) was calibrated and validated. After validation, the modelled yield and water use (WU) of spring wheat (Triticum aestivum L.) from two soil textures (silt loam and clay) under rain-fed condition were analyzed. Regression analysis showed that wheat grown in silt loam soil is more sensitive to WU than wheat grown in clay soil, indicating that the wheat grown in clay soil has higher drought tolerance than that grown in silt loam. Yield variation can be explained by WU other than by precipitation use (PU). These results demonstrated that the DSSAT-CERES-Wheat model can be used to evaluate the WU of different soil textures and assess the feasibility of wheat production under various conditions. These outcomes can improve our understanding of the long-term effect of soil texture on spring wheat productivity in rain-fed condition.
Estimation of crown closure from AVIRIS data using regression analysis
NASA Technical Reports Server (NTRS)
Staenz, K.; Williams, D. J.; Truchon, M.; Fritz, R.
1993-01-01
Crown closure is one of the input parameters used for forest growth and yield modelling. Preliminary work by Staenz et al. indicates that imaging spectrometer data acquired with sensors such as the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) have some potential for estimating crown closure on a stand level. The objectives of this paper are: (1) to establish a relationship between AVIRIS data and the crown closure derived from aerial photography of a forested test site within the Interior Douglas Fir biogeoclimatic zone in British Columbia, Canada; (2) to investigate the impact of atmospheric effects and the forest background on the correlation between AVIRIS data and crown closure estimates; and (3) to improve this relationship using multiple regression analysis.
Screening for ketosis using multiple logistic regression based on milk yield and composition.
Kayano, Mitsunori; Kataoka, Tomoko
2015-11-01
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF - 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Zheljazkov, Valtcho D; Gawde, Archana; Cantrell, Charles L; Astatkie, Tess; Schlegel, Vicki
2015-01-01
A steam distillation extraction kinetics experiment was conducted to estimate essential oil yield, composition, antimalarial, and antioxidant capacity of cumin (Cuminum cyminum L.) seed (fruits). Furthermore, regression models were developed to predict essential oil yield and composition for a given duration of the steam distillation time (DT). Ten DT durations were tested in this study: 5, 7.5, 15, 30, 60, 120, 240, 360, 480, and 600 min. Oil yields increased with an increase in the DT. Maximum oil yield (content, 2.3 g/100 seed), was achieved at 480 min; longer DT did not increase oil yields. The concentrations of the major oil constituents α-pinene (0.14-0.5% concentration range), β-pinene (3.7-10.3% range), γ-cymene (5-7.3% range), γ-terpinene (1.8-7.2% range), cumin aldehyde (50-66% range), α-terpinen-7-al (3.8-16% range), and β-terpinen-7-al (12-20% range) varied as a function of the DT. The concentrations of α-pinene, β-pinene, γ-cymene, γ-terpinene in the oil increased with the increase of the duration of the DT; α-pinene was highest in the oil obtained at 600 min DT, β-pinene and γ-terpinene reached maximum concentrations in the oil at 360 min DT; γ-cymene reached a maximum in the oil at 60 min DT, cumin aldehyde was high in the oils obtained at 5-60 min DT, and low in the oils obtained at 240-600 min DT, α-terpinen-7-al reached maximum in the oils obtained at 480 or 600 min DT, whereas β-terpinen-7-al reached a maximum concentration in the oil at 60 min DT. The yield of individual oil constituents (calculated from the oil yields and the concentration of a given compound at a particular DT) increased and reached a maximum at 480 or 600 min DT. The antimalarial activity of the cumin seed oil obtained during the 0-5 and at 5-7.5 min DT timeframes was twice higher than the antimalarial activity of the oils obtained at the other DT. This study opens the possibility for distinct marketing and utilization for these improved oils. The antioxidant capacity of the oil was highest in the oil obtained at 30 min DT and lowest in the oil from 360 min DT. The Michaelis-Menton and the Power nonlinear regression models developed in this study can be utilized to predict essential oil yield and composition of cumin seed at any given duration of DT and may also be useful to compare previous reports on cumin oil yield and composition. DT can be utilized to obtain cumin seed oil with improved antimalarial activity, improved antioxidant capacity, and with various compositions.
Sasaki, O; Aihara, M; Nishiura, A; Takeda, H
2017-09-01
Trends in genetic correlations between longevity, milk yield, and somatic cell score (SCS) during lactation in cows are difficult to trace. In this study, changes in the genetic correlations between milk yield, SCS, and cumulative pseudo-survival rate (PSR) during lactation were examined, and the effect of milk yield and SCS information on the reliability of estimated breeding value (EBV) of PSR were determined. Test day milk yield, SCS, and PSR records were obtained for Holstein cows in Japan from 2004 to 2013. A random subset of the data was used for the analysis (825 herds, 205,383 cows). This data set was randomly divided into 5 subsets (162-168 herds, 83,389-95,854 cows), and genetic parameters were estimated in each subset independently. Data were analyzed using multiple-trait random regression animal models including either the residual effect for the whole lactation period (H0), the residual effects for 5 lactation stages (H5), or both of these residual effects (HD). Milk yield heritability increased until 310 to 351 d in milk (DIM) and SCS heritability increased until 330 to 344 DIM. Heritability estimates for PSR increased with DIM from 0.00 to 0.05. The genetic correlation between milk yield and SCS increased negatively to under -0.60 at 455 DIM. The genetic correlation between milk yield and PSR increased until 342 to 355 DIM (0.53-0.57). The genetic correlation between the SCS and PSR was -0.82 to -0.83 at around 180 DIM, and decreased to -0.65 to -0.71 at 455 DIM. The reliability of EBV of PSR for sires with 30 or more recorded daughters was 0.17 to 0.45 when the effects of correlated traits were ignored. The maximum reliability of EBV was observed at 257 (H0) or 322 (HD) DIM. When the correlations of PSR with milk yield and SCS were considered, the reliabilities of PSR estimates increased to 0.31-0.76. The genetic parameter estimates of H5 were the same as those for HD. The rank correlation coefficients of the EBV of PSR between H0 and H5 or HD were greater than 0.9. Additionally, the reliabilities of EBV of PSR of H0 were similar to those for H5 and HD. Therefore, the genetic parameter estimates in H0 were not substantially different from those in H5 and HD. When milk yield and SCS, which were genetically correlated with PSR, were used, the reliability of PSR increased. Estimates of the genetic correlations between PSR and milk yield and between PSR and SCS are useful for management and breeding decisions to extend the herd life of cows. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Critical configurations (determinantal loci) for range and range difference satellite networks
NASA Technical Reports Server (NTRS)
Tsimis, E.
1973-01-01
The observational modes of Geometric Satellite Geodesy are discussed. The geometrical analysis of the problem yielded a regression model for the adjustment of the observations along with a suitable and convenient metric for the least-squares criterion. The determinantal loci (critical configurations) for range networks are analyzed. An attempt is made to apply elements of the theory of variants for this purpose. The use of continuously measured range differences for loci determination is proposed.
Mooney, Joshua J; Hedlin, Haley; Mohabir, Paul K; Vazquez, Rodrigo; Nguyen, John; Ha, Richard; Chiu, Peter; Patel, Kapilkumar; Zamora, Martin R.; Weill, David; Nicolls, Mark R; Dhillon, Gundeep S
2016-01-01
While controlled donation after circulatory determination of death (cDCDD) donors could increase the supply of donor lungs within the United States, the yield of lungs from cDCDD donors remain low compared to donation after neurologic determination of death (DNDD) donors. To explore the reason for low lung yield from cDCDD donors, Scientific Registry of Transplant Recipient data were used to assess the impact of donor lung quality on cDCDD lung utilization by fitting a logistic regression model. The relationship between center volume and cDCDD use was assessed and distance between center and donor hospital was calculated by cDCDD status. Recipient survival was compared using a multivariable Cox regression model. Lung utilization was 2.1% for cDCDD donors and 21.4% for DNDD donors. Being a cDCDD donor decreased lung donation (adjusted OR 0.101, CI 0.085–0.120). A minority of centers have performed cDCDD transplant with higher volume centers generally performing more cDCDD transplants. There was no difference in center to donor distance or recipient survival (adjusted HR 1.03, CI 0.78–1.37) between cDCDD and DNDD transplants. cDCDD lungs are underutilized compared to DNDD lungs after adjusting for lung quality. Increasing transplant center expertise and commitment to cDCDD lung procurement is needed to improve utilization. PMID:26844673
Wear, Keith A; Nagaraja, Srinidhi; Dreher, Maureen L; Sadoughi, Saghi; Zhu, Shan; Keaveny, Tony M
2017-10-01
Clinical bone sonometers applied at the calcaneus measure broadband ultrasound attenuation and speed of sound. However, the relation of ultrasound measurements to bone strength is not well-characterized. Addressing this issue, we assessed the extent to which ultrasonic measurements convey in vitro mechanical properties in 25 human calcaneal cancellous bone specimens (approximately 2×4×2cm). Normalized broadband ultrasound attenuation, speed of sound, and broadband ultrasound backscatter were measured with 500kHz transducers. To assess mechanical properties, non-linear finite element analysis, based on micro-computed tomography images (34-micron cubic voxel), was used to estimate apparent elastic modulus, overall specimen stiffness, and apparent yield stress, with models typically having approximately 25-30 million elements. We found that ultrasound parameters were correlated with mechanical properties with R=0.70-0.82 (p<0.001). Multiple regression analysis indicated that ultrasound measurements provide additional information regarding mechanical properties beyond that provided by bone quantity alone (p≤0.05). Adding ultrasound variables to linear regression models based on bone quantity improved adjusted squared correlation coefficients from 0.65 to 0.77 (stiffness), 0.76 to 0.81 (apparent modulus), and 0.67 to 0.73 (yield stress). These results indicate that ultrasound can provide complementary (to bone quantity) information regarding mechanical behavior of cancellous bone. Published by Elsevier Inc.
Spencer, Monique E; Jain, Alka; Matteini, Amy; Beamer, Brock A; Wang, Nae-Yuh; Leng, Sean X; Punjabi, Naresh M; Walston, Jeremy D; Fedarko, Neal S
2010-08-01
Neopterin, a GTP metabolite expressed by macrophages, is a marker of immune activation. We hypothesize that levels of this serum marker alter with donor age, reflecting increased chronic immune activation in normal aging. In addition to age, we assessed gender, race, body mass index (BMI), and percentage of body fat (%fat) as potential covariates. Serum was obtained from 426 healthy participants whose age ranged from 18 to 87 years. Anthropometric measures included %fat and BMI. Neopterin concentrations were measured by competitive ELISA. The paired associations between neopterin and age, BMI, or %fat were analyzed by Spearman's correlation or by linear regression of log-transformed neopterin, whereas overall associations were modeled by multiple regression of log-transformed neopterin as a function of age, gender, race, BMI, %fat, and interaction terms. Across all participants, neopterin exhibited a positive association with age, BMI, and %fat. Multiple regression modeling of neopterin in women and men as a function of age, BMI, and race revealed that each covariate contributed significantly to neopterin values and that optimal modeling required an interaction term between race and BMI. The covariate %fat was highly correlated with BMI and could be substituted for BMI to yield similar regression coefficients. The association of age and gender with neopterin levels and their modification by race, BMI, or %fat reflect the biology underlying chronic immune activation and perhaps gender differences in disease incidence, morbidity, and mortality.
Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.
Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva
2018-02-12
Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.
Wilke, Marko
2018-02-01
This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter) from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1-75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI) were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender) as well as technical (field strength, data quality) predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php.
Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.
Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie
2016-04-01
Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.
Brouckaert, D; Uyttersprot, J-S; Broeckx, W; De Beer, T
2018-03-01
Calibration transfer or standardisation aims at creating a uniform spectral response on different spectroscopic instruments or under varying conditions, without requiring a full recalibration for each situation. In the current study, this strategy is applied to construct at-line multivariate calibration models and consequently employ them in-line in a continuous industrial production line, using the same spectrometer. Firstly, quantitative multivariate models are constructed at-line at laboratory scale for predicting the concentration of two main ingredients in hard surface cleaners. By regressing the Raman spectra of a set of small-scale calibration samples against their reference concentration values, partial least squares (PLS) models are developed to quantify the surfactant levels in the liquid detergent compositions under investigation. After evaluating the models performance with a set of independent validation samples, a univariate slope/bias correction is applied in view of transporting these at-line calibration models to an in-line manufacturing set-up. This standardisation technique allows a fast and easy transfer of the PLS regression models, by simply correcting the model predictions on the in-line set-up, without adjusting anything to the original multivariate calibration models. An extensive statistical analysis is performed in order to assess the predictive quality of the transferred regression models. Before and after transfer, the R 2 and RMSEP of both models is compared for evaluating if their magnitude is similar. T-tests are then performed to investigate whether the slope and intercept of the transferred regression line are not statistically different from 1 and 0, respectively. Furthermore, it is inspected whether no significant bias can be noted. F-tests are executed as well, for assessing the linearity of the transfer regression line and for investigating the statistical coincidence of the transfer and validation regression line. Finally, a paired t-test is performed to compare the original at-line model to the slope/bias corrected in-line model, using interval hypotheses. It is shown that the calibration models of Surfactant 1 and Surfactant 2 yield satisfactory in-line predictions after slope/bias correction. While Surfactant 1 passes seven out of eight statistical tests, the recommended validation parameters are 100% successful for Surfactant 2. It is hence concluded that the proposed strategy for transferring at-line calibration models to an in-line industrial environment via a univariate slope/bias correction of the predicted values offers a successful standardisation approach. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
2014-01-01
Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463
Rebich, Richard A; Houston, Natalie A; Mize, Scott V; Pearson, Daniel K; Ging, Patricia B; Evan Hornig, C
2011-01-01
Abstract SPAtially Referenced Regressions On Watershed attributes (SPARROW) models were developed to estimate nutrient inputs [total nitrogen (TN) and total phosphorus (TP)] to the northwestern part of the Gulf of Mexico from streams in the South-Central United States (U.S.). This area included drainages of the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf hydrologic regions. The models were standardized to reflect nutrient sources and stream conditions during 2002. Model predictions of nutrient loads (mass per time) and yields (mass per area per time) generally were greatest in streams in the eastern part of the region and along reaches near the Texas and Louisiana shoreline. The Mississippi River and Atchafalaya River watersheds, which drain nearly two-thirds of the conterminous U.S., delivered the largest nutrient loads to the Gulf of Mexico, as expected. However, the three largest delivered TN yields were from the Trinity River/Galveston Bay, Calcasieu River, and Aransas River watersheds, while the three largest delivered TP yields were from the Calcasieu River, Mermentau River, and Trinity River/Galveston Bay watersheds. Model output indicated that the three largest sources of nitrogen from the region were atmospheric deposition (42%), commercial fertilizer (20%), and livestock manure (unconfined, 17%). The three largest sources of phosphorus were commercial fertilizer (28%), urban runoff (23%), and livestock manure (confined and unconfined, 23%). PMID:22457582
Ebshish, Ali; Yaakob, Zahira; Taufiq-Yap, Yun Hin; Bshish, Ahmed
2014-01-01
In this work; a response surface methodology (RSM) was implemented to investigate the process variables in a hydrogen production system. The effects of five independent variables; namely the temperature (X1); the flow rate (X2); the catalyst weight (X3); the catalyst loading (X4) and the glycerol-water molar ratio (X5) on the H2 yield (Y1) and the conversion of glycerol to gaseous products (Y2) were explored. Using multiple regression analysis; the experimental results of the H2 yield and the glycerol conversion to gases were fit to quadratic polynomial models. The proposed mathematical models have correlated the dependent factors well within the limits that were being examined. The best values of the process variables were a temperature of approximately 600 °C; a feed flow rate of 0.05 mL/min; a catalyst weight of 0.2 g; a catalyst loading of 20% and a glycerol-water molar ratio of approximately 12; where the H2 yield was predicted to be 57.6% and the conversion of glycerol was predicted to be 75%. To validate the proposed models; statistical analysis using a two-sample t-test was performed; and the results showed that the models could predict the responses satisfactorily within the limits of the variables that were studied. PMID:28788567
Chen, Wei-Hsin; Hsu, Hung-Jen; Kumar, Gopalakrishnan; Budzianowski, Wojciech M; Ong, Hwai Chyuan
2017-12-01
This study focuses on the biochar formation and torrefaction performance of sugarcane bagasse, and they are predicted using the bilinear interpolation (BLI), inverse distance weighting (IDW) interpolation, and regression analysis. It is found that the biomass torrefied at 275°C for 60min or at 300°C for 30min or longer is appropriate to produce biochar as alternative fuel to coal with low carbon footprint, but the energy yield from the torrefaction at 300°C is too low. From the biochar yield, enhancement factor of HHV, and energy yield, the results suggest that the three methods are all feasible for predicting the performance, especially for the enhancement factor. The power parameter of unity in the IDW method provides the best predictions and the error is below 5%. The second order in regression analysis gives a more reasonable approach than the first order, and is recommended for the predictions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Dissolved Solids in Streams of the Conterminous United States
NASA Astrophysics Data System (ADS)
Anning, D. W.; Flynn, M.
2014-12-01
Studies have shown that excessive dissolved-solids concentrations in water can have adverse effects on the environment and on agricultural, municipal, and industrial water users. Such effects motivated the U.S. Geological Survey's National Water-Quality Assessment Program to develop a SPAtially-Referenced Regression on Watershed Attributes (SPARROW) model to improve the understanding of dissolved solids in streams of the United States. Using the SPARROW model, annual dissolved-solids loads from 2,560 water-quality monitoring stations were statistically related to several spatial datasets serving as surrogates for dissolved-solids sources and transport processes. Sources investigated in the model included geologic materials, road de-icers, urban lands, cultivated lands, and pasture lands. Factors affecting transport from these sources to streams in the model included climate, soil, vegetation, terrain, population, irrigation, and artificial-drainage characteristics. The SPARROW model was used to predict long-term mean annual conditions for dissolved-solids sources, loads, yields, and concentrations in about 66,000 stream reaches and corresponding incremental catchments nationwide. The estimated total amount of dissolved solids delivered to the Nation's streams is 272 million metric tons (Mt) annually, of which 194 million Mt (71%) are from geologic sources, 38 million Mt (14%) are from road de-icers, 18 million Mt (7%) are from pasture lands, 14 million Mt (5 %) are from urban lands, and 8 million Mt (3%) are from cultivated lands. The median incremental-catchment yield delivered to local streams is 26 metric tons per year per square kilometer [(Mt/yr)/km2]. Ten percent of the incremental catchments yield less than 4 (Mt/yr)/km2, and 10 percent yield more than 90 (Mt/yr)/km2. In 13% of the reaches, predicted flow-weighted concentrations exceed 500 mg/L—the U.S. Environmental Protection Agency secondary non-enforceable drinking-water standard.
NASA Astrophysics Data System (ADS)
Haack, Lukas; Peniche, Ricardo; Sommer, Lutz; Kather, Alfons
2017-06-01
At early project stages, the main CSP plant design parameters such as turbine capacity, solar field size, and thermal storage capacity are varied during the techno-economic optimization to determine most suitable plant configurations. In general, a typical meteorological year with at least hourly time resolution is used to analyze each plant configuration. Different software tools are available to simulate the annual energy yield. Software tools offering a thermodynamic modeling approach of the power block and the CSP thermal cycle, such as EBSILONProfessional®, allow a flexible definition of plant topologies. In EBSILON, the thermodynamic equilibrium for each time step is calculated iteratively (quasi steady state), which requires approximately 45 minutes to process one year with hourly time resolution. For better presentation of gradients, 10 min time resolution is recommended, which increases processing time by a factor of 5. Therefore, analyzing a large number of plant sensitivities, as required during the techno-economic optimization procedure, the detailed thermodynamic simulation approach becomes impracticable. Suntrace has developed an in-house CSP-Simulation tool (CSPsim), based on EBSILON and applying predictive models, to approximate the CSP plant performance for central receiver and parabolic trough technology. CSPsim significantly increases the speed of energy yield calculations by factor ≥ 35 and has automated the simulation run of all predefined design configurations in sequential order during the optimization procedure. To develop the predictive models, multiple linear regression techniques and Design of Experiment methods are applied. The annual energy yield and derived LCOE calculated by the predictive model deviates less than ±1.5 % from the thermodynamic simulation in EBSILON and effectively identifies the optimal range of main design parameters for further, more specific analysis.
Temperature Increase Reduces Global Yields of Major Crops in Four Independent Estimates
NASA Technical Reports Server (NTRS)
Zhao, Chuang; Liu, Bing; Piao, Shilong; Wang, Xuhui; Lobell, David B.; Huang, Yao; Huang, Mengtian; Yao, Yitong; Bassu, Simona; Ciais, Philippe;
2017-01-01
Wheat, rice, maize, and soybean provide two-thirds of human caloric intake. Assessing the impact of global temperature increase on production of these crops is therefore critical to maintaining global food supply, but different studies have yielded different results. Here, we investigated the impacts of temperature on yields of the four crops by compiling extensive published results from four analytical methods: global grid-based and local point-based models, statistical regressions, and field-warming experiments. Results from the different methods consistently showed negative temperature impacts on crop yield at the global scale, generally underpinned by similar impacts at country and site scales. Without CO2 fertilization, effective adaptation, and genetic improvement, each degree-Celsius increase in global mean temperature would, on average, reduce global yields of wheat by 6.0%, rice by 3.2%, maize by 7.4%, and soybean by 3.1%. Results are highly heterogeneous across crops and geographical areas, with some positive impact estimates. Multi-method analyses improved the confidence in assessments of future climate impacts on global major crops and suggest crop- and region-specific adaptation strategies to ensure food security for an increasing world population.
Temperature increase reduces global yields of major crops in four independent estimates
Zhao, Chuang; Piao, Shilong; Wang, Xuhui; Lobell, David B.; Huang, Yao; Huang, Mengtian; Yao, Yitong; Bassu, Simona; Ciais, Philippe; Durand, Jean-Louis; Elliott, Joshua; Ewert, Frank; Janssens, Ivan A.; Li, Tao; Lin, Erda; Liu, Qiang; Martre, Pierre; Peng, Shushi; Wallach, Daniel; Wang, Tao; Wu, Donghai; Liu, Zhuo; Zhu, Yan; Zhu, Zaichun; Asseng, Senthold
2017-01-01
Wheat, rice, maize, and soybean provide two-thirds of human caloric intake. Assessing the impact of global temperature increase on production of these crops is therefore critical to maintaining global food supply, but different studies have yielded different results. Here, we investigated the impacts of temperature on yields of the four crops by compiling extensive published results from four analytical methods: global grid-based and local point-based models, statistical regressions, and field-warming experiments. Results from the different methods consistently showed negative temperature impacts on crop yield at the global scale, generally underpinned by similar impacts at country and site scales. Without CO2 fertilization, effective adaptation, and genetic improvement, each degree-Celsius increase in global mean temperature would, on average, reduce global yields of wheat by 6.0%, rice by 3.2%, maize by 7.4%, and soybean by 3.1%. Results are highly heterogeneous across crops and geographical areas, with some positive impact estimates. Multimethod analyses improved the confidence in assessments of future climate impacts on global major crops and suggest crop- and region-specific adaptation strategies to ensure food security for an increasing world population. PMID:28811375
Temperature increase reduces global yields of major crops in four independent estimates.
Zhao, Chuang; Liu, Bing; Piao, Shilong; Wang, Xuhui; Lobell, David B; Huang, Yao; Huang, Mengtian; Yao, Yitong; Bassu, Simona; Ciais, Philippe; Durand, Jean-Louis; Elliott, Joshua; Ewert, Frank; Janssens, Ivan A; Li, Tao; Lin, Erda; Liu, Qiang; Martre, Pierre; Müller, Christoph; Peng, Shushi; Peñuelas, Josep; Ruane, Alex C; Wallach, Daniel; Wang, Tao; Wu, Donghai; Liu, Zhuo; Zhu, Yan; Zhu, Zaichun; Asseng, Senthold
2017-08-29
Wheat, rice, maize, and soybean provide two-thirds of human caloric intake. Assessing the impact of global temperature increase on production of these crops is therefore critical to maintaining global food supply, but different studies have yielded different results. Here, we investigated the impacts of temperature on yields of the four crops by compiling extensive published results from four analytical methods: global grid-based and local point-based models, statistical regressions, and field-warming experiments. Results from the different methods consistently showed negative temperature impacts on crop yield at the global scale, generally underpinned by similar impacts at country and site scales. Without CO 2 fertilization, effective adaptation, and genetic improvement, each degree-Celsius increase in global mean temperature would, on average, reduce global yields of wheat by 6.0%, rice by 3.2%, maize by 7.4%, and soybean by 3.1%. Results are highly heterogeneous across crops and geographical areas, with some positive impact estimates. Multimethod analyses improved the confidence in assessments of future climate impacts on global major crops and suggest crop- and region-specific adaptation strategies to ensure food security for an increasing world population.
Spectrally-Based Assessment of Crop Seasonal Performance and Yield
NASA Astrophysics Data System (ADS)
Kancheva, Rumiana; Borisova, Denitsa; Georgiev, Georgy
The rapid advances of space technologies concern almost all scientific areas from aeronautics to medicine, and a wide range of application fields from communications to crop yield predictions. Agricultural monitoring is among the priorities of remote sensing observations for getting timely information on crop development. Monitoring agricultural fields during the growing season plays an important role in crop health assessment and stress detection provided that reliable data is obtained. Successfully spreading is the implementation of hyperspectral data to precision farming associated with plant growth and phenology monitoring, physiological state assessment, and yield prediction. In this paper, we investigated various spectral-biophysical relationships derived from in-situ reflectance measurements. The performance of spectral data for the assessment of agricultural crops condition and yield prediction was examined. The approach comprisesd development of regression models between plant spectral and state-indicative variables such as biomass, vegetation cover fraction, leaf area index, etc., and development of yield forecasting models from single-date (growth stage) and multitemporal (seasonal) reflectance data. Verification of spectral predictions was performed through comparison with estimations from biophysical relationships between crop growth variables. The study was carried out for spring barley and winter wheat. Visible and near-infrared reflectance data was acquired through the whole growing season accompanied by detailed datasets on plant phenology and canopy structural and biochemical attributes. Empirical relationships were derived relating crop agronomic variables and yield to various spectral predictors. The study findings were tested using airborne remote sensing inputs. A good correspondence was found between predicted and actual (ground-truth) estimates
Distillation time effect on lavender essential oil yield and composition.
Zheljazkov, Valtcho D; Cantrell, Charles L; Astatkie, Tess; Jeliazkova, Ekaterina
2013-01-01
Lavender (Lavandula angustifolia Mill.) is one of the most widely grown essential oil crops in the world. Commercial extraction of lavender oil is done using steam distillation. The objective of this study was to evaluate the effect of the length of the distillation time (DT) on lavender essential oil yield and composition when extracted from dried flowers. Therefore, the following distillation times (DT) were tested in this experiment: 1.5 min, 3 min, 3.75 min, 7.5 min, 15 min, 30 min, 60 min, 90 min, 120 min, 150 min, 180 min, and 240 min. The essential oil yield (range 0.5-6.8%) reached a maximum at 60 min DT. The concentrations of cineole (range 6.4-35%) and fenchol (range 1.7-2.9%) were highest at the 1.5 min DT and decreased with increasing length of the DT. The concentration of camphor (range 6.6-9.2%) reached a maximum at 7.5-15 min DT, while the concentration of linalool acetate (range 15-38%) reached a maximum at 30 min DT. Results suggest that lavender essential oil yield may not increase after 60 min DT. The change in essential oil yield, and the concentrations of cineole, fenchol and linalool acetate as DT changes were modeled very well by the asymptotic nonlinear regression model. DT may be used to modify the chemical profile of lavender oil and to obtain oils with differential chemical profiles from the same lavender flowers. DT must be taken into consideration when citing or comparing reports on lavender essential oil yield and composition.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.
Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S
2011-09-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
NASA Astrophysics Data System (ADS)
Chen, Yanling; Gong, Adu; Li, Jing; Wang, Jingmei
2017-04-01
Accurate crop growth monitoring and yield predictive information are significant to improve the sustainable development of agriculture and ensure the security of national food. Remote sensing observation and crop growth simulation models are two new technologies, which have highly potential applications in crop growth monitoring and yield forecasting in recent years. However, both of them have limitations in mechanism or regional application respectively. Remote sensing information can not reveal crop growth and development, inner mechanism of yield formation and the affection of environmental meteorological conditions. Crop growth simulation models have difficulties in obtaining data and parameterization from single-point to regional application. In order to make good use of the advantages of these two technologies, the coupling technique of remote sensing information and crop growth simulation models has been studied. Filtering and optimizing model parameters are key to yield estimation by remote sensing and crop model based on regional crop assimilation. Winter wheat of GaoCheng was selected as the experiment object in this paper. And then the essential data was collected, such as biochemical data and farmland environmental data and meteorological data about several critical growing periods. Meanwhile, the image of environmental mitigation small satellite HJ-CCD was obtained. In this paper, research work and major conclusions are as follows. (1) Seven vegetation indexes were selected to retrieve LAI, and then linear regression model was built up between each of these indexes and the measured LAI. The result shows that the accuracy of EVI model was the highest (R2=0.964 at anthesis stage and R2=0.920 at filling stage). Thus, EVI as the most optimal vegetation index to predict LAI in this paper. (2) EFAST method was adopted in this paper to conduct the sensitive analysis to the 26 initial parameters of the WOFOST model and then a sensitivity index was constructed to evaluate the influence of each parameter mentioned above on the winter wheat yield formation. Finally, six parameters that sensitivity index more than 0.1 as sensitivity factors were chose, which are TSUM1, SLATB1, SLATB2, SPAN, EFFTB3 and TMPF4. To other parameters, we confirmed them via practical measurement and calculation, available literature or WOFOST default. Eventually, we completed the regulation of WOFOST parameters. (3) Look-up table algorithm was used to realize single-point yield estimation through the assimilation of the WOFOST model and the retrieval LAI. This simulation achieved a high accuracy which perfectly meet the purpose of assimilation (R2=0.941 and RMSE=194.58kg/hm2). In this paper, the optimum value of sensitivity parameters were confirmed and the estimation of single-point yield were finished. Key words: yield estimation of winter wheat, LAI, WOFOST crop growth model, assimilation
Phenotypic effects of subclinical paratuberculosis (Johne's disease) in dairy cattle.
Pritchard, Tracey C; Coffey, Mike P; Bond, Karen S; Hutchings, Mike R; Wall, Eileen
2017-01-01
The effect of subclinical paratuberculosis (or Johne's disease) risk status on performance, health, and fertility was studied in 58,096 UK Holstein-Friesian cows with 156,837 lactations across lactations 1 to 3. Low-, medium-, and high-risk group categories were allocated to cows determined by a minimum of 4 ELISA milk tests taken at any time during their lactating life. Lactation curves of daily milk, protein, and fat yields and protein and fat percentage, together with log e -transformed somatic cell count, were estimated using a random regression model to quantify differences between risk groups. The effect of subclinical paratuberculosis risk groups on fertility, lactation-average somatic cell count, and mastitis were analyzed using linear regression fitting risk group as a fixed effect. Milk yield losses associated with high-risk cows compared with low-risk cows in lactations 1, 2, and 3 for mean daily yield were 0.34, 1.05, and 1.61kg; likewise, accumulated 305-d yields were 103, 316, and 485kg, respectively. The total loss was 904kg over the first 3 lactations. Protein and fat yield losses associated with high-risk cows were significant, but primarily a feature of decreasing milk yield. Similar trends were observed for both test-day and lactation-average somatic cell count measures with higher somatic cell counts from medium- and high-risk cows compared with low-risk cows, and differences were in almost all cases significant. Likewise, mastitis incidence was significantly higher in high-risk cows compared with low-risk cows in lactations 2 and 3. Whereas the few significant differences between risk groups among fertility traits were inconsistent with no clear trend. These results are expected to be conservative, as some animals that were considered negative may become positive after the timeframe of this study, particularly if the animal was tested when relatively young. However, the magnitude of milk yield losses together with higher somatic cell counts and an increase in mastitis incidence should motivate farmers to implement the appropriate control measures to reduce the spread of the disease. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Relationship between compatibilizer and yield strength of PLA/PP Blend
NASA Astrophysics Data System (ADS)
Jariyakulsith, Pattanun; Puajindanetr, Somchai
2018-01-01
The aim of this research is to study the relationship between compatibilizer and yield strength of polylactic acid (PLA) and polypropylene (PP) blend. The PLA is blended with PP (PLA/PP) at the ratios of 70/30, 50/50 and 30/70. In addition, (1) polypropylene grafted maleic anhydride (PP-g-MAH) as a compatibilizer at 0.3 and 0.7 part per hundred of PLA/PP resin (phr) and (2) dicumyl peroxide (DCP) being an initiator at 0.03 and 0.07 phr are added in each composition. Yield strength is characterized to study the interaction between compatibilizer, initiator and yield strength by using experimental design of multilevel full factorial. The results show that (1) the yield strength of PLA/PP blend are increased after addition of compatibilizer. Because the adding of PP-g-MAH and DCP resulted in improving compatibility between PLA and PP. (2) there are interaction between PP-g-MAH and DCP that have affected the final properties of PLA/PP blend. The highest yield strength of 27.68 MPa is provided at the ratio of 70/30 blend by using the 0.3 phr of PP-g-MAH and 0.03 phr of DCP. Linear regression model is fitted and follow the assumptions of normal distribution.
Dudley, Robert W.
2015-12-03
The largest average errors of prediction are associated with regression equations for the lowest streamflows derived for months during which the lowest streamflows of the year occur (such as the 5 and 1 monthly percentiles for August and September). The regression equations have been derived on the basis of streamflow and basin characteristics data for unregulated, rural drainage basins without substantial streamflow or drainage modifications (for example, diversions and (or) regulation by dams or reservoirs, tile drainage, irrigation, channelization, and impervious paved surfaces), therefore using the equations for regulated or urbanized basins with substantial streamflow or drainage modifications will yield results of unknown error. Input basin characteristics derived using techniques or datasets other than those documented in this report or using values outside the ranges used to develop these regression equations also will yield results of unknown error.
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.
Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data
Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
2018-01-01
Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160
Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan
2014-09-01
Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-01-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-12-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Forecasting the remaining reservoir capacity in the Laurentian Great Lakes watershed
NASA Astrophysics Data System (ADS)
Alighalehbabakhani, Fatemeh; Miller, Carol J.; Baskaran, Mark; Selegean, James P.; Barkach, John H.; Dahl, Travis; Abkenar, Seyed Mohsen Sadatiyan
2017-12-01
Sediment accumulation behind a dam is a significant factor in reservoir operation and watershed management. There are many dams located within the Laurentian Great Lakes watershed whose operations have been adversely affected by excessive reservoir sedimentation. Reservoir sedimentation effects include reduction of flood control capability and limitations to both water supply withdrawals and power generation due to reduced reservoir storage. In this research, the sediment accumulation rates of twelve reservoirs within the Great Lakes watershed were evaluated using the Soil and Water Assessment Tool (SWAT). The estimated sediment accumulation rates by SWAT were compared to estimates relying on radionuclide dating of sediment cores and bathymetric survey methods. Based on the sediment accumulation rate, the remaining reservoir capacity for each study site was estimated. Evaluation of the anthropogenic impacts including land use change and dam construction on the sediment yield were assessed in this research. The regression analysis was done on the current and pre-European settlement sediment yield for the modeled watersheds to predict the current and natural sediment yield in un-modeled watersheds. These eleven watersheds are in the state of Indiana, Michigan, Ohio, New York, and Wisconsin.
Olivoto, T; Nardino, M; Carvalho, I R; Follmann, D N; Ferrari, M; Szareski, V J; de Pelegrin, A J; de Souza, V Q
2017-03-22
Methodologies using restricted maximum likelihood/best linear unbiased prediction (REML/BLUP) in combination with sequential path analysis in maize are still limited in the literature. Therefore, the aims of this study were: i) to use REML/BLUP-based procedures in order to estimate variance components, genetic parameters, and genotypic values of simple maize hybrids, and ii) to fit stepwise regressions considering genotypic values to form a path diagram with multi-order predictors and minimum multicollinearity that explains the relationships of cause and effect among grain yield-related traits. Fifteen commercial simple maize hybrids were evaluated in multi-environment trials in a randomized complete block design with four replications. The environmental variance (78.80%) and genotype-vs-environment variance (20.83%) accounted for more than 99% of the phenotypic variance of grain yield, which difficult the direct selection of breeders for this trait. The sequential path analysis model allowed the selection of traits with high explanatory power and minimum multicollinearity, resulting in models with elevated fit (R 2 > 0.9 and ε < 0.3). The number of kernels per ear (NKE) and thousand-kernel weight (TKW) are the traits with the largest direct effects on grain yield (r = 0.66 and 0.73, respectively). The high accuracy of selection (0.86 and 0.89) associated with the high heritability of the average (0.732 and 0.794) for NKE and TKW, respectively, indicated good reliability and prospects of success in the indirect selection of hybrids with high-yield potential through these traits. The negative direct effect of NKE on TKW (r = -0.856), however, must be considered. The joint use of mixed models and sequential path analysis is effective in the evaluation of maize-breeding trials.
Evaluation of Projected Agricultural Climate Risk over the Contiguous US
NASA Astrophysics Data System (ADS)
Zhu, X.; Troy, T. J.; Devineni, N.
2017-12-01
Food demands are rising due to an increasing population with changing food preferences, which places pressure on agricultural production. Additionally, climate extremes have recently highlighted the vulnerability of our agricultural system to climate variability. This study seeks to fill two important gaps in current knowledge: how does the widespread response of irrigated crops differ from rainfed and how can we best account for uncertainty in yield responses. We developed a stochastic approach to evaluate climate risk quantitatively to better understand the historical impacts of climate change and estimate the future impacts it may bring about to agricultural system. Our model consists of Bayesian regression, distribution fitting, and Monte Carlo simulation to simulate rainfed and irrigated crop yields at the US county level. The model was fit using historical data for 1970-2010 and was then applied over different climate regions in the contiguous US using the CMIP5 climate projections. The relative importance of many major growing season climate indices, such as consecutive dry days without rainfall or heavy precipitation, was evaluated to determine what climate indices play a role in affecting future crop yields. The statistical modeling framework also evaluated the impact of irrigation by using county-level irrigated and rainfed yields separately. Furthermore, the projected years with negative yield anomalies were specifically evaluated in terms of magnitude, trend and potential climate drivers. This framework provides estimates of the agricultural climate risk for the 21st century that account for the full uncertainty of climate occurrences, range of crop response, and spatial correlation in climate. The results of this study can contribute to decision making about crop choice and water use in an uncertain future climate.
Poynter, Jenny N; Ross, Julie A; Hooten, Anthony J; Langer, Erica; Blommer, Crystal; Spector, Logan G
2013-08-12
Collection of high-quality DNA is essential for molecular epidemiology studies. Methods have been evaluated for optimal DNA collection in studies of adults; however, DNA collection in young children poses additional challenges. Here, we have evaluated predictors of DNA quantity in buccal cells collected for population-based studies of infant leukemia (N = 489 mothers and 392 children) and hepatoblastoma (HB; N = 446 mothers and 412 children) conducted through the Children's Oncology Group. DNA samples were collected by mail using mouthwash (for mothers and some children) and buccal brush (for children) collection kits and quantified using quantitative real-time PCR. Multivariable linear regression models were used to identify predictors of DNA yield. Median DNA yield was higher for mothers in both studies compared with their children (14 μg vs. <1 μg). Significant predictors of DNA yield in children included case-control status (β = -0.69, 50% reduction, P = 0.01 for case vs. control children), brush collection type, and season of sample collection. Demographic factors were not strong predictors of DNA yield in mothers or children in this analysis. The association with seasonality suggests that conditions during transport may influence DNA yield. The low yields observed in most children in these studies highlight the importance of developing alternative methods for DNA collection in younger age groups.
Heterogeneity in hedonic modelling of house prices: looking at buyers' household profiles
NASA Astrophysics Data System (ADS)
Kestens, Yan; Thériault, Marius; Des Rosiers, François
2006-03-01
This paper introduces household-level data into hedonic models in order to measure the heterogeneity of implicit prices regarding household type, age, educational attainment, income, and the previous tenure status of the buyers. Two methods are used for this purpose: a first series of models uses expansion terms, whereas a second series applies Geographically Weighted Regressions. Both methods yield conclusive results, showing that the marginal value given to certain property specifics and location attributes do vary regarding the characteristics of the buyer’s household. Particularly, major findings concern the significant effect of income on the location rent as well as the premium paid by highly-educated households in order to fulfil social homogeneity.
Wathes, D C; Bourne, N; Cheng, Z; Mann, G E; Taylor, V J; Coffey, M P
2007-03-01
Results from 4 studies were combined (representing a total of 500 lactations) to investigate the relationships between metabolic parameters and fertility in dairy cows. Information was collected on blood metabolic traits and body condition score at 1 to 2 wk prepartum and at 2, 4, and 7 wk postpartum. Fertility traits were days to commencement of luteal activity, days to first service, days to conception, and failure to conceive. Primiparous and multiparous cows were considered separately. Initial linear regression analyses were used to determine relationships among fertility, metabolic, and endocrine traits at each time point. All metabolic and endocrine traits significantly related to fertility were included in stepwise multiple regression analyses alone (model 1), including peak milk yield and interval to commencement of luteal activity (model 2), and with the further addition of dietary group (model 3). In multiparous cows, extended calving to conception intervals were associated prepartum with greater concentrations of leptin and lesser concentrations of nonesterified fatty acids and urea, and postpartum with reduced insulin-like growth factor-I at 2 wk, greater urea at 7 wk, and greater peak milk yield. In primiparous cows, extended calving to conception intervals were associated with more body condition and more urea prepartum, elevated urea postpartum, and more body condition loss by 7 wk. In conclusion, some metabolic measurements were associated with poorer fertility outcomes. Relationships between fertility and metabolic and endocrine traits varied both according to the lactation number of the cow and with the time relative to calving.
Screening for ketosis using multiple logistic regression based on milk yield and composition
KAYANO, Mitsunori; KATAOKA, Tomoko
2015-01-01
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (P<0.05) for the diagnosis of ketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (P<0.01). A diagnostic rule was constructed for each group of cows: (1) 9.978 × P/F ratio + 0.085 × milk yield <10 and (2) 2.327 × SNF − 2.703 × lactose + 0.225 × MUN <10. The sensitivity, specificity and the area under the curve (AUC) of the diagnostic rules were (1) 0.800, 0.729 and 0.811; (2) 0.813, 0.730 and 0.787, respectively. The P/F ratio, which is a widely used measure of ketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively. PMID:26074408
Kauhanen, Heikki; Komi, Paavo V; Häkkinen, Keijo
2002-02-01
The problems in comparing the performances of Olympic weightlifters arise from the fact that the relationship between body weight and weightlifting results is not linear. In the present study, this relationship was examined by using a nonparametric curve fitting technique of robust locally weighted regression (LOWESS) on relatively large data sets of the weightlifting results made in top international competitions. Power function formulas were derived from the fitted LOWESS values to represent the relationship between the 2 variables in a way that directly compares the snatch, clean-and-jerk, and total weightlifting results of a given athlete with those of the world-class weightlifters (golden standards). A residual analysis of several other parametric models derived from the initial results showed that they all experience inconsistencies, yielding either underestimation or overestimation of certain body weights. In addition, the existing handicapping formulas commonly used in normalizing the performances of Olympic weightlifters did not yield satisfactory results when applied to the present data. It was concluded that the devised formulas may provide objective means for the evaluation of the performances of male weightlifters, regardless of their body weights, ages, or performance levels.
Relationships between milk culture results and milk yield in Norwegian dairy cattle.
Reksen, O; Sølverød, L; Østerås, O
2007-10-01
Associations between test-day milk yield and positive milk cultures for Staphylococcus aureus, Streptococcus spp., and other mastitis pathogens or a negative milk culture for mastitis pathogens were assessed in quarter milk samples from randomly sampled cows selected without regard to current or previous udder health status. Staphylococcus aureus was dichotomized according to sparse (< or =1,500 cfu/mL of milk) or rich (>1,500 cfu/mL of milk) growth of the bacteria. Quarter milk samples were obtained on 1 to 4 occasions from 2,740 cows in 354 Norwegian dairy herds, resulting in a total of 3,430 samplings. Measures of test-day milk yield were obtained monthly and related to 3,547 microbiological diagnoses at the cow level. Mixed model linear regression models incorporating an autoregressive covariance structure accounting for repeated test-day milk yields within cow and random effects at the herd and sample level were used to quantify the effect of positive milk cultures on test-day milk yields. Identical models were run separately for first-parity, second-parity, and third-parity or older cows. Fixed effects were days in milk, the natural logarithm of days in milk, sparse and rich growth of Staph. aureus (1/0), Streptococcus spp. (1/0), other mastitis pathogens (1/0), calving season, time of test-day milk yields relative to time of microbiological diagnosis (test day relative to time of diagnosis), and the interaction terms between microbiological diagnosis and test day relative to time of diagnosis. The models were run with the logarithmically transformed composite milk somatic cell count excluded and included. Rich growth of Staph. aureus was associated with decreased production levels in first-parity cows. An interaction between rich growth of Staph. aureus and test day relative to time of diagnosis also predicted a decline in milk production in third-parity or older cows. Interaction between sparse growth of Staph. aureus and test day relative to time of diagnosis predicted declining test-day milk yields in first-parity cows. Sparse growth of Staph. aureus was associated with high milk yields in third-parity or older cows after including the logarithmically transformed composite milk somatic cell count in the model, which illustrates that lower production levels are related to elevated somatic cell counts in high-producing cows. The same association with test-day milk yield was found among Streptococcus spp.-positive pluriparous cows.
An Integrated Analysis of the Physiological Effects of Space Flight: Executive Summary
NASA Technical Reports Server (NTRS)
Leonard, J. I.
1985-01-01
A large array of models were applied in a unified manner to solve problems in space flight physiology. Mathematical simulation was used as an alternative way of looking at physiological systems and maximizing the yield from previous space flight experiments. A medical data analysis system was created which consist of an automated data base, a computerized biostatistical and data analysis system, and a set of simulation models of physiological systems. Five basic models were employed: (1) a pulsatile cardiovascular model; (2) a respiratory model; (3) a thermoregulatory model; (4) a circulatory, fluid, and electrolyte balance model; and (5) an erythropoiesis regulatory model. Algorithms were provided to perform routine statistical tests, multivariate analysis, nonlinear regression analysis, and autocorrelation analysis. Special purpose programs were prepared for rank correlation, factor analysis, and the integration of the metabolic balance data.
NASA Astrophysics Data System (ADS)
Wübbeler, Gerd; Bodnar, Olha; Elster, Clemens
2018-02-01
Weighted least-squares estimation is commonly applied in metrology to fit models to measurements that are accompanied with quoted uncertainties. The weights are chosen in dependence on the quoted uncertainties. However, when data and model are inconsistent in view of the quoted uncertainties, this procedure does not yield adequate results. When it can be assumed that all uncertainties ought to be rescaled by a common factor, weighted least-squares estimation may still be used, provided that a simple correction of the uncertainty obtained for the estimated model is applied. We show that these uncertainties and credible intervals are robust, as they do not rely on the assumption of a Gaussian distribution of the data. Hence, common software for weighted least-squares estimation may still safely be employed in such a case, followed by a simple modification of the uncertainties obtained by that software. We also provide means of checking the assumptions of such an approach. The Bayesian regression procedure is applied to analyze the CODATA values for the Planck constant published over the past decades in terms of three different models: a constant model, a straight line model and a spline model. Our results indicate that the CODATA values may not have yet stabilized.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Hoos, Anne B.; Terziotti, Silvia; McMahon, Gerard; Savvas, Katerina; Tighe, Kirsten C.; Alkons-Wolinsky, Ruth
2008-01-01
This report presents and describes the digital datasets that characterize nutrient source inputs, environmental characteristics, and instream nutrient loads for the purpose of calibrating and applying a nutrient water-quality model for the southeastern United States for 2002. The model area includes all of the river basins draining to the south Atlantic and the eastern Gulf of Mexico, as well as the Tennessee River basin (referred to collectively as the SAGT area). The water-quality model SPARROW (SPAtially-Referenced Regression On Watershed attributes), developed by the U.S. Geological Survey, uses a regression equation to describe the relation between watershed attributes (predictors) and measured instream loads (response). Watershed attributes that are considered to describe nutrient input conditions and are tested in the SPARROW model for the SAGT area as source variables include atmospheric deposition, fertilizer application to farmland, manure from livestock production, permitted wastewater discharge, and land cover. Watershed and channel attributes that are considered to affect rates of nutrient transport from land to water and are tested in the SAGT SPARROW model as nutrient-transport variables include characteristics of soil, landform, climate, reach time of travel, and reservoir hydraulic loading. Datasets with estimates of each of these attributes for each individual reach or catchment in the reach-catchment network are presented in this report, along with descriptions of methods used to produce them. Measurements of nutrient water quality at stream monitoring sites from a combination of monitoring programs were used to develop observations of the response variable - mean annual nitrogen or phosphorus load - in the SPARROW regression equation. Instream load of nitrogen and phosphorus was estimated using bias-corrected log-linear regression models using the program Fluxmaster, which provides temporally detrended estimates of long-term mean load well-suited for spatial comparisons. The detrended, or normalized, estimates of load are useful for regional-scale assessments but should be used with caution for local-scale interpretations, for which use of loads estimated for actual time periods and employing more detailed regression analysis is suggested. The mean value of the nitrogen yield estimates, normalized to 2002, for 637 stations in the SAGT area is 4.7 kilograms per hectare; the mean value of nitrogen flow-weighted mean concentration is 1.2 milligrams per liter. The mean value of the phosphorus yield estimates, normalized to 2002, for the 747 stations in the SAGT area is 0.66 kilogram per hectare; the mean value of phosphorus flow-weighted mean concentration is 0.17 milligram per liter. Nutrient conditions measured in streams affected by substantial influx or outflux of water and nutrient mass across surface-water basin divides do not reflect nutrient source and transport conditions in the topographic watershed; therefore, inclusion of such streams in the SPARROW modeling approach is considered inappropriate. River basins identified with this concern include south Florida (where surface-water flow paths have been extensively altered) and the Oklawaha, Crystal, Lower Sante Fe, Lower Suwanee, St. Marks, and Chipola River basins in central and northern Florida (where flow exchange with the underlying regional aquifer may represent substantial nitrogen influx to and outflux from the surface-water basins).
Son, Yeongkwon; Osornio-Vargas, Álvaro R; O'Neill, Marie S; Hystad, Perry; Texcalac-Sangrador, José L; Ohman-Strickland, Pamela; Meng, Qingyu; Schwander, Stephan
2018-05-17
The Mexico City Metropolitan Area (MCMA) is one of the largest and most populated urban environments in the world and experiences high air pollution levels. To develop models that estimate pollutant concentrations at fine spatiotemporal scales and provide improved air pollution exposure assessments for health studies in Mexico City. We developed finer spatiotemporal land use regression (LUR) models for PM 2.5 , PM 10 , O 3 , NO 2 , CO and SO 2 using mixed effect models with the Least Absolute Shrinkage and Selection Operator (LASSO). Hourly traffic density was included as a temporal variable besides meteorological and holiday variables. Models of hourly, daily, monthly, 6-monthly and annual averages were developed and evaluated using traditional and novel indices. The developed spatiotemporal LUR models yielded predicted concentrations with good spatial and temporal agreements with measured pollutant levels except for the hourly PM 2.5 , PM 10 and SO 2 . Most of the LUR models met performance goals based on the standardized indices. LUR models with temporal scales greater than one hour were successfully developed using mixed effect models with LASSO and showed superior model performance compared to earlier LUR models, especially for time scales of a day or longer. The newly developed LUR models will be further refined with ongoing Mexico City air pollution sampling campaigns to improve personal exposure assessments. Copyright © 2018. Published by Elsevier B.V.
A Remote Sensing-Derived Corn Yield Assessment Model
NASA Astrophysics Data System (ADS)
Shrestha, Ranjay Man
Agricultural studies and food security have become critical research topics due to continuous growth in human population and simultaneous shrinkage in agricultural land. In spite of modern technological advancements to improve agricultural productivity, more studies on crop yield assessments and food productivities are still necessary to fulfill the constantly increasing food demands. Besides human activities, natural disasters such as flood and drought, along with rapid climate changes, also inflect an adverse effect on food productivities. Understanding the impact of these disasters on crop yield and making early impact estimations could help planning for any national or international food crisis. Similarly, the United States Department of Agriculture (USDA) Risk Management Agency (RMA) insurance management utilizes appropriately estimated crop yield and damage assessment information to sustain farmers' practice through timely and proper compensations. Through County Agricultural Production Survey (CAPS), the USDA National Agricultural Statistical Service (NASS) uses traditional methods of field interviews and farmer-reported survey data to perform annual crop condition monitoring and production estimations at the regional and state levels. As these manual approaches of yield estimations are highly inefficient and produce very limited samples to represent the entire area, NASS requires supplemental spatial data that provides continuous and timely information on crop production and annual yield. Compared to traditional methods, remote sensing data and products offer wider spatial extent, more accurate location information, higher temporal resolution and data distribution, and lower data cost--thus providing a complementary option for estimation of crop yield information. Remote sensing derived vegetation indices such as Normalized Difference Vegetation Index (NDVI) provide measurable statistics of potential crop growth based on the spectral reflectance and could be further associated with the actual yield. Utilizing satellite remote sensing products, such as daily NDVI derived from Moderate Resolution Imaging Spectroradiometer (MODIS) at 250 m pixel size, the crop yield estimation can be performed at a very fine spatial resolution. Therefore, this study examined the potential of these daily NDVI products within agricultural studies and crop yield assessments. In this study, a regression-based approach was proposed to estimate the annual corn yield through changes in MODIS daily NDVI time series. The relationship between daily NDVI and corn yield was well defined and established, and as changes in corn phenology and yield were directly reflected by the changes in NDVI within the growing season, these two entities were combined to develop a relational model. The model was trained using 15 years (2000-2014) of historical NDVI and county-level corn yield data for four major corn producing states: Kansas, Nebraska, Iowa, and Indiana, representing four climatic regions as South, West North Central, East North Central, and Central, respectively, within the U.S. Corn Belt area. The model's goodness of fit was well defined with a high coefficient of determination (R2>0.81). Similarly, using 2015 yield data for validation, 92% of average accuracy signified the performance of the model in estimating corn yield at county level. Besides providing the county-level corn yield estimations, the derived model was also accurate enough to estimate the yield at finer spatial resolution (field level). The model's assessment accuracy was evaluated using the randomly selected field level corn yield within the study area for 2014, 2015, and 2016. A total of over 120 plot level corn yield were used for validation, and the overall average accuracy was 87%, which statistically justified the model's capability to estimate plot-level corn yield. Additionally, the proposed model was applied to the impact estimation by examining the changes in corn yield due to flood events during the growing season. Using a 2011 Missouri River flood event as a case study, field-level flood impact map on corn yield throughout the flooded regions was produced and an overall agreement of over 82.2% was achieved when compared with the reference impact map. The future research direction of this dissertation research would be to examine other major crops outside the Corn Belt region of the U.S.
On Relevance of Codon Usage to Expression of Synthetic and Natural Genes in Escherichia coli
Supek, Fran; Šmuc, Tomislav
2010-01-01
A recent investigation concluded that codon bias did not affect expression of green fluorescent protein (GFP) variants in Escherichia coli, while stability of an mRNA secondary structure near the 5′ end played a dominant role. We demonstrate that combining the two variables using regression trees or support vector regression yields a biologically plausible model with better support in the GFP data set and in other experimental data: codon usage is relevant for protein levels if the 5′ mRNA structures are not strong. Natural E. coli genes had weaker 5′ mRNA structures than the examined set of GFP variants and did not exhibit a correlation between the folding free energy of 5′ mRNA structures and protein expression. PMID:20421604
Leffondré, Karen; Abrahamowicz, Michal; Siemiatycki, Jack
2003-12-30
Case-control studies are typically analysed using the conventional logistic model, which does not directly account for changes in the covariate values over time. Yet, many exposures may vary over time. The most natural alternative to handle such exposures would be to use the Cox model with time-dependent covariates. However, its application to case-control data opens the question of how to manipulate the risk sets. Through a simulation study, we investigate how the accuracy of the estimates of Cox's model depends on the operational definition of risk sets and/or on some aspects of the time-varying exposure. We also assess the estimates obtained from conventional logistic regression. The lifetime experience of a hypothetical population is first generated, and a matched case-control study is then simulated from this population. We control the frequency, the age at initiation, and the total duration of exposure, as well as the strengths of their effects. All models considered include a fixed-in-time covariate and one or two time-dependent covariate(s): the indicator of current exposure and/or the exposure duration. Simulation results show that none of the models always performs well. The discrepancies between the odds ratios yielded by logistic regression and the 'true' hazard ratio depend on both the type of the covariate and the strength of its effect. In addition, it seems that logistic regression has difficulty separating the effects of inter-correlated time-dependent covariates. By contrast, each of the two versions of Cox's model systematically induces either a serious under-estimation or a moderate over-estimation bias. The magnitude of the latter bias is proportional to the true effect, suggesting that an improved manipulation of the risk sets may eliminate, or at least reduce, the bias. Copyright 2003 JohnWiley & Sons, Ltd.
Savary, Serge; Delbac, Lionel; Rochas, Amélie; Taisant, Guillaume; Willocquet, Laetitia
2009-08-01
Dual epidemics are defined as epidemics developing on two or several plant organs in the course of a cropping season. Agricultural pathosystems where such epidemics develop are often very important, because the harvestable part is one of the organs affected. These epidemics also are often difficult to manage, because the linkage between epidemiological components occurring on different organs is poorly understood, and because prediction of the risk toward the harvestable organs is difficult. In the case of downy mildew (DM) and powdery mildew (PM) of grapevine, nonlinear modeling and logistic regression indicated nonlinearity in the foliage-cluster relationships. Nonlinear modeling enabled the parameterization of a transmission coefficient that numerically links the two components, leaves and clusters, in DM and PM epidemics. Logistic regression analysis yielded a series of probabilistic models that enabled predicting preset levels of cluster infection risks based on DM and PM severities on the foliage at successive crop stages. The usefulness of this framework for tactical decision-making for disease control is discussed.
Logistic Stick-Breaking Process
Ren, Lu; Du, Lan; Carin, Lawrence; Dunson, David B.
2013-01-01
A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. PMID:25258593
Wang, Zhao Dan; Li, Li Hua; Xia, Hui; Wang, Feng; Yang, Li Gang; Wang, Shao Kang; Sun, Gui Ju
2018-01-01
Oil extraction from onion was performed by steam distillation. Response surface methodology was applied to evaluate the effects of ratio of water to raw material, extraction time, zymolysis temperature and distillation times on yield of onion oil. The maximum extraction yield (1.779%) was obtained as following conditions: ratio of water to raw material was 1, extraction time was 2.5 h, zymolysis temperature was 36° and distillation time was 2.6 h. The experimental values agreed well with those predicted by regression model. The chemical composition of extracted onion oil under the optimum conditions was analysed by gas chromatography-mass spectrometry technology. The results showed that sulphur compounds, like alkanes, sulphide, alkenes, ester and alcohol, were the major components of onion oil.
Wang, Hong-wu; Liu, Yan-qing; Wang, Yuan-hong
2011-07-01
To investigate the ultrasonic-assisted extract on of total flavonoids from leaves of the Artocarpus heterophyllus. Investigated the effects of ethanol concentration, extraction time, and liquid-solid ratio on flavonoids yield. A 17-run response surface design involving three factors at three levels was generated by the Design-Expert software and experimental data obtained were subjected to quadratic regression analysis to create a mathematical model describing flavonoids extraction. The optimum ultrasonic assisted extraction conditions were: ethanol volume fraction 69.4% and liquid-solid ratio of 22.6:1 for 32 min. Under these optimized conditions, the yield of flavonoids was 7.55 mg/g. The Box-Behnken design and response surface analysis can well optimize the ultrasonic-assisted extraction of total flavonoids from Artocarpus heterophyllus.
The effects of heat stress in Italian Holstein dairy cattle.
Bernabucci, U; Biffani, S; Buggiotti, L; Vitali, A; Lacetera, N; Nardone, A
2014-01-01
The data set for this study comprised 1,488,474 test-day records for milk, fat, and protein yields and fat and protein percentages from 191,012 first-, second-, and third-parity Holstein cows from 484 farms. Data were collected from 2001 through 2007 and merged with meteorological data from 35 weather stations. A linear model (M1) was used to estimate the effects of the temperature-humidity index (THI) on production traits. Least squares means from M1 were used to detect the THI thresholds for milk production in all parities by using a 2-phase linear regression procedure (M2). A multiple-trait repeatability test-model (M3) was used to estimate variance components for all traits and a dummy regression variable (t) was defined to estimate the production decline caused by heat stress. Additionally, the estimated variance components and M3 were used to estimate traditional and heat-tolerance breeding values (estimated breeding values, EBV) for milk yield and protein percentages at parity 1. An analysis of data (M2) indicated that the daily THI at which milk production started to decline for the 3 parities and traits ranged from 65 to 76. These THI values can be achieved with different temperature/humidity combinations with a range of temperatures from 21 to 36°C and relative humidity values from 5 to 95%. The highest negative effect of THI was observed 4 d before test day over the 3 parities for all traits. The negative effect of THI on production traits indicates that first-parity cows are less sensitive to heat stress than multiparous cows. Over the parities, the general additive genetic variance decreased for protein content and increased for milk yield and fat and protein yield. Additive genetic variance for heat tolerance showed an increase from the first to third parity for milk, protein, and fat yield, and for protein percentage. Genetic correlations between general and heat stress effects were all unfavorable (from -0.24 to -0.56). Three EBV per trait were calculated for each cow and bull (traditional EBV, traditional EBV estimated with the inclusion of THI covariate effect, and heat tolerance EBV) and the rankings of EBV for 283 bulls born after 1985 with at least 50 daughters were compared. When THI was included in the model, the ranking for 17 and 32 bulls changed for milk yield and protein percentage, respectively. The heat tolerance genetic component is not negligible, suggesting that heat tolerance selection should be included in the selection objectives. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Computational tools for exact conditional logistic regression.
Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P
Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Lee, Kang Il
2012-08-01
The present study aims to provide insight into the relationships of the phase velocity with the microarchitectural parameters in bovine trabecular bone in vitro. The frequency-dependent phase velocity was measured in 22 bovine femoral trabecular bone samples by using a pair of transducers with a diameter of 25.4 mm and a center frequency of 0.5 MHz. The phase velocity exhibited positive correlation coefficients of 0.48 and 0.32 with the ratio of bone volume to total volume and the trabecular thickness, respectively, but a negative correlation coefficient of -0.62 with the trabecular separation. The best univariate predictor of the phase velocity was the trabecular separation, yielding an adjusted squared correlation coefficient of 0.36. The multivariate regression models yielded adjusted squared correlation coefficients of 0.21-0.36. The theoretical phase velocity predicted by using a stratified model for wave propagation in periodically stratified media consisting of alternating parallel solid-fluid layers showed reasonable agreements with the experimental measurements.
Schmidt, Heinar; Scheier, Rico; Hopkins, David L
2013-01-01
A prototype handheld Raman system was used as a rapid non-invasive optical device to measure raw sheep meat to estimate cooked meat tenderness and cooking loss. Raman measurements were conducted on m. longissimus thoracis et lumborum samples from two sheep flocks from two different origins which had been aged for five days at 3-4°C before deep freezing and further analysis. The Raman data of 140 samples were correlated with shear force and cooking loss data using PLS regression. Both sample origins could be discriminated and separate correlation models yielded better correlations than the joint correlation model. For shear force, R(2)=0.79 and R(2)=0.86 were obtained for the two sites. Results for cooking loss were comparable: separate models yielded R(2)=0.79 and R(2)=0.83 for the two sites. The results show the potential usefulness of Raman spectra which can be recorded during meat processing for the prediction of quality traits such as tenderness and cooking loss. Copyright © 2012 Elsevier Ltd. All rights reserved.
Box-Behnken design for investigation of microwave-assisted extraction of patchouli oil
NASA Astrophysics Data System (ADS)
Kusuma, Heri Septya; Mahfud, Mahfud
2015-12-01
Microwave-assisted extraction (MAE) technique was employed to extract the essential oil from patchouli (Pogostemon cablin). The optimal conditions for microwave-assisted extraction of patchouli oil were determined by response surface methodology. A Box-Behnken design (BBD) was applied to evaluate the effects of three independent variables (microwave power (A: 400-800 W), plant material to solvent ratio (B: 0.10-0.20 g mL-1) and extraction time (C: 20-60 min)) on the extraction yield of patchouli oil. The correlation analysis of the mathematical-regression model indicated that quadratic polynomial model could be employed to optimize the microwave extraction of patchouli oil. The optimal extraction conditions of patchouli oil was microwave power 634.024 W, plant material to solvent ratio 0.147648 g ml-1 and extraction time 51.6174 min. The maximum patchouli oil yield was 2.80516% under these optimal conditions. Under the extraction condition, the experimental values agreed with the predicted results by analysis of variance. It indicated high fitness of the model used and the success of response surface methodology for optimizing and reflect the expected extraction condition.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carpenter, Daniel; Westover, Tyler; Howe, Daniel
Here, we report here on an experimental study to produce refinery-ready fuel blendstocks via catalytic hydrodeoxygenation (upgrading) of pyrolysis oil using several biomass feedstocks and various blends. Blends were tested along with the pure materials to determine the effect of blending on product yields and qualities. Within experimental error, oil yields from fast pyrolysis and upgrading are shown to be linear functions of the blend components. Switchgrass exhibited lower fast pyrolysis and upgrading yields than the woody samples, which included clean pine, oriented strand board (OSB), and a mix of pinon and juniper (PJ). The notable exception was PJ, formore » which the poor upgrading yield of 18% was likely associated with the very high viscosity of the PJ fast pyrolysis oil (947 cp). The highest fast pyrolysis yield (54% dry basis) was obtained from clean pine, while the highest upgrading yield (50%) was obtained from a blend of 80% clean pine and 20% OSB (CP 8OSB 2). For switchgrass, reducing the fast pyrolysis temperature to 450 degrees C resulted in a significant increase to the pyrolysis oil yield and reduced hydrogen consumption during hydrotreating, but did not directly affect the hydrotreating oil yield. The water content of fast pyrolysis oils was also observed to increase linearly with the summed content of potassium and sodium, ranging from 21% for clean pine to 37% for switchgrass. Multiple linear regression models demonstrate that fast pyrolysis is strongly dependent upon the contents lignin and volatile matter as well as the sum of potassium and sodium.« less
Robust inference in the negative binomial regression model with an application to falls data.
Aeberhard, William H; Cantoni, Eva; Heritier, Stephane
2014-12-01
A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.
NASA Astrophysics Data System (ADS)
G, M. Yasin H.; Isnaeni, M.; Faesal; Azrai, M.
2018-05-01
Interaction between genotypes (G), environment (E) and season (S) on anthocyanin corn has been studied under lowland zone in Indonesia. The experiment were conducted using Randomized Complete Block Design (RCBD) based on location namely Maros, Bajeng, and Polman with three replications. Ten populations of open pollinated variety (OPV) of anthocyanins purple corn included checks were used in two planting seasons (dry and rainy) in 2015/2016. The objective of the experiment is to determine which of the population is stable and has high yield to be promoted as candidate for new OPV varieties. Genotypes were planted in four rows of 5.0 m length, spacing of 75 cm x 20 cm, one plant per hill, and applied with Urea, Ponska (300 and 200 kg Ha-1). Population was then selected by t test in model Yij=µ+βiIj+δij, i≠j (Y: yield, µ: mean, β: regression coefficient, I: environmental index, δ: deviation from regression), and stability parameter bi=ΣYiIj/I2 j. The results show that there was a significant interaction of GxExS on PMU(S1).Synth.F.C1 and PPU(S1).F.C1 with yield ranging between 6.70-8.48 t ha-1 and grain potential found in the locations of Maros and Bajeng in the rainy season. Content of anthocyanins were between 37.15 and 51.92 µg per 100 g sample.
The intermediate endpoint effect in logistic and probit regression
MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM
2010-01-01
Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted conclusions regarding the intermediate effect. PMID:17942466
Prakash Maran, J; Manikandan, S; Thirugnanasambandham, K; Vigna Nivetha, C; Dinesh, R
2013-01-30
In this study, ultrasound assisted extraction (UAE) conditions on the yield of polysaccharide from corn silk were studied using three factors, three level Box-Behnken response surface design. Process parameters, which affect the efficiency of UAE such as extraction temperature (40-60 °C), time (10-30 min) and solid-liquid ratio (1:10-1:30 g/ml) were investigated. The results showed that, the extraction conditions have significant effects on extraction yield of polysaccharide. The obtained experimental data were fitted to a second-order polynomial equation using multiple regression analysis with high coefficient of determination value (R(2)) of 0.994. An optimization study using Derringer's desired function methodology was performed and the optimal conditions based on both individual and combinations of all independent variables (extraction temperature of 56 °C, time of 17 min and solid-liquid ratio of 1:20 g/ml) were determined with maximum polysaccharide yield of 6.06%, which was confirmed through validation experiments. Copyright © 2012 Elsevier Ltd. All rights reserved.
Optimization of isolation of cellulose from orange peel using sodium hydroxide and chelating agents.
Bicu, Ioan; Mustata, Fanica
2013-10-15
Response surface methodology was used to optimize cellulose recovery from orange peel using sodium hydroxide (NaOH) as isolation reagent, and to minimize its ash content using ethylenediaminetetraacetic acid (EDTA) as chelating agent. The independent variables were NaOH charge, EDTA charge and cooking time. Other two constant parameters were cooking temperature (98 °C) and liquid-to-solid ratio (7.5). The dependent variables were cellulose yield and ash content. A second-order polynomial model was used for plotting response surfaces and for determining optimum cooking conditions. The analysis of coefficient values for independent variables in the regression equation showed that NaOH and EDTA charges were major factors influencing the cellulose yield and ash content, respectively. Optimum conditions were defined by: NaOH charge 38.2%, EDTA charge 9.56%, and cooking time 317 min. The predicted cellulose yield was 24.06% and ash content 0.69%. A good agreement between the experimental values and the predicted was observed. Copyright © 2013 Elsevier Ltd. All rights reserved.
Process optimization of an auger pyrolyzer with heat carrier using response surface methodology.
Brown, J N; Brown, R C
2012-01-01
A 1 kg/h auger reactor utilizing mechanical mixing of steel shot heat carrier was used to pyrolyze red oak wood biomass. Response surface methodology was employed using a circumscribed central composite design of experiments to optimize the system. Factors investigated were: heat carrier inlet temperature and mass flow rate, rotational speed of screws in the reactor, and volumetric flow rate of sweep gas. Conditions for maximum bio-oil and minimum char yields were high flow rate of sweep gas (3.5 standard L/min), high heat carrier temperature (∼600 °C), high auger speeds (63 RPM) and high heat carrier mass flow rates (18 kg/h). Regression models for bio-oil and char yields are described including identification of a novel interaction effect between heat carrier mass flow rate and auger speed. Results suggest that auger reactors, which are rarely described in literature, are well suited for bio-oil production. The reactor achieved liquid yields greater than 73 wt.%. Copyright © 2011 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki
2018-04-01
Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
[Willingness of Patients with Obesity to Use New Media in Rehabilitation Aftercare].
Dorow, M; Löbner, M; Stein, J; Kind, P; Markert, J; Keller, J; Weidauer, E; Riedel-Heller, S G
2017-06-01
Digital media offer new possibilities in rehabilitation aftercare. This study investigates the rehabilitants' willingness to use new media (sms, internet, social networks) in rehabilitation aftercare and factors that are associated with the willingness to use media-based aftercare. 92 rehabilitants (patients with obesity) filled in a questionnaire on the willingness to use new media in rehabilitation aftercare. In order to identify influencing factors, binary logistic regression models were calculated. 3 quarters of the rehabilitants (76.1%) reported that they would be willing to use new media in rehabilitation aftercare. The binary logistic regression model yielded two factors that were associated with the willingness to use media-based aftercare: the possession of a smartphone and the willingness to receive telephone counseling for aftercare. The majority of the rehabilitants was willing to use new media in rehabilitation aftercare. The reasons for refusal of media-based aftercare need to be examined more closely. © Georg Thieme Verlag KG Stuttgart · New York.
Kang, Jae-Hyun; Kim, Suna; Moon, BoKyung
2016-08-15
In this study, we used response surface methodology (RSM) to optimize the extraction conditions for recovering lutein from paprika leaves using accelerated solvent extraction (ASE). The lutein content was quantitatively analyzed using a UPLC equipped with a BEH C18 column. A central composite design (CCD) was employed for experimental design to obtain the optimized combination of extraction temperature (°C), static time (min), and solvent (EtOH, %). The experimental data obtained from a twenty sample set were fitted to a second-order polynomial equation using multiple regression analysis. The adjusted coefficient of determination (R(2)) for the lutein extraction model was 0.9518, and the probability value (p=0.0000) demonstrated a high significance for the regression model. The optimum extraction conditions for lutein were temperature: 93.26°C, static time: 5 min, and solvent: 79.63% EtOH. Under these conditions, the predicted extraction yield of lutein was 232.60 μg/g. Copyright © 2016 Elsevier Ltd. All rights reserved.
Berzins, Tiffany L.; Garcia, Antonio F.; Acosta, Melina; Osman, Augustine
2017-01-01
Two instrument validation studies broadened the research literature exploring the factor structure, internal consistency reliability, and concurrent validity of scores on the Social Anxiety and Depression Life Interference—24 Inventory (SADLI-24; Osman, Bagge, Freedenthal, Guiterrez, & Emmerich, 2011). Study 1 (N = 1065) was undertaken to concurrently appraise three competing factor models for the instrument: a unidimensional model, a two-factor oblique model and a bifactor model. The bifactor model provided the best fit to the study sample data. Study 2 (N = 220) extended the results from Study 1 with an investigation of the convergent and discriminant validity for the bifactor model of the SADLI-24 with multiple regression analyses and scale-level exploratory structural equation modeling. This project yields data that augments the initial instrument development investigations for the target measure. PMID:28781401
A new approach to synthesis of benzyl cinnamate: Optimization by response surface methodology.
Zhang, Dong-Hao; Zhang, Jiang-Yan; Che, Wen-Cai; Wang, Yun
2016-09-01
In this work, the new approach to synthesis of benzyl cinnamate by enzymatic esterification of cinnamic acid with benzyl alcohol is optimized by response surface methodology. The effects of various reaction conditions, including temperature, enzyme loading, substrate molar ratio of benzyl alcohol to cinnamic acid, and reaction time, are investigated. A 5-level-4-factor central composite design is employed to search for the optimal yield of benzyl cinnamate. A quadratic polynomial regression model is used to analyze the experimental data at a 95% confidence level (P<0.05). The coefficient of determination of this model is found to be 0.9851. Three sets of optimum reaction conditions are established, and the verified experimental trials are performed for validating the optimum points. Under the optimum conditions (40°C, 31mg/mL enzyme loading, 2.6:1 molar ratio, 27h), the yield reaches 97.7%, which provides an efficient processes for industrial production of benzyl cinnamate. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tien, Hai Minh; Le, Kien Anh; Le, Phung Thi Kim
2017-09-01
Bio hydrogen is a sustainable energy resource due to its potentially higher efficiency of conversion to usable power, high energy efficiency and non-polluting nature resource. In this work, the experiments have been carried out to indicate the possibility of generating bio hydrogen as well as identifying effective factors and the optimum conditions from cassava starch. Experimental design was used to investigate the effect of operating temperature (37-43 °C), pH (6-7), and inoculums ratio (6-10 %) to the yield hydrogen production, the COD reduction and the ratio of volume of hydrogen production to COD reduction. The statistical analysis of the experiment indicated that the significant effects for the fermentation yield were the main effect of temperature, pH and inoculums ratio. The interaction effects between them seem not significant. The central composite design showed that the polynomial regression models were in good agreement with the experimental results. This result will be applied to enhance the process of cassava starch processing wastewater treatment.
Predictability of tick-borne encephalitis fluctuations.
Zeman, P
2017-10-01
Tick-borne encephalitis is a serious arboviral infection with unstable dynamics and profound inter-annual fluctuations in case numbers. A dependable predictive model has been sought since the discovery of the disease. The present study demonstrates that four superimposed cycles, approximately 2·4, 3, 5·4, and 10·4 years long, can account for three-fifths of the variation in the disease fluctuations over central Europe. Using harmonic regression, these cycles can be projected into the future, yielding forecasts of sufficient accuracy for up to 4 years ahead. For the years 2016-2018, this model predicts elevated incidence levels in most parts of the region.
Laidig, Friedrich; Piepho, Hans-Peter; Rentel, Dirk; Drobek, Thomas; Meyer, Uwe; Huesken, Alexandra
2017-05-01
Grain yield of hybrid varieties and population varieties in official German variety trials increased by 23.3 and 18.1%, respectively, over the last 26 years. On-farm gain in grain yield (18.9%) was comparable to that of population varieties in variety trials, yet at a level considerably lower than in variety trials. Rye quality is subject to large year-to-year fluctuation. Increase in grain yield and decline of protein concentration did not negatively influence quality traits. Performance progress of grain and quality traits of 78 winter rye varieties tested in official German trials to assess the value for cultivation and use (VCU) were evaluated during 1989 and 2014. We dissected progress into a genetic and a non-genetic component for hybrid and population varieties by applying mixed models, including regression components to model trends. VCU trial results were compared with grain yield and quality data from a national harvest survey (on-farm data). Yield gain for hybrid varieties was 23.3% (18.9 dt ha -1 ) and for population varieties 18.1% (13.0 dt ha -1 ) relative to 1989. On-farm yield progress of 18.9% (8.7 dt ha -1 ) was considerably lagging behind VCU trials, and mean yield levels were substantially lower than in field trials. Most of the yield progress was generated by genetic improvement. For hybrid varieties, ear density was the determining yield component, whereas for population varieties, it was thousand grain mass. Results for VCU trials showed no statistically significant gains or losses in rye quality traits. For on-farm data, we found a positive but non-significant gain in falling number and amylogram viscosity and temperature. Variation of grain and quality traits was strongly influenced by environments, whereas genotypic variation was less than 19% of total variation. Grain yield was strongly negatively associated with protein concentration, yet was weakly to moderately positively associated with quality traits. In general, our results from VCU trials and on-farm data indicated that increasing grain yield and decreasing protein concentration did not negatively affect rye quality traits.
Estimation of genetic parameters for milk yield in Murrah buffaloes by Bayesian inference.
Breda, F C; Albuquerque, L G; Euclydes, R F; Bignardi, A B; Baldi, F; Torres, R A; Barbosa, L; Tonhati, H
2010-02-01
Random regression models were used to estimate genetic parameters for test-day milk yield in Murrah buffaloes using Bayesian inference. Data comprised 17,935 test-day milk records from 1,433 buffaloes. Twelve models were tested using different combinations of third-, fourth-, fifth-, sixth-, and seventh-order orthogonal polynomials of weeks of lactation for additive genetic and permanent environmental effects. All models included the fixed effects of contemporary group, number of daily milkings and age of cow at calving as covariate (linear and quadratic effect). In addition, residual variances were considered to be heterogeneous with 6 classes of variance. Models were selected based on the residual mean square error, weighted average of residual variance estimates, and estimates of variance components, heritabilities, correlations, eigenvalues, and eigenfunctions. Results indicated that changes in the order of fit for additive genetic and permanent environmental random effects influenced the estimation of genetic parameters. Heritability estimates ranged from 0.19 to 0.31. Genetic correlation estimates were close to unity between adjacent test-day records, but decreased gradually as the interval between test-days increased. Results from mean squared error and weighted averages of residual variance estimates suggested that a model considering sixth- and seventh-order Legendre polynomials for additive and permanent environmental effects, respectively, and 6 classes for residual variances, provided the best fit. Nevertheless, this model presented the largest degree of complexity. A more parsimonious model, with fourth- and sixth-order polynomials, respectively, for these same effects, yielded very similar genetic parameter estimates. Therefore, this last model is recommended for routine applications. Copyright 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Bin; Research Center of Applied Statistics, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013; Lin, Boqiang, E-mail: bqlin@xmu.edu.cn
China is currently the world's largest carbon dioxide (CO{sub 2}) emitter. Moreover, total energy consumption and CO{sub 2} emissions in China will continue to increase due to the rapid growth of industrialization and urbanization. Therefore, vigorously developing the high–tech industry becomes an inevitable choice to reduce CO{sub 2} emissions at the moment or in the future. However, ignoring the existing nonlinear links between economic variables, most scholars use traditional linear models to explore the impact of the high–tech industry on CO{sub 2} emissions from an aggregate perspective. Few studies have focused on nonlinear relationships and regional differences in China. Basedmore » on panel data of 1998–2014, this study uses the nonparametric additive regression model to explore the nonlinear effect of the high–tech industry from a regional perspective. The estimated results show that the residual sum of squares (SSR) of the nonparametric additive regression model in the eastern, central and western regions are 0.693, 0.054 and 0.085 respectively, which are much less those that of the traditional linear regression model (3.158, 4.227 and 7.196). This verifies that the nonparametric additive regression model has a better fitting effect. Specifically, the high–tech industry produces an inverted “U–shaped” nonlinear impact on CO{sub 2} emissions in the eastern region, but a positive “U–shaped” nonlinear effect in the central and western regions. Therefore, the nonlinear impact of the high–tech industry on CO{sub 2} emissions in the three regions should be given adequate attention in developing effective abatement policies. - Highlights: • The nonlinear effect of the high–tech industry on CO{sub 2} emissions was investigated. • The high–tech industry yields an inverted “U–shaped” effect in the eastern region. • The high–tech industry has a positive “U–shaped” nonlinear effect in other regions. • The linear impact of the high–tech industry in the eastern region is the strongest.« less
Yamakado, Minoru; Tanaka, Takayuki; Nagao, Kenji; Imaizumi, Akira; Komatsu, Michiharu; Daimon, Takashi; Miyano, Hiroshi; Tani, Mizuki; Toda, Akiko; Yamamoto, Hiroshi; Horimoto, Katsuhisa; Ishizaka, Yuko
2017-11-03
Fatty liver disease (FLD) increases the risk of diabetes, cardiovascular disease, and steatohepatitis, which leads to fibrosis, cirrhosis, and hepatocellular carcinoma. Thus, the early detection of FLD is necessary. We aimed to find a quantitative and feasible model for discriminating the FLD, based on plasma free amino acid (PFAA) profiles. We constructed models of the relationship between PFAA levels in 2,000 generally healthy Japanese subjects and the diagnosis of FLD by abdominal ultrasound scan by multiple logistic regression analysis with variable selection. The performance of these models for FLD discrimination was validated using an independent data set of 2,160 subjects. The generated PFAA-based model was able to identify FLD patients. The area under the receiver operating characteristic curve for the model was 0.83, which was higher than those of other existing liver function-associated markers ranging from 0.53 to 0.80. The value of the linear discriminant in the model yielded the adjusted odds ratio (with 95% confidence intervals) for a 1 standard deviation increase of 2.63 (2.14-3.25) in the multiple logistic regression analysis with known liver function-associated covariates. Interestingly, the linear discriminant values were significantly associated with the progression of FLD, and patients with nonalcoholic steatohepatitis also exhibited higher values.
Calabrò, P S; Catalán, E; Folino, A; Sánchez, A; Komilis, D
2018-01-01
Opuntia ficus-indica (OFI) is an emerging biomass that has the potential to be used as substrate in anaerobic digestion. The goal of this work was to investigate the effect of three pretreatment techniques (thermal, alkaline, acidic) on the chemical composition and the methane yield of OFI biomass. A composite experimental design with three factors and two to three levels was implemented, and regression modelling was employed using a total of 10 biochemical methane potential (BMP) tests. The measured methane yields ranged from 289 to 604 NmL/gVS added ; according to the results, only the acidic pretreatment (HCl) was found to significantly increase methane generation. However, as the experimental values were quite high with regards to the theoretical methane yield of the substrate, this effect still needs to be confirmed via further research. The alkaline pretreatment (NaOH) did not noticeably affect methane yields (an average reduction of 8% was recorded), despite the fact that it did significantly reduce the lignin content. Thermal pretreatment had no effect on the methane yields or the chemical composition. Scanning electron microscopy images revealed changes in the chemical structure after the addition of NaOH and HCl. Modelling of the cumulated methane production by the Gompertz modified equation was successful and aided in understanding kinetic advantages linked to some of the pretreatments. For example, the alkaline treatment (at the 20% dosage) at room temperature resulted to a μ max (maximum specific methane production rate [NmLCH 4 /(gVS added ·d)]) equal to 36.3 against 18.6 for the control.
Zheljazkov, Valtcho D.; Gawde, Archana; Cantrell, Charles L.; Astatkie, Tess; Schlegel, Vicki
2015-01-01
A steam distillation extraction kinetics experiment was conducted to estimate essential oil yield, composition, antimalarial, and antioxidant capacity of cumin (Cuminum cyminum L.) seed (fruits). Furthermore, regression models were developed to predict essential oil yield and composition for a given duration of the steam distillation time (DT). Ten DT durations were tested in this study: 5, 7.5, 15, 30, 60, 120, 240, 360, 480, and 600 min. Oil yields increased with an increase in the DT. Maximum oil yield (content, 2.3 g/100 seed), was achieved at 480 min; longer DT did not increase oil yields. The concentrations of the major oil constituents α-pinene (0.14–0.5% concentration range), β-pinene (3.7–10.3% range), γ-cymene (5–7.3% range), γ-terpinene (1.8–7.2% range), cumin aldehyde (50–66% range), α-terpinen-7-al (3.8–16% range), and β-terpinen-7-al (12–20% range) varied as a function of the DT. The concentrations of α-pinene, β-pinene, γ-cymene, γ-terpinene in the oil increased with the increase of the duration of the DT; α-pinene was highest in the oil obtained at 600 min DT, β-pinene and γ-terpinene reached maximum concentrations in the oil at 360 min DT; γ-cymene reached a maximum in the oil at 60 min DT, cumin aldehyde was high in the oils obtained at 5–60 min DT, and low in the oils obtained at 240–600 min DT, α-terpinen-7-al reached maximum in the oils obtained at 480 or 600 min DT, whereas β-terpinen-7-al reached a maximum concentration in the oil at 60 min DT. The yield of individual oil constituents (calculated from the oil yields and the concentration of a given compound at a particular DT) increased and reached a maximum at 480 or 600 min DT. The antimalarial activity of the cumin seed oil obtained during the 0–5 and at 5–7.5 min DT timeframes was twice higher than the antimalarial activity of the oils obtained at the other DT. This study opens the possibility for distinct marketing and utilization for these improved oils. The antioxidant capacity of the oil was highest in the oil obtained at 30 min DT and lowest in the oil from 360 min DT. The Michaelis-Menton and the Power nonlinear regression models developed in this study can be utilized to predict essential oil yield and composition of cumin seed at any given duration of DT and may also be useful to compare previous reports on cumin oil yield and composition. DT can be utilized to obtain cumin seed oil with improved antimalarial activity, improved antioxidant capacity, and with various compositions. PMID:26641276
NASA Astrophysics Data System (ADS)
Meroni, M.; LEO, O.; Lopez-Lozano, R.; Baruth, B.; Duveiller, G.; Garcia-Condado, S.; Hooker, J.; Seguini, L.
2014-12-01
The site-specific relationship between EO indicators and actual crop yields has been explored in many different studies, describing semi-empirical regression models between spatially aggregated biophysical parameters or vegetation indices and observed yields (from field measurements or official statistics). However, when considering larger extensions -from countries to continents- agro-climatic conditions and crop management may differ substantially among regions, and these differences may greatly influence the relationship between biophysical indicators and the observed yields, which may be also driven by limiting factors other than green biomass formation. The present study aims to better assess the contribution of EO indicators within an operational crop yield forecasting system in Europe and neighbouring countries, by evaluating how these above mentioned geographic differences influence the relationship between biophysical indicators and crop yield. We therefore explore, as a first step, the correspondence between fAPAR time-series (1999-2013) and the inter-annual yield variability of wheat, barley and grain maize, at sub-national level across Europe (270-450 Administrative Units, depending on crop). In a second step, we map the agro-climatic contexts in which EO indicators better explain the observed yield inter-annual variability, identify the influence of some meteorological events on the fAPAR -yield relationship and provide some recommendations for further investigation. The results indicate that in water-limited environments (e.g. Mediterranean and Black Sea areas), fAPAR is highly correlated with yields whereas in northern Europe, crop yield appears much less limited by leaf area expansion along the season, and the relationship between yield and EO products becomes more difficult to interpret.
Factors related to well yield in the fractured-bedrock aquifer of New Hampshire
Moore, Richard Bridge; Schwartz, Gregory E.; Clark, Stewart F.; Walsh, Gregory J.; Degnan, James R.
2002-01-01
The New Hampshire Bedrock Aquifer Assessment was designed to provide information that can be used by communities, industry, professional consultants, and other interests to evaluate the ground-water development potential of the fractured-bedrock aquifer in the State. The assessment was done at statewide, regional, and well field scales to identify relations that potentially could increase the success in locating high-yield water supplies in the fractured-bedrock aquifer. statewide, data were collected for well construction and yield information, bedrock lithology, surficial geology, lineaments, topography, and various derivatives of these basic data sets. Regionally, geologic, fracture, and lineament data were collected for the Pinardville and Windham quadrangles in New Hampshire. The regional scale of the study examined the degree to which predictive well-yield relations, developed as part of the statewide reconnaissance investigation, could be improved by use of quadrangle-scale geologic mapping. Beginning in 1984, water-well contractors in the State were required to report detailed information on newly constructed wells to the New Hampshire Department of Environmental Services (NHDES). The reports contain basic data on well construction, including six characteristics used in this study?well yield, well depth, well use, method of construction, date drilled, and depth to bedrock (or length of casing). The NHDES has determined accurate georeferenced locations for more than 20,000 wells reported since 1984. The availability of this large data set provided an opportunity for a statistical analysis of bedrock-well yields. Well yields in the database ranged from zero to greater than 500 gallons per minute (gal/min). Multivariate regression was used as the primary statistical method of analysis because it is the most efficient tool for predicting a single variable with many potentially independent variables. The dependent variable that was explored in this study was the natural logarithm (ln) of the reported well yield. One complication with using well yield as a dependent variable is that yield also is a function of demand. An innovative statistical technique that involves the use of instrumental variables was implemented to compensate for the effect of demand on well yield. Results of the multivariate-regression model show that a variety of factors are either positively or negatively related to well yields. Using instrumental variables, well depth is positively related to total well yield. Other factors that were found to be positively related to well yield include (1) distance to the nearest waterbody; (2) size of the drainage area upgradient of a well; (3) well location in swales or valley bottoms in the Massabesic Gneiss Complex and Breakfast Hill Granite; (4) well proximity to lineaments, identified using high-altitude (1:80,000-scale) aerial photography, which are correlated with the primary fracture direction (regional analysis); (5) use of a cable tool rig for well drilling; and (6) wells drilled for commercial or public supply. Factors negatively related to well yields include sites underlain by foliated plutons, sites on steep slopes sites at high elevations, and sites on hilltops. Additionally, seven detailed geologic map units, identified during the detailed geologic mapping of the Pinardville and Windham quadrangles, were found to be positively or negatively related to well yields. Twenty-four geologic map units, depicted on the Bedrock Geologic Map of New Hampshire, also were found to be positively or negatively related to well yields. Maps or geographic information system (GIS) data sets identifying areas of various yield probabilities clearly display model results. Probability criteria developed in this investigation can be used to select areas where other techniques, such as geophysical techniques, can be applied to more closely identify potential drilling sites for high-yielding
Mooney, J J; Hedlin, H; Mohabir, P K; Vazquez, R; Nguyen, J; Ha, R; Chiu, P; Patel, K; Zamora, M R; Weill, D; Nicolls, M R; Dhillon, G S
2016-04-01
Although controlled donation after circulatory determination of death (cDCDD) could increase the supply of donor lungs within the United States, the yield of lungs from cDCDD donors remains low compared with donation after neurologic determination of death (DNDD). To explore the reason for low lung yield from cDCDD donors, Scientific Registry of Transplant Recipient data were used to assess the impact of donor lung quality on cDCDD lung utilization by fitting a logistic regression model. The relationship between center volume and cDCDD use was assessed, and the distance between center and donor hospital was calculated by cDCDD status. Recipient survival was compared using a multivariable Cox regression model. Lung utilization was 2.1% for cDCDD donors and 21.4% for DNDD donors. Being a cDCDD donor decreased lung donation (adjusted odds ratio 0.101, 95% confidence interval [CI] 0.085-0.120). A minority of centers have performed cDCDD transplant, with higher volume centers generally performing more cDCDD transplants. There was no difference in center-to-donor distance or recipient survival (adjusted hazard ratio 1.03, 95% CI 0.78-1.37) between cDCDD and DNDD transplants. cDCDD lungs are underutilized compared with DNDD lungs after adjusting for lung quality. Increasing transplant center expertise and commitment to cDCDD lung procurement is needed to improve utilization. © Copyright 2015 The American Society of Transplantation and the American Society of Transplant Surgeons.
a Comparison Between Two Ols-Based Approaches to Estimating Urban Multifractal Parameters
NASA Astrophysics Data System (ADS)
Huang, Lin-Shan; Chen, Yan-Guang
Multifractal theory provides a new spatial analytical tool for urban studies, but many basic problems remain to be solved. Among various pending issues, the most significant one is how to obtain proper multifractal dimension spectrums. If an algorithm is improperly used, the parameter spectrums will be abnormal. This paper is devoted to investigating two ordinary least squares (OLS)-based approaches for estimating urban multifractal parameters. Using empirical study and comparative analysis, we demonstrate how to utilize the adequate linear regression to calculate multifractal parameters. The OLS regression analysis has two different approaches. One is that the intercept is fixed to zero, and the other is that the intercept is not limited. The results of comparative study show that the zero-intercept regression yields proper multifractal parameter spectrums within certain scale range of moment order, while the common regression method often leads to abnormal multifractal parameter values. A conclusion can be reached that fixing the intercept to zero is a more advisable regression method for multifractal parameters estimation, and the shapes of spectral curves and value ranges of fractal parameters can be employed to diagnose urban problems. This research is helpful for scientists to understand multifractal models and apply a more reasonable technique to multifractal parameter calculations.
Tests of a habitat suitability model for black-capped chickadees
Schroeder, Richard L.
1990-01-01
The black-capped chickadee (Parus atricapillus) Habitat Suitability Index (HSI) model provides a quantitative rating of the capability of a habitat to support breeding, based on measures related to food and nest site availability. The model assumption that tree canopy volume can be predicted from measures of tree height and canopy closure was tested using data from foliage volume studies conducted in the riparian cottonwood habitat along the South Platte River in Colorado. Least absolute deviations (LAD) regression showed that canopy cover and over story tree height yielded volume predictions significantly lower than volume estimated by more direct methods. Revisions to these model relations resulted in improved predictions of foliage volume. The relation between the HSI and estimates of black-capped chickadee population densities was examined using LAD regression for both the original model and the model with the foliage volume revisions. Residuals from these models were compared to residuals from both a zero slope model and an ideal model. The fit model for the original HSI differed significantly from the ideal model, whereas the fit model for the original HSI did not differ significantly from the ideal model. However, both the fit model for the original HSI and the fit model for the revised HSI did not differ significantly from a model with a zero slope. Although further testing of the revised model is needed, its use is recommended for more realistic estimates of tree canopy volume and habitat suitability.
Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos
2017-06-01
We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Fischer, Thomas; Fischer, Susanne; Himmel, Wolfgang; Kochen, Michael M; Hummers-Pradier, Eva
2008-01-01
The influence of patient characteristics on family practitioners' (FPs') diagnostic decision making has mainly been investigated using indirect methods such as vignettes or questionnaires. Direct observation-borrowed from social and cultural anthropology-may be an alternative method for describing FPs' real-life behavior and may help in gaining insight into how FPs diagnose respiratory tract infections, which are frequent in primary care. To clarify FPs' diagnostic processes when treating patients suffering from symptoms of respiratory tract infection. This direct observation study was performed in 30 family practices using a checklist for patient complaints, history taking, physical examination, and diagnoses. The influence of patients' symptoms and complaints on the FPs' physical examination and diagnosis was calculated by logistic regression analyses. Dummy variables based on combinations of symptoms and complaints were constructed and tested against saturated (full) and backward regression models. In total, 273 patients (median age 37 years, 51% women) were included. The median number of symptoms described was 4 per patient, and most information was provided at the patients' own initiative. Multiple logistic regression analysis showed a strong association between patients' complaints and the physical examination. Frequent diagnoses were upper respiratory tract infection (URTI)/common cold (43%), bronchitis (26%), sinusitis (12%), and tonsillitis (11%). There were no significant statistical differences between "simple heuristic'' models and saturated regression models in the diagnoses of bronchitis, sinusitis, and tonsillitis, indicating that simple heuristics are probably used by the FPs, whereas "URTI/common cold'' was better explained by the full model. FPs tended to make their diagnosis based on a few patient symptoms and a limited physical examination. Simple heuristic models were almost as powerful in explaining most diagnoses as saturated models. Direct observation allowed for the study of decision making under real conditions, yielding both quantitative data and "qualitative'' information about the FPs' performance. It is important for investigators to be aware of the specific disadvantages of the method (e.g., a possible observer effect).
Rebich, R.A.; Houston, N.A.; Mize, S.V.; Pearson, D.K.; Ging, P.B.; Evan, Hornig C.
2011-01-01
SPAtially Referenced Regressions On Watershed attributes (SPARROW) models were developed to estimate nutrient inputs [total nitrogen (TN) and total phosphorus (TP)] to the northwestern part of the Gulf of Mexico from streams in the South-Central United States (U.S.). This area included drainages of the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf hydrologic regions. The models were standardized to reflect nutrient sources and stream conditions during 2002. Model predictions of nutrient loads (mass per time) and yields (mass per area per time) generally were greatest in streams in the eastern part of the region and along reaches near the Texas and Louisiana shoreline. The Mississippi River and Atchafalaya River watersheds, which drain nearly two-thirds of the conterminous U.S., delivered the largest nutrient loads to the Gulf of Mexico, as expected. However, the three largest delivered TN yields were from the Trinity River/Galveston Bay, Calcasieu River, and Aransas River watersheds, while the three largest delivered TP yields were from the Calcasieu River, Mermentau River, and Trinity River/Galveston Bay watersheds. Model output indicated that the three largest sources of nitrogen from the region were atmospheric deposition (42%), commercial fertilizer (20%), and livestock manure (unconfined, 17%). The three largest sources of phosphorus were commercial fertilizer (28%), urban runoff (23%), and livestock manure (confined and unconfined, 23%). ?? 2011 American Water Resources Association. This article is a U.S. Government work and is in the public domain in the USA.
a Gaussian Process Based Multi-Person Interaction Model
NASA Astrophysics Data System (ADS)
Klinger, T.; Rottensteiner, F.; Heipke, C.
2016-06-01
Online multi-person tracking in image sequences is commonly guided by recursive filters, whose predictive models define the expected positions of future states. When a predictive model deviates too much from the true motion of a pedestrian, which is often the case in crowded scenes due to unpredicted accelerations, the data association is prone to fail. In this paper we propose a novel predictive model on the basis of Gaussian Process Regression. The model takes into account the motion of every tracked pedestrian in the scene and the prediction is executed with respect to the velocities of all interrelated persons. As shown by the experiments, the model is capable of yielding more plausible predictions even in the presence of mutual occlusions or missing measurements. The approach is evaluated on a publicly available benchmark and outperforms other state-of-the-art trackers.
Effect of Calf Gender on Milk Yield and Fatty Acid Content in Holstein Dairy Cows
Ehrlich, James L.; Grove-White, Dai H.
2017-01-01
The scale of sexed semen use to avoid the birth of unwanted bull calves in the UK dairy industry depends on several economic factors. It has been suggested in other studies that calf gender may affect milk yield in Holsteins- something that would affect the economics of sexed semen use. The present study used a large milk recording data set to evaluate the effect of calf gender (both calf born and calf in utero) on both milk yield and saturated fat content. Linear regression was used to model data for first lactation and second lactation separately. Results showed that giving birth to a heifer calf conferred a 1% milk yield advantage in first lactation heifers, whilst giving birth to a bull calf conferred a 0.5% advantage in second lactation. Heifer calves were also associated with a 0.66kg reduction in saturated fatty acid content of milk in first lactation, but there was no significant difference between the genders in second lactation. No relationship was found between calf gender and milk mono- or polyunsaturated fatty acid content. The observed effects of calf gender on both yield and saturated fatty acid content was considered minor when compared to nutritional and genetic influences. PMID:28068399
Effect of Calf Gender on Milk Yield and Fatty Acid Content in Holstein Dairy Cows.
Gillespie, Amy V; Ehrlich, James L; Grove-White, Dai H
2017-01-01
The scale of sexed semen use to avoid the birth of unwanted bull calves in the UK dairy industry depends on several economic factors. It has been suggested in other studies that calf gender may affect milk yield in Holsteins- something that would affect the economics of sexed semen use. The present study used a large milk recording data set to evaluate the effect of calf gender (both calf born and calf in utero) on both milk yield and saturated fat content. Linear regression was used to model data for first lactation and second lactation separately. Results showed that giving birth to a heifer calf conferred a 1% milk yield advantage in first lactation heifers, whilst giving birth to a bull calf conferred a 0.5% advantage in second lactation. Heifer calves were also associated with a 0.66kg reduction in saturated fatty acid content of milk in first lactation, but there was no significant difference between the genders in second lactation. No relationship was found between calf gender and milk mono- or polyunsaturated fatty acid content. The observed effects of calf gender on both yield and saturated fatty acid content was considered minor when compared to nutritional and genetic influences.
NASA Astrophysics Data System (ADS)
Nizamuddin, Mohammad; Akhand, Kawsar; Roytman, Leonid; Kogan, Felix; Goldberg, Mitch
2015-06-01
Rice is a dominant food crop of Bangladesh accounting about 75 percent of agricultural land use for rice cultivation and currently Bangladesh is the world's fourth largest rice producing country. Rice provides about two-third of total calorie supply and about one-half of the agricultural GDP and one-sixth of the national income in Bangladesh. Aus is one of the main rice varieties in Bangladesh. Crop production, especially rice, the main food staple, is the most susceptible to climate change and variability. Any change in climate will, thus, increase uncertainty regarding rice production as climate is major cause year-to-year variability in rice productivity. This paper shows the application of remote sensing data for estimating Aus rice yield in Bangladesh using official statistics of rice yield with real time acquired satellite data from Advanced Very High Resolution Radiometer (AVHRR) sensor and Principal Component Regression (PCR) method was used to construct a model. The simulated result was compared with official agricultural statistics showing that the error of estimation of Aus rice yield was less than 10%. Remote sensing, therefore, is a valuable tool for estimating crop yields well in advance of harvest, and at a low cost.
The Impact of Changing Snowmelt Timing on Non-Irrigated Crop Yield in Idaho
NASA Astrophysics Data System (ADS)
Murray, E. M.; Cobourn, K.; Flores, A. N.; Pierce, J. L.; Kunkel, M. L.
2013-12-01
The impacts of climate change on water resources have implications for both agricultural production and grower welfare. Many mountainous regions in the western U.S. rely on snowmelt as the dominant surface water source, and in Idaho, reconstructions of spring snowmelt timing have demonstrated a trend toward earlier, more variable snowmelt dates within the past 20 years. This earlier date and increased variability in snowmelt timing have serious implications for agriculture, but there is considerable uncertainty about how agricultural impacts vary by region, crop-type, and practices like irrigation vs. dryland farming. Establishing the relationship between snowmelt timing and agricultural yield is important for understanding how changes in large-scale climatic indices (like snowmelt date) may be associated with changes in agricultural yield. This is particularly important where local practitioner behavior is influenced by historically observed relationships between these climate indices and yield. In addition, a better understanding of the influence of changes in snowmelt on non-irrigated crop yield may be extrapolated to better understand how climate change may alter biomass production in non-managed ecosystems. To investigate the impact of snowmelt date on non-irrigated crop yield, we developed a multiple linear regression model to predict historical wheat and barley yield in several Idaho counties as a function of snowmelt date, climate variables (precipitation and growing degree-days), and spatial differences between counties. The relationship between snowmelt timing and non-irrigated crop yield at the county level is strong in many of the models, but differs in magnitude and direction for the two different crops. Results show interesting spatial patterns of variability in the correlation between snowmelt timing and crop yield. In four southern counties that border the Snake River Plain and one county bordering Oregon, non-irrigated wheat and/or barley yield are significantly lower in years with early snowmelt timing, on average (P < 0.10). In contrast, in northern Idaho, barley yield is significantly higher in years with early snowmelt timing. Overall, this statistical modeling exercise indicates that the trend toward earlier snowmelt date may positively impact non-irrigated crop yield in some regions of Idaho, while negatively impacting yield in other areas. Additional research is necessary to identify spatial controls on the variable relationship between snowmelt timing and yield. Regional variability in the response of crops to changes in snowmelt timing may indicate that external factors (e.g. higher amounts of summer rain in northern vs. southern Idaho) may play an important role in crop yield. This study indicates that targeted regional analysis is necessary to determine the influence of climate change on agriculture, as local variability can cause the same forcing to produce opposite results.
Patient casemix classification for medicare psychiatric prospective payment.
Drozd, Edward M; Cromwell, Jerry; Gage, Barbara; Maier, Jan; Greenwald, Leslie M; Goldman, Howard H
2006-04-01
For a proposed Medicare prospective payment system for inpatient psychiatric facility treatment, the authors developed a casemix classification to capture differences in patients' real daily resource use. Primary data on patient characteristics and daily time spent in various activities were collected in a survey of 696 patients from 40 inpatient psychiatric facilities. Survey data were combined with Medicare claims data to estimate intensity-adjusted daily cost. Classification and Regression Trees (CART) analysis of average daily routine and ancillary costs yielded several hierarchical classification groupings. Regression analysis was used to control for facility and day-of-stay effects in order to compare hierarchical models with models based on the recently proposed payment system of the Centers for Medicare & Medicaid Services. CART analysis identified a small set of patient characteristics strongly associated with higher daily costs, including age, psychiatric diagnosis, deficits in daily living activities, and detox or ECT use. A parsimonious, 16-group, fully interactive model that used five major DSM-IV categories and stratified by age, illness severity, deficits in daily living activities, dangerousness, and use of ECT explained 40% (out of a possible 76%) of daily cost variation not attributable to idiosyncratic daily changes within patients. A noninteractive model based on diagnosis-related groups, age, and medical comorbidity had explanatory power of only 32%. A regression model with 16 casemix groups restricted to using "appropriate" payment variables (i.e., those with clinical face validity and low administrative burden that are easily validated and provide proper care incentives) produced more efficient and equitable payments than did a noninteractive system based on diagnosis-related groups.
Risser, Dennis W.; Thompson, Ronald E.; Stuckey, Marla H.
2008-01-01
A method was developed for making estimates of long-term, mean annual ground-water recharge from streamflow data at 80 streamflow-gaging stations in Pennsylvania. The method relates mean annual base-flow yield derived from the streamflow data (as a proxy for recharge) to the climatic, geologic, hydrologic, and physiographic characteristics of the basins (basin characteristics) by use of a regression equation. Base-flow yield is the base flow of a stream divided by the drainage area of the basin, expressed in inches of water basinwide. Mean annual base-flow yield was computed for the period of available streamflow record at continuous streamflow-gaging stations by use of the computer program PART, which separates base flow from direct runoff on the streamflow hydrograph. Base flow provides a reasonable estimate of recharge for basins where streamflow is mostly unaffected by upstream regulation, diversion, or mining. Twenty-eight basin characteristics were included in the exploratory regression analysis as possible predictors of base-flow yield. Basin characteristics found to be statistically significant predictors of mean annual base-flow yield during 1971-2000 at the 95-percent confidence level were (1) mean annual precipitation, (2) average maximum daily temperature, (3) percentage of sand in the soil, (4) percentage of carbonate bedrock in the basin, and (5) stream channel slope. The equation for predicting recharge was developed using ordinary least-squares regression. The standard error of prediction for the equation on log-transformed data was 9.7 percent, and the coefficient of determination was 0.80. The equation can be used to predict long-term, mean annual recharge rates for ungaged basins, providing that the explanatory basin characteristics can be determined and that the underlying assumption is accepted that base-flow yield derived from PART is a reasonable estimate of ground-water recharge rates. For example, application of the equation for 370 hydrologic units in Pennsylvania predicted a range of ground-water recharge from about 6.0 to 22 inches per year. A map of the predicted recharge illustrates the general magnitude and variability of recharge throughout Pennsylvania.
Morphodynamic data assimilation used to understand changing coasts
Plant, Nathaniel G.; Long, Joseph W.
2015-01-01
Morphodynamic data assimilation blends observations with model predictions and comes in many forms, including linear regression, Kalman filter, brute-force parameter estimation, variational assimilation, and Bayesian analysis. Importantly, data assimilation can be used to identify sources of prediction errors that lead to improved fundamental understanding. Overall, models incorporating data assimilation yield better information to the people who must make decisions impacting safety and wellbeing in coastal regions that experience hazards due to storms, sea-level rise, and erosion. We present examples of data assimilation associated with morphologic change. We conclude that enough morphodynamic predictive capability is available now to be useful to people, and that we will increase our understanding and the level of detail of our predictions through assimilation of observations and numerical-statistical models.
NASA Technical Reports Server (NTRS)
Wentz, F. J.
1977-01-01
The general problem of bistatic scattering from a two scale surface was evaluated. The treatment was entirely two-dimensional and in a vector formulation independent of any particular coordinate system. The two scale scattering model was then applied to backscattering from the sea surface. In particular, the model was used in conjunction with the JONSWAP 1975 aircraft scatterometer measurements to determine the sea surface's two scale roughness distributions, namely the probability density of the large scale surface slope and the capillary wavenumber spectrum. Best fits yield, on the average, a 0.7 dB rms difference between the model computations and the vertical polarization measurements of the normalized radar cross section. Correlations between the distribution parameters and the wind speed were established from linear, least squares regressions.
Nie, Lei; Hu, Mingming; Yan, Xu; Guo, Tingting; Wang, Haibin; Zhang, Sheng; Qu, Haibin
2018-05-03
This case study described a successful application of the quality by design (QbD) principles to a coupling process development of insulin degludec. Failure mode effects analysis (FMEA) risk analysis was first used to recognize critical process parameters (CPPs). Five CPPs, including coupling temperature (Temp), pH of desB30 solution (pH), reaction time (Time), desB30 concentration (Conc), and molar equivalent of ester per mole of desB30 insulin (MolE), were then investigated using a fractional factorial design. The curvature effect was found significant, indicating the requirement of second-order models. Afterwards, a central composite design was built with an augmented star and center points study. Regression models were developed for the CPPs to predict the purity and yield of predegludec using above experimental data. The R 2 and adjusted R 2 were higher than 96 and 93% for the two models respectively. The Q 2 values were more than 80% indicating a good predictive ability of models. MolE was found to be the most significant factor affecting both yield and purity of predegludec. Temp, pH, and Conc were also significant for predegludec purity, while Time appeared to remarkably influence the yield model. The multi-dimensional design space and normal operating region (NOR) with a robust setpoint were determined using a probability-based Monte-Carlo simulation method. The verified experimental results showed that the design space was reliable and effective. This study enriches the understanding of acetylation process and is instructional to other complicated operations in biopharmaceutical engineering.
2013-01-01
Background Collection of high-quality DNA is essential for molecular epidemiology studies. Methods have been evaluated for optimal DNA collection in studies of adults; however, DNA collection in young children poses additional challenges. Here, we have evaluated predictors of DNA quantity in buccal cells collected for population-based studies of infant leukemia (N = 489 mothers and 392 children) and hepatoblastoma (HB; N = 446 mothers and 412 children) conducted through the Children’s Oncology Group. DNA samples were collected by mail using mouthwash (for mothers and some children) and buccal brush (for children) collection kits and quantified using quantitative real-time PCR. Multivariable linear regression models were used to identify predictors of DNA yield. Results Median DNA yield was higher for mothers in both studies compared with their children (14 μg vs. <1 μg). Significant predictors of DNA yield in children included case–control status (β = −0.69, 50% reduction, P = 0.01 for case vs. control children), brush collection type, and season of sample collection. Demographic factors were not strong predictors of DNA yield in mothers or children in this analysis. Conclusions The association with seasonality suggests that conditions during transport may influence DNA yield. The low yields observed in most children in these studies highlight the importance of developing alternative methods for DNA collection in younger age groups. PMID:23937514
NASA Technical Reports Server (NTRS)
Fishman, Jack; Creilson, John K.; Parker, Peter A.; Ainsworth, Elizabeth A.; Vining, G. Geoffrey; Szarka, John; Booker, Fitzgerald L.; Xu, Xiaojing
2010-01-01
Elevated concentrations of ground-level ozone (O3) are frequently measured over farmland regions in many parts of the world. While numerous experimental studies show that O3 can significantly decrease crop productivity, independent verifications of yield losses at current ambient O3 concentrations in rural locations are sparse. In this study, soybean crop yield data during a 5-year period over the Midwest of the United States were combined with ground and satellite O3 measurements to provide evidence that yield losses on the order of 10% could be estimated through the use of a multiple linear regression model. Yield loss trends based on both conventional ground-based instrumentation and satellite-derived tropospheric O3 measurements were statistically significant and were consistent with results obtained from open-top chamber experiments and an open-air experimental facility (SoyFACE, Soybean Free Air Concentration Enrichment) in central Illinois. Our analysis suggests that such losses are a relatively new phenomenon due to the increase in background tropospheric O3 levels over recent decades. Extrapolation of these findings supports previous studies that estimate the global economic loss to the farming community of more than $10 billion annually.
Afshari, Kasra; Samavati, Vahid; Shahidi, Seyed-Ahmad
2015-03-01
The effects of ultrasonic power, extraction time, extraction temperature, and the water-to-raw material ratio on extraction yield of crude polysaccharide from the leaf of Hibiscus rosa-sinensis (HRLP) were optimized by statistical analysis using response surface methodology. The response surface methodology (RSM) was used to optimize HRLP extraction yield by implementing the Box-Behnken design (BBD). The experimental data obtained were fitted to a second-order polynomial equation using multiple regression analysis and also analyzed by appropriate statistical methods (ANOVA). Analysis of the results showed that the linear and quadratic terms of these four variables had significant effects. The optimal conditions for the highest extraction yield of HRLP were: ultrasonic power, 93.59 W; extraction time, 25.71 min; extraction temperature, 93.18°C; and the water to raw material ratio, 24.3 mL/g. Under these conditions, the experimental yield was 9.66±0.18%, which is well in close agreement with the value predicted by the model 9.526%. The results demonstrated that HRLP had strong scavenging activities in vitro on DPPH and hydroxyl radicals. Copyright © 2014 Elsevier B.V. All rights reserved.
Validation of test-day models for genetic evaluation of dairy goats in Norway.
Andonov, S; Ødegård, J; Boman, I A; Svendsen, M; Holme, I J; Adnøy, T; Vukovic, V; Klemetsdal, G
2007-10-01
Test-day data for daily milk yield and fat, protein, and lactose content were sampled from the years 1988 to 2003 in 17 flocks belonging to 2 genetically well-tied buck circles. In total, records from 2,111 to 2,215 goats for content traits and 2,371 goats for daily milk yield were included in the analysis, averaging 2.6 and 4.8 observations per goat for the 2 groups of traits, respectively. The data were analyzed by using 4 test-day models with different modeling of fixed effects. Model [0] (the reference model) contained a fixed effect of year-season of kidding with regression on Ali-Schaeffer polynomials nested within the year-season classes, and a random effect of flock test-day. In model [1], the lactation curve effect from model [0] was replaced by a fixed effect of days in milk (in 3-d periods), the same for all year-seasons of kidding. Models [2] and [3] were obtained from model [1] by removing the fixed year-season of kidding effect and considering the flock test-day effect as either fixed or random, respectively. The models were compared by using 2 criteria: mean-squared error of prediction and a test of bias affecting the genetic trend. The first criterion indicated a preference for model [3], whereas the second criterion preferred model [1]. Mean-squared error of prediction is based on model fit, whereas the second criterion tests the ability of the model to produce unbiased genetic evaluation (i.e., its capability of separating environmental and genetic time trends). Thus, a fixed structure with year (year, year-season, or possibly flock-year) was indicated to appropriately separate time trends. Heritability estimates for daily milk yield and milk content were 0.26 and 0.24 to 0.27, respectively.
Huberts, W; Donders, W P; Delhaas, T; van de Vosse, F N
2014-12-01
Patient-specific modeling requires model personalization, which can be achieved in an efficient manner by parameter fixing and parameter prioritization. An efficient variance-based method is using generalized polynomial chaos expansion (gPCE), but it has not been applied in the context of model personalization, nor has it ever been compared with standard variance-based methods for models with many parameters. In this work, we apply the gPCE method to a previously reported pulse wave propagation model and compare the conclusions for model personalization with that of a reference analysis performed with Saltelli's efficient Monte Carlo method. We furthermore differentiate two approaches for obtaining the expansion coefficients: one based on spectral projection (gPCE-P) and one based on least squares regression (gPCE-R). It was found that in general the gPCE yields similar conclusions as the reference analysis but at much lower cost, as long as the polynomial metamodel does not contain unnecessary high order terms. Furthermore, the gPCE-R approach generally yielded better results than gPCE-P. The weak performance of the gPCE-P can be attributed to the assessment of the expansion coefficients using the Smolyak algorithm, which might be hampered by the high number of model parameters and/or by possible non-smoothness in the output space. Copyright © 2014 John Wiley & Sons, Ltd.
Artificial Neural Network for the Prediction of Chromosomal Abnormalities in Azoospermic Males.
Akinsal, Emre Can; Haznedar, Bulent; Baydilli, Numan; Kalinli, Adem; Ozturk, Ahmet; Ekmekçioğlu, Oğuz
2018-02-04
To evaluate whether an artifical neural network helps to diagnose any chromosomal abnormalities in azoospermic males. The data of azoospermic males attending to a tertiary academic referral center were evaluated retrospectively. Height, total testicular volume, follicle stimulating hormone, luteinising hormone, total testosterone and ejaculate volume of the patients were used for the analyses. In artificial neural network, the data of 310 azoospermics were used as the education and 115 as the test set. Logistic regression analyses and discriminant analyses were performed for statistical analyses. The tests were re-analysed with a neural network. Both logistic regression analyses and artificial neural network predicted the presence or absence of chromosomal abnormalities with more than 95% accuracy. The use of artificial neural network model has yielded satisfactory results in terms of distinguishing patients whether they have any chromosomal abnormality or not.
NASA Technical Reports Server (NTRS)
Raymond, William H.; Olson, William S.; Callan, Geary
1995-01-01
In this study, diabatic forcing, and liquid water assimilation techniques are tested in a semi-implicit hydrostatic regional forecast model containing explicit representations of grid-scale cloud water and rainwater. Diabatic forcing, in conjunction with diabatic contributions in the initialization, is found to help the forecast retain the diabatic signal found in the liquid water or heating rate data, consequently reducing the spinup time associated with grid-scale precipitation processes. Both observational Special Sensor Microwave/Imager (SSM/I) and model-generated data are used. A physical retrieval method incorporating SSM/I radiance data is utilized to estimate the 3D distribution of precipitating storms. In the retrieval method the relationship between precipitation distributions and upwelling microwave radiances is parameterized, based upon cloud ensemble-radiative model simulations. Regression formulae relating vertically integrated liquid and ice-phase precipitation amounts to latent heating rates are also derived from the cloud ensemble simulations. Thus, retrieved SSM/I precipitation structures can be used in conjunction with the regression-formulas to infer the 3D distribution of latent heating rates. These heating rates are used directly in the forecast model to help initiate Tropical Storm Emily (21 September 1987). The 14-h forecast of Emily's development yields atmospheric precipitation water contents that compare favorably with coincident SSM/I estimates.
Production of ethyl levulinate by direct conversion of wheat straw in ethanol media.
Chang, Chun; Xu, Guizhuan; Jiang, Xiaoxian
2012-10-01
The production of ethyl levulinate from wheat straw by direct conversion in ethanol media was investigated. Response surface methodology (RSM) was applied to optimize the effects of processing parameters, and the regression analysis was performed on the data obtained. A close agreement between the experimental results and the model predictions was achieved. The optimal conditions for ethyl levulinate production from wheat straw were acid concentration 2.5%, reaction temperature 183°C, mass ratio of liquid to solid 19.8 and reaction time 36 min. Under the optimum conditions, the yield of ethyl levulinate 17.91% was obtained, representing a theoretical yield of 51.0%. The results suggest that wheat straw can be used as potential raw materials for the production of ethyl levulinate by direct conversion in ethanol media. Copyright © 2012 Elsevier Ltd. All rights reserved.
Multilevel joint competing risk models
NASA Astrophysics Data System (ADS)
Karunarathna, G. H. S.; Sooriyarachchi, M. R.
2017-09-01
Joint modeling approaches are often encountered for different outcomes of competing risk time to event and count in many biomedical and epidemiology studies in the presence of cluster effect. Hospital length of stay (LOS) has been the widely used outcome measure in hospital utilization due to the benchmark measurement for measuring multiple terminations such as discharge, transferred, dead and patients who have not completed the event of interest at the follow up period (censored) during hospitalizations. Competing risk models provide a method of addressing such multiple destinations since classical time to event models yield biased results when there are multiple events. In this study, the concept of joint modeling has been applied to the dengue epidemiology in Sri Lanka, 2006-2008 to assess the relationship between different outcomes of LOS and platelet count of dengue patients with the district cluster effect. Two key approaches have been applied to build up the joint scenario. In the first approach, modeling each competing risk separately using the binary logistic model, treating all other events as censored under the multilevel discrete time to event model, while the platelet counts are assumed to follow a lognormal regression model. The second approach is based on the endogeneity effect in the multilevel competing risks and count model. Model parameters were estimated using maximum likelihood based on the Laplace approximation. Moreover, the study reveals that joint modeling approach yield more precise results compared to fitting two separate univariate models, in terms of AIC (Akaike Information Criterion).
Catalytic hydroprocessing of fast pyrolysis oils: Impact of biomass feedstock on process efficiency
Carpenter, Daniel; Westover, Tyler; Howe, Daniel; ...
2016-12-01
Here, we report here on an experimental study to produce refinery-ready fuel blendstocks via catalytic hydrodeoxygenation (upgrading) of pyrolysis oil using several biomass feedstocks and various blends. Blends were tested along with the pure materials to determine the effect of blending on product yields and qualities. Within experimental error, oil yields from fast pyrolysis and upgrading are shown to be linear functions of the blend components. Switchgrass exhibited lower fast pyrolysis and upgrading yields than the woody samples, which included clean pine, oriented strand board (OSB), and a mix of pinon and juniper (PJ). The notable exception was PJ, formore » which the poor upgrading yield of 18% was likely associated with the very high viscosity of the PJ fast pyrolysis oil (947 cp). The highest fast pyrolysis yield (54% dry basis) was obtained from clean pine, while the highest upgrading yield (50%) was obtained from a blend of 80% clean pine and 20% OSB (CP 8OSB 2). For switchgrass, reducing the fast pyrolysis temperature to 450 degrees C resulted in a significant increase to the pyrolysis oil yield and reduced hydrogen consumption during hydrotreating, but did not directly affect the hydrotreating oil yield. The water content of fast pyrolysis oils was also observed to increase linearly with the summed content of potassium and sodium, ranging from 21% for clean pine to 37% for switchgrass. Multiple linear regression models demonstrate that fast pyrolysis is strongly dependent upon the contents lignin and volatile matter as well as the sum of potassium and sodium.« less
Damane, Moslem Moghbeli; Fozi, Masood Asadi; Mehrgardi, Ahmad Ayatollahi
2016-01-01
The milk yield can be affected by the frequency of milking per day, in dairy cows. Previous studies have shown that the milk yield is increased by 6-25 % per lactation when the milking frequency is increased from 2 to 3 times per day while the somatic cell count is decreased. To investigate the effect of milking frequency (3X vs. 4X) on milk yield and it's genetic parameters in the first and second lactations of the Iranian Holstein dairy cows, a total of 142,604 test day (TD) records of milk yield were measured on 20,762 cows. Heritability estimates of milk yield were 0.25 and 0.19 for 3X milking frequency and 0.34 and 0.26 for 4X milking frequency throughout the first and second lactations, respectively. Repeatability estimates of milk yield were 0.70 and 0.71 for 3X milking frequency and 0.76 and 0.77 for 4X milking frequency, respectively. In comparison with 3X milking frequency, the milk yield of the first and second lactations was increased by 11.6 and 12.2 %, respectively when 4X was used (p < 0.01). Results of this research demonstrated that increasing milking frequency led to an increase in heritability and repeatability of milk yield. The current investigation provided clear evidences for the benefits of using 4X milking frequency instead of 3X in Iranian Holstein dairy cows.
Prediction of Aerosol Optical Depth in West Asia: Machine Learning Methods versus Numerical Models
NASA Astrophysics Data System (ADS)
Omid Nabavi, Seyed; Haimberger, Leopold; Abbasi, Reyhaneh; Samimi, Cyrus
2017-04-01
Dust-prone areas of West Asia are releasing increasingly large amounts of dust particles during warm months. Because of the lack of ground-based observations in the region, this phenomenon is mainly monitored through remotely sensed aerosol products. The recent development of mesoscale Numerical Models (NMs) has offered an unprecedented opportunity to predict dust emission, and, subsequently Aerosol Optical Depth (AOD), at finer spatial and temporal resolutions. Nevertheless, the significant uncertainties in input data and simulations of dust activation and transport limit the performance of numerical models in dust prediction. The presented study aims to evaluate if machine-learning algorithms (MLAs), which require much less computational expense, can yield the same or even better performance than NMs. Deep blue (DB) AOD, which is observed by satellites but also predicted by MLAs and NMs, is used for validation. We concentrate our evaluations on the over dry Iraq plains, known as the main origin of recently intensified dust storms in West Asia. Here we examine the performance of four MLAs including Linear regression Model (LM), Support Vector Machine (SVM), Artificial Neural Network (ANN), Multivariate Adaptive Regression Splines (MARS). The Weather Research and Forecasting model coupled to Chemistry (WRF-Chem) and the Dust REgional Atmosphere Model (DREAM) are included as NMs. The MACC aerosol re-analysis of European Centre for Medium-range Weather Forecast (ECMWF) is also included, although it has assimilated satellite-based AOD data. Using the Recursive Feature Elimination (RFE) method, nine environmental features including soil moisture and temperature, NDVI, dust source function, albedo, dust uplift potential, vertical velocity, precipitation and 9-month SPEI drought index are selected for dust (AOD) modeling by MLAs. During the feature selection process, we noticed that NDVI and SPEI are of the highest importance in MLAs predictions. The data set was divided into a training (2003-2010) and a testing (2011-2013) subset. The evaluation using the two subsets shows that ANN outperformed all other MLAs and NMs. Verified to monthly mean MODIS DB AOD, ANN yielded a Spearman correlation coefficient (SCC) of 0.74, whereas SCC of 0.71 was allotted to WRF-chem simulations, as the most successful NM. In terms of simulation accuracy, SVM and MARS have yielded the lowest bias (-0.001) and RMSE (0.16). DREAM showed the poorest performance with a SCC of 0.52, a bias of -0.17 and a RMSE of 0.29.
Genetic modelling of test day records in dairy sheep using orthogonal Legendre polynomials.
Kominakis, A; Volanis, M; Rogdakis, E
2001-03-01
Test day milk yields of three lactations in Sfakia sheep were analyzed fitting a random regression (RR) model, regressing on orthogonal polynomials of the stage of the lactation period, i.e. days in milk. Univariate (UV) and multivariate (MV) analyses were also performed for four stages of the lactation period, represented by average days in milk, i.e. 15, 45, 70 and 105 days, to compare estimates obtained from RR models with estimates from UV and MV analyses. The total number of test day records were 790, 1314 and 1041 obtained from 214, 342 and 303 ewes in the first, second and third lactation, respectively. Error variances and covariances between regression coefficients were estimated by restricted maximum likelihood. Models were compared using likelihood ratio tests (LRTs). Log likelihoods were not significantly reduced when the rank of the orthogonal Legendre polynomials (LPs) of lactation stage was reduced from 4 to 2 and homogenous variances for lactation stages within lactations were considered. Mean weighted heritability estimates with RR models were 0.19, 0.09 and 0.08 for first, second and third lactation, respectively. The respective estimates obtained from UV analyses were 0.14, 0.12 and 0.08, respectively. Mean permanent environmental variance, as a proportion of the total, was high at all stages and lactations ranging from 0.54 to 0.71. Within lactations, genetic and permanent environmental correlations between lactation stages were in the range from 0.36 to 0.99 and 0.76 to 0.99, respectively. Genetic parameters for additive genetic and permanent environmental effects obtained from RR models were different from those obtained from UV and MV analyses.
Calibration Model for Apnea-Hypopnea Indices: Impact of Alternative Criteria for Hypopneas
Ho, Vu; Crainiceanu, Ciprian M.; Punjabi, Naresh M.; Redline, Susan; Gottlieb, Daniel J.
2015-01-01
Study Objective: To characterize the association among apnea-hypopnea indices (AHIs) determined using three common metrics for defining hypopnea, and to develop a model to calibrate between these AHIs. Design: Cross-sectional analysis of Sleep Heart Health Study Data. Setting: Community-based. Participants: There were 6,441 men and women age 40 y or older. Measurement and Results: Three separate AHIs have been calculated, using all apneas (defined as a decrease in airflow greater than 90% from baseline for ≥ 10 sec) plus hypopneas (defined as a decrease in airflow or chest wall or abdominal excursion greater than 30% from baseline, but not meeting apnea definitions) associated with either: (1) a 4% or greater fall in oxyhemoglobin saturation—AHI4; (2) a 3% or greater fall in oxyhemoglobin saturation—AHI3; or (3) a 3% or greater fall in oxyhemoglobin saturation or an event-related arousal—AHI3a. Median values were 5.4, 9.7, and 13.4 for AHI4, AHI3, and AHI3a, respectively (P < 0.0001). Penalized spline regression models were used to compare AHI values across the three metrics and to calculate prediction intervals. Comparison of regression models demonstrates divergence in AHI scores among the three methods at low AHI values and gradual convergence at higher levels of AHI. Conclusions: The three methods of scoring hypopneas yielded significantly different estimates of the apnea-hypopnea index (AHI), although the relative difference is reduced in severe disease. The regression models presented will enable clinicians and researchers to more appropriately compare AHI values obtained using differing metrics for hypopnea. Citation: Ho V, Crainiceanu CM, Punjabi NM, Redline S, Gottlieb DJ. Calibration model for apnea-hypopnea indices: impact of alternative criteria for hypopneas. SLEEP 2015;38(12):1887–1892. PMID:26564122
Predicting acute pain after cesarean delivery using three simple questions.
Pan, Peter H; Tonidandel, Ashley M; Aschenbrenner, Carol A; Houle, Timothy T; Harris, Lynne C; Eisenach, James C
2013-05-01
Interindividual variability in postoperative pain presents a clinical challenge. Preoperative quantitative sensory testing is useful but time consuming in predicting postoperative pain intensity. The current study was conducted to develop and validate a predictive model of acute postcesarean pain using a simple three-item preoperative questionnaire. A total of 200 women scheduled for elective cesarean delivery under subarachnoid anesthesia were enrolled (192 subjects analyzed). Patients were asked to rate the intensity of loudness of audio tones, their level of anxiety and anticipated pain, and analgesic need from surgery. Postoperatively, patients reported the intensity of evoked pain. Regression analysis was performed to generate a predictive model for pain from these measures. A validation cohort of 151 women was enrolled to test the reliability of the model (131 subjects analyzed). Responses from each of the three preoperative questions correlated moderately with 24-h evoked pain intensity (r = 0.24-0.33, P < 0.001). Audio tone rating added uniquely, but minimally, to the model and was not included in the predictive model. The multiple regression analysis yielded a statistically significant model (R = 0.20, P < 0.001), whereas the validation cohort showed reliably a very similar regression line (R = 0.18). In predicting the upper 20th percentile of evoked pain scores, the optimal cut point was 46.9 (z =0.24) such that sensitivity of 0.68 and specificity of 0.67 were as balanced as possible. This simple three-item questionnaire is useful to help predict postcesarean evoked pain intensity, and could be applied to further research and clinical application to tailor analgesic therapy to those who need it most.
NASA Astrophysics Data System (ADS)
Li, D.; Nanseki, T.; Chomei, Y.; Yokota, S.
2017-07-01
Rice, a staple crop in Japan, is at risk of decreasing production and its yield highly depends on soil fertility. This study aimed to investigate determinants of rice yield, from the perspectives of fertilizer nitrogen and soil chemical properties. The data were sampled in 2014 and 2015 from 92 peat soil paddy fields on a large-scale farm located in the Kanto Region of Japan. The rice variety used was the most widely planted Koshihikari in Japan. Regression analysis indicated that fertilizer nitrogen significantly affected the yield, with a significant sustained effect to the subsequent year. Twelve soil chemical properties, including pH, cation exchange capacity, content of pyridine base elements, phosphoric acid, and silicic acid, were estimated. In addition to silicic acid, magnesia, in forms of its exchangeable content, saturation, and ratios to potassium and lime, positively affected the yield, while phosphoric acid negatively affected the yield. We assessed the soil chemical properties by soil quality index and principal component analysis. Positive effects were identified for both approaches, with the former performing better in explaining the rice yield. For soil quality index, the individual standardized soil properties and margins for improvement were indicated for each paddy field. Finally, multivariate regression on the principal components identified the most significant properties.
Estimation of Rice Crop Yields Using Random Forests in Taiwan
NASA Astrophysics Data System (ADS)
Chen, C. F.; Lin, H. S.; Nguyen, S. T.; Chen, C. R.
2017-12-01
Rice is globally one of the most important food crops, directly feeding more people than any other crops. Rice is not only the most important commodity, but also plays a critical role in the economy of Taiwan because it provides employment and income for large rural populations. The rice harvested area and production are thus monitored yearly due to the government's initiatives. Agronomic planners need such information for more precise assessment of food production to tackle issues of national food security and policymaking. This study aimed to develop a machine-learning approach using physical parameters to estimate rice crop yields in Taiwan. We processed the data for 2014 cropping seasons, following three main steps: (1) data pre-processing to construct input layers, including soil types and weather parameters (e.g., maxima and minima air temperature, precipitation, and solar radiation) obtained from meteorological stations across the country; (2) crop yield estimation using the random forests owing to its merits as it can process thousands of variables, estimate missing data, maintain the accuracy level when a large proportion of the data is missing, overcome most of over-fitting problems, and run fast and efficiently when handling large datasets; and (3) error verification. To execute the model, we separated the datasets into two groups of pixels: group-1 (70% of pixels) for training the model and group-2 (30% of pixels) for testing the model. Once the model is trained to produce small and stable out-of-bag error (i.e., the mean squared error between predicted and actual values), it can be used for estimating rice yields of cropping seasons. The results obtained from the random forests-based regression were compared with the actual yield statistics indicated the values of root mean square error (RMSE) and mean absolute error (MAE) achieved for the first rice crop were respectively 6.2% and 2.7%, while those for the second rice crop were 5.3% and 2.9%, respectively. Although there are several uncertainties attributed to the data quality of input layers, our study demonstrates the promising application of random forests for estimating rice crop yields at the national level in Taiwan. This approach could be transferable to other regions of the world for improving large-scale estimation of rice crop yields.
Santana, M L; Pereira, R J; Bignardi, A B; Filho, A E Vercesi; Menéndez-Buxadera, A; El Faro, L
2015-12-01
In an attempt to determine the possible detrimental effects of continuous selection for milk yield on the genetic tolerance of Zebu cattle to heat stress, genetic parameters and trends of the response to heat stress for 86,950 test-day (TD) milk yield records from 14,670 first lactations of purebred dairy Gir cows were estimated. A random regression model with regression on days in milk (DIM) and temperature-humidity index (THI) values was applied to the data. The most detrimental effect of THI on milk yield was observed in the stage of lactation with higher milk production, DIM 61 to 120 (-0.099kg/d per THI). Although modest variations were observed for the THI scale, a reduction in additive genetic variance as well as in permanent environmental and residual variance was observed with increasing THI values. The heritability estimates showed a slight increase with increasing THI values for any DIM. The correlations between additive genetic effects across the THI scale showed that, for most of the THI values, genotype by environment interactions due to heat stress were less important for the ranking of bulls. However, for extreme THI values, this type of genotype by environment interaction may lead to an important error in selection. As a result of the selection for milk yield practiced in the dairy Gir population for 3 decades, the genetic trend of cumulative milk yield was significantly positive for production in both high (51.81kg/yr) and low THI values (78.48kg/yr). However, the difference between the breeding values of animals at high and low THI may be considered alarming (355kg in 2011). The genetic trends observed for the regression coefficients related to general production level (intercept of the reaction norm) and specific ability to respond to heat stress (slope of the reaction norm) indicate that the dairy Gir population is heading toward a higher production level at the expense of lower tolerance to heat stress. These trends reflect the genetic antagonism between production and tolerance to heat stress demonstrated by the negative genetic correlation between these components (-0.23). Monitoring trends of the genetic component of heat stress would be a reasonable measure to avoid deterioration in one of the main traits of Zebu cattle (i.e., high tolerance to heat stress). On the basis of current genetic trends, the need for future genetic evaluation of dairy Zebu animals for tolerance to heat stress cannot be ruled out. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A; Guinney, Justin
2015-01-01
Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.
USDA-ARS?s Scientific Manuscript database
Data assimilation and regression are two commonly used methods for predicting agricultural yield from remote sensing observations. Data assimilation is a generative approach because it requires explicit approximations of the Bayesian prior and likelihood to compute the probability density function...
NASA Astrophysics Data System (ADS)
Soucemarianadin, Laure; Barré, Pierre; Baudin, François; Chenu, Claire; Houot, Sabine; Kätterer, Thomas; Macdonald, Andy; van Oort, Folkert; Plante, Alain F.; Cécillon, Lauric
2017-04-01
The organic carbon reservoir of soils is a key component of climate change, calling for an accurate knowledge of the residence time of soil organic carbon (SOC). Existing proxies of the size of SOC labile pool such as SOC fractionation or respiration tests are time consuming and unable to consistently predict SOC mineralization over years to decades. Similarly, models of SOC dynamics often yield unrealistic values of the size of SOC kinetic pools. Thermal analysis of bulk soil samples has recently been shown to provide useful and cost-effective information regarding the long-term in-situ decomposition of SOC. Barré et al. (2016) analyzed soil samples from long-term bare fallow sites in northwestern Europe using Rock-Eval 6 pyrolysis (RE6), and demonstrated that persistent SOC is thermally more stable and has less hydrogen-rich compounds (low RE6 HI parameter) than labile SOC. The objective of this study was to predict SOC loss over a 20-year period (i.e. the size of the SOC pool with a residence time lower than 20 years) using RE6 indicators. Thirty-six archive soil samples coming from 4 long-term bare fallow chronosequences (Grignon, France; Rothamsted, Great Britain; Ultuna, Sweden; Versailles, France) were used in this study. For each sample, the value of bi-decadal SOC mineralization was obtained from the observed SOC dynamics of its long-term bare fallow plot (approximated by a spline function). Those values ranged from 0.8 to 14.3 gC·kg-1 (concentration data), representing 8.6 to 50.6% of total SOC (proportion data). All samples were analyzed using RE6 and simple linear regression models were used to predict bi-decadal SOC loss (concentration and proportion data) from 4 RE6 parameters: HI, OI, PC/SOC and T50 CO2 oxidation. HI (the amount of hydrogen-rich effluents formed during the pyrolysis phase of RE6; mgCH.g-1SOC) and OI (the CO2 yield during the pyrolysis phase of RE6; mgCO2.g-1SOC) parameters describe SOC bulk chemistry. PC/SOC (the amount of organic C evolved during the pyrolysis phase of RE6; % of total SOC) and T50 CO2 oxidation (the temperature at which 50% of the residual organic C was oxidized to CO2 during the RE6 oxidation phase; °C) parameters represent SOC thermal stability. The RE6 HI parameter yielded the best predictions of bi-decadal SOC mineralization, for both concentration (R2 = 0.75) and proportion (R2 = 0.66) data. PC/SOC and T50 CO2 oxidation parameters also yielded significant regression models with R2 = 0.68 and 0.42 for concentration data and R2 = 0.59 and 0.26 for proportion data, respectively. The OI parameter was not a good predictor of bi-decadal SOC loss, with non-significant regression models. The RE6 thermal analysis method can predict in-situ SOC biogeochemical stability. SOC chemical composition, and to a lesser SOC thermal stability, are related to its bi-decadal dynamics. RE6 appears to be a more accurate and convenient proxy of the size of the bi-decadal labile SOC pool than other existing methodologies. Future developments include the validation of these RE6 models of bi-decadal SOC loss on soils from contrasted pedoclimatic conditions. Reference: Barré et al., 2016. Biogeochemistry 130, 1-12
Bryant, J R; Lopez-Villalobos, N; Holmes, C W; Pryce, J E; Pitman, G D; Davis, S R
2007-03-01
An evolutionary algorithm was applied to a mechanistic model of the mammary gland to find the parameter values that minimised the difference between predicted and actual lactation curves of milk yields in New Zealand Jersey cattle managed at different feeding levels. The effect of feeding level, genetic merit, body condition score at parturition and age on total lactation yields of milk, fat and protein, days in milk, live weight and evolutionary algorithm derived mammary gland parameters was then determined using a multiple regression model. The mechanistic model of the mammary gland was able to fit lactation curves that corresponded to actual lactation curves with a high degree of accuracy. The senescence rate of quiescent (inactive) alveoli was highest at the very low feeding level. The active alveoli population at peak lactation was highest at very low feeding levels, but lower nutritional status at this feeding level prevented high milk yields from being achieved. Genetic merit had a significant linear effect on the active alveoli population at peak and mid to late lactation, with higher values in animals, which had higher breeding values for milk yields. A type of genetic merit × feeding level scaling effect was observed for total yields of milk and fat, and total number of alveoli produced from conception until the end of lactation with the benefits of increases in genetic merit being greater at high feeding levels. A genetic merit × age scaling effect was observed for total lactation protein yields. Initial rates of differentiation of progenitor cells declined with age. Production levels of alveoli from conception to the end of lactation were lowest in 5- to 8-year-old animals; however, in these older animals, quiescent alveoli were reactivated more frequently. The active alveoli population at peak lactation and rates of active alveoli proceeding to quiescence were highest in animals of intermediate body condition scores of 4.0 to 5.0. The results illustrate the potential uses of a mechanistic model of the mammary gland to fit a lactation curve and to quantify the effects of feeding level, genetic merit, body condition score, and age on mammary gland dynamics throughout lactation.
Support vector regression to predict porosity and permeability: Effect of sample size
NASA Astrophysics Data System (ADS)
Al-Anazi, A. F.; Gates, I. D.
2012-02-01
Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function type and loss functions used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Braeton J.; Shaneyfelt, Calvin R.
A NISAC study on the economic effects of a hypothetical H1N1 pandemic was done in order to assess the differential impacts at the state and industry levels given changes in absenteeism, mortality, and consumer spending rates. Part of the analysis was to determine if there were any direct relationships between pandemic impacts and gross domestic product (GDP) losses. Multiple regression analysis was used because it shows very clearly which predictors are significant in their impact on GDP. GDP impact data taken from the REMI PI+ (Regional Economic Models, Inc., Policy Insight +) model was used to serve as the responsemore » variable. NISAC economists selected the average absenteeism rate, mortality rate, and consumer spending categories as the predictor variables. Two outliers were found in the data: Nevada and Washington, DC. The analysis was done twice, with the outliers removed for the second analysis. The second set of regressions yielded a cleaner model, but for the purposes of this study, the analysts deemed it not as useful because particular interest was placed on determining the differential impacts to states. Hospitals and accommodation were found to be the most important predictors of percentage change in GDP among the consumer spending variables.« less
Wilks, Scott E; Croom, Beth
2008-05-01
The study examined whether social support functioned as a protective, resilience factor among Alzheimer's disease (AD) caregivers. Moderation and mediation models were used to test social support amid stress and resilience. A cross-sectional analysis of self-reported data was conducted. Measures of demographics, perceived stress, family support, friend support, overall social support, and resilience were administered to caregiver attendees (N=229) of two AD caregiver conferences. Hierarchical regression analysis showed the compounded impact of predictors on resilience. Odds ratios generated probability of high resilience given high stress and social supports. Social support moderation and mediation were tested via distinct series of regression equations. Path analyses illustrated effects on the models for significant moderation and/or mediation. Stress negatively influenced and accounted for most variation in resilience. Social support positively influenced resilience, and caregivers with high family support had the highest probability of elevated resilience. Moderation was observed among all support factors. No social support fulfilled the complete mediation criteria. Evidence of social support as a protective, moderating factor yields implications for health care practitioners who deliver services to assist AD caregivers, particularly the promotion of identification and utilization of supportive familial and peer relations.
Misspecification of Cox regression models with composite endpoints
Wu, Longyang; Cook, Richard J
2012-01-01
Researchers routinely adopt composite endpoints in multicenter randomized trials designed to evaluate the effect of experimental interventions in cardiovascular disease, diabetes, and cancer. Despite their widespread use, relatively little attention has been paid to the statistical properties of estimators of treatment effect based on composite endpoints. We consider this here in the context of multivariate models for time to event data in which copula functions link marginal distributions with a proportional hazards structure. We then examine the asymptotic and empirical properties of the estimator of treatment effect arising from a Cox regression model for the time to the first event. We point out that even when the treatment effect is the same for the component events, the limiting value of the estimator based on the composite endpoint is usually inconsistent for this common value. We find that in this context the limiting value is determined by the degree of association between the events, the stochastic ordering of events, and the censoring distribution. Within the framework adopted, marginal methods for the analysis of multivariate failure time data yield consistent estimators of treatment effect and are therefore preferred. We illustrate the methods by application to a recent asthma study. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22736519
Alves, R S; Teodoro, P E; Farias, F C; Farias, F J C; Carvalho, L P; Rodrigues, J I S; Bhering, L L; Resende, M D V
2017-08-17
Cotton produces one of the most important textile fibers of the world and has great relevance in the world economy. It is an economically important crop in Brazil, which is the world's fifth largest producer. However, studies evaluating the genotype x environment (G x E) interactions in cotton are scarce in this country. Therefore, the goal of this study was to evaluate the G x E interactions in two important traits in cotton (fiber yield and fiber length) using the method proposed by Eberhart and Russell (simple linear regression) and reaction norm models (random regression). Eight trials with sixteen upland cotton genotypes, conducted in a randomized block design, were used. It was possible to identify a genotype with wide adaptability and stability for both traits. Reaction norm models have excellent theoretical and practical properties and led to more informative and accurate results than the method proposed by Eberhart and Russell and should, therefore, be preferred. Curves of genotypic values as a function of the environmental gradient, which predict the behavior of the genotypes along the environmental gradient, were generated. These curves make possible the recommendation to untested environmental levels.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Zhenhua; Rose, Adam Z.; Prager, Fynnwin
The state of the art approach to economic consequence analysis (ECA) is computable general equilibrium (CGE) modeling. However, such models contain thousands of equations and cannot readily be incorporated into computerized systems used by policy analysts to yield estimates of economic impacts of various types of transportation system failures due to natural hazards, human related attacks or technological accidents. This paper presents a reduced-form approach to simplify the analytical content of CGE models to make them more transparent and enhance their utilization potential. The reduced-form CGE analysis is conducted by first running simulations one hundred times, varying key parameters, suchmore » as magnitude of the initial shock, duration, location, remediation, and resilience, according to a Latin Hypercube sampling procedure. Statistical analysis is then applied to the “synthetic data” results in the form of both ordinary least squares and quantile regression. The analysis yields linear equations that are incorporated into a computerized system and utilized along with Monte Carlo simulation methods for propagating uncertainties in economic consequences. Although our demonstration and discussion focuses on aviation system disruptions caused by terrorist attacks, the approach can be applied to a broad range of threat scenarios.« less
Optimization of extraction of chitin from procambarus clarkia shell by Box-Behnken design
NASA Astrophysics Data System (ADS)
Dong, Fang; Qiu, Hailong; Jia, Shaoqian; Dai, Cuiping; Kong, Qingxin; Xu, Changliang
2018-06-01
This paper investigated the optimizing extraction processing of chitin from procambarus clarkia shell by Box-Behnken design. Firstly, four independent variables were explored in single factor experiments, namely, concentration of hydrochloric acid, soaking time, concentration of sodium hydroxide and reaction time. Then, based on the results of the above experiments, four factors and three levels experiments were planned by Box-Behnken design. According to the experimental results, we harvested a second-order polynomial equation using multiple regression analysis. In addition, the optimum extraction process of chitin of the model was obtained: concentration of HCl solution 1.54mol/L, soaking time 19.87h, concentration of NaOH solution 2.9mol/L and reaction time 3.54h. For proving the accuracy of the model, we finished the verification experiment under the following conditions: concentration of hydrochloric acid 1.5mol/L, soaking time 20h, concentration of sodium hydroxide 3mol/L and reaction time 3.5h. The actual yield of chitin reached 18.76%, which was very close to the predicted yield (18.66%) of the model. The result indicated that the optimum extraction processing of chitin was feasible and practical.
Wang, Peijie; Zhao, Hui; Sun, Jianguo
2016-12-01
Interval-censored failure time data occur in many fields such as demography, economics, medical research, and reliability and many inference procedures on them have been developed (Sun, 2006; Chen, Sun, and Peace, 2012). However, most of the existing approaches assume that the mechanism that yields interval censoring is independent of the failure time of interest and it is clear that this may not be true in practice (Zhang et al., 2007; Ma, Hu, and Sun, 2015). In this article, we consider regression analysis of case K interval-censored failure time data when the censoring mechanism may be related to the failure time of interest. For the problem, an estimated sieve maximum-likelihood approach is proposed for the data arising from the proportional hazards frailty model and for estimation, a two-step procedure is presented. In the addition, the asymptotic properties of the proposed estimators of regression parameters are established and an extensive simulation study suggests that the method works well. Finally, we apply the method to a set of real interval-censored data that motivated this study. © 2016, The International Biometric Society.
Zhu, Xiang; Stephens, Matthew
2017-01-01
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241
Li, Xiujin; Lund, Mogens Sandø; Janss, Luc; Wang, Chonglong; Ding, Xiangdong; Zhang, Qin; Su, Guosheng
2017-03-15
With the development of SNP chips, SNP information provides an efficient approach to further disentangle different patterns of genomic variances and covariances across the genome for traits of interest. Due to the interaction between genotype and environment as well as possible differences in genetic background, it is reasonable to treat the performances of a biological trait in different populations as different but genetic correlated traits. In the present study, we performed an investigation on the patterns of region-specific genomic variances, covariances and correlations between Chinese and Nordic Holstein populations for three milk production traits. Variances and covariances between Chinese and Nordic Holstein populations were estimated for genomic regions at three different levels of genome region (all SNP as one region, each chromosome as one region and every 100 SNP as one region) using a novel multi-trait random regression model which uses latent variables to model heterogeneous variance and covariance. In the scenario of the whole genome as one region, the genomic variances, covariances and correlations obtained from the new multi-trait Bayesian method were comparable to those obtained from a multi-trait GBLUP for all the three milk production traits. In the scenario of each chromosome as one region, BTA 14 and BTA 5 accounted for very large genomic variance, covariance and correlation for milk yield and fat yield, whereas no specific chromosome showed very large genomic variance, covariance and correlation for protein yield. In the scenario of every 100 SNP as one region, most regions explained <0.50% of genomic variance and covariance for milk yield and fat yield, and explained <0.30% for protein yield, while some regions could present large variance and covariance. Although overall correlations between two populations for the three traits were positive and high, a few regions still showed weakly positive or highly negative genomic correlations for milk yield and fat yield. The new multi-trait Bayesian method using latent variables to model heterogeneous variance and covariance could work well for estimating the genomic variances and covariances for all genome regions simultaneously. Those estimated genomic parameters could be useful to improve the genomic prediction accuracy for Chinese and Nordic Holstein populations using a joint reference data in the future.
LAS bioconcentration is isomer specific
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tolls, J.; Haller, M.; Graaf, I. de
1995-12-31
The authors measured parent compound specific bioconcentration data for linear alkylbenzene sulfonates in Pimephales promelas. They did so by using cold, custom synthesized sulfophenyl alkanes. They observed that, within homologous series of isomers, the uptake rate constants (k{sub 1}) and the bioconcentration factor (BCF) increase with increasing number of carbon atoms in the alkyl chain (n{sub C-atoms}). In contrast, the elimination rate constant k{sub 2} appears to be independent of the alkyl chain length. Regressions of log BCF vs n{sub C-atoms} yielded different slopes for the homologous groups of the 5- and the 2-sulfophenyl alkane isomers. Regression of all logmore » BCF-data vs log 1/CMC yielded a good description of the data. However, when regressing the data for both homologous series separately again very different slopes are obtained. The results therefore indicate that hydrophobicity-bioconcentration relationships may be different for different homologous groups of sulfophenyl alkanes.« less
Pearce, B.D.; Grove, J.; Bonney, E.A.; Bliwise, N.; Dudley, D.J.; Schendel, D.E.; Thorsen, P.
2010-01-01
Background/Aims To examine the relationship of biological mediators (cytokines, stress hormones), psychosocial, obstetric history, and demographic factors in the early prediction of preterm birth (PTB) using a comprehensive logistic regression model incorporating diverse risk factors. Methods In this prospective case-control study, maternal serum biomarkers were quantified at 9–23 weeks’ gestation in 60 women delivering at <37 weeks compared to 123 women delivering at term. Biomarker data were combined with maternal sociodemographic factors and stress data into regression models encompassing 22 preterm risk factors and 1st-order interactions. Results Among individual biomarkers, we found that macrophage migration inhibitory factor (MIF), interleukin-10, C-reactive protein (CRP), and tumor necrosis factor-α were statistically significant predictors of PTB at all cutoff levels tested (75th, 85th, and 90th percentiles). We fit multifactor models for PTB prediction at each biomarker cutoff. Our best models revealed that MIF, CRP, risk-taking behavior, and low educational attainment were consistent predictors of PTB at all biomarker cutoffs. The 75th percentile cutoff yielded the best predicting model with an area under the ROC curve of 0.808 (95% CI 0.743–0.874). Conclusion Our comprehensive models highlight the prominence of behavioral risk factors for PTB and point to MIF as a possible psychobiological mediator. PMID:20160447
Does waist circumference uncorrelated with BMI add valuable information?
Ngueta, Gerard; Laouan-Sidi, Elhadji A; Lucas, Michel
2014-09-01
Estimation of relative contribution of Body Mass Index (BMI) and waist circumference (WC) on health outcomes requires a regression model that includes both obesity metrics. But, multicollinearity could yield biased estimates. To address the multicollinearity issue between BMI and WC, we used the residual model approach. The standard WC (Y-axis) was regressed on the BMI (X-axis) to obtain residual WC. Data from two adult population surveys (Nunavik Inuit and James Bay Cree) were analysed to evaluate relative effect of BMI and WC on four cardiometabolic risk factors: insulin, triglycerides, systolic blood pressure and high-density lipoprotein levels. In multivariate models, standard WC and BMI were significantly associated with cardiometabolic outcomes. Residual WC was not linked with any outcomes. The BMI effect was weakened by including standard WC in the model, but its effect remained unchanged if residual WC was considered. The strong correlation between standard WC and BMI does not allow assessment of their relative contributions to health in the same model without a risk of making erroneous estimations. By contrast with BMI, fat distribution (residual WC) does not add valuable information to a model that already contains overall adiposity (BMI) in Inuit and Cree. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Pearce, B D; Grove, J; Bonney, E A; Bliwise, N; Dudley, D J; Schendel, D E; Thorsen, P
2010-01-01
To examine the relationship of biological mediators (cytokines, stress hormones), psychosocial, obstetric history, and demographic factors in the early prediction of preterm birth (PTB) using a comprehensive logistic regression model incorporating diverse risk factors. In this prospective case-control study, maternal serum biomarkers were quantified at 9-23 weeks' gestation in 60 women delivering at <37 weeks compared to 123 women delivering at term. Biomarker data were combined with maternal sociodemographic factors and stress data into regression models encompassing 22 preterm risk factors and 1st-order interactions. Among individual biomarkers, we found that macrophage migration inhibitory factor (MIF), interleukin-10, C-reactive protein (CRP), and tumor necrosis factor-alpha were statistically significant predictors of PTB at all cutoff levels tested (75th, 85th, and 90th percentiles). We fit multifactor models for PTB prediction at each biomarker cutoff. Our best models revealed that MIF, CRP, risk-taking behavior, and low educational attainment were consistent predictors of PTB at all biomarker cutoffs. The 75th percentile cutoff yielded the best predicting model with an area under the ROC curve of 0.808 (95% CI 0.743-0.874). Our comprehensive models highlight the prominence of behavioral risk factors for PTB and point to MIF as a possible psychobiological mediator. Copyright (c) 2010 S. Karger AG, Basel.
Volkova, Svitlana; Ayton, Ellyn; Porterfield, Katherine; ...
2017-12-15
This work is the first to take advantage of recurrent neural networks to predict influenza-like-illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data [1, 2] and the state-of-the-art machine learning models [3, 4], we build and evaluate the predictive power of Long Short Term Memory (LSTMs) architectures capable of nowcasting (predicting in \\real-time") and forecasting (predicting the future) ILI dynamics in the 2011 { 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, stylistic and syntactic patterns,more » emotions and opinions, and communication behavior. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks. Finally, we combine ILI and social media signals to build joint neural network models for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance [1], specifically for military rather than general populations [3] in 26 U.S. and six international locations. Our approach demonstrates several advantages: (a) Neural network models learned from social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than syntactic and stylistic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Volkova, Svitlana; Ayton, Ellyn; Porterfield, Katherine
This work is the first to take advantage of recurrent neural networks to predict influenza-like-illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data [1, 2] and the state-of-the-art machine learning models [3, 4], we build and evaluate the predictive power of Long Short Term Memory (LSTMs) architectures capable of nowcasting (predicting in \\real-time") and forecasting (predicting the future) ILI dynamics in the 2011 { 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, stylistic and syntactic patterns,more » emotions and opinions, and communication behavior. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks. Finally, we combine ILI and social media signals to build joint neural network models for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance [1], specifically for military rather than general populations [3] in 26 U.S. and six international locations. Our approach demonstrates several advantages: (a) Neural network models learned from social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than syntactic and stylistic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns.« less
SPARROW models used to understand nutrient sources in the Mississippi/Atchafalaya River Basin
Robertson, Dale M.; Saad, David A.
2013-01-01
Nitrogen (N) and phosphorus (P) loading from the Mississippi/Atchafalaya River Basin (MARB) has been linked to hypoxia in the Gulf of Mexico. To describe where and from what sources those loads originate, SPAtially Referenced Regression On Watershed attributes (SPARROW) models were constructed for the MARB using geospatial datasets for 2002, including inputs from wastewater treatment plants (WWTPs), and calibration sites throughout the MARB. Previous studies found that highest N and P yields were from the north-central part of the MARB (Corn Belt). Based on the MARB SPARROW models, highest N yields were still from the Corn Belt but centered over Iowa and Indiana, and highest P yields were widely distributed throughout the center of the MARB. Similar to that found in other studies, agricultural inputs were found to be the largest N and P sources throughout most of the MARB: farm fertilizers were the largest N source, whereas farm fertilizers, manure, and urban inputs were dominant P sources. The MARB models enable individual N and P sources to be defined at scales ranging from SPARROW catchments (∼50 km2) to the entire area of the MARB. Inputs of P from WWTPs and urban areas were more important than found in most other studies. Information from this study will help to reduce nutrient loading from the MARB by providing managers with a description of where each of the sources of N and P are most important, thus providing a basis for prioritizing management actions and ultimately reducing the extent of Gulf hypoxia.
Samad, Manar D; Ulloa, Alvaro; Wehner, Gregory J; Jing, Linyuan; Hartzel, Dustin; Good, Christopher W; Williams, Brent A; Haggerty, Christopher M; Fornwalt, Brandon K
2018-06-09
The goal of this study was to use machine learning to more accurately predict survival after echocardiography. Predicting patient outcomes (e.g., survival) following echocardiography is primarily based on ejection fraction (EF) and comorbidities. However, there may be significant predictive information within additional echocardiography-derived measurements combined with clinical electronic health record data. Mortality was studied in 171,510 unselected patients who underwent 331,317 echocardiograms in a large regional health system. We investigated the predictive performance of nonlinear machine learning models compared with that of linear logistic regression models using 3 different inputs: 1) clinical variables, including 90 cardiovascular-relevant International Classification of Diseases, Tenth Revision, codes, and age, sex, height, weight, heart rate, blood pressures, low-density lipoprotein, high-density lipoprotein, and smoking; 2) clinical variables plus physician-reported EF; and 3) clinical variables and EF, plus 57 additional echocardiographic measurements. Missing data were imputed with a multivariate imputation by using a chained equations algorithm (MICE). We compared models versus each other and baseline clinical scoring systems by using a mean area under the curve (AUC) over 10 cross-validation folds and across 10 survival durations (6 to 60 months). Machine learning models achieved significantly higher prediction accuracy (all AUC >0.82) over common clinical risk scores (AUC = 0.61 to 0.79), with the nonlinear random forest models outperforming logistic regression (p < 0.01). The random forest model including all echocardiographic measurements yielded the highest prediction accuracy (p < 0.01 across all models and survival durations). Only 10 variables were needed to achieve 96% of the maximum prediction accuracy, with 6 of these variables being derived from echocardiography. Tricuspid regurgitation velocity was more predictive of survival than LVEF. In a subset of studies with complete data for the top 10 variables, multivariate imputation by chained equations yielded slightly reduced predictive accuracies (difference in AUC of 0.003) compared with the original data. Machine learning can fully utilize large combinations of disparate input variables to predict survival after echocardiography with superior accuracy. Copyright © 2018 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Satellite-based studies of maize yield spatial variations and their causes in China
NASA Astrophysics Data System (ADS)
Zhao, Y.
2013-12-01
Maize production in China has been expanding significantly in the past two decades, but yield has become relatively stagnant in the past few years, and needs to be improved to meet increasing demand. Multiple studies found that the gap between potential and actual yield of maize is as large as 40% to 60% of yield potential. Although a few major causes of yield gap have been qualitatively identified with surveys, there has not been spatial analysis aimed at quantifying relative importance of specific biophysical and socio-economic causes, information which would be useful for targeting interventions. This study analyzes the causes of yield variation at field and village level in Quzhou county of North China Plain (NCP). We combine remote sensing and crop modeling to estimate yields in 2009-2012, and identify fields that are consistently high or low yielding. To establish the relationship between yield and potential factors, we gather data on those factors through a household survey. We select targeted survey fields such that not only both extremes of yield distribution but also all soil texture categories in the county is covered. Our survey assesses management and biophysical factors as well as social factors such as farmers' access to agronomic knowledge, which is approximated by distance to the closest demonstration plot or 'Science and technology backyard'. Our survey covers 10 townships, 53 villages and 180 fields. Three to ten farmers are surveyed depending on the amount of variation present among sub pixels of each field. According to survey results, we extract the amount of variation within as well as between villages and or soil type. The higher within village or within field variation, the higher importance of management factors. Factors such as soil type and access to knowledge are more represented by between village variation. Through regression and analysis of variance, we gain more quantitative and thorough understanding of causes to yield variation at village scale, which further explains the gap between average and highest achieved yield.
Robust regression for large-scale neuroimaging studies.
Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand
2015-05-01
Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.
de Souza Araújo, E; Pimenta, A S; Feijó, F M C; Castro, R V O; Fasciotti, M; Monteiro, T V C; de Lima, K M G
2018-01-01
This work aimed to evaluate the antibacterial and antifungal activities of two types of pyroligneous acid (PA) obtained from slow pyrolysis of wood of Mimosa tenuiflora and of a hybrid of Eucalyptus urophylla × Eucalyptus grandis. Wood wedges were carbonized on a heating rate of 1·25°C min -1 until 450°C. Pyrolysis smoke was trapped and condensed to yield liquid products. Crude pyrolysis liquids were bidistilled under 5 mmHg vacuum yielding purified PA. Multi-antibiotic-resistant strains of Escherichia coli, Pseudomonas aeruginosa (ATCC 27853) and Staphylococcus aureus (ATCC 25923) had their sensitivity to PA evaluated using agar diffusion test. Two yeasts were evaluated as well, Candida albicans (ATCC 10231) and Cryptococcus neoformans. GC-MS analysis of both PAs was carried out to obtain their chemical composition. Regression analysis was performed, and models were adjusted, with diameter of inhibition halos and PA concentration (100, 50 and 20%) as parameters. Identity of regression models and equality of parameters in polynomial orthogonal equations were verified. Inhibition halos were observed in the range 15-25 mm of diameter. All micro-organisms were inhibited by both types of PA even in the lowest concentration of 20%. The feasibility of the usage of PAs produced with wood species planted in large scale in Brazil was evident and the real potential as a basis to produce natural antibacterial and antifungal agents, with real possibility to be used in veterinary and zootechnical applications. © 2017 The Society for Applied Microbiology.
Predicting arsenic in drinking water wells of the Central Valley, California
Ayotte, Joseph; Nolan, Bernard T.; Gronberg, JoAnn M.
2016-01-01
Probabilities of arsenic in groundwater at depths used for domestic and public supply in the Central Valley of California are predicted using weak-learner ensemble models (boosted regression trees, BRT) and more traditional linear models (logistic regression, LR). Both methods captured major processes that affect arsenic concentrations, such as the chemical evolution of groundwater, redox differences, and the influence of aquifer geochemistry. Inferred flow-path length was the most important variable but near-surface-aquifer geochemical data also were significant. A unique feature of this study was that previously predicted nitrate concentrations in three dimensions were themselves predictive of arsenic and indicated an important redox effect at >10 μg/L, indicating low arsenic where nitrate was high. Additionally, a variable representing three-dimensional aquifer texture from the Central Valley Hydrologic Model was an important predictor, indicating high arsenic associated with fine-grained aquifer sediment. BRT outperformed LR at the 5 μg/L threshold in all five predictive performance measures and at 10 μg/L in four out of five measures. BRT yielded higher prediction sensitivity (39%) than LR (18%) at the 10 μg/L threshold–a useful outcome because a major objective of the modeling was to improve our ability to predict high arsenic areas.
Person, M.; Konikow, Leonard F.
1986-01-01
A solute-transport model of an irrigated stream-aquifer system was recalibrated because of discrepancies between prior predictions of ground-water salinity trends during 1971-1982 and the observed outcome in February 1982. The original model was calibrated with a 1-year record of data collected during 1971-1972 in an 18-km reach of the Arkansas River Valley in southeastern Colorado. The model is improved by incorporating additional hydrologic processes (salt transport through the unsaturated zone) and through reexamination of the reliability of some input data (regression relationship used to estimate salinity from specific conductance data). Extended simulations using the recalibrated model are made to investigate the usefulness of the model for predicting long-term trends of salinity and water levels within the study area. Predicted ground-water levels during 1971-1982 are in good agreement with the observed, indicating that the original 1971-1972 study period was sufficient to calibrate the flow model. However, long-term simulations using the recalibrated model based on recycling the 1971-1972 data alone yield an average ground-water salinity for 1982 that is too low by about 10%. Simulations that incorporate observed surface-water salinity variations yield better results, in that the calculated average ground-water salinity for 1982 is within 3% of the observed value. Statistical analysis of temporal salinity variations of the applied surface water indicates that at least a 4-year sampling period is needed to accurately calibrate the transport model. ?? 1986.
Kelly, Valerie J.; Hooper, Richard P.; Aulenbach, Brent T.; Janet, Mary
2001-01-01
This report contains concentrations and annual mass fluxes (loadings) for a broad range of water-quality constituents measured during 1996-2000 as part of the U.S. Geological Survey National Stream Quality Accounting Network (NASQAN). During this period, NASQAN operated a network of 40-42 stations in four of the largest river basins of the USA: the Colorado, the Columbia, the Mississippi (including the Missouri and Ohio), and the Rio Grande. The report contains surface-water quality data, streamflow data, field measurements (e.g. water temperature and pH), sediment-chemistry data, and quality-assurance data; interpretive products include annual and average loads, regression parameters for models used to estimate loads, sub-basin yield maps, maps depicting percent detections for censored constituents, and diagrams depicting flow-weighted average concentrations. Where possible, a regression model relating concentration to discharge and season was used for flux estimation. The interpretive context provided by annual loads includes identifying source and sink areas for constituents and estimating the loadings to receiving waters, such as reservoirs or the ocean.
Regression Discontinuity Design in Gifted and Talented Education Research
ERIC Educational Resources Information Center
Matthews, Michael S.; Peters, Scott J.; Housand, Angela M.
2012-01-01
This Methodological Brief introduces the reader to the regression discontinuity design (RDD), which is a method that when used correctly can yield estimates of research treatment effects that are equivalent to those obtained through randomized control trials and can therefore be used to infer causality. However, RDD does not require the random…
7 CFR 275.23 - Determination of State agency program performance.
Code of Federal Regulations, 2011 CFR
2011-01-01
... NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING... section, the adjusted regressed payment error rate shall be calculated to yield the State agency's payment error rate. The adjusted regressed payment error rate is given by r 1″ + r 2″. (ii) If FNS determines...
NASA Astrophysics Data System (ADS)
Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui
2015-12-01
This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.
Welch, Jarrod R.; Vincent, Jeffrey R.; Auffhammer, Maximilian; Moya, Piedad F.; Dobermann, Achim; Dawe, David
2010-01-01
Data from farmer-managed fields have not been used previously to disentangle the impacts of daily minimum and maximum temperatures and solar radiation on rice yields in tropical/subtropical Asia. We used a multiple regression model to analyze data from 227 intensively managed irrigated rice farms in six important rice-producing countries. The farm-level detail, observed over multiple growing seasons, enabled us to construct farm-specific weather variables, control for unobserved factors that either were unique to each farm but did not vary over time or were common to all farms at a given site but varied by season and year, and obtain more precise estimates by including farm- and site-specific economic variables. Temperature and radiation had statistically significant impacts during both the vegetative and ripening phases of the rice plant. Higher minimum temperature reduced yield, whereas higher maximum temperature raised it; radiation impact varied by growth phase. Combined, these effects imply that yield at most sites would have grown more rapidly during the high-yielding season but less rapidly during the low-yielding season if observed temperature and radiation trends at the end of the 20th century had not occurred, with temperature trends being more influential. Looking ahead, they imply a net negative impact on yield from moderate warming in coming decades. Beyond that, the impact would likely become more negative, because prior research indicates that the impact of maximum temperature becomes negative at higher levels. Diurnal temperature variation must be considered when investigating the impacts of climate change on irrigated rice in Asia. PMID:20696908
Why bother with testing? The validity of immigrants' self-assessed language proficiency.
Edele, Aileen; Seuring, Julian; Kristen, Cornelia; Stanat, Petra
2015-07-01
Due to its central role in social integration, immigrants' language proficiency is a matter of considerable societal concern and scientific interest. This study examines whether commonly applied self-assessments of linguistic skills yield results that are similar to those of competence tests and thus whether these self-assessments are valid measures of language proficiency. Analyses of data for immigrant youth reveal moderate correlations between language test scores and two types of self-assessments (general ability estimates and concrete performance estimates) for the participants' first and second languages. More importantly, multiple regression models using self-assessments and models using test scores yield different results. This finding holds true for a variety of analyses and for both types of self-assessments. Our findings further suggest that self-assessed language skills are systematically biased in certain groups. Subjective measures thus seem to be inadequate estimates of language skills, and future research should use them with caution when research questions pertain to actual language skills rather than self-perceptions. Copyright © 2015 Elsevier Inc. All rights reserved.
Chinese time trade-off values for EQ-5D health states.
Liu, Gordon G; Wu, Hongyan; Li, Minghui; Gao, Chen; Luo, Nan
2014-07-01
To generate a Chinese general population-based three-level EuroQol five-dimensios (EQ-5D-3L) social value set using the time trade-off method. The study sample was drawn from five cities in China: Beijing, Guangzhou, Shenyang, Chengdu, and Nanjing, using a quota sampling method. Utility values for a subset of 97 health states defined by the EQ-5D-3L descriptive system were directly elicited from the study sample using a modified Measurement and Valuation of Health protocol, with each respondent valuing 13 of the health states. The utility values for all 243 EQ-5D-3L health states were estimated on the basis of econometric models at both individual and aggregate levels. Various linear regression models using different model specifications were examined to determine the best model using predefined model selection criteria. The N3 model based on ordinary least square regression at the aggregate level yielded the best model fit, with a mean absolute error of 0.020, 7 and 0 states for which prediction errors were greater than 0.05 and 0.10, respectively, in absolute magnitude. This model passed tests for model misspecification (F = 2.7; P = 0.0509, Ramsey Regression Equation Specification Error Test), heteroskedasticity (χ(2) = 0.97; P = 0.3254, Breusch-Pagan/Cook-Weisberg test), and normality of the residuals (χ(2) = 1.285; P = 0.5259, Jarque-Bera test). The range of the predicted values (-0.149 to 0.887) was similar to those estimated in other countries. The study successfully developed Chinese utility values for EQ-5D-3L health states using the time trade-off method. It is the first attempt ever to develop a standardized instrument for quantifying quality-adjusted life-years in China. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Rice production model based on the concept of ecological footprint
NASA Astrophysics Data System (ADS)
Faiz, S. A.; Wicaksono, A. D.; Dinanti, D.
2017-06-01
Pursuant to what had been stated in Region Spatial Planning (RTRW) of Malang Regency for period 2010-2030, Malang Regency was considered as the center of agricultural development, including districts bordered with Malang City. To protect the region functioning as the provider of rice production, then the policy of sustainable food farming-land (LP2B) was made which its implementation aims to protect rice-land. In the existing condition, LP2B system was not maximally executed, and it caused a limited extend of rice-land to deliver rice production output. One cause related with the development of settlements and industries due to the effect of Malang City that converted land-function. Location of research focused on 30 villages with direct border with Malang City. Review was conducted to develop a model of relation between farming production output and ecological footprint variables. These variables include rice-land area (X1), built land percentage (X2), and number of farmers (X3). Analysis technique was regression. Result of regression indicated that the model of rice production output Y=-207,983 + 10.246X1. Rice-land area (X1) was the most influential independent variable. It was concluded that of villages directly bordered with Malang City, there were 11 villages with higher production potential because their rice production yield was more than 1,000 tons/year, while 12 villages were threatened with low production output because its rice production yield only attained 500 tons/year. Based on the model and the spatial direction of RTRW, it can be said that the direction for the farming development policy must be redesigned to maintain rice-land area on the regions on which agricultural activity was still dominant. Because rice-land area was the most influential factor to farming production. Therefore, the wider the rice-land is, the higher rice production output is on each village.
Automatically rating trainee skill at a pediatric laparoscopic suturing task.
Oquendo, Yousi A; Riddle, Elijah W; Hiller, Dennis; Blinman, Thane A; Kuchenbecker, Katherine J
2018-04-01
Minimally invasive surgeons must acquire complex technical skills while minimizing patient risk, a challenge that is magnified in pediatric surgery. Trainees need realistic practice with frequent detailed feedback, but human grading is tedious and subjective. We aim to validate a novel motion-tracking system and algorithms that automatically evaluate trainee performance of a pediatric laparoscopic suturing task. Subjects (n = 32) ranging from medical students to fellows performed two trials of intracorporeal suturing in a custom pediatric laparoscopic box trainer after watching a video of ideal performance. The motions of the tools and endoscope were recorded over time using a magnetic sensing system, and both tool grip angles were recorded using handle-mounted flex sensors. An expert rated the 63 trial videos on five domains from the Objective Structured Assessment of Technical Skill (OSATS), yielding summed scores from 5 to 20. Motion data from each trial were processed to calculate 280 features. We used regularized least squares regression to identify the most predictive features from different subsets of the motion data and then built six regression tree models that predict summed OSATS score. Model accuracy was evaluated via leave-one-subject-out cross-validation. The model that used all sensor data streams performed best, achieving 71% accuracy at predicting summed scores within 2 points, 89% accuracy within 4, and a correlation of 0.85 with human ratings. 59% of the rounded average OSATS score predictions were perfect, and 100% were within 1 point. This model employed 87 features, including none based on completion time, 77 from tool tip motion, 3 from tool tip visibility, and 7 from grip angle. Our novel hardware and software automatically rated previously unseen trials with summed OSATS scores that closely match human expert ratings. Such a system facilitates more feedback-intensive surgical training and may yield insights into the fundamental components of surgical skill.
Estimating milk yield and value losses from increased somatic cell count on US dairy farms.
Hadrich, J C; Wolf, C A; Lombard, J; Dolak, T M
2018-04-01
Milk loss due to increased somatic cell counts (SCC) results in economic losses for dairy producers. This research uses 10 mo of consecutive dairy herd improvement data from 2013 and 2014 to estimate milk yield loss using SCC as a proxy for clinical and subclinical mastitis. A fixed effects regression was used to examine factors that affected milk yield while controlling for herd-level management. Breed, milking frequency, days in milk, seasonality, SCC, cumulative months with SCC greater than 100,000 cells/mL, lactation, and herd size were variables included in the regression analysis. The cumulative months with SCC above a threshold was included as a proxy for chronic mastitis. Milk yield loss increased as the number of test days with SCC ≥100,000 cells/mL increased. Results from the regression were used to estimate a monetary value of milk loss related to SCC as a function of cow and operation related explanatory variables for a representative dairy cow. The largest losses occurred from increased cumulative test days with a SCC ≥100,000 cells/mL, with daily losses of $1.20/cow per day in the first month to $2.06/cow per day in mo 10. Results demonstrate the importance of including the duration of months above a threshold SCC when estimating milk yield losses. Cows with chronic mastitis, measured by increased consecutive test days with SCC ≥100,000 cells/mL, resulted in higher milk losses than cows with a new infection. This provides farm managers with a method to evaluate the trade-off between treatment and culling decisions as it relates to mastitis control and early detection. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Oliver, A; Mendizabal, J A; Ripoll, G; Albertí, P; Purroy, A
2010-04-01
The SEUROP system is currently in use for carcass classification in Europe. Image analysis and other new technologies are being developed to enhance and supplement this classification system. After slaughtering, 91 carcasses of local Spanish beef breeds were weighed and classified according to the SEUROP system. Two digital photographs (a side and a dorsal view) were taken of the left carcass sides, and a total of 33 morphometric measurements (lengths, perimeters, areas) were made. Commercial butchering of these carcasses took place 24 h postmortem, and the different cuts were grouped according to four commercial meat cut quality categories: extra, first, second, and third. Multiple regression analysis of carcass weight and the SEUROP conformation score (x variables) on meat yield and the four commercial cut quality category yields (y variables) was performed as a measure of the accuracy of the SEUROP system. Stepwise regression analysis of carcass weight and the 33 morphometric image analysis measurements (x variables) and meat yield and yields of the four commercial cut quality categories (y variables) was carried out. Higher accuracy was achieved using image analysis than using only the current SEUROP conformation score. The regression coefficient values were between R(2)=0.66 and R(2)=0.93 (P<0.001) for the SEUROP system and between R(2)=0.81 and R(2)=0.94 (P<0.001) for the image analysis method. These results suggest that the image analysis method should be helpful as a means of supplementing and enhancing the SEUROP system for grading beef carcasses. 2009 Elsevier Ltd. All rights reserved.
Petersen, Nanna; Stocks, Stuart; Gernaey, Krist V
2008-05-01
The main purpose of this article is to demonstrate that principal component analysis (PCA) and partial least squares regression (PLSR) can be used to extract information from particle size distribution data and predict rheological properties. Samples from commercially relevant Aspergillus oryzae fermentations conducted in 550 L pilot scale tanks were characterized with respect to particle size distribution, biomass concentration, and rheological properties. The rheological properties were described using the Herschel-Bulkley model. Estimation of all three parameters in the Herschel-Bulkley model (yield stress (tau(y)), consistency index (K), and flow behavior index (n)) resulted in a large standard deviation of the parameter estimates. The flow behavior index was not found to be correlated with any of the other measured variables and previous studies have suggested a constant value of the flow behavior index in filamentous fermentations. It was therefore chosen to fix this parameter to the average value thereby decreasing the standard deviation of the estimates of the remaining rheological parameters significantly. Using a PLSR model, a reasonable prediction of apparent viscosity (micro(app)), yield stress (tau(y)), and consistency index (K), could be made from the size distributions, biomass concentration, and process information. This provides a predictive method with a high predictive power for the rheology of fermentation broth, and with the advantages over previous models that tau(y) and K can be predicted as well as micro(app). Validation on an independent test set yielded a root mean square error of 1.21 Pa for tau(y), 0.209 Pa s(n) for K, and 0.0288 Pa s for micro(app), corresponding to R(2) = 0.95, R(2) = 0.94, and R(2) = 0.95 respectively. Copyright 2007 Wiley Periodicals, Inc.
Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold
2015-03-01
A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.
Spahr, Norman E.; Mueller, David K.; Wolock, David M.; Hitt, Kerie J.; Gronberg, JoAnn M.
2010-01-01
Data collected for the U.S. Geological Survey National Water-Quality Assessment program from 1992-2001 were used to investigate the relations between nutrient concentrations and nutrient sources, hydrology, and basin characteristics. Regression models were developed to estimate annual flow-weighted concentrations of total nitrogen and total phosphorus using explanatory variables derived from currently available national ancillary data. Different total-nitrogen regression models were used for agricultural (25 percent or more of basin area classified as agricultural land use) and nonagricultural basins. Atmospheric, fertilizer, and manure inputs of nitrogen, percent sand in soil, subsurface drainage, overland flow, mean annual precipitation, and percent undeveloped area were significant variables in the agricultural basin total nitrogen model. Significant explanatory variables in the nonagricultural total nitrogen model were total nonpoint-source nitrogen input (sum of nitrogen from manure, fertilizer, and atmospheric deposition), population density, mean annual runoff, and percent base flow. The concentrations of nutrients derived from regression (CONDOR) models were applied to drainage basins associated with the U.S. Environmental Protection Agency (USEPA) River Reach File (RF1) to predict flow-weighted mean annual total nitrogen concentrations for the conterminous United States. The majority of stream miles in the Nation have predicted concentrations less than 5 milligrams per liter. Concentrations greater than 5 milligrams per liter were predicted for a broad area extending from Ohio to eastern Nebraska, areas spatially associated with greater application of fertilizer and manure. Probabilities that mean annual total-nitrogen concentrations exceed the USEPA regional nutrient criteria were determined by incorporating model prediction uncertainty. In all nutrient regions where criteria have been established, there is at least a 50 percent probability of exceeding the criteria in more than half of the stream miles. Dividing calibration sites into agricultural and nonagricultural groups did not improve the explanatory capability for total phosphorus models. The group of explanatory variables that yielded the lowest model error for mean annual total phosphorus concentrations includes phosphorus input from manure, population density, amounts of range land and forest land, percent sand in soil, and percent base flow. However, the large unexplained variability and associated model error precluded the use of the total phosphorus model for nationwide extrapolations.
NASA Technical Reports Server (NTRS)
Leduc, S. (Principal Investigator)
1982-01-01
Models based on multiple regression were developed to estimate corn and soybean yield from weather data for agrophysical units (APU) in Iowa. The predictor variables are derived from monthly average temperature and monthly total precipitation data at meteorological stations in the cooperative network. The models are similar in form to the previous models developed for crop reporting districts (CRD). The trends and derived variables were the same and the approach to select the significant predictors was similar to that used in developing the CRD models. The APU's were selected to be more homogeneous with respect crop to production than the CRDs. The APU models are quite similar to the CRD models, similar explained variation and number of predictor variables. The APU models are to be independently evaluated and compared to the previously evaluated CRD models. That comparison should indicate the preferred model area for this application, i.e., APU or CRD.
Hussain, Saddam; Khaliq, Abdul; Matloob, Amar; Fahad, Shah; Tanveer, Asif
2015-01-01
Little seed canary grass (LCG) is a pernicious weed of wheat crop causing enormous yield losses. Information on the interference and economic threshold (ET) level of LCG is of prime significance to rationalize the use of herbicide for its effective management in wheat fields. The present study was conducted to quantify interference and ET density of LCG in mid-sown (20 November) and late-sown (10 December) wheat. Experiment was triplicated in randomized split-plot design with sowing dates as the main plots and LCG densities (10, 20, 30, and 40 plants m(-2)) as the subplots. Plots with two natural infestations of weeds including and excluding LCG were maintained for comparing its interference in pure stands with designated densities. A season-long weed-free treatment was also run. Results indicated that composite stand of weeds, including LCG, and density of 40 LCG plants m(-2) were more competitive with wheat, especially when crop was sown late in season. Maximum weed dry biomass was attained by composite stand of weeds including LCG followed by 40 LCG plants m(-2) under both sowing dates. Significant variations in wheat growth and yield were observed under the influence of different LCG densities as well as sowing dates. Presence of 40 LCG plants m(-2) reduced wheat yield by 28 and 34% in mid- and late-sown wheat crop, respectively. These losses were much greater than those for infestation of all weeds, excluding LCG. Linear regression model was effective in simulating wheat yield losses over a wide range of LCG densities, and the regression equations showed good fit to observed data. The ET levels of LCG were 6-7 and 2.2-3.3 plants m(-2) in mid- and late-sown wheat crop, respectively. Herbicide should be applied in cases when LCG density exceeds these levels under respective sowing dates.
Characterizing the performance of the Conway-Maxwell Poisson generalized linear model.
Francis, Royce A; Geedipally, Srinivas Reddy; Guikema, Seth D; Dhavala, Soma Sekhar; Lord, Dominique; LaRocca, Sarah
2012-01-01
Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression. © 2011 Society for Risk Analysis.
Villamor, Grace B.; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi
2018-01-01
Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield. PMID:29346406
Aleza, Koutchoukalo; Villamor, Grace B; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi
2018-01-01
Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield.
Moore, Richard Bridge; Johnston, Craig M.; Robinson, Keith W.; Deacon, Jeffrey R.
2004-01-01
The U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency (USEPA) and the New England Interstate Water Pollution Control Commission (NEIWPCC), has developed a water-quality model, called SPARROW (Spatially Referenced Regressions on Watershed Attributes), to assist in regional total maximum daily load (TMDL) and nutrient-criteria activities in New England. SPARROW is a spatially detailed, statistical model that uses regression equations to relate total nitrogen and phosphorus (nutrient) stream loads to nutrient sources and watershed characteristics. The statistical relations in these equations are then used to predict nutrient loads in unmonitored streams. The New England SPARROW models are built using a hydrologic network of 42,000 stream reaches and associated watersheds. Watershed boundaries are defined for each stream reach in the network through the use of a digital elevation model and existing digitized watershed divides. Nutrient source data is from permitted wastewater discharge data from USEPA's Permit Compliance System (PCS), various land-use sources, and atmospheric deposition. Physical watershed characteristics include drainage area, land use, streamflow, time-of-travel, stream density, percent wetlands, slope of the land surface, and soil permeability. The New England SPARROW models for total nitrogen and total phosphorus have R-squared values of 0.95 and 0.94, with mean square errors of 0.16 and 0.23, respectively. Variables that were statistically significant in the total nitrogen model include permitted municipal-wastewater discharges, atmospheric deposition, agricultural area, and developed land area. Total nitrogen stream-loss rates were significant only in streams with average annual flows less than or equal to 2.83 cubic meters per second. In streams larger than this, there is nondetectable in-stream loss of annual total nitrogen in New England. Variables that were statistically significant in the total phosphorus model include discharges for municipal wastewater-treatment facilities and pulp and paper facilities, developed land area, agricultural area, and forested area. For total phosphorus, loss rates were significant for reservoirs with surface areas of 10 square kilometers or less, and in streams with flows less than or equal to 2.83 cubic meters per second. Applications of SPARROW for evaluating nutrient loading in New England waters include estimates of the spatial distributions of total nitrogen and phosphorus yields, sources of the nutrients, and the potential for delivery of those yields to receiving waters. This information can be used to (1) predict ranges in nutrient levels in surface waters, (2) identify the environmental variables that are statistically significant predictors of nutrient levels in streams, (3) evaluate monitoring efforts for better determination of nutrient loads, and (4) evaluate management options for reducing nutrient loads to achieve water-quality goals.
NASA Astrophysics Data System (ADS)
Beguería, S.
2017-12-01
While large efforts are devoted to developing crop status monitoring and yield forecasting systems trough the use of Earth observation data (mostly remotely sensed satellite imagery) and observational and modeled weather data, here we focus on the information value of qualitative data on crop status from direct observations made by humans. This kind of data has a high value as it reflects the expert opinion of individuals directly involved in the development of the crop. However, they have issues that prevent their direct use in crop monitoring and yield forecasting systems, such as their non-spatially explicit nature, or most importantly their qualitative nature. Indeed, while the human brain is good at categorizing the status of physical systems in terms of qualitative scales (`very good', `good', `fair', etcetera), it has difficulties in quantifying it in physical units. This has prevented the incorporation of this kind of data into systems that make extensive use of numerical information. Here we show an example of using qualitative crop condition data to estimate yields of the most important crops in the US early in the season. We use USDA weekly crop condition reports, which are based on a sample of thousands of reporters including mostly farmers and people in direct contact with them. These reporters provide subjective evaluations of crop conditions, in a scale including five levels ranging from `very poor' to `excellent'. The USDA report indicates, for each state, the proportion of reporters fort each condition level. We show how is it possible to model the underlying non-observed quantitative variable that reflects the crop status on each state, and how this model is consistent across states and years. Furthermore, we show how this information can be used to monitor the status of the crops and to produce yield forecasts early in the season. Finally, we discuss approaches for blending this information source with other, more classical earth data sources such as remote sensing or weather data, in the context of hierarchical regression models.
Heimann, David C.; Rasmussen, Patrick P.; Cline, Teri L.; Pigue, Lori M.; Wagner, Holly R.
2010-01-01
Suspended-sediment data from 18 selected surface-water monitoring stations in the lower Missouri River Basin downstream from Gavins Point Dam were used in the computation of annual suspended-sediment and suspended-sand loads for 1976 through 2008. Three methods of suspended-sediment load determination were utilized and these included the subdivision method, regression of instantaneous turbidity with suspended-sediment concentrations at selected stations, and regression techniques using the Load Estimator (LOADEST) software. Characteristics of the suspended-sediment and streamflow data collected at the 18 monitoring stations and the tabulated annual suspended-sediment and suspended-sand loads and yields are presented.
Kheirabadi, Khabat; Rashidi, Amir; Alijani, Sadegh; Imumorin, Ikhide
2014-11-01
We compared the goodness of fit of three mathematical functions (including: Legendre polynomials, Lidauer-Mäntysaari function and Wilmink function) for describing the lactation curve of primiparous Iranian Holstein cows by using multiple-trait random regression models (MT-RRM). Lactational submodels provided the largest daily additive genetic (AG) and permanent environmental (PE) variance estimates at the end and at the onset of lactation, respectively, as well as low genetic correlations between peripheral test-day records. For all models, heritability estimates were highest at the end of lactation (245 to 305 days) and ranged from 0.05 to 0.26, 0.03 to 0.12 and 0.04 to 0.24 for milk, fat and protein yields, respectively. Generally, the genetic correlations between traits depend on how far apart they are or whether they are on the same day in any two traits. On average, genetic correlations between milk and fat were the lowest and those between fat and protein were intermediate, while those between milk and protein were the highest. Results from all criteria (Akaike's and Schwarz's Bayesian information criterion, and -2*logarithm of the likelihood function) suggested that a model with 2 and 5 coefficients of Legendre polynomials for AG and PE effects, respectively, was the most adequate for fitting the data. © 2014 Japanese Society of Animal Science.
Season of birth is associated with first-lactation milk yield in Holstein Friesian cattle.
Van Eetvelde, M; Kamal, M M; Vandaele, L; Opsomer, G
2017-12-01
The aim of the present research was to assess factors associated with first-lactation milk yield in dairy heifers, including maternal and environmental factors, factors related to the development of the heifer and factors related to its offspring such as gender of the calf. In addition, the potential underlying mechanism, in particular metabolic adaptations, was further explored. Data on body growth, reproduction and milk yield of 74 Holstein Friesian heifers on three herds in Flanders (Belgium) were collected. At birth, body measurements of the heifers were recorded and blood samples were taken (in order) to determine basal glucose and insulin concentrations. Body measurements were assessed every 3 months until first calving, and gender and weight of their first calf were recorded. Information on fertility and milk yield of the heifer and its dam were collected from the herd databases. Daily temperature and photoperiod were recorded from the database of the Belgian Royal Meteorological Institute. Linear mixed models were run with herd as a random factor, to account for differences in herd management. Heifers grew 867±80.7 g/day during their first year of life and were inseminated at 14.8±1.34 months. First calving took place at 24.5±1.93 months, at a weight of 642±61.5 kg and heifers produced 8506±1064 kg energy corrected milk during their first 305-day lactation. Regression models revealed that none of the maternal factors such as milk yield and parity, nor the growth of the heifer during the 1st year of life were associated with milk yield during first lactation. Age, and to a lesser extent BW at first parturition were positively associated with first-lactation milk yield. In addition, the season of birth, but not calving, had a significant influence on milk yield, with winter-born heifers producing less than heifers born in any other season. The lower yielding winter-born heifers had higher insulin concentrations at birth, whereas glucose concentrations were similar, the latter being suggestive for lower insulin sensitivity of the peripheral tissues. Furthermore, environmental temperature at the end of gestation was negatively correlated with neonatal insulin concentrations. In conclusion, results of the present study suggest heifers born during the hotter months are born with a higher peripheral insulin sensitivity, finally leading to a higher first-lactation milk yield.
Reed, Derek D; Kaplan, Brent A; Brewer, Adam T
2012-01-01
In recent years, researchers and practitioners in the behavioral sciences have profited from a growing literature on delay discounting. The purpose of this article is to provide readers with a brief tutorial on how to use Microsoft Office Excel 2010 and Excel for Mac 2011 to analyze discounting data to yield parameters for both the hyperbolic discounting model and area under the curve. This tutorial is intended to encourage the quantitative analysis of behavior in both research and applied settings by readers with relatively little formal training in nonlinear regression.