The Effects of Model Misspecification and Sample Size on LISREL Maximum Likelihood Estimates.
ERIC Educational Resources Information Center
Baldwin, Beatrice
The robustness of LISREL computer program maximum likelihood estimates under specific conditions of model misspecification and sample size was examined. The population model used in this study contains one exogenous variable; three endogenous variables; and eight indicator variables, two for each latent variable. Conditions of model…
Brandstätter, Christian; Laner, David; Prantl, Roman; Fellner, Johann
2014-12-01
Municipal solid waste landfills pose a threat on environment and human health, especially old landfills which lack facilities for collection and treatment of landfill gas and leachate. Consequently, missing information about emission flows prevent site-specific environmental risk assessments. To overcome this gap, the combination of waste sampling and analysis with statistical modeling is one option for estimating present and future emission potentials. Optimizing the tradeoff between investigation costs and reliable results requires knowledge about both: the number of samples to be taken and variables to be analyzed. This article aims to identify the optimized number of waste samples and variables in order to predict a larger set of variables. Therefore, we introduce a multivariate linear regression model and tested the applicability by usage of two case studies. Landfill A was used to set up and calibrate the model based on 50 waste samples and twelve variables. The calibrated model was applied to Landfill B including 36 waste samples and twelve variables with four predictor variables. The case study results are twofold: first, the reliable and accurate prediction of the twelve variables can be achieved with the knowledge of four predictor variables (Loi, EC, pH and Cl). For the second Landfill B, only ten full measurements would be needed for a reliable prediction of most response variables. The four predictor variables would exhibit comparably low analytical costs in comparison to the full set of measurements. This cost reduction could be used to increase the number of samples yielding an improved understanding of the spatial waste heterogeneity in landfills. Concluding, the future application of the developed model potentially improves the reliability of predicted emission potentials. The model could become a standard screening tool for old landfills if its applicability and reliability would be tested in additional case studies. Copyright © 2014 Elsevier Ltd. All rights reserved.
GY SAMPLING THEORY AND GEOSTATISTICS: ALTERNATE MODELS OF VARIABILITY IN CONTINUOUS MEDIA
In the sampling theory developed by Pierre Gy, sample variability is modeled as the sum of a set of seven discrete error components. The variogram used in geostatisties provides an alternate model in which several of Gy's error components are combined in a continuous mode...
ERIC Educational Resources Information Center
Chan, Wai
2007-01-01
In social science research, an indirect effect occurs when the influence of an antecedent variable on the effect variable is mediated by an intervening variable. To compare indirect effects within a sample or across different samples, structural equation modeling (SEM) can be used if the computer program supports model fitting with nonlinear…
NASA Astrophysics Data System (ADS)
Kumar, V.; Nayagum, D.; Thornton, S.; Banwart, S.; Schuhmacher2, M.; Lerner, D.
2006-12-01
Characterization of uncertainty associated with groundwater quality models is often of critical importance, as for example in cases where environmental models are employed in risk assessment. Insufficient data, inherent variability and estimation errors of environmental model parameters introduce uncertainty into model predictions. However, uncertainty analysis using conventional methods such as standard Monte Carlo sampling (MCS) may not be efficient, or even suitable, for complex, computationally demanding models and involving different nature of parametric variability and uncertainty. General MCS or variant of MCS such as Latin Hypercube Sampling (LHS) assumes variability and uncertainty as a single random entity and the generated samples are treated as crisp assuming vagueness as randomness. Also when the models are used as purely predictive tools, uncertainty and variability lead to the need for assessment of the plausible range of model outputs. An improved systematic variability and uncertainty analysis can provide insight into the level of confidence in model estimates, and can aid in assessing how various possible model estimates should be weighed. The present study aims to introduce, Fuzzy Latin Hypercube Sampling (FLHS), a hybrid approach of incorporating cognitive and noncognitive uncertainties. The noncognitive uncertainty such as physical randomness, statistical uncertainty due to limited information, etc can be described by its own probability density function (PDF); whereas the cognitive uncertainty such estimation error etc can be described by the membership function for its fuzziness and confidence interval by ?-cuts. An important property of this theory is its ability to merge inexact generated data of LHS approach to increase the quality of information. The FLHS technique ensures that the entire range of each variable is sampled with proper incorporation of uncertainty and variability. A fuzzified statistical summary of the model results will produce indices of sensitivity and uncertainty that relate the effects of heterogeneity and uncertainty of input variables to model predictions. The feasibility of the method is validated to assess uncertainty propagation of parameter values for estimation of the contamination level of a drinking water supply well due to transport of dissolved phenolics from a contaminated site in the UK.
Kanık, Emine Arzu; Temel, Gülhan Orekici; Erdoğan, Semra; Kaya, İrem Ersöz
2013-01-01
Objective: The aim of study is to introduce method of Soft Independent Modeling of Class Analogy (SIMCA), and to express whether the method is affected from the number of independent variables, the relationship between variables and sample size. Study Design: Simulation study. Material and Methods: SIMCA model is performed in two stages. In order to determine whether the method is influenced by the number of independent variables, the relationship between variables and sample size, simulations were done. Conditions in which sample sizes in both groups are equal, and where there are 30, 100 and 1000 samples; where the number of variables is 2, 3, 5, 10, 50 and 100; moreover where the relationship between variables are quite high, in medium level and quite low were mentioned. Results: Average classification accuracy of simulation results which were carried out 1000 times for each possible condition of trial plan were given as tables. Conclusion: It is seen that diagnostic accuracy results increase as the number of independent variables increase. SIMCA method is a method in which the relationship between variables are quite high, the number of independent variables are many in number and where there are outlier values in the data that can be used in conditions having outlier values. PMID:25207065
Kanık, Emine Arzu; Temel, Gülhan Orekici; Erdoğan, Semra; Kaya, Irem Ersöz
2013-03-01
The aim of study is to introduce method of Soft Independent Modeling of Class Analogy (SIMCA), and to express whether the method is affected from the number of independent variables, the relationship between variables and sample size. Simulation study. SIMCA model is performed in two stages. In order to determine whether the method is influenced by the number of independent variables, the relationship between variables and sample size, simulations were done. Conditions in which sample sizes in both groups are equal, and where there are 30, 100 and 1000 samples; where the number of variables is 2, 3, 5, 10, 50 and 100; moreover where the relationship between variables are quite high, in medium level and quite low were mentioned. Average classification accuracy of simulation results which were carried out 1000 times for each possible condition of trial plan were given as tables. It is seen that diagnostic accuracy results increase as the number of independent variables increase. SIMCA method is a method in which the relationship between variables are quite high, the number of independent variables are many in number and where there are outlier values in the data that can be used in conditions having outlier values.
Hao, Yong; Sun, Xu-Dong; Yang, Qiang
2012-12-01
Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.
Variable selection based cotton bollworm odor spectroscopic detection
NASA Astrophysics Data System (ADS)
Lü, Chengxu; Gai, Shasha; Luo, Min; Zhao, Bo
2016-10-01
Aiming at rapid automatic pest detection based efficient and targeting pesticide application and shooting the trouble of reflectance spectral signal covered and attenuated by the solid plant, the possibility of near infrared spectroscopy (NIRS) detection on cotton bollworm odor is studied. Three cotton bollworm odor samples and 3 blank air gas samples were prepared. Different concentrations of cotton bollworm odor were prepared by mixing the above gas samples, resulting a calibration group of 62 samples and a validation group of 31 samples. Spectral collection system includes light source, optical fiber, sample chamber, spectrometer. Spectra were pretreated by baseline correction, modeled with partial least squares (PLS), and optimized by genetic algorithm (GA) and competitive adaptive reweighted sampling (CARS). Minor counts differences are found among spectra of different cotton bollworm odor concentrations. PLS model of all the variables was built presenting RMSEV of 14 and RV2 of 0.89, its theory basis is insect volatilizes specific odor, including pheromone and allelochemics, which are used for intra-specific and inter-specific communication and could be detected by NIR spectroscopy. 28 sensitive variables are selected by GA, presenting the model performance of RMSEV of 14 and RV2 of 0.90. Comparably, 8 sensitive variables are selected by CARS, presenting the model performance of RMSEV of 13 and RV2 of 0.92. CARS model employs only 1.5% variables presenting smaller error than that of all variable. Odor gas based NIR technique shows the potential for cotton bollworm detection.
Ronald E. McRoberts; Veronica C. Lessard
2001-01-01
Uncertainty in diameter growth predictions is attributed to three general sources: measurement error or sampling variability in predictor variables, parameter covariances, and residual or unexplained variation around model expectations. Using measurement error and sampling variability distributions obtained from the literature and Monte Carlo simulation methods, the...
Junttila, Virpi; Kauranne, Tuomo; Finley, Andrew O.; Bradford, John B.
2015-01-01
Modern operational forest inventory often uses remotely sensed data that cover the whole inventory area to produce spatially explicit estimates of forest properties through statistical models. The data obtained by airborne light detection and ranging (LiDAR) correlate well with many forest inventory variables, such as the tree height, the timber volume, and the biomass. To construct an accurate model over thousands of hectares, LiDAR data must be supplemented with several hundred field sample measurements of forest inventory variables. This can be costly and time consuming. Different LiDAR-data-based and spatial-data-based sampling designs can reduce the number of field sample plots needed. However, problems arising from the features of the LiDAR data, such as a large number of predictors compared with the sample size (overfitting) or a strong correlation among predictors (multicollinearity), may decrease the accuracy and precision of the estimates and predictions. To overcome these problems, a Bayesian linear model with the singular value decomposition of predictors, combined with regularization, is proposed. The model performance in predicting different forest inventory variables is verified in ten inventory areas from two continents, where the number of field sample plots is reduced using different sampling designs. The results show that, with an appropriate field plot selection strategy and the proposed linear model, the total relative error of the predicted forest inventory variables is only 5%–15% larger using 50 field sample plots than the error of a linear model estimated with several hundred field sample plots when we sum up the error due to both the model noise variance and the model’s lack of fit.
Impact of spatial variability and sampling design on model performance
NASA Astrophysics Data System (ADS)
Schrape, Charlotte; Schneider, Anne-Kathrin; Schröder, Boris; van Schaik, Loes
2017-04-01
Many environmental physical and chemical parameters as well as species distributions display a spatial variability at different scales. In case measurements are very costly in labour time or money a choice has to be made between a high sampling resolution at small scales and a low spatial cover of the study area or a lower sampling resolution at the small scales resulting in local data uncertainties with a better spatial cover of the whole area. This dilemma is often faced in the design of field sampling campaigns for large scale studies. When the gathered field data are subsequently used for modelling purposes the choice of sampling design and resulting data quality influence the model performance criteria. We studied this influence with a virtual model study based on a large dataset of field information on spatial variation of earthworms at different scales. Therefore we built a virtual map of anecic earthworm distributions over the Weiherbach catchment (Baden-Württemberg in Germany). First of all the field scale abundance of earthworms was estimated using a catchment scale model based on 65 field measurements. Subsequently the high small scale variability was added using semi-variograms, based on five fields with a total of 430 measurements divided in a spatially nested sampling design over these fields, to estimate the nugget, range and standard deviation of measurements within the fields. With the produced maps, we performed virtual samplings of one up to 50 random points per field. We then used these data to rebuild the catchment scale models of anecic earthworm abundance with the same model parameters as in the work by Palm et al. (2013). The results of the models show clearly that a large part of the non-explained deviance of the models is due to the very high small scale variability in earthworm abundance: the models based on single virtual sampling points on average obtain an explained deviance of 0.20 and a correlation coefficient of 0.64. With increasing sampling points per field, we averaged the measured abundance of the sampling within each field to obtain a more representative value of the field average. Doubling the samplings per field strongly improved the model performance criteria (explained deviance 0.38 and correlation coefficient 0.73). With 50 sampling points per field the performance criteria were 0.91 and 0.97 respectively for explained deviance and correlation coefficient. The relationship between number of samplings and performance criteria can be described with a saturation curve. Beyond five samples per field the model improvement becomes rather small. With this contribution we wish to discuss the impact of data variability at sampling scale on model performance and the implications for sampling design and assessment of model results as well as ecological inferences.
Inventory implications of using sampling variances in estimation of growth model coefficients
Albert R. Stage; William R. Wykoff
2000-01-01
Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...
Applications of MIDAS regression in analysing trends in water quality
NASA Astrophysics Data System (ADS)
Penev, Spiridon; Leonte, Daniela; Lazarov, Zdravetz; Mann, Rob A.
2014-04-01
We discuss novel statistical methods in analysing trends in water quality. Such analysis uses complex data sets of different classes of variables, including water quality, hydrological and meteorological. We analyse the effect of rainfall and flow on trends in water quality utilising a flexible model called Mixed Data Sampling (MIDAS). This model arises because of the mixed frequency in the data collection. Typically, water quality variables are sampled fortnightly, whereas the rain data is sampled daily. The advantage of using MIDAS regression is in the flexible and parsimonious modelling of the influence of the rain and flow on trends in water quality variables. We discuss the model and its implementation on a data set from the Shoalhaven Supply System and Catchments in the state of New South Wales, Australia. Information criteria indicate that MIDAS modelling improves upon simplistic approaches that do not utilise the mixed data sampling nature of the data.
Optical EVPA rotations in blazars: testing a stochastic variability model with RoboPol data
NASA Astrophysics Data System (ADS)
Kiehlmann, S.; Blinov, D.; Pearson, T. J.; Liodakis, I.
2017-12-01
We identify rotations of the polarization angle in a sample of blazars observed for three seasons with the RoboPol instrument. A simplistic stochastic variability model is tested against this sample of rotation events. The model is capable of producing samples of rotations with parameters similar to the observed ones, but fails to reproduce the polarization fraction at the same time. Even though we can neither accept nor conclusively reject the model, we point out various aspects of the observations that are fully consistent with a random walk process.
NASA Astrophysics Data System (ADS)
Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong
2018-01-01
Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.
Lobréaux, Stéphane; Melodelima, Christelle
2015-02-01
We tested the use of Generalized Linear Mixed Models to detect associations between genetic loci and environmental variables, taking into account the population structure of sampled individuals. We used a simulation approach to generate datasets under demographically and selectively explicit models. These datasets were used to analyze and optimize GLMM capacity to detect the association between markers and selective coefficients as environmental data in terms of false and true positive rates. Different sampling strategies were tested, maximizing the number of populations sampled, sites sampled per population, or individuals sampled per site, and the effect of different selective intensities on the efficiency of the method was determined. Finally, we apply these models to an Arabidopsis thaliana SNP dataset from different accessions, looking for loci associated with spring minimal temperature. We identified 25 regions that exhibit unusual correlations with the climatic variable and contain genes with functions related to temperature stress. Copyright © 2014 Elsevier Inc. All rights reserved.
Ram Kumar Deo; Robert E. Froese; Michael J. Falkowski; Andrew T. Hudak
2016-01-01
The conventional approach to LiDAR-based forest inventory modeling depends on field sample data from fixed-radius plots (FRP). Because FRP sampling is cost intensive, combining variable-radius plot (VRP) sampling and LiDAR data has the potential to improve inventory efficiency. The overarching goal of this study was to evaluate the integration of LiDAR and VRP data....
Sampling and modeling riparian forest structure and riparian microclimate
Bianca N.I. Eskelson; Paul D. Anderson; Hailemariam Temesgen
2013-01-01
Riparian areas are extremely variable and dynamic, and represent some of the most complex terrestrial ecosystems in the world. The high variability within and among riparian areas poses challenges in developing efficient sampling and modeling approaches that accurately quantify riparian forest structure and riparian microclimate. Data from eight stream reaches that are...
Bayesian model comparison and parameter inference in systems biology using nested sampling.
Pullen, Nick; Morris, Richard J
2014-01-01
Inferring parameters for models of biological processes is a current challenge in systems biology, as is the related problem of comparing competing models that explain the data. In this work we apply Skilling's nested sampling to address both of these problems. Nested sampling is a Bayesian method for exploring parameter space that transforms a multi-dimensional integral to a 1D integration over likelihood space. This approach focuses on the computation of the marginal likelihood or evidence. The ratio of evidences of different models leads to the Bayes factor, which can be used for model comparison. We demonstrate how nested sampling can be used to reverse-engineer a system's behaviour whilst accounting for the uncertainty in the results. The effect of missing initial conditions of the variables as well as unknown parameters is investigated. We show how the evidence and the model ranking can change as a function of the available data. Furthermore, the addition of data from extra variables of the system can deliver more information for model comparison than increasing the data from one variable, thus providing a basis for experimental design.
Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai
2015-01-01
The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai
2015-10-01
The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.
Makowski, David; Bancal, Rémi; Bensadoun, Arnaud; Monod, Hervé; Messéan, Antoine
2017-09-01
According to E.U. regulations, the maximum allowable rate of adventitious transgene presence in non-genetically modified (GM) crops is 0.9%. We compared four sampling methods for the detection of transgenic material in agricultural non-GM maize fields: random sampling, stratified sampling, random sampling + ratio reweighting, random sampling + regression reweighting. Random sampling involves simply sampling maize grains from different locations selected at random from the field concerned. The stratified and reweighting sampling methods make use of an auxiliary variable corresponding to the output of a gene-flow model (a zero-inflated Poisson model) simulating cross-pollination as a function of wind speed, wind direction, and distance to the closest GM maize field. With the stratified sampling method, an auxiliary variable is used to define several strata with contrasting transgene presence rates, and grains are then sampled at random from each stratum. With the two methods involving reweighting, grains are first sampled at random from various locations within the field, and the observations are then reweighted according to the auxiliary variable. Data collected from three maize fields were used to compare the four sampling methods, and the results were used to determine the extent to which transgene presence rate estimation was improved by the use of stratified and reweighting sampling methods. We found that transgene rate estimates were more accurate and that substantially smaller samples could be used with sampling strategies based on an auxiliary variable derived from a gene-flow model. © 2017 Society for Risk Analysis.
Graves, Tabitha A.; Royle, J. Andrew; Kendall, Katherine C.; Beier, Paul; Stetz, Jeffrey B.; Macleod, Amy C.
2012-01-01
Using multiple detection methods can increase the number, kind, and distribution of individuals sampled, which may increase accuracy and precision and reduce cost of population abundance estimates. However, when variables influencing abundance are of interest, if individuals detected via different methods are influenced by the landscape differently, separate analysis of multiple detection methods may be more appropriate. We evaluated the effects of combining two detection methods on the identification of variables important to local abundance using detections of grizzly bears with hair traps (systematic) and bear rubs (opportunistic). We used hierarchical abundance models (N-mixture models) with separate model components for each detection method. If both methods sample the same population, the use of either data set alone should (1) lead to the selection of the same variables as important and (2) provide similar estimates of relative local abundance. We hypothesized that the inclusion of 2 detection methods versus either method alone should (3) yield more support for variables identified in single method analyses (i.e. fewer variables and models with greater weight), and (4) improve precision of covariate estimates for variables selected in both separate and combined analyses because sample size is larger. As expected, joint analysis of both methods increased precision as well as certainty in variable and model selection. However, the single-method analyses identified different variables and the resulting predicted abundances had different spatial distributions. We recommend comparing single-method and jointly modeled results to identify the presence of individual heterogeneity between detection methods in N-mixture models, along with consideration of detection probabilities, correlations among variables, and tolerance to risk of failing to identify variables important to a subset of the population. The benefits of increased precision should be weighed against those risks. The analysis framework presented here will be useful for other species exhibiting heterogeneity by detection method.
A Bayesian Measurment Error Model for Misaligned Radiographic Data
Lennox, Kristin P.; Glascoe, Lee G.
2013-09-06
An understanding of the inherent variability in micro-computed tomography (micro-CT) data is essential to tasks such as statistical process control and the validation of radiographic simulation tools. The data present unique challenges to variability analysis due to the relatively low resolution of radiographs, and also due to minor variations from run to run which can result in misalignment or magnification changes between repeated measurements of a sample. Positioning changes artificially inflate the variability of the data in ways that mask true physical phenomena. We present a novel Bayesian nonparametric regression model that incorporates both additive and multiplicative measurement error inmore » addition to heteroscedasticity to address this problem. We also use this model to assess the effects of sample thickness and sample position on measurement variability for an aluminum specimen. Supplementary materials for this article are available online.« less
Parametric Cost Models for Space Telescopes
NASA Technical Reports Server (NTRS)
Stahl, H. Philip
2010-01-01
A study is in-process to develop a multivariable parametric cost model for space telescopes. Cost and engineering parametric data has been collected on 30 different space telescopes. Statistical correlations have been developed between 19 variables of 59 variables sampled. Single Variable and Multi-Variable Cost Estimating Relationships have been developed. Results are being published.
Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao
2014-10-07
In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.
AGN Variability: Probing Black Hole Accretion
NASA Astrophysics Data System (ADS)
Moreno, Jackeline; O'Brien, Jack; Vogeley, Michael S.; Richards, Gordon T.; Kasliwal, Vishal P.
2017-01-01
We combine the long temporal baseline of Sloan Digital Sky Survey (SDSS) for quasars in Stripe 82 with the high precision photometry of the Kepler/K2 Satellite to study the physics of optical variability in the accretion disk and supermassive black hole engine. We model the lightcurves directly as Continuous-time Auto Regressive Moving Average processes (C-ARMA) with the Kali analysis package (Kasliwal et al. 2016). These models are extremely robust to irregular sampling and can capture aperiodic variability structure on various timescales. We also estimate the power spectral density and structure function of both the model family and the data. A Green's function kernel may also be estimated for the resulting C-ARMA parameter fit, which may be interpreted as the response to driving impulses such as hotspots in the accretion disk. We also examine available spectra for our AGN sample to relate observed and modelled behavior to spectral properties. The objective of this work is twofold: to explore the proper physical interpretation of different families of C-ARMA models applied to AGN optical flux variability and to relate empirical characteristic timescales of our AGN sample to physical theory or to properties estimated from spectra or simulations like the disk viscosity and temperature. We find that AGN with strong variability features on timescales resolved by K2 are well modelled by a low order C-ARMA family while K2 lightcurves with weak amplitude variability are dominated by outliers and measurement errors which force higher order model fits. This work explores a novel approach to combining SDSS and K2 data sets and presents recovered characteristic timescales of AGN variability.
Sterba, Sonya K; Rights, Jason D
2016-01-01
Item parceling remains widely used under conditions that can lead to parcel-allocation variability in results. Hence, researchers may be interested in quantifying and accounting for parcel-allocation variability within sample. To do so in practice, three key issues need to be addressed. First, how can we combine sources of uncertainty arising from sampling variability and parcel-allocation variability when drawing inferences about parameters in structural equation models? Second, on what basis can we choose the number of repeated item-to-parcel allocations within sample? Third, how can we diagnose and report proportions of total variability per estimate arising due to parcel-allocation variability versus sampling variability? This article addresses these three methodological issues. Developments are illustrated using simulated and empirical examples, and software for implementing them is provided.
Long-term behaviour and cross-correlation water quality analysis of the River Elbe, Germany.
Lehmann, A; Rode, M
2001-06-01
This study analyses weekly data samples from the river Elbe at Magdeburg between 1984 and 1996 to investigate the changes in metabolism and water quality in the river Elbe since the German reunification in 1990. Modelling water quality variables by autoregressive component models and ARIMA models reveals the improvement of water quality due to the reduction of waste water emissions since 1990. The models are used to determine the long-term and seasonal behaviour of important water quality variables. Organic and heavy metal pollution parameters showed a significant decrease since 1990, however, no significant change of chlorophyll-a as a measure for primary production could be found. A new procedure for testing the significance of a sample correlation coefficient is discussed, which is able to detect spurious sample correlation coefficients without making use of time-consuming prewhitening. The cross-correlation analysis is applied to hydrophysical, biological, and chemical water quality variables of the river Elbe since 1984. Special emphasis is laid on the detection of spurious sample correlation coefficients.
Investigation to develop a multistage forest sampling inventory system using ERTS-1 imagery
NASA Technical Reports Server (NTRS)
Langley, P. G.; Vanroessel, J. W. (Principal Investigator); Wert, S. L.
1975-01-01
The author has identified the following significant results. The annotation system produced a RMSE of about 200 m ground distance in the MSS data system with the control data used. All the analytical MSS interpretation models tried were highly significant. However, the gains in forest sampling efficiency that can be achieved by using the models vary from zero to over 50 percent depending on the area to which they are applied and the sampling method used. Among the sampling methods tried, regression sampling yielded substantial and the most consistent gains. The single most significant variable in the interpretation model was the difference between bands 5 and 7. The contrast variable, computed by the Hadamard transform was significant but did not contribute much to the interpretation model. Forest areas containing very large timber volumes because of large tree sizes were not separable from areas of similar crown cover but containing smaller trees using ERTS image interpretation only. All correlations between space derived timber volume predictions and estimates obtained from aerial and ground sampling were relatively low but significant and stable. There was a much stronger relationship between variables derived from MSS and U2 data than between U2 and ground data.
Ribic, C.A.; Miller, T.W.
1998-01-01
We investigated CART performance with a unimodal response curve for one continuous response and four continuous explanatory variables, where two variables were important (ie directly related to the response) and the other two were not. We explored performance under three relationship strengths and two explanatory variable conditions: equal importance and one variable four times as important as the other. We compared CART variable selection performance using three tree-selection rules ('minimum risk', 'minimum risk complexity', 'one standard error') to stepwise polynomial ordinary least squares (OLS) under four sample size conditions. The one-standard-error and minimum-risk-complexity methods performed about as well as stepwise OLS with large sample sizes when the relationship was strong. With weaker relationships, equally important explanatory variables and larger sample sizes, the one-standard-error and minimum-risk-complexity rules performed better than stepwise OLS. With weaker relationships and explanatory variables of unequal importance, tree-structured methods did not perform as well as stepwise OLS. Comparing performance within tree-structured methods, with a strong relationship and equally important explanatory variables, the one-standard-error-rule was more likely to choose the correct model than were the other tree-selection rules 1) with weaker relationships and equally important explanatory variables; and 2) under all relationship strengths when explanatory variables were of unequal importance and sample sizes were lower.
Sun, Tong; Xu, Wen-Li; Hu, Tian; Liu, Mu-Hua
2013-12-01
The objective of the present research was to assess soluble solids content (SSC) of Nanfeng mandarin by visible/near infrared (Vis/NIR) spectroscopy combined with new variable selection method, simplify prediction model and improve the performance of prediction model for SSC of Nanfeng mandarin. A total of 300 Nanfeng mandarin samples were used, the numbers of Nanfeng mandarin samples in calibration, validation and prediction sets were 150, 75 and 75, respectively. Vis/NIR spectra of Nanfeng mandarin samples were acquired by a QualitySpec spectrometer in the wavelength range of 350-1000 nm. Uninformative variables elimination (UVE) was used to eliminate wavelength variables that had few information of SSC, then independent component analysis (ICA) was used to extract independent components (ICs) from spectra that eliminated uninformative wavelength variables. At last, least squares support vector machine (LS-SVM) was used to develop calibration models for SSC of Nanfeng mandarin using extracted ICs, and 75 prediction samples that had not been used for model development were used to evaluate the performance of SSC model of Nanfeng mandarin. The results indicate t hat Vis/NIR spectroscopy combinedwith UVE-ICA-LS-SVM is suitable for assessing SSC o f Nanfeng mandarin, and t he precision o f prediction ishigh. UVE--ICA is an effective method to eliminate uninformative wavelength variables, extract important spectral information, simplify prediction model and improve the performance of prediction model. The SSC model developed by UVE-ICA-LS-SVM is superior to that developed by PLS, PCA-LS-SVM or ICA-LS-SVM, and the coefficient of determination and root mean square error in calibration, validation and prediction sets were 0.978, 0.230%, 0.965, 0.301% and 0.967, 0.292%, respectively.
Edwards, T.C.; Cutler, D.R.; Zimmermann, N.E.; Geiser, L.; Moisen, Gretchen G.
2006-01-01
We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys. ?? 2006 Elsevier B.V. All rights reserved.
Ghosh, Sreya; Preza, Chrysanthe
2015-07-01
A three-dimensional (3-D) point spread function (PSF) model for wide-field fluorescence microscopy, suitable for imaging samples with variable refractive index (RI) in multilayered media, is presented. This PSF model is a key component for accurate 3-D image restoration of thick biological samples, such as lung tissue. Microscope- and specimen-derived parameters are combined with a rigorous vectorial formulation to obtain a new PSF model that accounts for additional aberrations due to specimen RI variability. Experimental evaluation and verification of the PSF model was accomplished using images from 175-nm fluorescent beads in a controlled test sample. Fundamental experimental validation of the advantage of using improved PSFs in depth-variant restoration was accomplished by restoring experimental data from beads (6 μm in diameter) mounted in a sample with RI variation. In the investigated study, improvement in restoration accuracy in the range of 18 to 35% was observed when PSFs from the proposed model were used over restoration using PSFs from an existing model. The new PSF model was further validated by showing that its prediction compares to an experimental PSF (determined from 175-nm beads located below a thick rat lung slice) with a 42% improved accuracy over the current PSF model prediction.
A Latent Variable Approach to the Simple View of Reading
ERIC Educational Resources Information Center
Kershaw, Sarah; Schatschneider, Chris
2012-01-01
The present study utilized a latent variable modeling approach to examine the Simple View of Reading in a sample of students from 3rd, 7th, and 10th grades (N = 215, 188, and 180, respectively). Latent interaction modeling and other latent variable models were employed to investigate (a) the functional form of the relationship between decoding and…
A Note on Sample Size and Solution Propriety for Confirmatory Factor Analytic Models
ERIC Educational Resources Information Center
Jackson, Dennis L.; Voth, Jennifer; Frey, Marc P.
2013-01-01
Determining an appropriate sample size for use in latent variable modeling techniques has presented ongoing challenges to researchers. In particular, small sample sizes are known to present concerns over sampling error for the variances and covariances on which model estimation is based, as well as for fit indexes and convergence failures. The…
Application of classification-tree methods to identify nitrate sources in ground water
Spruill, T.B.; Showers, W.J.; Howe, S.S.
2002-01-01
A study was conducted to determine if nitrate sources in ground water (fertilizer on crops, fertilizer on golf courses, irrigation spray from hog (Sus scrofa) wastes, and leachate from poultry litter and septic systems) could be classified with 80% or greater success. Two statistical classification-tree models were devised from 48 water samples containing nitrate from five source categories. Model I was constructed by evaluating 32 variables and selecting four primary predictor variables (??15N, nitrate to ammonia ratio, sodium to potassium ratio, and zinc) to identify nitrate sources. A ??15N value of nitrate plus potassium 18.2 indicated inorganic or soil organic N. A nitrate to ammonia ratio 575 indicated nitrate from golf courses. A sodium to potassium ratio 3.2 indicated spray or poultry wastes. A value for zinc 2.8 indicated poultry wastes. Model 2 was devised by using all variables except ??15N. This model also included four variables (sodium plus potassium, nitrate to ammonia ratio, calcium to magnesium ratio, and sodium to potassium ratio) to distinguish categories. Both models were able to distinguish all five source categories with better than 80% overall success and with 71 to 100% success in individual categories using the learning samples. Seventeen water samples that were not used in model development were tested using Model 2 for three categories, and all were correctly classified. Classification-tree models show great potential in identifying sources of contamination and variables important in the source-identification process.
Variable selection under multiple imputation using the bootstrap in a prognostic study
Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW
2007-01-01
Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912
Bonawitz, Elizabeth; Denison, Stephanie; Griffiths, Thomas L; Gopnik, Alison
2014-10-01
Although probabilistic models of cognitive development have become increasingly prevalent, one challenge is to account for how children might cope with a potentially vast number of possible hypotheses. We propose that children might address this problem by 'sampling' hypotheses from a probability distribution. We discuss empirical results demonstrating signatures of sampling, which offer an explanation for the variability of children's responses. The sampling hypothesis provides an algorithmic account of how children might address computationally intractable problems and suggests a way to make sense of their 'noisy' behavior. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
McMillan, N. J.; Chavez, A.; Chanover, N.; Voelz, D.; Uckert, K.; Tawalbeh, R.; Gariano, J.; Dragulin, I.; Xiao, X.; Hull, R.
2014-12-01
Rapid, in-situ methods for identification of biologic and non-biologic mineral precipitation sites permit mapping of biological hot spots. Two portable spectrometers, Laser-Induced Breakdown Spectroscopy (LIBS) and Acoustic-Optic Tunable Filter Reflectance Spectroscopy (AOTFRS) were used to differentiate between bacterially influenced and inorganically precipitated calcite specimens from Fort Stanton Cave, NM, USA. LIBS collects light emitted from the decay of excited electrons in a laser ablation plasma; the spectrum is a chemical fingerprint of the analyte. AOTFRS collects light reflected from the surface of a specimen and provides structural information about the material (i.e., the presence of O-H bonds). These orthogonal data sets provide a rigorous method to determine the origin of calcite in cave deposits. This study used a set of 48 calcite samples collected from Fort Stanton cave. Samples were examined in SEM for the presence of biologic markers; these data were used to separate the samples into biologic and non-biologic groups. Spectra were modeled using the multivariate technique Partial Least Squares Regression (PLSR). Half of the spectra were used to train a PLSR model, in which biologic samples were assigned to the independent variable "0" and non-biologic samples were assigned the variable "1". Values of the independent variable were calculated for each of the training samples, which were close to 0 for the biologic samples (-0.09 - 0.23) and close to 1 for the non-biologic samples (0.57 - 1.14). A Value of Apparent Distinction (VAD) of 0.55 was used to numerically distinguish between the two groups; any sample with an independent variable value < 0.55 was classified as having a biologic origin; a sample with a value > 0.55 was determined to be non-biologic in origin. After the model was trained, independent variable values for the remaining half of the samples were calculated. Biologic or non-biologic origin was assigned by comparison to the VAD. Using LIBS data alone, the model has a 92% success rate, correctly identifying 23 of 25 samples. Modeling of AOTFRS spectra and the combined LIBS-AOTFRS data set have similar success rates. This study demonstrates that rapid, portable LIBS and AOTFRS instruments can be used to map the spatial distribution of biologic precipitation in caves.
NASA Astrophysics Data System (ADS)
Mao, Zhiyi; Shan, Ruifeng; Wang, Jiajun; Cai, Wensheng; Shao, Xueguang
2014-07-01
Polyphenols in plant samples have been extensively studied because phenolic compounds are ubiquitous in plants and can be used as antioxidants in promoting human health. A method for rapid determination of three phenolic compounds (chlorogenic acid, scopoletin and rutin) in plant samples using near-infrared diffuse reflectance spectroscopy (NIRDRS) is studied in this work. Partial least squares (PLS) regression was used for building the calibration models, and the effects of spectral preprocessing and variable selection on the models are investigated for optimization of the models. The results show that individual spectral preprocessing and variable selection has no or slight influence on the models, but the combination of the techniques can significantly improve the models. The combination of continuous wavelet transform (CWT) for removing the variant background, multiplicative scatter correction (MSC) for correcting the scattering effect and randomization test (RT) for selecting the informative variables was found to be the best way for building the optimal models. For validation of the models, the polyphenol contents in an independent sample set were predicted. The correlation coefficients between the predicted values and the contents determined by high performance liquid chromatography (HPLC) analysis are as high as 0.964, 0.948 and 0.934 for chlorogenic acid, scopoletin and rutin, respectively.
Basic, David; Khoo, Angela
2015-09-01
To examine the relationship between newly made medical diagnoses and length of stay (LOS) of acutely unwell older patients. Consecutive patients admitted under the care of four geriatricians were randomly allocated to a model development sample (n = 937) or a model validation sample (n = 855). Cox regression was used to model LOS. Variables considered for inclusion in the development model were established risk factors for LOS and univariate predictors from our dataset. Variables selected in the development sample were tested in the validation sample. A median of five new medical diagnoses were made during a median LOS of 10 days. New diagnoses predicted an increased LOS (hazard ratio 0.90, 95% confidence interval 0.88-0.92). Other significant predictors of increased LOS in both samples were malnutrition and frailty. Identification of new medical diagnoses may have implications for Diagnosis Related Groups-based funding models and may improve the care of older people. © 2015 AJA Inc.
Corron, Louise; Marchal, François; Condemi, Silvana; Telmon, Norbert; Chaumoitre, Kathia; Adalian, Pascal
2018-05-31
Subadult age estimation should rely on sampling and statistical protocols capturing development variability for more accurate age estimates. In this perspective, measurements were taken on the fifth lumbar vertebrae and/or clavicles of 534 French males and females aged 0-19 years and the ilia of 244 males and females aged 0-12 years. These variables were fitted in nonparametric multivariate adaptive regression splines (MARS) models with 95% prediction intervals (PIs) of age. The models were tested on two independent samples from Marseille and the Luis Lopes reference collection from Lisbon. Models using ilium width and module, maximum clavicle length, and lateral vertebral body heights were more than 92% accurate. Precision was lower for postpubertal individuals. Integrating punctual nonlinearities of the relationship between age and the variables and dynamic prediction intervals incorporated the normal increase in interindividual growth variability (heteroscedasticity of variance) with age for more biologically accurate predictions. © 2018 American Academy of Forensic Sciences.
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Jacob Strunk; Hailemariam Temesgen; Hans-Erik Andersen; James P. Flewelling; Lisa Madsen
2012-01-01
Using lidar in an area-based model-assisted approach to forest inventory has the potential to increase estimation precision for some forest inventory variables. This study documents the bias and precision of a model-assisted (regression estimation) approach to forest inventory with lidar-derived auxiliary variables relative to lidar pulse density and the number of...
NASA Astrophysics Data System (ADS)
Hu, Jiexiang; Zhou, Qi; Jiang, Ping; Shao, Xinyu; Xie, Tingli
2018-01-01
Variable-fidelity (VF) modelling methods have been widely used in complex engineering system design to mitigate the computational burden. Building a VF model generally includes two parts: design of experiments and metamodel construction. In this article, an adaptive sampling method based on improved hierarchical kriging (ASM-IHK) is proposed to refine the improved VF model. First, an improved hierarchical kriging model is developed as the metamodel, in which the low-fidelity model is varied through a polynomial response surface function to capture the characteristics of a high-fidelity model. Secondly, to reduce local approximation errors, an active learning strategy based on a sequential sampling method is introduced to make full use of the already required information on the current sampling points and to guide the sampling process of the high-fidelity model. Finally, two numerical examples and the modelling of the aerodynamic coefficient for an aircraft are provided to demonstrate the approximation capability of the proposed approach, as well as three other metamodelling methods and two sequential sampling methods. The results show that ASM-IHK provides a more accurate metamodel at the same simulation cost, which is very important in metamodel-based engineering design problems.
Optical variability properties of the largest AGN sample observed with Kepler/K2
NASA Astrophysics Data System (ADS)
Aranzana, E.; Koerding, E.; Uttley, P.; Scaringi, S.; Steven, B.
2017-10-01
We present the first short time-scale ( hours to days) optical variability study of a large sample of Active Galactic Nuclei (AGN) observed with the Kepler/K2 mission. The sample contains 275 AGN observed over four campaigns with ˜30-minute cadence selected from the Million Quasar Catalogue with R magnitude < 19. We performed time series analysis to determine their variability properties by means of the power spectral densities (PSDs) and applied Monte Carlo techniques to find the best model parameters that fit the observed power spectra. A power-law model is sufficient to describe all the PSDs of the AGN in our sample. The average power-law slope is 2.5±0.5, steeper than the PSDs observed in X-rays, and the rest-frame amplitude variability in the frequency range of 6×10^{-6}-10^{-4} Hz varies from 1-10 % with an average of 2.6 %. We explore correlations between the variability amplitude and key parameters of the AGN, finding a significant correlation of rest-frame short-term variability amplitude with redshift, but no such correlation with luminosity. We attribute these effects to the known 'bluer when brighter variability of quasars combined with the fixed bandpass of Kepler. This study enables us to distinguish between Seyferts and Blazar and confirm AGN candidates.
Kang, Jian; Li, Xin; Jin, Rui; Ge, Yong; Wang, Jinfeng; Wang, Jianghao
2014-01-01
The eco-hydrological wireless sensor network (EHWSN) in the middle reaches of the Heihe River Basin in China is designed to capture the spatial and temporal variability and to estimate the ground truth for validating the remote sensing productions. However, there is no available prior information about a target variable. To meet both requirements, a hybrid model-based sampling method without any spatial autocorrelation assumptions is developed to optimize the distribution of EHWSN nodes based on geostatistics. This hybrid model incorporates two sub-criteria: one for the variogram modeling to represent the variability, another for improving the spatial prediction to evaluate remote sensing productions. The reasonability of the optimized EHWSN is validated from representativeness, the variogram modeling and the spatial accuracy through using 15 types of simulation fields generated with the unconditional geostatistical stochastic simulation. The sampling design shows good representativeness; variograms estimated by samples have less than 3% mean error relative to true variograms. Then, fields at multiple scales are predicted. As the scale increases, estimated fields have higher similarities to simulation fields at block sizes exceeding 240 m. The validations prove that this hybrid sampling method is effective for both objectives when we do not know the characteristics of an optimized variables. PMID:25317762
Kang, Jian; Li, Xin; Jin, Rui; Ge, Yong; Wang, Jinfeng; Wang, Jianghao
2014-10-14
The eco-hydrological wireless sensor network (EHWSN) in the middle reaches of the Heihe River Basin in China is designed to capture the spatial and temporal variability and to estimate the ground truth for validating the remote sensing productions. However, there is no available prior information about a target variable. To meet both requirements, a hybrid model-based sampling method without any spatial autocorrelation assumptions is developed to optimize the distribution of EHWSN nodes based on geostatistics. This hybrid model incorporates two sub-criteria: one for the variogram modeling to represent the variability, another for improving the spatial prediction to evaluate remote sensing productions. The reasonability of the optimized EHWSN is validated from representativeness, the variogram modeling and the spatial accuracy through using 15 types of simulation fields generated with the unconditional geostatistical stochastic simulation. The sampling design shows good representativeness; variograms estimated by samples have less than 3% mean error relative to true variograms. Then, fields at multiple scales are predicted. As the scale increases, estimated fields have higher similarities to simulation fields at block sizes exceeding 240 m. The validations prove that this hybrid sampling method is effective for both objectives when we do not know the characteristics of an optimized variables.
Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)
NASA Astrophysics Data System (ADS)
Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz
2017-05-01
Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.
ERIC Educational Resources Information Center
Henson, James M.; Reise, Steven P.; Kim, Kevin H.
2007-01-01
The accuracy of structural model parameter estimates in latent variable mixture modeling was explored with a 3 (sample size) [times] 3 (exogenous latent mean difference) [times] 3 (endogenous latent mean difference) [times] 3 (correlation between factors) [times] 3 (mixture proportions) factorial design. In addition, the efficacy of several…
New approaches for sampling and modeling native and exotic plant species richness
Chong, G.W.; Reich, R.M.; Kalkhan, M.A.; Stohlgren, T.J.
2001-01-01
We demonstrate new multi-phase, multi-scale approaches for sampling and modeling native and exotic plant species to predict the spread of invasive species and aid in control efforts. Our test site is a 54,000-ha portion of Rocky Mountain National Park, Colorado, USA. This work is based on previous research wherein we developed vegetation sampling techniques to identify hot spots of diversity, important rare habitats, and locations of invasive plant species. Here we demonstrate statistical modeling tools to rapidly assess current patterns of native and exotic plant species to determine which habitats are most vulnerable to invasion by exotic species. We use stepwise multiple regression and modified residual kriging to estimate numbers of native species and exotic species, as well as probability of observing an exotic species in 30 × 30-m cells. Final models accounted for 62% of the variability observed in number of native species, 51% of the variability observed in number of exotic species, and 47% of the variability associated with observing an exotic species. Important independent variables used in developing the models include geographical location, elevation, slope, aspect, and Landsat TM bands 1-7. These models can direct resource managers to areas in need of further inventory, monitoring, and exotic species control efforts.
Creel, Scott; Creel, Michael
2009-11-01
1. Sampling error in annual estimates of population size creates two widely recognized problems for the analysis of population growth. First, if sampling error is mistakenly treated as process error, one obtains inflated estimates of the variation in true population trajectories (Staples, Taper & Dennis 2004). Second, treating sampling error as process error is thought to overestimate the importance of density dependence in population growth (Viljugrein et al. 2005; Dennis et al. 2006). 2. In ecology, state-space models are used to account for sampling error when estimating the effects of density and other variables on population growth (Staples et al. 2004; Dennis et al. 2006). In econometrics, regression with instrumental variables is a well-established method that addresses the problem of correlation between regressors and the error term, but requires fewer assumptions than state-space models (Davidson & MacKinnon 1993; Cameron & Trivedi 2005). 3. We used instrumental variables to account for sampling error and fit a generalized linear model to 472 annual observations of population size for 35 Elk Management Units in Montana, from 1928 to 2004. We compared this model with state-space models fit with the likelihood function of Dennis et al. (2006). We discuss the general advantages and disadvantages of each method. Briefly, regression with instrumental variables is valid with fewer distributional assumptions, but state-space models are more efficient when their distributional assumptions are met. 4. Both methods found that population growth was negatively related to population density and winter snow accumulation. Summer rainfall and wolf (Canis lupus) presence had much weaker effects on elk (Cervus elaphus) dynamics [though limitation by wolves is strong in some elk populations with well-established wolf populations (Creel et al. 2007; Creel & Christianson 2008)]. 5. Coupled with predictions for Montana from global and regional climate models, our results predict a substantial reduction in the limiting effect of snow accumulation on Montana elk populations in the coming decades. If other limiting factors do not operate with greater force, population growth rates would increase substantially.
NASA Astrophysics Data System (ADS)
Aulenbach, B. T.; Burns, D. A.; Shanley, J. B.; Yanai, R. D.; Bae, K.; Wild, A.; Yang, Y.; Dong, Y.
2013-12-01
There are many sources of uncertainty in estimates of streamwater solute flux. Flux is the product of discharge and concentration (summed over time), each of which has measurement uncertainty of its own. Discharge can be measured almost continuously, but concentrations are usually determined from discrete samples, which increases uncertainty dependent on sampling frequency and how concentrations are assigned for the periods between samples. Gaps between samples can be estimated by linear interpolation or by models that that use the relations between concentration and continuously measured or known variables such as discharge, season, temperature, and time. For this project, developed in cooperation with QUEST (Quantifying Uncertainty in Ecosystem Studies), we evaluated uncertainty for three flux estimation methods and three different sampling frequencies (monthly, weekly, and weekly plus event). The constituents investigated were dissolved NO3, Si, SO4, and dissolved organic carbon (DOC), solutes whose concentration dynamics exhibit strongly contrasting behavior. The evaluation was completed for a 10-year period at five small, forested watersheds in Georgia, New Hampshire, New York, Puerto Rico, and Vermont. Concentration regression models were developed for each solute at each of the three sampling frequencies for all five watersheds. Fluxes were then calculated using (1) a linear interpolation approach, (2) a regression-model method, and (3) the composite method - which combines the regression-model method for estimating concentrations and the linear interpolation method for correcting model residuals to the observed sample concentrations. We considered the best estimates of flux to be derived using the composite method at the highest sampling frequencies. We also evaluated the importance of sampling frequency and estimation method on flux estimate uncertainty; flux uncertainty was dependent on the variability characteristics of each solute and varied for different reporting periods (e.g. 10-year, study period vs. annually vs. monthly). The usefulness of the two regression model based flux estimation approaches was dependent upon the amount of variance in concentrations the regression models could explain. Our results can guide the development of optimal sampling strategies by weighing sampling frequency with improvements in uncertainty in stream flux estimates for solutes with particular characteristics of variability. The appropriate flux estimation method is dependent on a combination of sampling frequency and the strength of concentration regression models. Sites: Biscuit Brook (Frost Valley, NY), Hubbard Brook Experimental Forest and LTER (West Thornton, NH), Luquillo Experimental Forest and LTER (Luquillo, Puerto Rico), Panola Mountain (Stockbridge, GA), Sleepers River Research Watershed (Danville, VT)
ERIC Educational Resources Information Center
Castro-Villarreal, Felicia; Guerra, Norma; Sass, Daniel; Hseih, Pei-Hsuan
2014-01-01
Theoretical models were tested using structural equation modeling to evaluate the interrelations among cognitive motivational variables and academic achievement using a sample of 128 predominately Hispanic pre-service teachers enrolled in two undergraduate educational psychology classes. Data were gathered using: (1) a quantitative questionnaire…
A Multivariate Model of Achievement in Geometry
ERIC Educational Resources Information Center
Bailey, MarLynn; Taasoobshirazi, Gita; Carr, Martha
2014-01-01
Previous studies have shown that several key variables influence student achievement in geometry, but no research has been conducted to determine how these variables interact. A model of achievement in geometry was tested on a sample of 102 high school students. Structural equation modeling was used to test hypothesized relationships among…
Sample Size Limits for Estimating Upper Level Mediation Models Using Multilevel SEM
ERIC Educational Resources Information Center
Li, Xin; Beretvas, S. Natasha
2013-01-01
This simulation study investigated use of the multilevel structural equation model (MLSEM) for handling measurement error in both mediator and outcome variables ("M" and "Y") in an upper level multilevel mediation model. Mediation and outcome variable indicators were generated with measurement error. Parameter and standard…
Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal
2009-01-01
The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455
Networks for image acquisition, processing and display
NASA Technical Reports Server (NTRS)
Ahumada, Albert J., Jr.
1990-01-01
The human visual system comprises layers of networks which sample, process, and code images. Understanding these networks is a valuable means of understanding human vision and of designing autonomous vision systems based on network processing. Ames Research Center has an ongoing program to develop computational models of such networks. The models predict human performance in detection of targets and in discrimination of displayed information. In addition, the models are artificial vision systems sharing properties with biological vision that has been tuned by evolution for high performance. Properties include variable density sampling, noise immunity, multi-resolution coding, and fault-tolerance. The research stresses analysis of noise in visual networks, including sampling, photon, and processing unit noises. Specific accomplishments include: models of sampling array growth with variable density and irregularity comparable to that of the retinal cone mosaic; noise models of networks with signal-dependent and independent noise; models of network connection development for preserving spatial registration and interpolation; multi-resolution encoding models based on hexagonal arrays (HOP transform); and mathematical procedures for simplifying analysis of large networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kelly, Brandon C.; Becker, Andrew C.; Sobolewska, Malgosia
2014-06-10
We present the use of continuous-time autoregressive moving average (CARMA) models as a method for estimating the variability features of a light curve, and in particular its power spectral density (PSD). CARMA models fully account for irregular sampling and measurement errors, making them valuable for quantifying variability, forecasting and interpolating light curves, and variability-based classification. We show that the PSD of a CARMA model can be expressed as a sum of Lorentzian functions, which makes them extremely flexible and able to model a broad range of PSDs. We present the likelihood function for light curves sampled from CARMA processes, placingmore » them on a statistically rigorous foundation, and we present a Bayesian method to infer the probability distribution of the PSD given the measured light curve. Because calculation of the likelihood function scales linearly with the number of data points, CARMA modeling scales to current and future massive time-domain data sets. We conclude by applying our CARMA modeling approach to light curves for an X-ray binary, two active galactic nuclei, a long-period variable star, and an RR Lyrae star in order to illustrate their use, applicability, and interpretation.« less
NASA Astrophysics Data System (ADS)
Yi, Jin; Li, Xinyu; Xiao, Mi; Xu, Junnan; Zhang, Lin
2017-01-01
Engineering design often involves different types of simulation, which results in expensive computational costs. Variable fidelity approximation-based design optimization approaches can realize effective simulation and efficiency optimization of the design space using approximation models with different levels of fidelity and have been widely used in different fields. As the foundations of variable fidelity approximation models, the selection of sample points of variable-fidelity approximation, called nested designs, is essential. In this article a novel nested maximin Latin hypercube design is constructed based on successive local enumeration and a modified novel global harmony search algorithm. In the proposed nested designs, successive local enumeration is employed to select sample points for a low-fidelity model, whereas the modified novel global harmony search algorithm is employed to select sample points for a high-fidelity model. A comparative study with multiple criteria and an engineering application are employed to verify the efficiency of the proposed nested designs approach.
Modeling Signal-Noise Processes Supports Student Construction of a Hierarchical Image of Sample
ERIC Educational Resources Information Center
Lehrer, Richard
2017-01-01
Grade 6 (modal age 11) students invented and revised models of the variability generated as each measured the perimeter of a table in their classroom. To construct models, students represented variability as a linear composite of true measure (signal) and multiple sources of random error. Students revised models by developing sampling…
On the Asymptotic Relative Efficiency of Planned Missingness Designs.
Rhemtulla, Mijke; Savalei, Victoria; Little, Todd D
2016-03-01
In planned missingness (PM) designs, certain data are set a priori to be missing. PM designs can increase validity and reduce cost; however, little is known about the loss of efficiency that accompanies these designs. The present paper compares PM designs to reduced sample (RN) designs that have the same total number of data points concentrated in fewer participants. In 4 studies, we consider models for both observed and latent variables, designs that do or do not include an "X set" of variables with complete data, and a full range of between- and within-set correlation values. All results are obtained using asymptotic relative efficiency formulas, and thus no data are generated; this novel approach allows us to examine whether PM designs have theoretical advantages over RN designs removing the impact of sampling error. Our primary findings are that (a) in manifest variable regression models, estimates of regression coefficients have much lower relative efficiency in PM designs as compared to RN designs, (b) relative efficiency of factor correlation or latent regression coefficient estimates is maximized when the indicators of each latent variable come from different sets, and (c) the addition of an X set improves efficiency in manifest variable regression models only for the parameters that directly involve the X-set variables, but it substantially improves efficiency of most parameters in latent variable models. We conclude that PM designs can be beneficial when the model of interest is a latent variable model; recommendations are made for how to optimize such a design.
Polycyclic Aromatic Hydrocarbons in Residential Dust: Sources of Variability
Metayer, Catherine; Petreas, Myrto; Does, Monique; Buffler, Patricia A.; Rappaport, Stephen M.
2013-01-01
Background: There is interest in using residential dust to estimate human exposure to environmental contaminants. Objectives: We aimed to characterize the sources of variability for polycyclic aromatic hydrocarbons (PAHs) in residential dust and provide guidance for investigators who plan to use residential dust to assess exposure to PAHs. Methods: We collected repeat dust samples from 293 households in the Northern California Childhood Leukemia Study during two sampling rounds (from 2001 through 2007 and during 2010) using household vacuum cleaners, and measured 12 PAHs using gas chromatography–mass spectrometry. We used a random- and a mixed-effects model for each PAH to apportion observed variance into four components and to identify sources of variability. Results: Median concentrations for individual PAHs ranged from 10 to 190 ng/g of dust. For each PAH, total variance was apportioned into regional variability (1–9%), intraregional between-household variability (24–48%), within-household variability over time (41–57%), and within-sample analytical variability (2–33%). Regional differences in PAH dust levels were associated with estimated ambient air concentrations of PAH. Intraregional differences between households were associated with the residential construction date and the smoking habits of residents. For some PAHs, a decreasing time trend explained a modest fraction of the within-household variability; however, most of the within-household variability was unaccounted for by our mixed-effects models. Within-household differences between sampling rounds were largest when the interval between dust sample collections was at least 6 years in duration. Conclusions: Our findings indicate that it may be feasible to use residential dust for retrospective assessment of PAH exposures in studies of health effects. PMID:23461863
ERIC Educational Resources Information Center
Vardeman, Stephen B.; Wendelberger, Joanne R.
2005-01-01
There is a little-known but very simple generalization of the standard result that for uncorrelated random variables with common mean [mu] and variance [sigma][superscript 2], the expected value of the sample variance is [sigma][superscript 2]. The generalization justifies the use of the usual standard error of the sample mean in possibly…
Wang, Ching-Yun; Song, Xiao
2017-01-01
SUMMARY Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women’s Health Initiative. PMID:27546625
NASA Astrophysics Data System (ADS)
Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Mello, Paola de Azevedo; Ferrão, Marco Flores; dos Santos, Maria de Fátima Pereira; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes
2012-04-01
Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm-1). This model produced a RMSECV of 400 mg kg-1 S and RMSEP of 420 mg kg-1 S, showing a correlation coefficient of 0.990.
Hugo, Sanet; Altwegg, Res
2017-09-01
Using the Southern African Bird Atlas Project (SABAP2) as a case study, we examine the possible determinants of spatial bias in volunteer sampling effort and how well such biased data represent environmental gradients across the area covered by the atlas. For each province in South Africa, we used generalized linear mixed models to determine the combination of variables that explain spatial variation in sampling effort (number of visits per 5' × 5' grid cell, or "pentad"). The explanatory variables were distance to major road and exceptional birding locations or "sampling hubs," percentage cover of protected, urban, and cultivated area, and the climate variables mean annual precipitation, winter temperatures, and summer temperatures. Further, we used the climate variables and plant biomes to define subsets of pentads representing environmental zones across South Africa, Lesotho, and Swaziland. For each environmental zone, we quantified sampling intensity, and we assessed sampling completeness with species accumulation curves fitted to the asymptotic Lomolino model. Sampling effort was highest close to sampling hubs, major roads, urban areas, and protected areas. Cultivated area and the climate variables were less important. Further, environmental zones were not evenly represented by current data and the zones varied in the amount of sampling required representing the species that are present. SABAP2 volunteers' preferences in birding locations cause spatial bias in the dataset that should be taken into account when analyzing these data. Large parts of South Africa remain underrepresented, which may restrict the kind of ecological questions that may be addressed. However, sampling bias may be improved by directing volunteers toward undersampled regions while taking into account volunteer preferences.
Ryan, Patrick H; Lemasters, Grace K; Levin, Linda; Burkle, Jeff; Biswas, Pratim; Hu, Shaohua; Grinshpun, Sergey; Reponen, Tiina
2008-10-01
The Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS) is a prospective birth cohort whose purpose is to determine if exposure to high levels of diesel exhaust particles (DEP) during early childhood increases the risk for developing allergic diseases. In order to estimate exposure to DEP, a land-use regression (LUR) model was developed using geographic data as independent variables and sampled levels of a marker of DEP as the dependent variable. A continuous wind direction variable was also created. The LUR model predicted 74% of the variability in sampled values with four variables: wind direction, length of bus routes within 300 m of the sample site, a measure of truck intensity within 300 m of the sampling site, and elevation. The LUR model was subsequently applied to all locations where the child had spent more than eight hours per week from through age three. A time-weighted average (TWA) microenvironmental exposure estimate was derived for four time periods: 0-6 months, 7-12 months, 13-24 months, 25-36 months. By age two, one third of the children were spending significant time at locations other than home and by 36 months, 39% of the children had changed their residential addresses. The mean cumulative DEP exposure estimate increased from age 6 to 36 months from 70 to 414 microg/m3-days. Findings indicate that using birth addresses to estimate a child's exposure may result in exposure misclassification for some children who spend a significant amount of time at a location with high exposure to DEP.
Increasing precision of turbidity-based suspended sediment concentration and load estimates.
Jastram, John D; Zipper, Carl E; Zelazny, Lucian W; Hyer, Kenneth E
2010-01-01
Turbidity is an effective tool for estimating and monitoring suspended sediments in aquatic systems. Turbidity can be measured in situ remotely and at fine temporal scales as a surrogate for suspended sediment concentration (SSC), providing opportunity for a more complete record of SSC than is possible with physical sampling approaches. However, there is variability in turbidity-based SSC estimates and in sediment loadings calculated from those estimates. This study investigated the potential to improve turbidity-based SSC, and by extension the resulting sediment loading estimates, by incorporating hydrologic variables that can be monitored remotely and continuously (typically 15-min intervals) into the SSC estimation procedure. On the Roanoke River in southwestern Virginia, hydrologic stage, turbidity, and other water-quality parameters were monitored with in situ instrumentation; suspended sediments were sampled manually during elevated turbidity events; samples were analyzed for SSC and physical properties including particle-size distribution and organic C content; and rainfall was quantified by geologic source area. The study identified physical properties of the suspended-sediment samples that contribute to SSC estimation variance and hydrologic variables that explained variability of those physical properties. Results indicated that the inclusion of any of the measured physical properties in turbidity-based SSC estimation models reduces unexplained variance. Further, the use of hydrologic variables to represent these physical properties, along with turbidity, resulted in a model, relying solely on data collected remotely and continuously, that estimated SSC with less variance than a conventional turbidity-based univariate model, allowing a more precise estimate of sediment loading, Modeling results are consistent with known mechanisms governing sediment transport in hydrologic systems.
Variability estimation of urban wastewater biodegradable fractions by respirometry.
Lagarde, Fabienne; Tusseau-Vuillemin, Marie-Hélène; Lessard, Paul; Héduit, Alain; Dutrop, François; Mouchel, Jean-Marie
2005-11-01
This paper presents a methodology for assessing the variability of biodegradable chemical oxygen demand (COD) fractions in urban wastewaters. Thirteen raw wastewater samples from combined and separate sewers feeding the same plant were characterised, and two optimisation procedures were applied in order to evaluate the variability in biodegradable fractions and related kinetic parameters. Through an overall optimisation on all the samples, a unique kinetic parameter set was obtained with a three-substrate model including an adsorption stage. This method required powerful numerical treatment, but improved the identifiability problem compared to the usual sample-to-sample optimisation. The results showed that the fractionation of samples collected in the combined sewer was much more variable (standard deviation of 70% of the mean values) than the fractionation of the separate sewer samples, and the slowly biodegradable COD fraction was the most significant fraction (45% of the total COD on average). Because these samples were collected under various rain conditions, the standard deviations obtained here on the combined sewer biodegradable fractions could be used as a first estimation of the variability of this type of sewer system.
POWER ANALYSIS FOR COMPLEX MEDIATIONAL DESIGNS USING MONTE CARLO METHODS
Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.
2013-01-01
Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex mediational models. The approach is based on the well known technique of generating a large number of samples in a Monte Carlo study, and estimating power as the percentage of cases in which an estimate of interest is significantly different from zero. Examples of power calculation for commonly used mediational models are provided. Power analyses for the single mediator, multiple mediators, three-path mediation, mediation with latent variables, moderated mediation, and mediation in longitudinal designs are described. Annotated sample syntax for Mplus is appended and tabled values of required sample sizes are shown for some models. PMID:23935262
Data splitting for artificial neural networks using SOM-based stratified sampling.
May, R J; Maier, H R; Dandy, G C
2010-03-01
Data splitting is an important consideration during artificial neural network (ANN) development where hold-out cross-validation is commonly employed to ensure generalization. Even for a moderate sample size, the sampling methodology used for data splitting can have a significant effect on the quality of the subsets used for training, testing and validating an ANN. Poor data splitting can result in inaccurate and highly variable model performance; however, the choice of sampling methodology is rarely given due consideration by ANN modellers. Increased confidence in the sampling is of paramount importance, since the hold-out sampling is generally performed only once during ANN development. This paper considers the variability in the quality of subsets that are obtained using different data splitting approaches. A novel approach to stratified sampling, based on Neyman sampling of the self-organizing map (SOM), is developed, with several guidelines identified for setting the SOM size and sample allocation in order to minimize the bias and variance in the datasets. Using an example ANN function approximation task, the SOM-based approach is evaluated in comparison to random sampling, DUPLEX, systematic stratified sampling, and trial-and-error sampling to minimize the statistical differences between data sets. Of these approaches, DUPLEX is found to provide benchmark performance with good model performance, with no variability. The results show that the SOM-based approach also reliably generates high-quality samples and can therefore be used with greater confidence than other approaches, especially in the case of non-uniform datasets, with the benefit of scalability to perform data splitting on large datasets. Copyright 2009 Elsevier Ltd. All rights reserved.
[Discrimination of varieties of brake fluid using visual-near infrared spectra].
Jiang, Lu-lu; Tan, Li-hong; Qiu, Zheng-jun; Lu, Jiang-feng; He, Yong
2008-06-01
A new method was developed to fast discriminate brands of brake fluid by means of visual-near infrared spectroscopy. Five different brands of brake fluid were analyzed using a handheld near infrared spectrograph, manufactured by ASD Company, and 60 samples were gotten from each brand of brake fluid. The samples data were pretreated using average smoothing and standard normal variable method, and then analyzed using principal component analysis (PCA). A 2-dimensional plot was drawn based on the first and the second principal components, and the plot indicated that the clustering characteristic of different brake fluid is distinct. The foregoing 6 principal components were taken as input variable, and the band of brake fluid as output variable to build the discriminate model by stepwise discriminant analysis method. Two hundred twenty five samples selected randomly were used to create the model, and the rest 75 samples to verify the model. The result showed that the distinguishing rate was 94.67%, indicating that the method proposed in this paper has good performance in classification and discrimination. It provides a new way to fast discriminate different brands of brake fluid.
Wu, Chang-Fu; Lin, Hung-I; Ho, Chi-Chang; Yang, Tzu-Hui; Chen, Chu-Chih; Chan, Chang-Chuan
2014-08-01
Land use regression (LUR) models are increasingly used to evaluate intraurban variability in population exposure to fine particulate matter (PM2.5). However, most of these models lack information on PM2.5 elemental compositions and vertically distributed samples. The purpose of this study was to evaluate intraurban exposure to PM2.5 concentrations and compositions for populations in an Asian city using LUR models, with special emphasis on examining the effects of having measurements on different building stories. PM2.5 samples were collected at 20 sampling sites below the third story (low-level sites). Additional vertically stratified sampling sites were set up on the fourth to sixth (mid-level sites, n=5) and seventh to ninth (high-level sites, n=5) stories. LUR models were built for PM2.5, copper (Cu), iron (Fe), potassium (K), manganese (Mn), nickel (Ni), sulfur (S), silicon (Si), and zinc (Zn). The explained concentration variance (R(2)) of the PM2.5 model was 65%. R(2) values were >69% in the Cu, Fe, Mn, Ni, Si, and Zn models and <44% in the K and S models. Sampling height from ground level was a significant predictor in the PM2.5 and Si models. This finding stresses the importance of collecting vertically stratified information on PM2.5 mass concentrations to reduce potential exposure misclassification in future health studies. In addition to traffic variables, some models identified gravel-plant, industrial, and port variables with large buffer zones as important predictors, indicating that PM from these sources had significant effects at distant places. Copyright © 2014 Elsevier Inc. All rights reserved.
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Phillips, Thomas J.; Gates, W. Lawrence; Arpe, Klaus
1992-12-01
The effects of sampling frequency on the first- and second-moment statistics of selected European Centre for Medium-Range Weather Forecasts (ECMWF) model variables are investigated in a simulation of "perpetual July" with a diurnal cycle included and with surface and atmospheric fields saved at hourly intervals. The shortest characteristic time scales (as determined by the e-folding time of lagged autocorrelation functions) are those of ground heat fluxes and temperatures, precipitation and runoff, convective processes, cloud properties, and atmospheric vertical motion, while the longest time scales are exhibited by soil temperature and moisture, surface pressure, and atmospheric specific humidity, temperature, and wind. The time scales of surface heat and momentum fluxes and of convective processes are substantially shorter over land than over oceans. An appropriate sampling frequency for each model variable is obtained by comparing the estimates of first- and second-moment statistics determined at intervals ranging from 2 to 24 hours with the "best" estimates obtained from hourly sampling. Relatively accurate estimation of first- and second-moment climate statistics (10% errors in means, 20% errors in variances) can be achieved by sampling a model variable at intervals that usually are longer than the bandwidth of its time series but that often are shorter than its characteristic time scale. For the surface variables, sampling at intervals that are nonintegral divisors of a 24-hour day yields relatively more accurate time-mean statistics because of a reduction in errors associated with aliasing of the diurnal cycle and higher-frequency harmonics. The superior estimates of first-moment statistics are accompanied by inferior estimates of the variance of the daily means due to the presence of systematic biases, but these probably can be avoided by defining a different measure of low-frequency variability. Estimates of the intradiurnal variance of accumulated precipitation and surface runoff also are strongly impacted by the length of the storage interval. In light of these results, several alternative strategies for storage of the EMWF model variables are recommended.
ERIC Educational Resources Information Center
Kim, Seohyun; Lu, Zhenqiu; Cohen, Allan S.
2018-01-01
Bayesian algorithms have been used successfully in the social and behavioral sciences to analyze dichotomous data particularly with complex structural equation models. In this study, we investigate the use of the Polya-Gamma data augmentation method with Gibbs sampling to improve estimation of structural equation models with dichotomous variables.…
Spatial Sampling of Weather Data for Regional Crop Yield Simulations
NASA Technical Reports Server (NTRS)
Van Bussel, Lenny G. J.; Ewert, Frank; Zhao, Gang; Hoffmann, Holger; Enders, Andreas; Wallach, Daniel; Asseng, Senthold; Baigorria, Guillermo A.; Basso, Bruno; Biernath, Christian;
2016-01-01
Field-scale crop models are increasingly applied at spatio-temporal scales that range from regions to the globe and from decades up to 100 years. Sufficiently detailed data to capture the prevailing spatio-temporal heterogeneity in weather, soil, and management conditions as needed by crop models are rarely available. Effective sampling may overcome the problem of missing data but has rarely been investigated. In this study the effect of sampling weather data has been evaluated for simulating yields of winter wheat in a region in Germany over a 30-year period (1982-2011) using 12 process-based crop models. A stratified sampling was applied to compare the effect of different sizes of spatially sampled weather data (10, 30, 50, 100, 500, 1000 and full coverage of 34,078 sampling points) on simulated wheat yields. Stratified sampling was further compared with random sampling. Possible interactions between sample size and crop model were evaluated. The results showed differences in simulated yields among crop models but all models reproduced well the pattern of the stratification. Importantly, the regional mean of simulated yields based on full coverage could already be reproduced by a small sample of 10 points. This was also true for reproducing the temporal variability in simulated yields but more sampling points (about 100) were required to accurately reproduce spatial yield variability. The number of sampling points can be smaller when a stratified sampling is applied as compared to a random sampling. However, differences between crop models were observed including some interaction between the effect of sampling on simulated yields and the model used. We concluded that stratified sampling can considerably reduce the number of required simulations. But, differences between crop models must be considered as the choice for a specific model can have larger effects on simulated yields than the sampling strategy. Assessing the impact of sampling soil and crop management data for regional simulations of crop yields is still needed.
Sparse covariance estimation in heterogeneous samples*
Rodríguez, Abel; Lenkoski, Alex; Dobra, Adrian
2015-01-01
Standard Gaussian graphical models implicitly assume that the conditional independence among variables is common to all observations in the sample. However, in practice, observations are usually collected from heterogeneous populations where such an assumption is not satisfied, leading in turn to nonlinear relationships among variables. To address such situations we explore mixtures of Gaussian graphical models; in particular, we consider both infinite mixtures and infinite hidden Markov models where the emission distributions correspond to Gaussian graphical models. Such models allow us to divide a heterogeneous population into homogenous groups, with each cluster having its own conditional independence structure. As an illustration, we study the trends in foreign exchange rate fluctuations in the pre-Euro era. PMID:26925189
Simple Model of the Circulation.
ERIC Educational Resources Information Center
Greenway, Clive A.
1980-01-01
Describes a program in BASIC-11 that explores the relationships between various variables in the circulatory system and permits manipulation of several semiindependent variables to model the effects of hemorrhage, drug infusions, etc. A flow chart and accompanying sample printout are provided; the program is listed in the appendix. (CS)
Feasibility of conducting wetfall chemistry investigations around the Bowen Power Plant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, N.C.J.; Patrinos, A.A.N.
1979-10-01
The feasibility of expanding the Meteorological Effects of Thermal Energy Releases - Oak Ridge National Laboratory (METER-ORNL) research at Bower Power Plant, a coal-fired power plant in northwest Georgia, to include wetfall chemistry is evaluated using results of similar studies around other power plants, several atmospheric washout models, analysis of spatial variability in precipitation, and field logistical considerations. An optimal wetfall chemistry network design is proposed, incorporating the inner portion of the existing rain-gauge network and augmented by additional sites to ensure adequate coverage of probable target areas. The predicted sulfate production rate differs by about four orders of magnitudemore » among the models reviewed with a pH of 3. No model can claim superiority over any other model without substantive data verification. The spatial uniformity in rain amount is evaluated using four storms that occurred at the METER-ORNL network. Values of spatial variability ranged from 8 to 31% and decreased as the mean rainfall increased. The field study of wetfall chemistry will require a minimum of 5 persons to operate the approximately 50 collectors covering an area of 740 km/sup 2/. Preliminary wetfall-only samples collected on an event basis showed lower pH and higher electrical conductivity of precipitation collected about 5 km downwind of the power plant relative to samples collected upwind. Wetfall samples collected on a weekly basis using automatic samplers, however, showed variable results, with no consistent pattern. This suggests the need for event sampling to minimize variable rain volume and multiple-source effects often associated with weekly samples.« less
Lachmann, Bernd; Sariyska, Rayna; Kannen, Christopher; Błaszkiewicz, Konrad; Trendafilov, Boris; Andone, Ionut; Eibes, Mark; Markowetz, Alexander; Li, Mei; Kendrick, Keith M.
2017-01-01
Virtually everybody would agree that life satisfaction is of immense importance in everyday life. Thus, it is not surprising that a considerable amount of research using many different methodological approaches has investigated what the best predictors of life satisfaction are. In the present study, we have focused on several key potential influences on life satisfaction including bottom-up and top-down models, cross-cultural effects, and demographic variables. In four independent (large scale) surveys with sample sizes ranging from N = 488 to 40,297, we examined the associations between life satisfaction and various related variables. Our findings demonstrate that prediction of overall life satisfaction works best when including information about specific life satisfaction variables. From this perspective, satisfaction with leisure showed the highest impact on overall life satisfaction in our European samples. Personality was also robustly associated with life satisfaction, but only when life satisfaction variables were not included in the regression model. These findings could be replicated in all four independent samples, but it was also demonstrated that the relevance of life satisfaction variables changed under the influence of cross-cultural effects. PMID:29295529
Polybrominated Diphenyl Ethers in Residential Dust: Sources of Variability
Whitehead, Todd P.; Brown, F. Reber; Metayer, Catherine; Park, June-Soo; Does, Monique; Petreas, Myrto X.; Buffler, Patricia A.; Rappaport, Stephen M.
2013-01-01
We characterized the sources of variability for polybrominated diphenyl ethers (PBDEs) in residential dust and provided guidance for investigators who plan to use residential dust to assess exposure to PBDEs. We collected repeat dust samples from 292 households in the Northern California Childhood Leukemia Study during two sampling rounds (from 2001–2007 and during 2010) using household vacuum cleaners and measured 22 PBDEs using high resolution gas chromatography-high resolution mass spectrometry. Median concentrations for individual PBDEs ranged from <0.1–2,500 ng per g of dust. For each of eight representative PBDEs, we used a random-effects model to apportion total variance into regional variability (0–11%), intra-regional between-household variability (17–50%), within-household variability over time (38–74%), and within-sample variability (0–23%) and we used a mixed-effects model to identify determinants of PBDE levels. Regional differences in PBDE dust levels were associated with residential characteristics that differed by region, including the presence of furniture with exposed or crumbling foam and the recent installation of carpets in the residence. Intra-regional differences between households were associated with neighborhood urban density, racial and ethnic characteristics, and to a lesser extent, income. For some PBDEs, a decreasing time trend explained a modest fraction of the within-household variability; however, most of the within-household variability was unaccounted for by our mixed-effects models. Our findings indicate that it may be feasible to use residential dust for retrospective assessment of PBDE exposures in studies of children’s health (e.g., the Northern California Childhood Leukemia Study). PMID:23628589
Nonlinear time series modeling and forecasting the seismic data of the Hindu Kush region
NASA Astrophysics Data System (ADS)
Khan, Muhammad Yousaf; Mittnik, Stefan
2018-01-01
In this study, we extended the application of linear and nonlinear time models in the field of earthquake seismology and examined the out-of-sample forecast accuracy of linear Autoregressive (AR), Autoregressive Conditional Duration (ACD), Self-Exciting Threshold Autoregressive (SETAR), Threshold Autoregressive (TAR), Logistic Smooth Transition Autoregressive (LSTAR), Additive Autoregressive (AAR), and Artificial Neural Network (ANN) models for seismic data of the Hindu Kush region. We also extended the previous studies by using Vector Autoregressive (VAR) and Threshold Vector Autoregressive (TVAR) models and compared their forecasting accuracy with linear AR model. Unlike previous studies that typically consider the threshold model specifications by using internal threshold variable, we specified these models with external transition variables and compared their out-of-sample forecasting performance with the linear benchmark AR model. The modeling results show that time series models used in the present study are capable of capturing the dynamic structure present in the seismic data. The point forecast results indicate that the AR model generally outperforms the nonlinear models. However, in some cases, threshold models with external threshold variables specification produce more accurate forecasts, indicating that specification of threshold time series models is of crucial importance. For raw seismic data, the ACD model does not show an improved out-of-sample forecasting performance over the linear AR model. The results indicate that the AR model is the best forecasting device to model and forecast the raw seismic data of the Hindu Kush region.
[Study on Application of NIR Spectral Information Screening in Identification of Maca Origin].
Wang, Yuan-zhong; Zhao, Yan-li; Zhang, Ji; Jin, Hang
2016-02-01
Medicinal and edible plant Maca is rich in various nutrients and owns great medicinal value. Based on near infrared diffuse reflectance spectra, 139 Maca samples collected from Peru and Yunnan were used to identify their geographical origins. Multiplication signal correction (MSC) coupled with second derivative (SD) and Norris derivative filter (ND) was employed in spectral pretreatment. Spectrum range (7,500-4,061 cm⁻¹) was chosen by spectrum standard deviation. Combined with principal component analysis-mahalanobis distance (PCA-MD), the appropriate number of principal components was selected as 5. Based on the spectrum range and the number of principal components selected, two abnormal samples were eliminated by modular group iterative singular sample diagnosis method. Then, four methods were used to filter spectral variable information, competitive adaptive reweighted sampling (CARS), monte carlo-uninformative variable elimination (MC-UVE), genetic algorithm (GA) and subwindow permutation analysis (SPA). The spectral variable information filtered was evaluated by model population analysis (MPA). The results showed that RMSECV(SPA) > RMSECV(CARS) > RMSECV(MC-UVE) > RMSECV(GA), were 2. 14, 2. 05, 2. 02, and 1. 98, and the spectral variables were 250, 240, 250 and 70, respectively. According to the spectral variable filtered, partial least squares discriminant analysis (PLS-DA) was used to build the model, with random selection of 97 samples as training set, and the other 40 samples as validation set. The results showed that, R²: GA > MC-UVE > CARS > SPA, RMSEC and RMSEP: GA < MC-UVE < CARS
On the Accretion Rates of SW Sextantis Nova-like Variables
NASA Astrophysics Data System (ADS)
Ballouz, Ronald-Louis; Sion, Edward M.
2009-06-01
We present accretion rates for selected samples of nova-like variables having IUE archival spectra and distances uniformly determined using an infrared method by Knigge. A comparison with accretion rates derived independently with a multiparametric optimization modeling approach by Puebla et al. is carried out. The accretion rates of SW Sextantis nova-like systems are compared with the accretion rates of non-SW Sextantis systems in the Puebla et al. sample and in our sample, which was selected in the orbital period range of three to four and a half hours, with all systems having distances using the method of Knigge. Based upon the two independent modeling approaches, we find no significant difference between the accretion rates of SW Sextantis systems and non-SW Sextantis nova-like systems insofar as optically thick disk models are appropriate. We find little evidence to suggest that the SW Sex stars have higher accretion rates than other nova-like cataclysmic variables (CVs) above the period gap within the same range of orbital periods.
A Test of Biosocial Models of Adolescent Cigarette and Alcohol Involvement
ERIC Educational Resources Information Center
Foshee, Vangie A.; Ennett, Susan T.; Bauman, Karl E.; Granger, Douglas A.; Benefield, Thad; Suchindran, Chirayath; Hussong, Andrea M.; Karriker-Jaffe, Katherine J.; DuRant, Robert H.
2007-01-01
The authors test biosocial models that posit interactions between biological variables (testosterone, estradiol, pubertal status, and pubertal timing) and social context variables (family, peer, school, and neighborhood) in predicting adolescent involvement with cigarettes and alcohol in a sample of 409 adolescents in Grades 6 and 8. Models…
NASA Astrophysics Data System (ADS)
Luna, Aderval S.; Gonzaga, Fabiano B.; da Rocha, Werickson F. C.; Lima, Igor C. A.
2018-01-01
Laser-induced breakdown spectroscopy (LIBS) analysis was carried out on eleven steel samples to quantify the concentrations of chromium, nickel, and manganese. LIBS spectral data were correlated to known concentrations of the samples using different strategies in partial least squares (PLS) regression models. For the PLS analysis, one predictive model was separately generated for each element, while different approaches were used for the selection of variables (VIP: variable importance in projection and iPLS: interval partial least squares) in the PLS model to quantify the contents of the elements. The comparison of the performance of the models showed that there was no significant statistical difference using the Wilcoxon signed rank test. The elliptical joint confidence region (EJCR) did not detect systematic errors in these proposed methodologies for each metal.
Wang, Ching-Yun; Song, Xiao
2016-11-01
Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
How Robust Is Linear Regression with Dummy Variables?
ERIC Educational Resources Information Center
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
The use of auxiliary variables in capture-recapture and removal experiments
Pollock, K.H.; Hines, J.E.; Nichols, J.D.
1984-01-01
The dependence of animal capture probabilities on auxiliary variables is an important practical problem which has not been considered in the development of estimation procedures for capture-recapture and removal experiments. In this paper the linear logistic binary regression model is used to relate the probability of capture to continuous auxiliary variables. The auxiliary variables could be environmental quantities such as air or water temperature, or characteristics of individual animals, such as body length or weight. Maximum likelihood estimators of the population parameters are considered for a variety of models which all assume a closed population. Testing between models is also considered. The models can also be used when one auxiliary variable is a measure of the effort expended in obtaining the sample.
Motamarri, Srinivas; Boccelli, Dominic L
2012-09-15
Users of recreational waters may be exposed to elevated pathogen levels through various point/non-point sources. Typical daily notifications rely on microbial analysis of indicator organisms (e.g., Escherichia coli) that require 18, or more, hours to provide an adequate response. Modeling approaches, such as multivariate linear regression (MLR) and artificial neural networks (ANN), have been utilized to provide quick predictions of microbial concentrations for classification purposes, but generally suffer from high false negative rates. This study introduces the use of learning vector quantization (LVQ)--a direct classification approach--for comparison with MLR and ANN approaches and integrates input selection for model development with respect to primary and secondary water quality standards within the Charles River Basin (Massachusetts, USA) using meteorologic, hydrologic, and microbial explanatory variables. Integrating input selection into model development showed that discharge variables were the most important explanatory variables while antecedent rainfall and time since previous events were also important. With respect to classification, all three models adequately represented the non-violated samples (>90%). The MLR approach had the highest false negative rates associated with classifying violated samples (41-62% vs 13-43% (ANN) and <16% (LVQ)) when using five or more explanatory variables. The ANN performance was more similar to LVQ when a larger number of explanatory variables were utilized, but the ANN performance degraded toward MLR performance as explanatory variables were removed. Overall, the use of LVQ as a direct classifier provided the best overall classification ability with respect to violated/non-violated samples for both standards. Copyright © 2012 Elsevier Ltd. All rights reserved.
Corsi, Steven R.; Borchardt, M. A.; Spencer, S. K.; Hughes, Peter E.; Baldwin, Austin K.
2014-01-01
To examine the occurrence, hydrologic variability, and seasonal variability of human and bovine viruses in surface water, three stream locations were monitored in the Milwaukee River watershed in Wisconsin, USA, from February 2007 through June 2008. Monitoring sites included an urban subwatershed, a rural subwatershed, and the Milwaukee River at the mouth. To collect samples that characterize variability throughout changing hydrologic periods, a process control system was developed for unattended, large-volume (56–2800 L) filtration over extended durations. This system provided flow-weighted mean concentrations during runoff and extended (24-h) low-flow periods. Human viruses and bovine viruses were detected by real-time qPCR in 49% and 41% of samples (n = 63), respectively. All human viruses analyzed were detected at least once including adenovirus (40% of samples), GI norovirus (10%), enterovirus (8%), rotavirus (6%), GII norovirus (1.6%) and hepatitis A virus (1.6%). Three of seven bovine viruses analyzed were detected including bovine polyomavirus (32%), bovine rotavirus (19%), and bovine viral diarrhea virus type 1 (5%). Human viruses were present in 63% of runoff samples resulting from precipitation and snowmelt, and 20% of low-flow samples. Maximum human virus concentrations exceeded 300 genomic copies/L. Bovine viruses were present in 46% of runoff samples resulting from precipitation and snowmelt and 14% of low-flow samples. The maximum bovine virus concentration was 11 genomic copies/L. Statistical modeling indicated that stream flow, precipitation, and season explained the variability of human viruses in the watershed, and hydrologic condition (runoff event or low-flow) and season explained the variability of the sum of human and bovine viruses; however, no model was identified that could explain the variability of bovine viruses alone. Understanding the factors that affect virus fate and transport in rivers will aid watershed management for minimizing human exposure and disease transmission.
Radinger, Johannes; Wolter, Christian; Kail, Jochem
2015-01-01
Habitat suitability and the distinct mobility of species depict fundamental keys for explaining and understanding the distribution of river fishes. In recent years, comprehensive data on river hydromorphology has been mapped at spatial scales down to 100 m, potentially serving high resolution species-habitat models, e.g., for fish. However, the relative importance of specific hydromorphological and in-stream habitat variables and their spatial scales of influence is poorly understood. Applying boosted regression trees, we developed species-habitat models for 13 fish species in a sand-bed lowland river based on river morphological and in-stream habitat data. First, we calculated mean values for the predictor variables in five distance classes (from the sampling site up to 4000 m up- and downstream) to identify the spatial scale that best predicts the presence of fish species. Second, we compared the suitability of measured variables and assessment scores related to natural reference conditions. Third, we identified variables which best explained the presence of fish species. The mean model quality (AUC = 0.78, area under the receiver operating characteristic curve) significantly increased when information on the habitat conditions up- and downstream of a sampling site (maximum AUC at 2500 m distance class, +0.049) and topological variables (e.g., stream order) were included (AUC = +0.014). Both measured and assessed variables were similarly well suited to predict species’ presence. Stream order variables and measured cross section features (e.g., width, depth, velocity) were best-suited predictors. In addition, measured channel-bed characteristics (e.g., substrate types) and assessed longitudinal channel features (e.g., naturalness of river planform) were also good predictors. These findings demonstrate (i) the applicability of high resolution river morphological and instream-habitat data (measured and assessed variables) to predict fish presence, (ii) the importance of considering habitat at spatial scales larger than the sampling site, and (iii) that the importance of (river morphological) habitat characteristics differs depending on the spatial scale. PMID:26569119
NASA Astrophysics Data System (ADS)
Alavi-Shoushtari, N.; King, D.
2017-12-01
Agricultural landscapes are highly variable ecosystems and are home to many local farmland species. Seasonal, phenological and inter-annual agricultural landscape dynamics have potential to affect the richness and abundance of farmland species. Remote sensing provides data and techniques which enable monitoring landscape changes in multiple temporal and spatial scales. MODIS high temporal resolution remote sensing images enable detection of seasonal and phenological trends, while Landsat higher spatial resolution images, with its long term archive enables inter-annual trend analysis over several decades. The objective of this study to use multi-spatial and multi-temporal remote sensing data to model the response of farmland species to landscape metrics. The study area is the predominantly agricultural region of eastern Ontario. 92 sample landscapes were selected within this region using a protocol designed to maximize variance in composition and configuration heterogeneity while controlling for amount of forest and spatial autocorrelation. Two sample landscape extents (1×1km and 3×3km) were selected to analyze the impacts of spatial scale on biodiversity response. Gamma diversity index data for four taxa groups (birds, butterflies, plants, and beetles) were collected during the summers of 2011 and 2012 within the cropped area of each landscape. To extract the seasonal and phenological metrics a 2000-2012 MODIS NDVI time-series was used, while a 1985-2012 Landsat time-series was used to model the inter-annual trends of change in the sample landscapes. The results of statistical modeling showed significant relationships between farmland biodiversity for several taxa and the phenological and inter-annual variables. The following general results were obtained: 1) Among the taxa groups, plant and beetles diversity was most significantly correlated with the phenological variables; 2) Those phenological variables which are associated with the variability in the start of season date across the sample landscapes and the variability in the corresponding NDVI values at that date showed the strongest correlation with the biodiversity indices; 3) The significance of the models improved when using 3×3km site extent both for MODIS and Landsat based models due most likely to the larger sample size over 3x3km.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
NASA Astrophysics Data System (ADS)
Davis, A. D.; Heimbach, P.; Marzouk, Y.
2017-12-01
We develop a Bayesian inverse modeling framework for predicting future ice sheet volume with associated formal uncertainty estimates. Marine ice sheets are drained by fast-flowing ice streams, which we simulate using a flowline model. Flowline models depend on geometric parameters (e.g., basal topography), parameterized physical processes (e.g., calving laws and basal sliding), and climate parameters (e.g., surface mass balance), most of which are unknown or uncertain. Given observations of ice surface velocity and thickness, we define a Bayesian posterior distribution over static parameters, such as basal topography. We also define a parameterized distribution over variable parameters, such as future surface mass balance, which we assume are not informed by the data. Hyperparameters are used to represent climate change scenarios, and sampling their distributions mimics internal variation. For example, a warming climate corresponds to increasing mean surface mass balance but an individual sample may have periods of increasing or decreasing surface mass balance. We characterize the predictive distribution of ice volume by evaluating the flowline model given samples from the posterior distribution and the distribution over variable parameters. Finally, we determine the effect of climate change on future ice sheet volume by investigating how changing the hyperparameters affects the predictive distribution. We use state-of-the-art Bayesian computation to address computational feasibility. Characterizing the posterior distribution (using Markov chain Monte Carlo), sampling the full range of variable parameters and evaluating the predictive model is prohibitively expensive. Furthermore, the required resolution of the inferred basal topography may be very high, which is often challenging for sampling methods. Instead, we leverage regularity in the predictive distribution to build a computationally cheaper surrogate over the low dimensional quantity of interest (future ice sheet volume). Continual surrogate refinement guarantees asymptotic sampling from the predictive distribution. Directly characterizing the predictive distribution in this way allows us to assess the ice sheet's sensitivity to climate variability and change.
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP.
Deng, Li; Wang, Guohua; Chen, Bo
2015-01-01
In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency.
Ni, Ai; Cai, Jianwen
2018-07-01
Case-cohort designs are commonly used in large epidemiological studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large. An efficient variable selection method is needed for case-cohort studies where the covariates are only observed in a subset of the sample. Current literature on this topic has been focused on the proportional hazards model. However, in many studies the additive hazards model is preferred over the proportional hazards model either because the proportional hazards assumption is violated or the additive hazards model provides more relevent information to the research question. Motivated by one such study, the Atherosclerosis Risk in Communities (ARIC) study, we investigate the properties of a regularized variable selection procedure in stratified case-cohort design under an additive hazards model with a diverging number of parameters. We establish the consistency and asymptotic normality of the penalized estimator and prove its oracle property. Simulation studies are conducted to assess the finite sample performance of the proposed method with a modified cross-validation tuning parameter selection methods. We apply the variable selection procedure to the ARIC study to demonstrate its practical use.
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP
Wang, Guohua; Chen, Bo
2015-01-01
In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency. PMID:26448740
Song, Li-Yu
2017-04-01
This study examined a comprehensive set of potential correlates of recovery based on the Unity Model of Recovery. Thirty-two community psychiatric rehabilitation centers in Taiwan agreed to participate in this study. A sample of 592 participants were administered the questionnaires. Five groups of independent variables were included in the model: socio-demographic variables, illness variables, resilience, informal support, and formal support. The results of regression analysis provided support for the validity of the Unity Model of Recovery. The independent variables explained 53.5% of the variance in recovery for the full sample, and 55.5% for the subsample of the consumers who have been ever employed. The significance of the three cornerstones (resilience, family support, and symptoms) for recovery was confirmed. Other critical support variables, including the extent of rehabilitation service use, professional relationship, and professional support were also found to be significant factors. Among all the significant correlates, resilience, family support, and extent of rehabilitation service use ranked in the top three. The findings could shed light on paths to recovery. Implications for psychiatric services were discussed and suggested. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Decomposition and model selection for large contingency tables.
Dahinden, Corinne; Kalisch, Markus; Bühlmann, Peter
2010-04-01
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.
Improvements in sub-grid, microphysics averages using quadrature based approaches
NASA Astrophysics Data System (ADS)
Chowdhary, K.; Debusschere, B.; Larson, V. E.
2013-12-01
Sub-grid variability in microphysical processes plays a critical role in atmospheric climate models. In order to account for this sub-grid variability, Larson and Schanen (2013) propose placing a probability density function on the sub-grid cloud microphysics quantities, e.g. autoconversion rate, essentially interpreting the cloud microphysics quantities as a random variable in each grid box. Random sampling techniques, e.g. Monte Carlo and Latin Hypercube, can be used to calculate statistics, e.g. averages, on the microphysics quantities, which then feed back into the model dynamics on the coarse scale. We propose an alternate approach using numerical quadrature methods based on deterministic sampling points to compute the statistical moments of microphysics quantities in each grid box. We have performed a preliminary test on the Kessler autoconversion formula, and, upon comparison with Latin Hypercube sampling, our approach shows an increased level of accuracy with a reduction in sample size by almost two orders of magnitude. Application to other microphysics processes is the subject of ongoing research.
Skocic, Sonja; Jackson, Henry; Hulbert, Carol; Faber, Christina
2016-07-01
Clark and Wells' (1995) cognitive model of social anxiety (CWM) explains the maintenance of social anxiety and has been used as a guide for treatment of Social Anxiety Disorder (SAD). Few studies have examined the components of the model together across different samples. This study had two distinct aims: to test the components of CWM and to examine how the variables of CWM may differ between clinical and non-clinical samples with varying levels of social anxiety. Hypothesized relationships between three groups (i.e. a clinical sample of individuals diagnosed with SAD (ClinS), n = 40; socially anxious students (HSA), n = 40; and, non-anxious students (LSA), n = 40) were investigated. Four out of five CWM variables tested were able to distinguish between highly socially anxious and non-anxious groups after controlling for age and depression. CWM variables are able to distinguish between high and low levels of social anxiety and are uniquely related to social anxiety over depression.
Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; de Azevedo Mello, Paola; Ferrão, Marco Flores; de Fátima Pereira dos Santos, Maria; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes
2012-04-01
Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm(-1)). This model produced a RMSECV of 400 mg kg(-1) S and RMSEP of 420 mg kg(-1) S, showing a correlation coefficient of 0.990. Copyright © 2011 Elsevier B.V. All rights reserved.
Profile-likelihood Confidence Intervals in Item Response Theory Models.
Chalmers, R Philip; Pek, Jolynn; Liu, Yang
2017-01-01
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
Sonntag, Darrell B; Gao, H Oliver; Holmén, Britt A
2008-08-01
A linear mixed model was developed to quantify the variability of particle number emissions from transit buses tested in real-world driving conditions. Two conventional diesel buses and two hybrid diesel-electric buses were tested throughout 2004 under different aftertreatments, fuels, drivers, and bus routes. The mixed model controlled the confounding influence of factors inherent to on-board testing. Statistical tests showed that particle number emissions varied significantly according to the after treatment, bus route, driver, bus type, and daily temperature, with only minor variability attributable to differences between fuel types. The daily setup and operation of the sampling equipment (electrical low pressure impactor) and mini-dilution system contributed to 30-84% of the total random variability of particle measurements among tests with diesel oxidation catalysts. By controlling for the sampling day variability, the model better defined the differences in particle emissions among bus routes. In contrast, the low particle number emissions measured with diesel particle filters (decreased by over 99%) did not vary according to operating conditions or bus type but did vary substantially with ambient temperature.
NASA Astrophysics Data System (ADS)
Song, Yunquan; Lin, Lu; Jian, Ling
2016-07-01
Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.
Bello, Alessandra; Bianchi, Federica; Careri, Maria; Giannetto, Marco; Mori, Giovanni; Musci, Marilena
2007-11-05
A new NIR method based on multivariate calibration for determination of ethanol in industrially packed wholemeal bread was developed and validated. GC-FID was used as reference method for the determination of actual ethanol concentration of different samples of wholemeal bread with proper content of added ethanol, ranging from 0 to 3.5% (w/w). Stepwise discriminant analysis was carried out on the NIR dataset, in order to reduce the number of original variables by selecting those that were able to discriminate between the samples of different ethanol concentrations. With the so selected variables a multivariate calibration model was then obtained by multiple linear regression. The prediction power of the linear model was optimized by a new "leave one out" method, so that the number of original variables resulted further reduced.
Petrich, Nicholas T.; Spak, Scott N.; Carmichael, Gregory R.; Hu, Dingfei; Martinez, Andres; Hornbuckle, Keri C.
2013-01-01
Passive air samplers (PAS) including polyurethane foam (PUF) are widely deployed as an inexpensive and practical way to sample semi-volatile pollutants. However, concentration estimates from PAS rely on constant empirical mass transfer rates, which add unquantified uncertainties to concentrations. Here we present a method for modeling hourly sampling rates for semi-volatile compounds from hourly meteorology using first-principle chemistry, physics, and fluid dynamics, calibrated from depuration experiments. This approach quantifies and explains observed effects of meteorology on variability in compound-specific sampling rates and analyte concentrations; simulates nonlinear PUF uptake; and recovers synthetic hourly concentrations at a reference temperature. Sampling rates are evaluated for polychlorinated biphenyl congeners at a network of Harner model samplers in Chicago, Illinois during 2008, finding simulated average sampling rates within analytical uncertainty of those determined from loss of depuration compounds, and confirming quasi-linear uptake. Results indicate hourly, daily and interannual variability in sampling rates, sensitivity to temporal resolution in meteorology, and predictable volatility-based relationships between congeners. We quantify importance of each simulated process to sampling rates and mass transfer and assess uncertainty contributed by advection, molecular diffusion, volatilization, and flow regime within the PAS, finding PAS chamber temperature contributes the greatest variability to total process uncertainty (7.3%). PMID:23837599
Additional Samples: Where They Should Be Located
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pilger, G. G., E-mail: jfelipe@ufrgs.br; Costa, J. F. C. L.; Koppe, J. C.
2001-09-15
Information for mine planning requires to be close spaced, if compared to the grid used for exploration and resource assessment. The additional samples collected during quasimining usually are located in the same pattern of the original diamond drillholes net but closer spaced. This procedure is not the best in mathematical sense for selecting a location. The impact of an additional information to reduce the uncertainty about the parameter been modeled is not the same everywhere within the deposit. Some locations are more sensitive in reducing the local and global uncertainty than others. This study introduces a methodology to select additionalmore » sample locations based on stochastic simulation. The procedure takes into account data variability and their spatial location. Multiple equally probable models representing a geological attribute are generated via geostatistical simulation. These models share basically the same histogram and the same variogram obtained from the original data set. At each block belonging to the model a value is obtained from the n simulations and their combination allows one to access local variability. Variability is measured using an uncertainty index proposed. This index was used to map zones of high variability. A value extracted from a given simulation is added to the original data set from a zone identified as erratic in the previous maps. The process of adding samples and simulation is repeated and the benefit of the additional sample is evaluated. The benefit in terms of uncertainty reduction is measure locally and globally. The procedure showed to be robust and theoretically sound, mapping zones where the additional information is most beneficial. A case study in a coal mine using coal seam thickness illustrates the method.« less
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Morokoff, Patricia J.; Redding, Colleen A.; Harlow, Lisa L.; Cho, Sookhyun; Rossi, Joseph S.; Meier, Kathryn S.; Mayer, Kenneth H.; Koblin, Beryl; Brown-Peterside, Pamela
2014-01-01
This study examined whether the Multifaceted Model of HIV Risk (MMOHR) would predict unprotected sex based on predictors including gender, childhood sexual abuse (CSA), sexual victimization (SV), depression, and sexual assertiveness for condom use. A community-based sample of 473 heterosexually active men and women, aged 18–46 years completed survey measures of model variables. Gender predicted several variables significantly. A separate model for women demonstrated excellent fit, while the model for men demonstrated reasonable fit. Multiple sample model testing supported the use of MMOHR in both men and women, while simultaneously highlighting areas of gender difference. Prevention interventions should focus on sexual assertiveness, especially for CSA and SV survivors, as well as targeting depression, especially among men. PMID:25018617
Francy, Donna S.; Stelzer, Erin A.; Duris, Joseph W.; Brady, Amie M.G.; Harrison, John H.; Johnson, Heather E.; Ware, Michael W.
2013-01-01
Predictive models, based on environmental and water quality variables, have been used to improve the timeliness and accuracy of recreational water quality assessments, but their effectiveness has not been studied in inland waters. Sampling at eight inland recreational lakes in Ohio was done in order to investigate using predictive models for Escherichia coli and to understand the links between E. coli concentrations, predictive variables, and pathogens. Based upon results from 21 beach sites, models were developed for 13 sites, and the most predictive variables were rainfall, wind direction and speed, turbidity, and water temperature. Models were not developed at sites where the E. coli standard was seldom exceeded. Models were validated at nine sites during an independent year. At three sites, the model resulted in increased correct responses, sensitivities, and specificities compared to use of the previous day's E. coli concentration (the current method). Drought conditions during the validation year precluded being able to adequately assess model performance at most of the other sites. Cryptosporidium, adenovirus, eaeA (E. coli), ipaH (Shigella), and spvC (Salmonella) were found in at least 20% of samples collected for pathogens at five sites. The presence or absence of the three bacterial genes was related to some of the model variables but was not consistently related to E. coli concentrations. Predictive models were not effective at all inland lake sites; however, their use at two lakes with high swimmer densities will provide better estimates of public health risk than current methods and will be a valuable resource for beach managers and the public.
Francy, Donna S; Stelzer, Erin A; Duris, Joseph W; Brady, Amie M G; Harrison, John H; Johnson, Heather E; Ware, Michael W
2013-03-01
Predictive models, based on environmental and water quality variables, have been used to improve the timeliness and accuracy of recreational water quality assessments, but their effectiveness has not been studied in inland waters. Sampling at eight inland recreational lakes in Ohio was done in order to investigate using predictive models for Escherichia coli and to understand the links between E. coli concentrations, predictive variables, and pathogens. Based upon results from 21 beach sites, models were developed for 13 sites, and the most predictive variables were rainfall, wind direction and speed, turbidity, and water temperature. Models were not developed at sites where the E. coli standard was seldom exceeded. Models were validated at nine sites during an independent year. At three sites, the model resulted in increased correct responses, sensitivities, and specificities compared to use of the previous day's E. coli concentration (the current method). Drought conditions during the validation year precluded being able to adequately assess model performance at most of the other sites. Cryptosporidium, adenovirus, eaeA (E. coli), ipaH (Shigella), and spvC (Salmonella) were found in at least 20% of samples collected for pathogens at five sites. The presence or absence of the three bacterial genes was related to some of the model variables but was not consistently related to E. coli concentrations. Predictive models were not effective at all inland lake sites; however, their use at two lakes with high swimmer densities will provide better estimates of public health risk than current methods and will be a valuable resource for beach managers and the public.
Sampling design for spatially distributed hydrogeologic and environmental processes
Christakos, G.; Olea, R.A.
1992-01-01
A methodology for the design of sampling networks over space is proposed. The methodology is based on spatial random field representations of nonhomogeneous natural processes, and on optimal spatial estimation techniques. One of the most important results of random field theory for physical sciences is its rationalization of correlations in spatial variability of natural processes. This correlation is extremely important both for interpreting spatially distributed observations and for predictive performance. The extent of site sampling and the types of data to be collected will depend on the relationship of subsurface variability to predictive uncertainty. While hypothesis formulation and initial identification of spatial variability characteristics are based on scientific understanding (such as knowledge of the physics of the underlying phenomena, geological interpretations, intuition and experience), the support offered by field data is statistically modelled. This model is not limited by the geometric nature of sampling and covers a wide range in subsurface uncertainties. A factorization scheme of the sampling error variance is derived, which possesses certain atttactive properties allowing significant savings in computations. By means of this scheme, a practical sampling design procedure providing suitable indices of the sampling error variance is established. These indices can be used by way of multiobjective decision criteria to obtain the best sampling strategy. Neither the actual implementation of the in-situ sampling nor the solution of the large spatial estimation systems of equations are necessary. The required values of the accuracy parameters involved in the network design are derived using reference charts (readily available for various combinations of data configurations and spatial variability parameters) and certain simple yet accurate analytical formulas. Insight is gained by applying the proposed sampling procedure to realistic examples related to sampling problems in two dimensions. ?? 1992.
Characterizing regional soil mineral composition using spectroscopyand geostatistics
Mulder, V.L.; de Bruin, S.; Weyermann, J.; Kokaly, Raymond F.; Schaepman, M.E.
2013-01-01
This work aims at improving the mapping of major mineral variability at regional scale using scale-dependent spatial variability observed in remote sensing data. Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data and statistical methods were combined with laboratory-based mineral characterization of field samples to create maps of the distributions of clay, mica and carbonate minerals and their abundances. The Material Identification and Characterization Algorithm (MICA) was used to identify the spectrally-dominant minerals in field samples; these results were combined with ASTER data using multinomial logistic regression to map mineral distributions. X-ray diffraction (XRD)was used to quantify mineral composition in field samples. XRD results were combined with ASTER data using multiple linear regression to map mineral abundances. We testedwhether smoothing of the ASTER data to match the scale of variability of the target sample would improve model correlations. Smoothing was donewith Fixed Rank Kriging (FRK) to represent the mediumand long-range spatial variability in the ASTER data. Stronger correlations resulted using the smoothed data compared to results obtained with the original data. Highest model accuracies came from using both medium and long-range scaled ASTER data as input to the statistical models. High correlation coefficients were obtained for the abundances of calcite and mica (R2 = 0.71 and 0.70, respectively). Moderately-high correlation coefficients were found for smectite and kaolinite (R2 = 0.57 and 0.45, respectively). Maps of mineral distributions, obtained by relating ASTER data to MICA analysis of field samples, were found to characterize major soil mineral variability (overall accuracies for mica, smectite and kaolinite were 76%, 89% and 86% respectively). The results of this study suggest that the distributions of minerals and their abundances derived using FRK-smoothed ASTER data more closely match the spatial variability of soil and environmental properties at regional scale.
ERIC Educational Resources Information Center
Teo, Timothy
2016-01-01
The aim of this study is to examine the factors that influenced the use of Facebook among university students. Using an extended technology acceptance model (TAM) with emotional attachment (EA) as an external variable, a sample of 498 students from a public-funded Thailand university were surveyed on their responses to five variables hypothesized…
Modeling the intraurban variation in nitrogen dioxide in urban areas in Kathmandu Valley, Nepal.
Gurung, Anobha; Levy, Jonathan I; Bell, Michelle L
2017-05-01
With growing urbanization, traffic has become one of the main sources of air pollution in Nepal. Understanding the impact of air pollution on health requires estimation of exposure. Land use regression (LUR) modeling is widely used to investigate intraurban variation in air pollution for Western cities, but LUR models are relatively scarce in developing countries. In this study, we developed LUR models to characterize intraurban variation of nitrogen dioxide (NO 2 ) in urban areas of Kathmandu Valley, Nepal, one of the fastest urbanizing areas in South Asia. Over the study area, 135 monitoring sites were selected using stratified random sampling based on building density and road density along with purposeful sampling. In 2014, four sampling campaigns were performed, one per season, for two weeks each. NO 2 was measured using duplicate Palmes tubes at 135 sites, with additional information on nitric oxide (NO), NO 2 , and nitrogen oxide (NOx) concentrations derived from Ogawa badges at 28 sites. Geographical variables (e.g., road network, land use, built area) were used as predictor variables in LUR modeling, considering buffers 25-400m around each monitoring site. Annual average NO 2 by site ranged from 5.7 to 120ppb for the study area, with higher concentrations in the Village Development Committees (VDCs) of Kathmandu and Lalitpur than in Kirtipur, Thimi, and Bhaktapur, and with variability present within each VDC. In the final LUR model, length of major road, built area, and industrial area were positively associated with NO 2 concentration while normalized difference vegetation index (NDVI) was negatively associated with NO 2 concentration (R 2 =0.51). Cross-validation of the results confirmed the reliability of the model. The combination of passive NO 2 sampling and LUR modeling techniques allowed for characterization of nitrogen dioxide patterns in a developing country setting, demonstrating spatial variability and high pollution levels. Copyright © 2017 Elsevier Inc. All rights reserved.
Joint modelling rationale for chained equations
2014-01-01
Background Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples. Methods Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied. Results We provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak. Conclusions Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible. PMID:24559129
Fall risk factors analysis based on sample entropy of plantar kinematic signal during stance phase.
Shengyun Liang; Huiyu Jia; Zilong Li; Huiqi Li; Xing Gao; Zuchang Ma; Yingnan Ma; Guoru Zhao
2016-08-01
Falls are a multi-causal phenomenon with a complex interaction. The aim of our research is to study the effect of multiple variables for potential risk of falls and construct an elderly fall risk assessment model based on demographics data and gait characteristics. A total of 101 subjects, whom belong to Malianwa Street, aged above 50 years old and participated in questionnaire survey. Participants were classified into three groups (high, medium and low risk group) according to the score of elderly fall risk assessment scale. In addition, the data of ground reaction force (GRF) and ground reaction moment (GRM) was record when they performed walking at comfortable state. The demographic variables, sample entropy of GRF and GRM, and impulse difference of bilateral foot were considered as potential explanatory variables of risk assessment model. Firstly, we investigated whether different groups could present difference in every variable. Statistical differences were found for the following variables: age (p=2.28e-05); impulse difference (p=0.02036); sample entropy of GRF in vertical direction (p=0.0144); sample entropy of GRM in anterior-posterior direction (p=0.0387). Finally, the multiple regression analysis results indicated that age, impulse difference and sample entropy of resultant GRM could identify individuals who had different levels of fall risk. Therefore, those results could potentially be useful in the fall risk assessment and monitor the state of physical function in elderly population.
A Model for the Correlates of Students' Creative Thinking
ERIC Educational Resources Information Center
Sarsani, Mahender Reddy
2007-01-01
The present study was aimed to explore the relationships between orgainsational or school variables, students' personal background variables, and cognitive and motivational variables. The sample for the survey included 373 students drawn from nine Government schools in Andhra Pradesh, India. Students' creative thinking abilities were measured by…
Short time-scale optical variability properties of the largest AGN sample observed with Kepler/K2
NASA Astrophysics Data System (ADS)
Aranzana, E.; Körding, E.; Uttley, P.; Scaringi, S.; Bloemen, S.
2018-05-01
We present the first short time-scale (˜hours to days) optical variability study of a large sample of active galactic nuclei (AGNs) observed with the Kepler/K2 mission. The sample contains 252 AGN observed over four campaigns with ˜30 min cadence selected from the Million Quasar Catalogue with R magnitude <19. We performed time series analysis to determine their variability properties by means of the power spectral densities (PSDs) and applied Monte Carlo techniques to find the best model parameters that fit the observed power spectra. A power-law model is sufficient to describe all the PSDs of our sample. A variety of power-law slopes were found indicating that there is not a universal slope for all AGNs. We find that the rest-frame amplitude variability in the frequency range of 6 × 10-6-10-4 Hz varies from 1to10 per cent with an average of 1.7 per cent. We explore correlations between the variability amplitude and key parameters of the AGN, finding a significant correlation of rest-frame short-term variability amplitude with redshift. We attribute this effect to the known `bluer when brighter' variability of quasars combined with the fixed bandpass of Kepler data. This study also enables us to distinguish between Seyferts and blazars and confirm AGN candidates. For our study, we have compared results obtained from light curves extracted using different aperture sizes and with and without detrending. We find that limited detrending of the optimal photometric precision light curve is the best approach, although some systematic effects still remain present.
Radio variability in complete samples of extragalactic radio sources at 1.4 GHz
NASA Astrophysics Data System (ADS)
Rys, S.; Machalski, J.
1990-09-01
Complete samples of extragalactic radio sources obtained in 1970-1975 and the sky survey of Condon and Broderick (1983) were used to select sources variable at 1.4 GHz, and to investigate the characteristics of variability in the whole population of sources at this frequency. The radio structures, radio spectral types, and optical identifications of the selected variables are discussed. Only compact flat-spectrum sources vary at 1.4 GHz, and all but four are identified with QSOs, BL Lacs, or other (unconfirmed spectroscopically) stellar objects. No correlation of degree of variability at 1.4 GHz with Galactic latitude or variability at 408 MHz has been found, suggesting that most of the 1.4-GHz variability is intrinsic and not caused by refractive scintillations. Numerical models of the variability have been computed.
Fitts, Douglas A
2017-09-21
The variable criteria sequential stopping rule (vcSSR) is an efficient way to add sample size to planned ANOVA tests while holding the observed rate of Type I errors, α o , constant. The only difference from regular null hypothesis testing is that criteria for stopping the experiment are obtained from a table based on the desired power, rate of Type I errors, and beginning sample size. The vcSSR was developed using between-subjects ANOVAs, but it should work with p values from any type of F test. In the present study, the α o remained constant at the nominal level when using the previously published table of criteria with repeated measures designs with various numbers of treatments per subject, Type I error rates, values of ρ, and four different sample size models. New power curves allow researchers to select the optimal sample size model for a repeated measures experiment. The criteria held α o constant either when used with a multiple correlation that varied the sample size model and the number of predictor variables, or when used with MANOVA with multiple groups and two levels of a within-subject variable at various levels of ρ. Although not recommended for use with χ 2 tests such as the Friedman rank ANOVA test, the vcSSR produces predictable results based on the relation between F and χ 2 . Together, the data confirm the view that the vcSSR can be used to control Type I errors during sequential sampling with any t- or F-statistic rather than being restricted to certain ANOVA designs.
ERIC Educational Resources Information Center
Seo, Hyojeong; Shaw, Leslie A.; Shogren, Karrie A.; Lang, Kyle M.; Little, Todd D.
2017-01-01
This article demonstrates the use of structural equation modeling to develop norms for a translated version of a standardized scale, the Supports Intensity Scale-Children's Version (SIS-C). The latent variable norming method proposed is useful when the standardization sample for a translated version is relatively small to derive norms…
Application of Influence Diagrams in Identifying Soviet Satellite Missions
1990-12-01
Probabilities Comparison ......................... 58 35. Continuous Model Variables ............................ 59 36. Sample Inclination Data...diagramming is a method which allows the simple construction of a model to illustrate the interrelationships which exist among variables by capturing an...environmental monitoring systems. The module also contained an array of instruments for geophysical and astrophysical experimentation . 4.3.14.3 Soyuz. The Soyuz
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2015-01-01
A direct approach to point and interval estimation of Cronbach's coefficient alpha for multiple component measuring instruments is outlined. The procedure is based on a latent variable modeling application with widely circulated software. As a by-product, using sample data the method permits ascertaining whether the population discrepancy…
Modeling the Effects of Early Childhood Intervention Variables on Parent and Family Well-Being
ERIC Educational Resources Information Center
Dunst, Carl J.; Hamby, Deborah W.; Brookfield, Jeffri
2007-01-01
Structural equation modeling was used to evaluate the effects of family, child, and both early childhood intervention process and structural variables on parent and family well-being in a sample of 250 parents involved in birth to age three early childhood intervention programs. Family SES and income had direct positive effects, family-centered…
On Fitting a Multivariate Two-Part Latent Growth Model
Xu, Shu; Blozis, Shelley A.; Vandewater, Elizabeth A.
2017-01-01
A 2-part latent growth model can be used to analyze semicontinuous data to simultaneously study change in the probability that an individual engages in a behavior, and if engaged, change in the behavior. This article uses a Monte Carlo (MC) integration algorithm to study the interrelationships between the growth factors of 2 variables measured longitudinally where each variable can follow a 2-part latent growth model. A SAS macro implementing Mplus is developed to estimate the model to take into account the sampling uncertainty of this simulation-based computational approach. A sample of time-use data is used to show how maximum likelihood estimates can be obtained using a rectangular numerical integration method and an MC integration method. PMID:29333054
Graves, T.A.; Kendall, Katherine C.; Royle, J. Andrew; Stetz, J.B.; Macleod, A.C.
2011-01-01
Few studies link habitat to grizzly bear Ursus arctos abundance and these have not accounted for the variation in detection or spatial autocorrelation. We collected and genotyped bear hair in and around Glacier National Park in northwestern Montana during the summer of 2000. We developed a hierarchical Markov chain Monte Carlo model that extends the existing occupancy and count models by accounting for (1) spatially explicit variables that we hypothesized might influence abundance; (2) separate sub-models of detection probability for two distinct sampling methods (hair traps and rub trees) targeting different segments of the population; (3) covariates to explain variation in each sub-model of detection; (4) a conditional autoregressive term to account for spatial autocorrelation; (5) weights to identify most important variables. Road density and per cent mesic habitat best explained variation in female grizzly bear abundance; spatial autocorrelation was not supported. More female bears were predicted in places with lower road density and with more mesic habitat. Detection rates of females increased with rub tree sampling effort. Road density best explained variation in male grizzly bear abundance and spatial autocorrelation was supported. More male bears were predicted in areas of low road density. Detection rates of males increased with rub tree and hair trap sampling effort and decreased over the sampling period. We provide a new method to (1) incorporate multiple detection methods into hierarchical models of abundance; (2) determine whether spatial autocorrelation should be included in final models. Our results suggest that the influence of landscape variables is consistent between habitat selection and abundance in this system.
NASA Astrophysics Data System (ADS)
Cao, Lu; Li, Hengnian
2016-10-01
For the satellite attitude estimation problem, the serious model errors always exist and hider the estimation performance of the Attitude Determination and Control System (ACDS), especially for a small satellite with low precision sensors. To deal with this problem, a new algorithm for the attitude estimation, referred to as the unscented predictive variable structure filter (UPVSF) is presented. This strategy is proposed based on the variable structure control concept and unscented transform (UT) sampling method. It can be implemented in real time with an ability to estimate the model errors on-line, in order to improve the state estimation precision. In addition, the model errors in this filter are not restricted only to the Gaussian noises; therefore, it has the advantages to deal with the various kinds of model errors or noises. It is anticipated that the UT sampling strategy can further enhance the robustness and accuracy of the novel UPVSF. Numerical simulations show that the proposed UPVSF is more effective and robustness in dealing with the model errors and low precision sensors compared with the traditional unscented Kalman filter (UKF).
NASA Astrophysics Data System (ADS)
Žukovič, Milan; Hristopulos, Dionissios T.
2009-02-01
A current problem of practical significance is how to analyze large, spatially distributed, environmental data sets. The problem is more challenging for variables that follow non-Gaussian distributions. We show by means of numerical simulations that the spatial correlations between variables can be captured by interactions between 'spins'. The spins represent multilevel discretizations of environmental variables with respect to a number of pre-defined thresholds. The spatial dependence between the 'spins' is imposed by means of short-range interactions. We present two approaches, inspired by the Ising and Potts models, that generate conditional simulations of spatially distributed variables from samples with missing data. Currently, the sampling and simulation points are assumed to be at the nodes of a regular grid. The conditional simulations of the 'spin system' are forced to respect locally the sample values and the system statistics globally. The second constraint is enforced by minimizing a cost function representing the deviation between normalized correlation energies of the simulated and the sample distributions. In the approach based on the Nc-state Potts model, each point is assigned to one of Nc classes. The interactions involve all the points simultaneously. In the Ising model approach, a sequential simulation scheme is used: the discretization at each simulation level is binomial (i.e., ± 1). Information propagates from lower to higher levels as the simulation proceeds. We compare the two approaches in terms of their ability to reproduce the target statistics (e.g., the histogram and the variogram of the sample distribution), to predict data at unsampled locations, as well as in terms of their computational complexity. The comparison is based on a non-Gaussian data set (derived from a digital elevation model of the Walker Lake area, Nevada, USA). We discuss the impact of relevant simulation parameters, such as the domain size, the number of discretization levels, and the initial conditions.
An Investigation of the Sampling Distribution of the Congruence Coefficient.
ERIC Educational Resources Information Center
Broadbooks, Wendy J.; Elmore, Patricia B.
This study developed and investigated an empirical sampling distribution of the congruence coefficient. The effects of sample size, number of variables, and population value of the congruence coefficient on the sampling distribution of the congruence coefficient were examined. Sample data were generated on the basis of the common factor model and…
A new model for ancient DNA decay based on paleogenomic meta-analysis
Ware, Roselyn; Smith, Oliver; Collins, Matthew
2017-01-01
Abstract The persistence of DNA over archaeological and paleontological timescales in diverse environments has led to a revolutionary body of paleogenomic research, yet the dynamics of DNA degradation are still poorly understood. We analyzed 185 paleogenomic datasets and compared DNA survival with environmental variables and sample ages. We find cytosine deamination follows a conventional thermal age model, but we find no correlation between DNA fragmentation and sample age over the timespans analyzed, even when controlling for environmental variables. We propose a model for ancient DNA decay wherein fragmentation rapidly reaches a threshold, then subsequently slows. The observed loss of DNA over time may be due to a bulk diffusion process in many cases, highlighting the importance of tissues and environments creating effectively closed systems for DNA preservation. This model of DNA degradation is largely based on mammal bone samples due to published genomic dataset availability. Continued refinement to the model to reflect diverse biological systems and tissue types will further improve our understanding of ancient DNA breakdown dynamics. PMID:28486705
Flickinger, Allison; Christensen, Eric D.
2017-01-01
The Little Blue River in Jackson County, Missouri, was listed as impaired in 2012 due to Escherichia coli (E. coli) from urban runoff and storm sewers. A study was initiated to characterize E. coli concentrations and loads to aid in the development of a total maximum daily load implementation plan. Longitudinal sampling along the stream revealed spatial and temporal variability in E. coli loads. Regression models were developed to better represent E. coli variability in the impaired reach using continuous hydrologic and water-quality parameters as predictive parameters. Daily loads calculated from main-stem samples were significantly higher downstream compared to upstream even though there was no significant difference between the upstream and downstream measured concentrations and no significant conclusions could be drawn from model-estimated loads due to model-associated uncertainty. Increasing sample frequency could decrease the bias and increase the accuracy of the modeled results.
Alpha1 LASSO data bundles Lamont, OK
Gustafson, William Jr; Vogelmann, Andrew; Endo, Satoshi; Toto, Tami; Xiao, Heng; Li, Zhijin; Cheng, Xiaoping; Krishna, Bhargavi (ORCID:000000018828528X)
2016-08-03
A data bundle is a unified package consisting of LASSO LES input and output, observations, evaluation diagnostics, and model skill scores. LES input includes model configuration information and forcing data. LES output includes profile statistics and full domain fields of cloud and environmental variables. Model evaluation data consists of LES output and ARM observations co-registered on the same grid and sampling frequency. Model performance is quantified by skill scores and diagnostics in terms of cloud and environmental variables.
Bootstrap investigation of the stability of a Cox regression model.
Altman, D G; Andersen, P K
1989-07-01
We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
40 CFR 90.706 - Engine sample selection.
Code of Federal Regulations, 2010 CFR
2010-07-01
... = emission test result for an individual engine. x = mean of emission test results of the actual sample. FEL... test with the last test result from the previous model year and then calculate the required sample size.... Test results used to calculate the variables in the following Sample Size Equation must be final...
A gentle introduction to quantile regression for ecologists
Cade, B.S.; Noon, B.R.
2003-01-01
Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Measured acoustic properties of variable and low density bulk absorbers
NASA Technical Reports Server (NTRS)
Dahl, M. D.; Rice, E. J.
1985-01-01
Experimental data were taken to determine the acoustic absorbing properties of uniform low density and layered variable density samples using a bulk absober with a perforated plate facing to hold the material in place. In the layered variable density case, the bulk absorber was packed such that the lowest density layer began at the surface of the sample and progressed to higher density layers deeper inside. The samples were placed in a rectangular duct and measurements were taken using the two microphone method. The data were used to calculate specific acoustic impedances and normal incidence absorption coefficients. Results showed that for uniform density samples the absorption coefficient at low frequencies decreased with increasing density and resonances occurred in the absorption coefficient curve at lower densities. These results were confirmed by a model for uniform density bulk absorbers. Results from layered variable density samples showed that low frequency absorption was the highest when the lowest density possible was packed in the first layer near the exposed surface. The layers of increasing density within the sample had the effect of damping the resonances.
NASA Astrophysics Data System (ADS)
Mikhailova, E. A.; Stiglitz, R. Y.; Post, C. J.; Schlautman, M. A.; Sharp, J. L.; Gerard, P. D.
2017-12-01
Color sensor technologies offer opportunities for affordable and rapid assessment of soil organic carbon (SOC) and total nitrogen (TN) in the field, but the applicability of these technologies may vary by soil type. The objective of this study was to use an inexpensive color sensor to develop SOC and TN prediction models for the Russian Chernozem (Haplic Chernozem) in the Kursk region of Russia. Twenty-one dried soil samples were analyzed using a Nix Pro™ color sensor that is controlled through a mobile application and Bluetooth to collect CIEL*a*b* (darkness to lightness, green to red, and blue to yellow) color data. Eleven samples were randomly selected to be used to construct prediction models and the remaining ten samples were set aside for cross validation. The root mean squared error (RMSE) was calculated to determine each model's prediction error. The data from the eleven soil samples were used to develop the natural log of SOC (lnSOC) and TN (lnTN) prediction models using depth, L*, a*, and b* for each sample as predictor variables in regression analyses. Resulting residual plots, root mean square errors (RMSE), mean squared prediction error (MSPE) and coefficients of determination ( R 2, adjusted R 2) were used to assess model fit for each of the SOC and total N prediction models. Final models were fit using all soil samples, which included depth and color variables, for lnSOC ( R 2 = 0.987, Adj. R 2 = 0.981, RMSE = 0.003, p-value < 0.001, MSPE = 0.182) and lnTN ( R 2 = 0.980 Adj. R 2 = 0.972, RMSE = 0.004, p-value < 0.001, MSPE = 0.001). Additionally, final models were fit for all soil samples, which included only color variables, for lnSOC ( R 2 = 0.959 Adj. R 2 = 0.949, RMSE = 0.007, p-value < 0.001, MSPE = 0.536) and lnTN ( R 2 = 0.912 Adj. R 2 = 0.890, RMSE = 0.015, p-value < 0.001, MSPE = 0.001). The results suggest that soil color may be used for rapid assessment of SOC and TN in these agriculturally important soils.
Boykin, K.G.; Thompson, B.C.; Propeck-Gray, S.
2010-01-01
Despite widespread and long-standing efforts to model wildlife-habitat associations using remotely sensed and other spatially explicit data, there are relatively few evaluations of the performance of variables included in predictive models relative to actual features on the landscape. As part of the National Gap Analysis Program, we specifically examined physical site features at randomly selected sample locations in the Southwestern U.S. to assess degree of concordance with predicted features used in modeling vertebrate habitat distribution. Our analysis considered hypotheses about relative accuracy with respect to 30 vertebrate species selected to represent the spectrum of habitat generalist to specialist and categorization of site by relative degree of conservation emphasis accorded to the site. Overall comparison of 19 variables observed at 382 sample sites indicated ???60% concordance for 12 variables. Directly measured or observed variables (slope, soil composition, rock outcrop) generally displayed high concordance, while variables that required judgments regarding descriptive categories (aspect, ecological system, landform) were less concordant. There were no differences detected in concordance among taxa groups, degree of specialization or generalization of selected taxa, or land conservation categorization of sample sites with respect to all sites. We found no support for the hypothesis that accuracy of habitat models is inversely related to degree of taxa specialization when model features for a habitat specialist could be more difficult to represent spatially. Likewise, we did not find support for the hypothesis that physical features will be predicted with higher accuracy on lands with greater dedication to biodiversity conservation than on other lands because of relative differences regarding available information. Accuracy generally was similar (>60%) to that observed for land cover mapping at the ecological system level. These patterns demonstrate resilience of gap analysis deductive model processes to the type of remotely sensed or interpreted data used in habitat feature predictions. ?? 2010 Elsevier B.V.
Stelzer, Erin A.; Duris, Joseph W.; Brady, Amie M. G.; Harrison, John H.; Johnson, Heather E.; Ware, Michael W.
2013-01-01
Predictive models, based on environmental and water quality variables, have been used to improve the timeliness and accuracy of recreational water quality assessments, but their effectiveness has not been studied in inland waters. Sampling at eight inland recreational lakes in Ohio was done in order to investigate using predictive models for Escherichia coli and to understand the links between E. coli concentrations, predictive variables, and pathogens. Based upon results from 21 beach sites, models were developed for 13 sites, and the most predictive variables were rainfall, wind direction and speed, turbidity, and water temperature. Models were not developed at sites where the E. coli standard was seldom exceeded. Models were validated at nine sites during an independent year. At three sites, the model resulted in increased correct responses, sensitivities, and specificities compared to use of the previous day's E. coli concentration (the current method). Drought conditions during the validation year precluded being able to adequately assess model performance at most of the other sites. Cryptosporidium, adenovirus, eaeA (E. coli), ipaH (Shigella), and spvC (Salmonella) were found in at least 20% of samples collected for pathogens at five sites. The presence or absence of the three bacterial genes was related to some of the model variables but was not consistently related to E. coli concentrations. Predictive models were not effective at all inland lake sites; however, their use at two lakes with high swimmer densities will provide better estimates of public health risk than current methods and will be a valuable resource for beach managers and the public. PMID:23291550
[Adjustment of the Andersen's model to the Mexican context: access to prenatal care].
Tamez-González, Silvia; Valle-Arcos, Rosa Irene; Eibenschutz-Hartman, Catalina; Méndez-Ramírez, Ignacio
2006-01-01
The aim of this work was to propose an adjustment to the Model of Andersen who answers better to the social inequality of the population in the Mexico City and allows to evaluate the effect of socioeconomic factors in the access to the prenatal care of a sample stratified according to degree of marginalization. The data come from a study of 663 women, randomly selected from a framework sample of 21,421 homes in Mexico City. This work collects information about factors that affect utilization of health services, as well as predisposing factors (age and socioeconomic level), as enabling factors (education, social support, entitlement, pay out of pocket and opinion of health services), and need factors. The sample was ranked according to exclusion variables into three stratums. The data were analyzed through the technique of path analysis. The results indicate that socioeconomic level takes part like predisposed variable for utilization of prenatal care services into three stratums. Otherwise, education and social support were the most important enabling variables for utilization of prenatal care services in the same three groups. In regard to low stratum, the most important enabling variables were education and entitlement. For high stratum the principal enabling variables were pay out of pocket and social support. The medium stratum shows atypical behavior which it was difficult to explain and understand. There was not mediating role with need variable in three models. This indicated absence of equality in all stratums. However, the most correlations in high stratum perhaps indicate less inequitable conditions regarding other stratums.
Use of an Ecological Model to Study Sexual Satisfaction in a Heterosexual Spanish Sample.
Del Mar Sánchez-Fuentes, María; Salinas, José María; Sierra, Juan Carlos
2016-11-01
Sexual satisfaction is a key factor in sexual health and has been associated with quality of life. However, few studies have focused on the factors related to sexual satisfaction in the population in Spain. The main goal of this research was to analyze the predictive capacity of an ecological model for the study of sexual satisfaction in a Spanish sample of 723 men and 851 women, with mean age equal to 36.28 (SD = 12.59) and who were in a heterosexual relationship. We analyzed, using structural equation modeling, the degree to which sexual satisfaction was related to different variables. These variables were the following: personal variables (depression and sexual attitudes); interpersonal variables (relationship satisfaction, sexual function, and sexual assertiveness); social variables (social support, parenthood, and annual income); and cultural variables (political ideology, religion, and religious practice). In men, sexual satisfaction was directly predicted by relationship satisfaction and sexual function. Furthermore, political ideology, religious practice, social support, annual income, initiation sexual assertiveness, and sexual attitudes were indirectly associated with sexual satisfaction. In women, sexual satisfaction was directly predicted by relationship satisfaction, sexual function, sexual assertiveness, and sexual attitudes. In addition, political ideology, religious practice, and social support were indirectly associated with sexual satisfaction. Implications for research and therapy are also discussed.
Estimating maize production in Kenya using NDVI: Some statistical considerations
Lewis, J.E.; Rowland, James; Nadeau , A.
1998-01-01
A regression model approach using a normalized difference vegetation index (NDVI) has the potential for estimating crop production in East Africa. However, before production estimation can become a reality, the underlying model assumptions and statistical nature of the sample data (NDVI and crop production) must be examined rigorously. Annual maize production statistics from 1982-90 for 36 agricultural districts within Kenya were used as the dependent variable; median area NDVI (independent variable) values from each agricultural district and year were extracted from the annual maximum NDVI data set. The input data and the statistical association of NDVI with maize production for Kenya were tested systematically for the following items: (1) homogeneity of the data when pooling the sample, (2) gross data errors and influence points, (3) serial (time) correlation, (4) spatial autocorrelation and (5) stability of the regression coefficients. The results of using a simple regression model with NDVI as the only independent variable are encouraging (r 0.75, p 0.05) and illustrate that NDVI can be a responsive indicator of maize production, especially in areas of high NDVI spatial variability, which coincide with areas of production variability in Kenya.
Predicting active-layer soil thickness using topographic variables at a small watershed scale
Li, Aidi; Tan, Xing; Wu, Wei; Liu, Hongbin; Zhu, Jie
2017-01-01
Knowledge about the spatial distribution of active-layer (AL) soil thickness is indispensable for ecological modeling, precision agriculture, and land resource management. However, it is difficult to obtain the details on AL soil thickness by using conventional soil survey method. In this research, the objective is to investigate the possibility and accuracy of mapping the spatial distribution of AL soil thickness through random forest (RF) model by using terrain variables at a small watershed scale. A total of 1113 soil samples collected from the slope fields were randomly divided into calibration (770 soil samples) and validation (343 soil samples) sets. Seven terrain variables including elevation, aspect, relative slope position, valley depth, flow path length, slope height, and topographic wetness index were derived from a digital elevation map (30 m). The RF model was compared with multiple linear regression (MLR), geographically weighted regression (GWR) and support vector machines (SVM) approaches based on the validation set. Model performance was evaluated by precision criteria of mean error (ME), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2). Comparative results showed that RF outperformed MLR, GWR and SVM models. The RF gave better values of ME (0.39 cm), MAE (7.09 cm), and RMSE (10.85 cm) and higher R2 (62%). The sensitivity analysis demonstrated that the DEM had less uncertainty than the AL soil thickness. The outcome of the RF model indicated that elevation, flow path length and valley depth were the most important factors affecting the AL soil thickness variability across the watershed. These results demonstrated the RF model is a promising method for predicting spatial distribution of AL soil thickness using terrain parameters. PMID:28877196
Scheel, Ida; Ferkingstad, Egil; Frigessi, Arnoldo; Haug, Ola; Hinnerichsen, Mikkel; Meze-Hausken, Elisabeth
2013-01-01
Climate change will affect the insurance industry. We develop a Bayesian hierarchical statistical approach to explain and predict insurance losses due to weather events at a local geographic scale. The number of weather-related insurance claims is modelled by combining generalized linear models with spatially smoothed variable selection. Using Gibbs sampling and reversible jump Markov chain Monte Carlo methods, this model is fitted on daily weather and insurance data from each of the 319 municipalities which constitute southern and central Norway for the period 1997–2006. Precise out-of-sample predictions validate the model. Our results show interesting regional patterns in the effect of different weather covariates. In addition to being useful for insurance pricing, our model can be used for short-term predictions based on weather forecasts and for long-term predictions based on downscaled climate models. PMID:23396890
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
Liu, Ruijie; Holik, Aliaksei Z.; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E.; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.
2015-01-01
Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package. PMID:25925576
Does Nationality Matter in the B2C Environment? Results from a Two Nation Study
NASA Astrophysics Data System (ADS)
Peikari, Hamid Reza
Different studies have explored the relations between different dimensions of e-commerce transactions and lots of models and findings have been proposed to the academic and business worlds. However, there is a doubt on the applications and generalization of such models and findings in different countries and nations. In other words, this study argues that the relations among the variables of a model ay differ in different countries, which raises questions on the findings of researchers collecting data in one country to test their hypotheses. This study intends to examine if different nations have different perceptions toward the elements of Website interface, security and purchase intention on Internet. Moreover, a simple model was developed to investigate whether the independent variables of the model are equally important in different nations and significantly influence the dependent variable in such nations or not. Since majority of the studies in the context of e-commerce were either focused on the developed countries which have a high e-readiness indices and overall ranks, two developing countries with different e-readiness indices and ranks were selected for the data collection. The results showed that the samples had different significant perceptions of security and some of the Website interface factors. Moreover, it was found that the significance of relations among the independent variables ad the dependent variable are different between the samples, which questions the findings of the researchers testing their model and hypotheses only based on the data collected in one country.
Typing SNP based on the near-infrared spectroscopy and artificial neural network
NASA Astrophysics Data System (ADS)
Ren, Li; Wang, Wei-Peng; Gao, Yu-Zhen; Yu, Xiao-Wei; Xie, Hong-Ping
2009-07-01
Based on the near-infrared spectra (NIRS) of the measured samples as the discriminant variables of their genotypes, the genotype discriminant model of SNP has been established by using back-propagation artificial neural network (BP-ANN). Taking a SNP (857G > A) of N-acetyltransferase 2 (NAT2) as an example, DNA fragments containing the SNP site were amplified by the PCR method based on a pair of primers to obtain the three-genotype (GG, AA, and GA) modeling samples. The NIRS-s of the amplified samples were directly measured in transmission by using quartz cell. Based on the sample spectra measured, the two BP-ANN-s were combined to obtain the stronger ability of the three-genotype classification. One of them was established to compress the measured NIRS variables by using the resilient back-propagation algorithm, and another network established by Levenberg-Marquardt algorithm according to the compressed NIRS-s was used as the discriminant model of the three-genotype classification. For the established model, the root mean square error for the training and the prediction sample sets were 0.0135 and 0.0132, respectively. Certainly, this model could rightly predict the three genotypes (i.e. the accuracy of prediction samples was up to100%) and had a good robust for the prediction of unknown samples. Since the three genotypes of SNP could be directly determined by using the NIRS-s without any preprocessing for the analyzed samples after PCR, this method is simple, rapid and low-cost.
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS
Huang, Jian; Horowitz, Joel L.; Wei, Fengrong
2010-01-01
We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. PMID:21127739
ERIC Educational Resources Information Center
Svanum, Soren; Bringle, Robert G.
1980-01-01
The confluence model of cognitive development was tested on 7,060 children. Family size, sibling order within family sizes, and hypothesized age-dependent effects were tested. Findings indicated an inverse relationship between family size and the cognitive measures; age-dependent effects and other confluence variables were found to be…
The relative influence of nutrients and habitat on stream metabolism in agricultural streams
Frankforter, J.D.; Weyers, H.S.; Bales, J.D.; Moran, P.W.; Calhoun, D.L.
2010-01-01
Stream metabolism was measured in 33 streams across a gradient of nutrient concentrations in four agricultural areas of the USA to determine the relative influence of nutrient concentrations and habitat on primary production (GPP) and respiration (CR-24). In conjunction with the stream metabolism estimates, water quality and algal biomass samples were collected, as was an assessment of habitat in the sampling reach. When data for all study areas were combined, there were no statistically significant relations between gross primary production or community respiration and any of the independent variables. However, significant regression models were developed for three study areas for GPP (r 2 = 0.79-0.91) and CR-24 (r 2 = 0.76-0.77). Various forms of nutrients (total phosphorus and area-weighted total nitrogen loading) were significant for predicting GPP in two study areas, with habitat variables important in seven significant models. Important physical variables included light availability, precipitation, basin area, and in-stream habitat cover. Both benthic and seston chlorophyll were not found to be important explanatory variables in any of the models; however, benthic ash-free dry weight was important in two models for GPP. ?? 2009 The Author(s).
Selenium in irrigated agricultural areas of the western United States
Nolan, B.T.; Clark, M.L.
1997-01-01
A logistic regression model was developed to predict the likelihood that Se exceeds the USEPA chronic criterion for aquatic life (5 ??g/L) in irrigated agricultural areas of the western USA. Preliminary analysis of explanatory variables used in the model indicated that surface-water Se concentration increased with increasing dissolved solids (DS) concentration and with the presence of Upper Cretaceous, mainly marine sediment. The presence or absence of Cretaceous sediment was the major variable affecting Se concentration in surface-water samples from the National Irrigation Water Quality Program. Median Se concentration was 14 ??g/L in samples from areas underlain by Cretaceous sediments and < 1 ??g/L in samples from areas underlain by non-Cretaceous sediments. Wilcoxon rank sum tests indicated that elevated Se concentrations in samples from areas with Cretaceous sediments, irrigated areas, and from closed lakes and ponds were statistically significant. Spearman correlations indicated that Se was positively correlated with a binary geology variable (0.64) and DS (0.45). Logistic regression models indicated that the concentration of Se in surface water was almost certain to exceed the Environmental Protection Agency aquatic-life chronic criterion of 5 ??g/L when DS was greater than 3000 mg/L in areas with Cretaceous sediments. The 'best' logistic regression model correctly predicted Se exceedances and nonexceedances 84.4% of the time, and model sensitivity was 80.7%. A regional map of Cretaceous sediment showed the location of potential problem areas. The map and logistic regression model are tools that can be used to determine the potential for Se contamination of irrigated agricultural areas in the western USA.
NASA Astrophysics Data System (ADS)
Sung, S.; Kim, H. G.; Lee, D. K.; Park, J. H.; Mo, Y.; Kil, S.; Park, C.
2016-12-01
The impact of climate change has been observed throughout the globe. The ecosystem experiences rapid changes such as vegetation shift, species extinction. In these context, Species Distribution Model (SDM) is one of the popular method to project impact of climate change on the ecosystem. SDM basically based on the niche of certain species with means to run SDM present point data is essential to find biological niche of species. To run SDM for plants, there are certain considerations on the characteristics of vegetation. Normally, to make vegetation data in large area, remote sensing techniques are used. In other words, the exact point of presence data has high uncertainties as we select presence data set from polygons and raster dataset. Thus, sampling methods for modeling vegetation presence data should be carefully selected. In this study, we used three different sampling methods for selection of presence data of vegetation: Random sampling, Stratified sampling and Site index based sampling. We used one of the R package BIOMOD2 to access uncertainty from modeling. At the same time, we included BioCLIM variables and other environmental variables as input data. As a result of this study, despite of differences among the 10 SDMs, the sampling methods showed differences in ROC values, random sampling methods showed the lowest ROC value while site index based sampling methods showed the highest ROC value. As a result of this study the uncertainties from presence data sampling methods and SDM can be quantified.
Zheng, Lianqing; Chen, Mengen; Yang, Wei
2009-06-21
To overcome the pseudoergodicity problem, conformational sampling can be accelerated via generalized ensemble methods, e.g., through the realization of random walks along prechosen collective variables, such as spatial order parameters, energy scaling parameters, or even system temperatures or pressures, etc. As usually observed, in generalized ensemble simulations, hidden barriers are likely to exist in the space perpendicular to the collective variable direction and these residual free energy barriers could greatly abolish the sampling efficiency. This sampling issue is particularly severe when the collective variable is defined in a low-dimension subset of the target system; then the "Hamiltonian lagging" problem, which reveals the fact that necessary structural relaxation falls behind the move of the collective variable, may be likely to occur. To overcome this problem in equilibrium conformational sampling, we adopted the orthogonal space random walk (OSRW) strategy, which was originally developed in the context of free energy simulation [L. Zheng, M. Chen, and W. Yang, Proc. Natl. Acad. Sci. U.S.A. 105, 20227 (2008)]. Thereby, generalized ensemble simulations can simultaneously escape both the explicit barriers along the collective variable direction and the hidden barriers that are strongly coupled with the collective variable move. As demonstrated in our model studies, the present OSRW based generalized ensemble treatments show improved sampling capability over the corresponding classical generalized ensemble treatments.
Observational studies of patients in the emergency department: a comparison of 4 sampling methods.
Valley, Morgan A; Heard, Kennon J; Ginde, Adit A; Lezotte, Dennis C; Lowenstein, Steven R
2012-08-01
We evaluate the ability of 4 sampling methods to generate representative samples of the emergency department (ED) population. We analyzed the electronic records of 21,662 consecutive patient visits at an urban, academic ED. From this population, we simulated different models of study recruitment in the ED by using 2 sample sizes (n=200 and n=400) and 4 sampling methods: true random, random 4-hour time blocks by exact sample size, random 4-hour time blocks by a predetermined number of blocks, and convenience or "business hours." For each method and sample size, we obtained 1,000 samples from the population. Using χ(2) tests, we measured the number of statistically significant differences between the sample and the population for 8 variables (age, sex, race/ethnicity, language, triage acuity, arrival mode, disposition, and payer source). Then, for each variable, method, and sample size, we compared the proportion of the 1,000 samples that differed from the overall ED population to the expected proportion (5%). Only the true random samples represented the population with respect to sex, race/ethnicity, triage acuity, mode of arrival, language, and payer source in at least 95% of the samples. Patient samples obtained using random 4-hour time blocks and business hours sampling systematically differed from the overall ED patient population for several important demographic and clinical variables. However, the magnitude of these differences was not large. Common sampling strategies selected for ED-based studies may affect parameter estimates for several representative population variables. However, the potential for bias for these variables appears small. Copyright © 2012. Published by Mosby, Inc.
NASA Astrophysics Data System (ADS)
De Lucia, Frank C., Jr.; Gottfried, Jennifer L.
2011-02-01
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.
Hering, Johanna; Hille, Katja; Frömke, Cornelia; von Münchhausen, Christiane; Hartmann, Maria; Schneider, Bettina; Friese, Anika; Roesler, Uwe; Merle, Roswitha; Kreienbrock, Lothar
2014-09-01
A cross-sectional study concerning farm prevalence and risk factors for the count of cefotaxime resistant Escherichia coli (E. coli) (CREC) positive samples per sampling group on German fattening pig farms was performed in 2011 and 2012. Altogether 48 farms in four agricultural regions in the whole of Germany were investigated. Faecal samples, boot swabs and dust samples from two sampling groups per farm were taken and supplemental data were collected using a questionnaire. On 85% of the farms, at least one sample contained cefotaxime resistant E. coli colonies. Positive samples were more frequent in faeces (61%) and boot swabs (54%) than in dust samples (11%). Relevant variables from the questionnaire were analysed in a univariable mixed effect Poisson regression model. Variables that were related to the number (risk) of positive samples per sampling group with a p-value <0.2 were entered in a multivariable model. This model was reduced to statistically significant variables via backward selection. Factors that increased the risk for positive samples involved farm management and hygienic aspects. Farms that had a separate pen for diseased pigs had a 2.8 higher mean count of positive samples (95%-CI [1.71; 4.58], p=0.001) than farms without an extra pen. The mean count was increased on farms with under-floor exhaust ventilation compared to farms with over floor ventilation (2.22 [1.43; 3.46], p=0.001) and more positive samples were observed on farms that controlled flies with toxin compared to farms that did not (1.86 [1.24; 2.78], p=0.003). It can be concluded, that CREC are wide spread on German fattening pig farms. In addition the explorative approach of the present study suggests an influence of management strategies on the occurrence of cefotaxime resistant E. coli. Copyright © 2014 Elsevier B.V. All rights reserved.
Efficient SRAM yield optimization with mixture surrogate modeling
NASA Astrophysics Data System (ADS)
Zhongjian, Jiang; Zuochang, Ye; Yan, Wang
2016-12-01
Largely repeated cells such as SRAM cells usually require extremely low failure-rate to ensure a moderate chi yield. Though fast Monte Carlo methods such as importance sampling and its variants can be used for yield estimation, they are still very expensive if one needs to perform optimization based on such estimations. Typically the process of yield calculation requires a lot of SPICE simulation. The circuit SPICE simulation analysis accounted for the largest proportion of time in the process yield calculation. In the paper, a new method is proposed to address this issue. The key idea is to establish an efficient mixture surrogate model. The surrogate model is based on the design variables and process variables. This model construction method is based on the SPICE simulation to get a certain amount of sample points, these points are trained for mixture surrogate model by the lasso algorithm. Experimental results show that the proposed model is able to calculate accurate yield successfully and it brings significant speed ups to the calculation of failure rate. Based on the model, we made a further accelerated algorithm to further enhance the speed of the yield calculation. It is suitable for high-dimensional process variables and multi-performance applications.
Data-driven process decomposition and robust online distributed modelling for large-scale processes
NASA Astrophysics Data System (ADS)
Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou
2018-02-01
With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
Candela, L.; Olea, R.A.; Custodio, E.
1988-01-01
Groundwater quality observation networks are examples of discontinuous sampling on variables presenting spatial continuity and highly skewed frequency distributions. Anywhere in the aquifer, lognormal kriging provides estimates of the variable being sampled and a standard error of the estimate. The average and the maximum standard error within the network can be used to dynamically improve the network sampling efficiency or find a design able to assure a given reliability level. The approach does not require the formulation of any physical model for the aquifer or any actual sampling of hypothetical configurations. A case study is presented using the network monitoring salty water intrusion into the Llobregat delta confined aquifer, Barcelona, Spain. The variable chloride concentration used to trace the intrusion exhibits sudden changes within short distances which make the standard error fairly invariable to changes in sampling pattern and to substantial fluctuations in the number of wells. ?? 1988.
Anxiety, Depression and Hopelessness in Adolescents: A Structural Equation Model
Cunningham, Shaylyn; Gunn, Thelma; Alladin, Assen; Cawthorpe, David
2008-01-01
Objective This study tested a structural model, examining the relationship between a latent variable termed demoralization and measured variables (anxiety, depression and hopelessness) in a community sample of Canadian youth. Methods The combined sample consisted of data collected from four independent studies from 2001 to 2005. Nine hundred and seventy one (n=971) participants were high school students (grades 10–12) from three geographic locations: Calgary, Saskatchewan and Lethbridge. Participants completed the Beck Anxiety Inventory (BAI), Beck Depression Inventory-Revised (BDI-II), Beck Hopelessness Scale (BHS), and demographic survey. Structural equation modeling was used for statistical analysis. Results The analysis revealed that the final model, including depression, anxiety and hopelessness and one latent variable demoralization, fit the data (chi-square value, X2 (2) = 7.25, p< .001, goodness of fit indices (CFI=0.99, NFI=0.98) and standardized error (0.05). Overall, the findings suggest that close relationships exist among depression, anxiety, hopelessness and demoralization that is stable across demographic variables. Further, the model explains the relationship between sub-clinical anxiety, depression and hopelessness. Conclusion These findings contribute to a theoretical framework, which has implications for educational and clinical intervention. The present findings will help guide further preventative research on examining demoralization as a precursor to sub-clinical anxiety and depression. PMID:18769644
Bastistella, Luciane; Rousset, Patrick; Aviz, Antonio; Caldeira-Pires, Armando; Humbert, Gilles; Nogueira, Manoel
2018-02-09
New experimental techniques, as well as modern variants on known methods, have recently been employed to investigate the fundamental reactions underlying the oxidation of biochar. The purpose of this paper was to experimentally and statistically study how the relative humidity of air, mass, and particle size of four biochars influenced the adsorption of water and the increase in temperature. A random factorial design was employed using the intuitive statistical software Xlstat. A simple linear regression model and an analysis of variance with a pairwise comparison were performed. The experimental study was carried out on the wood of Quercus pubescens , Cyclobalanopsis glauca , Trigonostemon huangmosun , and Bambusa vulgaris , and involved five relative humidity conditions (22, 43, 75, 84, and 90%), two mass samples (0.1 and 1 g), and two particle sizes (powder and piece). Two response variables including water adsorption and temperature increase were analyzed and discussed. The temperature did not increase linearly with the adsorption of water. Temperature was modeled by nine explanatory variables, while water adsorption was modeled by eight. Five variables, including factors and their interactions, were found to be common to the two models. Sample mass and relative humidity influenced the two qualitative variables, while particle size and biochar type only influenced the temperature.
Bayesian Normalization Model for Label-Free Quantitative Analysis by LC-MS
Nezami Ranjbar, Mohammad R.; Tadesse, Mahlet G.; Wang, Yue; Ressom, Habtom W.
2016-01-01
We introduce a new method for normalization of data acquired by liquid chromatography coupled with mass spectrometry (LC-MS) in label-free differential expression analysis. Normalization of LC-MS data is desired prior to subsequent statistical analysis to adjust variabilities in ion intensities that are not caused by biological differences but experimental bias. There are different sources of bias including variabilities during sample collection and sample storage, poor experimental design, noise, etc. In addition, instrument variability in experiments involving a large number of LC-MS runs leads to a significant drift in intensity measurements. Although various methods have been proposed for normalization of LC-MS data, there is no universally applicable approach. In this paper, we propose a Bayesian normalization model (BNM) that utilizes scan-level information from LC-MS data. Specifically, the proposed method uses peak shapes to model the scan-level data acquired from extracted ion chromatograms (EIC) with parameters considered as a linear mixed effects model. We extended the model into BNM with drift (BNMD) to compensate for the variability in intensity measurements due to long LC-MS runs. We evaluated the performance of our method using synthetic and experimental data. In comparison with several existing methods, the proposed BNM and BNMD yielded significant improvement. PMID:26357332
NASA Astrophysics Data System (ADS)
Ouyang, Qin; Liu, Yan; Chen, Quansheng; Zhang, Zhengzhu; Zhao, Jiewen; Guo, Zhiming; Gu, Hang
2017-06-01
Instrumental test of black tea samples instead of human panel test is attracting massive attention recently. This study focused on an investigation of the feasibility for estimation of the color sensory quality of black tea samples using the VIS-NIR spectroscopy technique, comparing the performances of models based on the spectra and color information. In model calibration, the variables were first selected by genetic algorithm (GA); then the nonlinear back propagation-artificial neural network (BPANN) models were established based on the optimal variables. In comparison with the other models, GA-BPANN models from spectra data information showed the best performance, with the correlation coefficient of 0.8935, and the root mean square error of 0.392 in the prediction set. In addition, models based on the spectra information provided better performance than that based on the color parameters. Therefore, the VIS-NIR spectroscopy technique is a promising tool for rapid and accurate evaluation of the sensory quality of black tea samples.
Ouyang, Qin; Liu, Yan; Chen, Quansheng; Zhang, Zhengzhu; Zhao, Jiewen; Guo, Zhiming; Gu, Hang
2017-06-05
Instrumental test of black tea samples instead of human panel test is attracting massive attention recently. This study focused on an investigation of the feasibility for estimation of the color sensory quality of black tea samples using the VIS-NIR spectroscopy technique, comparing the performances of models based on the spectra and color information. In model calibration, the variables were first selected by genetic algorithm (GA); then the nonlinear back propagation-artificial neural network (BPANN) models were established based on the optimal variables. In comparison with the other models, GA-BPANN models from spectra data information showed the best performance, with the correlation coefficient of 0.8935, and the root mean square error of 0.392 in the prediction set. In addition, models based on the spectra information provided better performance than that based on the color parameters. Therefore, the VIS-NIR spectroscopy technique is a promising tool for rapid and accurate evaluation of the sensory quality of black tea samples. Copyright © 2017 Elsevier B.V. All rights reserved.
Åsenlöf, Pernilla; Bring, Annika; Söderlund, Anne
2013-12-21
Different recovery patterns are reported for those befallen a whip-lash injury, but little is known about the variability within subgroups. The aims were (1) to compare a self-selected mildly affected sample (MILD) with a self-selected moderately to severely affected sample (MOD/SEV) with regard to background characteristics and pain-related disability, pain intensity, functional self-efficacy, fear of movement/(re)injury, pain catastrophising, post-traumatic stress symptoms in the acute stage (at baseline), (2) to study the development over the first year after the accident for the above listed clinical variables in the MILD sample, and (3) to study the validity of a prediction model including baseline levels of clinical variables on pain-related disability one year after baseline assessments. The study had a prospective and correlative design. Ninety-eight participants were consecutively selected. Inclusion criteria; age 18 to 65 years, WAD grade I-II, Swedish language skills, and subjective report of not being in need of treatment due to mild symptoms. A multivariate linear regression model was applied for the prediction analysis. The MILD sample was less affected in all study variables compared to the MOD/SEV sample. Pain-related disability, pain catastrophising, and post-traumatic stress symptoms decreased over the first year after the accident, whereas functional self-efficacy and fear of movement/(re)injury increased. Pain intensity was stable. Pain-related disability at baseline emerged as the only statistically significant predictor of pain-related disability one year after the accident (Adj r² = 0.67). A good prognosis over the first year is expected for the majority of individuals with WAD grade I or II who decline treatment due to mild symptoms. The prediction model was not valid in the MILD sample except for the contribution of pain-related disability. An implication is that early observations of individuals with elevated levels of pain-related disability are warranted, although they may decline treatment.
Huang, Tao; Li, Xiao-yu; Xu, Meng-ling; Jin, Rui; Ku, Jing; Xu, Sen-miao; Wu, Zhen-zhong
2015-01-01
The quality of potato is directly related to their edible value and industrial value. Hollow heart of potato, as a physiological disease occurred inside the tuber, is difficult to be detected. This paper put forward a non-destructive detection method by using semi-transmission hyperspectral imaging with support vector machine (SVM) to detect hollow heart of potato. Compared to reflection and transmission hyperspectral image, semi-transmission hyperspectral image can get clearer image which contains the internal quality information of agricultural products. In this study, 224 potato samples (149 normal samples and 75 hollow samples) were selected as the research object, and semi-transmission hyperspectral image acquisition system was constructed to acquire the hyperspectral images (390-1 040 nn) of the potato samples, and then the average spectrum of region of interest were extracted for spectral characteristics analysis. Normalize was used to preprocess the original spectrum, and prediction model were developed based on SVM using all wave bands, the accurate recognition rate of test set is only 87. 5%. In order to simplify the model competitive.adaptive reweighed sampling algorithm (CARS) and successive projection algorithm (SPA) were utilized to select important variables from the all 520 spectral variables and 8 variables were selected (454, 601, 639, 664, 748, 827, 874 and 936 nm). 94. 64% of the accurate recognition rate of test set was obtained by using the 8 variables to develop SVM model. Parameter optimization algorithms, including artificial fish swarm algorithm (AFSA), genetic algorithm (GA) and grid search algorithm, were used to optimize the SVM model parameters: penalty parameter c and kernel parameter g. After comparative analysis, AFSA, a new bionic optimization algorithm based on the foraging behavior of fish swarm, was proved to get the optimal model parameter (c=10. 659 1, g=0. 349 7), and the recognition accuracy of 10% were obtained for the AFSA-SVM model. The results indicate that combining the semi-transmission hyperspectral imaging technology with CARS-SPA and AFSA-SVM can accurately detect hollow heart of potato, and also provide technical support for rapid non-destructive detecting of hollow heart of potato.
Zimmerman, Tammy M.
2006-01-01
The Lake Erie shoreline in Pennsylvania spans nearly 40 miles and is a valuable recreational resource for Erie County. Nearly 7 miles of the Lake Erie shoreline lies within Presque Isle State Park in Erie, Pa. Concentrations of Escherichia coli (E. coli) bacteria at permitted Presque Isle beaches occasionally exceed the single-sample bathing-water standard, resulting in unsafe swimming conditions and closure of the beaches. E. coli concentrations and other water-quality and environmental data collected at Presque Isle Beach 2 during the 2004 and 2005 recreational seasons were used to develop models using tobit regression analyses to predict E. coli concentrations. All variables statistically related to E. coli concentrations were included in the initial regression analyses, and after several iterations, only those explanatory variables that made the models significantly better at predicting E. coli concentrations were included in the final models. Regression models were developed using data from 2004, 2005, and the combined 2-year dataset. Variables in the 2004 model and the combined 2004-2005 model were log10 turbidity, rain weight, wave height (calculated), and wind direction. Variables in the 2005 model were log10 turbidity and wind direction. Explanatory variables not included in the final models were water temperature, streamflow, wind speed, and current speed; model results indicated these variables did not meet significance criteria at the 95-percent confidence level (probabilities were greater than 0.05). The predicted E. coli concentrations produced by the models were used to develop probabilities that concentrations would exceed the single-sample bathing-water standard for E. coli of 235 colonies per 100 milliliters. Analysis of the exceedence probabilities helped determine a threshold probability for each model, chosen such that the correct number of exceedences and nonexceedences was maximized and the number of false positives and false negatives was minimized. Future samples with computed exceedence probabilities higher than the selected threshold probability, as determined by the model, will likely exceed the E. coli standard and a beach advisory or closing may need to be issued; computed exceedence probabilities lower than the threshold probability will likely indicate the standard will not be exceeded. Additional data collected each year can be used to test and possibly improve the model. This study will aid beach managers in more rapidly determining when waters are not safe for recreational use and, subsequently, when to issue beach advisories or closings.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xi; Tang, Jianwu; Mustard, John F.
Understanding the temporal patterns of leaf traits is critical in determining the seasonality and magnitude of terrestrial carbon, water, and energy fluxes. However, we lack robust and efficient ways to monitor the temporal dynamics of leaf traits. Here we assessed the potential of leaf spectroscopy to predict and monitor leaf traits across their entire life cycle at different forest sites and light environments (sunlit vs. shaded) using a weekly sampled dataset across the entire growing season at two temperate deciduous forests. In addition, the dataset includes field measured leaf-level directional-hemispherical reflectance/transmittance together with seven important leaf traits [total chlorophyll (chlorophyllmore » a and b), carotenoids, mass-based nitrogen concentration (N mass), mass-based carbon concentration (C mass), and leaf mass per area (LMA)]. All leaf traits varied significantly throughout the growing season, and displayed trait-specific temporal patterns. We used a Partial Least Square Regression (PLSR) modeling approach to estimate leaf traits from spectra, and found that PLSR was able to capture the variability across time, sites, and light environments of all leaf traits investigated (R 2 = 0.6–0.8 for temporal variability; R 2 = 0.3–0.7 for cross-site variability; R 2 = 0.4–0.8 for variability from light environments). We also tested alternative field sampling designs and found that for most leaf traits, biweekly leaf sampling throughout the growing season enabled accurate characterization of the seasonal patterns. Compared with the estimation of foliar pigments, the performance of N mass, C mass and LMA PLSR models improved more significantly with sampling frequency. Our results demonstrate that leaf spectra-trait relationships vary with time, and thus tracking the seasonality of leaf traits requires statistical models calibrated with data sampled throughout the growing season. In conclusion, our results have broad implications for future research that use vegetation spectra to infer leaf traits at different growing stages.« less
Yang, Xi; Tang, Jianwu; Mustard, John F.; ...
2016-04-02
Understanding the temporal patterns of leaf traits is critical in determining the seasonality and magnitude of terrestrial carbon, water, and energy fluxes. However, we lack robust and efficient ways to monitor the temporal dynamics of leaf traits. Here we assessed the potential of leaf spectroscopy to predict and monitor leaf traits across their entire life cycle at different forest sites and light environments (sunlit vs. shaded) using a weekly sampled dataset across the entire growing season at two temperate deciduous forests. In addition, the dataset includes field measured leaf-level directional-hemispherical reflectance/transmittance together with seven important leaf traits [total chlorophyll (chlorophyllmore » a and b), carotenoids, mass-based nitrogen concentration (N mass), mass-based carbon concentration (C mass), and leaf mass per area (LMA)]. All leaf traits varied significantly throughout the growing season, and displayed trait-specific temporal patterns. We used a Partial Least Square Regression (PLSR) modeling approach to estimate leaf traits from spectra, and found that PLSR was able to capture the variability across time, sites, and light environments of all leaf traits investigated (R 2 = 0.6–0.8 for temporal variability; R 2 = 0.3–0.7 for cross-site variability; R 2 = 0.4–0.8 for variability from light environments). We also tested alternative field sampling designs and found that for most leaf traits, biweekly leaf sampling throughout the growing season enabled accurate characterization of the seasonal patterns. Compared with the estimation of foliar pigments, the performance of N mass, C mass and LMA PLSR models improved more significantly with sampling frequency. Our results demonstrate that leaf spectra-trait relationships vary with time, and thus tracking the seasonality of leaf traits requires statistical models calibrated with data sampled throughout the growing season. In conclusion, our results have broad implications for future research that use vegetation spectra to infer leaf traits at different growing stages.« less
A spatial model of bird abundance as adjusted for detection probability
Gorresen, P.M.; Mcmillan, G.P.; Camp, R.J.; Pratt, T.K.
2009-01-01
Modeling the spatial distribution of animals can be complicated by spatial and temporal effects (i.e. spatial autocorrelation and trends in abundance over time) and other factors such as imperfect detection probabilities and observation-related nuisance variables. Recent advances in modeling have demonstrated various approaches that handle most of these factors but which require a degree of sampling effort (e.g. replication) not available to many field studies. We present a two-step approach that addresses these challenges to spatially model species abundance. Habitat, spatial and temporal variables were handled with a Bayesian approach which facilitated modeling hierarchically structured data. Predicted abundance was subsequently adjusted to account for imperfect detection and the area effectively sampled for each species. We provide examples of our modeling approach for two endemic Hawaiian nectarivorous honeycreepers: 'i'iwi Vestiaria coccinea and 'apapane Himatione sanguinea. ?? 2009 Ecography.
A Comparison between Multiple Regression Models and CUN-BAE Equation to Predict Body Fat in Adults
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A.; Aguiló, Antoni
2015-01-01
Background Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Methods Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. Results The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). Conclusions There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF. PMID:25821960
A comparison between multiple regression models and CUN-BAE equation to predict body fat in adults.
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A; Aguiló, Antoni
2015-01-01
Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF.
Dynamics relationship between stock prices and economic variables in Malaysia
NASA Astrophysics Data System (ADS)
Chun, Ooi Po; Arsad, Zainudin; Huen, Tan Bee
2014-07-01
Knowledge on linkages between stock prices and macroeconomic variables are essential in the formulation of effective monetary policy. This study investigates the relationship between stock prices in Malaysia (KLCI) with four selected macroeconomic variables, namely industrial production index (IPI), quasi money supply (MS2), real exchange rate (REXR) and 3-month Treasury bill (TRB). The variables used in this study are monthly data from 1996 to 2012. Vector error correction (VEC) model and Kalman filter (KF) technique are utilized to assess the impact of macroeconomic variables on the stock prices. The results from the cointegration test revealed that the stock prices and macroeconomic variables are cointegrated. Different from the constant estimate from the static VEC model, the KF estimates noticeably exhibit time-varying attributes over the entire sample period. The varying estimates of the impact coefficients should be better reflect the changing economic environment. Surprisingly, IPI is negatively related to the KLCI with the estimates of the impact slowly increase and become positive in recent years. TRB is found to be generally negatively related to the KLCI with the impact fluctuating along the constant estimate of the VEC model. The KF estimates for REXR and MS2 show a mixture of positive and negative impact on the KLCI. The coefficients of error correction term (ECT) are negative in majority of the sample period, signifying the stock prices responded to stabilize any short term deviation in the economic system. The findings from the KF model indicate that any implication that is based on the usual static model may lead to authorities implementing less appropriate policies.
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.
Hero, Alfred O; Rajaratnam, Bala
2016-01-01
When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.
Predictive data modeling of human type II diabetes related statistics
NASA Astrophysics Data System (ADS)
Jaenisch, Kristina L.; Jaenisch, Holger M.; Handley, James W.; Albritton, Nathaniel G.
2009-04-01
During the course of routine Type II treatment of one of the authors, it was decided to derive predictive analytical Data Models of the daily sampled vital statistics: namely weight, blood pressure, and blood sugar, to determine if the covariance among the observed variables could yield a descriptive equation based model, or better still, a predictive analytical model that could forecast the expected future trend of the variables and possibly eliminate the number of finger stickings required to montior blood sugar levels. The personal history and analysis with resulting models are presented.
NASA Astrophysics Data System (ADS)
WANG, P. T.
2015-12-01
Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.
Opportunities of probabilistic flood loss models
NASA Astrophysics Data System (ADS)
Schröter, Kai; Kreibich, Heidi; Lüdtke, Stefan; Vogel, Kristin; Merz, Bruno
2016-04-01
Oftentimes, traditional uni-variate damage models as for instance depth-damage curves fail to reproduce the variability of observed flood damage. However, reliable flood damage models are a prerequisite for the practical usefulness of the model results. Innovative multi-variate probabilistic modelling approaches are promising to capture and quantify the uncertainty involved and thus to improve the basis for decision making. In this study we compare the predictive capability of two probabilistic modelling approaches, namely Bagging Decision Trees and Bayesian Networks and traditional stage damage functions. For model evaluation we use empirical damage data which are available from computer aided telephone interviews that were respectively compiled after the floods in 2002, 2005, 2006 and 2013 in the Elbe and Danube catchments in Germany. We carry out a split sample test by sub-setting the damage records. One sub-set is used to derive the models and the remaining records are used to evaluate the predictive performance of the model. Further we stratify the sample according to catchments which allows studying model performance in a spatial transfer context. Flood damage estimation is carried out on the scale of the individual buildings in terms of relative damage. The predictive performance of the models is assessed in terms of systematic deviations (mean bias), precision (mean absolute error) as well as in terms of sharpness of the predictions the reliability which is represented by the proportion of the number of observations that fall within the 95-quantile and 5-quantile predictive interval. The comparison of the uni-variable Stage damage function and the multivariable model approach emphasises the importance to quantify predictive uncertainty. With each explanatory variable, the multi-variable model reveals an additional source of uncertainty. However, the predictive performance in terms of precision (mbe), accuracy (mae) and reliability (HR) is clearly improved in comparison to uni-variable Stage damage function. Overall, Probabilistic models provide quantitative information about prediction uncertainty which is crucial to assess the reliability of model predictions and improves the usefulness of model results.
2013-01-01
Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
Optical Variability of Narrow-line and Broad-line Seyfert 1 Galaxies
NASA Astrophysics Data System (ADS)
Rakshit, Suvendu; Stalin, C. S.
2017-06-01
We studied the optical variability (OV) of a large sample of narrow-line Seyfert 1 (NLSy1) and broad-line Seyfert 1 (BLSy1) galaxies with z < 0.8 to investigate any differences in their OV properties. Using archival optical V-band light curves from the Catalina Real Time Transient Survey that span 5-9 years and modeling them using damped random walk, we estimated the amplitude of variability. We found that NLSy1 galaxies as a class show lower amplitude of variability than their broad-line counterparts. In the sample of both NLSy1 and BLSy1 galaxies, radio-loud sources are found to have higher variability amplitude than radio-quiet sources. Considering only sources that are detected in the X-ray band, NLSy1 galaxies are less optically variable than BLSy1 galaxies. The amplitude of variability in the sample of both NLSy1 and BLSy1 galaxies is found to be anti-correlated with Fe II strength but correlated with the width of the Hβ line. The well-known anti-correlation of variability-luminosity and the variability-Eddington ratio is present in our data. Among the radio-loud sample, variability amplitude is found to be correlated with radio-loudness and radio-power, suggesting that jets also play an important role in the OV in radio-loud objects, in addition to the Eddington ratio, which is the main driving factor of OV in radio-quiet sources.
ERIC Educational Resources Information Center
Rhemtulla, Mijke; Brosseau-Liard, Patricia E.; Savalei, Victoria
2012-01-01
A simulation study compared the performance of robust normal theory maximum likelihood (ML) and robust categorical least squares (cat-LS) methodology for estimating confirmatory factor analysis models with ordinal variables. Data were generated from 2 models with 2-7 categories, 4 sample sizes, 2 latent distributions, and 5 patterns of category…
John, Emily E; Nekouei, Omid; McClure, J T; Cameron, Marguerite; Keefe, Greg; Stryhn, Henrik
2018-06-01
Bulk tank milk (BTM) samples are used to determine the infection status and estimate dairy herd prevalence for bovine leukaemia virus (BLV) using an antibody ELISA assay. BLV ELISA variability between samples from the same herd or from different herds has not been investigated over long time periods. The main objective of this study was to determine the within-herd and between-herd variability of a BTM BLV ELISA assay over 1-month, 3-month, and 3-year sampling intervals. All of the Canadian Maritime region dairy herds (n = 523) that were active in 2013 and 2016 were included (83.9% and 86.9% of total herds in 2013 and 2016, respectively). BLV antibody levels were measured in three BTM samples collected at 1-month intervals in early 2013 as well as two BTM samples collected over a 3-month interval in early 2016. Random-effects models, with fixed effects for sample replicate and province and random effects for herd, were used to estimate the variability between BTM samples from the same herd and between herds for 1-month, 3-month, and 3-year sampling intervals. The majority of variability of BTM BLV ELISA results was seen between herds (1-month, 6.792 ± 0.533; 3-month, 7.806 ± 0.652; 3-year, 6.222 ± 0.528). Unexplained variance between samples from the same herd, on square-root scale, was greatest for the 3-year (0.976 ± 0.104), followed by the 1-month (0.611 ± 0.035) then the 3-month (0.557 ± 0.071) intervals. Variability of BTM antibody levels within the same herd was present but was much smaller than the variability between herds, and was greatest for the 3-year sampling interval. The 3-month sampling interval resulted in the least variability and is appropriate to use for estimating the baseline level of within-herd prevalence for BLV control programs. Knowledge of the baseline variability and within-herd prevalence can help to determine effectiveness of control programs when BTM sampling is repeated at longer intervals. Copyright © 2018 Elsevier B.V. All rights reserved.
Pitfalls in statistical landslide susceptibility modelling
NASA Astrophysics Data System (ADS)
Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut
2010-05-01
The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.
Extension of latin hypercube samples with correlated variables.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hora, Stephen Curtis; Helton, Jon Craig; Sallaberry, Cedric J. PhD.
2006-11-01
A procedure for extending the size of a Latin hypercube sample (LHS) with rank correlated variables is described and illustrated. The extension procedure starts with an LHS of size m and associated rank correlation matrix C and constructs a new LHS of size 2m that contains the elements of the original LHS and has a rank correlation matrix that is close to the original rank correlation matrix C. The procedure is intended for use in conjunction with uncertainty and sensitivity analysis of computationally demanding models in which it is important to make efficient use of a necessarily limited number ofmore » model evaluations.« less
Zhou, Fuqun; Zhang, Aining
2016-01-01
Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2–3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests’ features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data. PMID:27792152
Zhou, Fuqun; Zhang, Aining
2016-10-25
Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-05-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (entry-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Optimal regulation in systems with stochastic time sampling
NASA Technical Reports Server (NTRS)
Montgomery, R. C.; Lee, P. S.
1980-01-01
An optimal control theory that accounts for stochastic variable time sampling in a distributed microprocessor based flight control system is presented. The theory is developed by using a linear process model for the airplane dynamics and the information distribution process is modeled as a variable time increment process where, at the time that information is supplied to the control effectors, the control effectors know the time of the next information update only in a stochastic sense. An optimal control problem is formulated and solved for the control law that minimizes the expected value of a quadratic cost function. The optimal cost obtained with a variable time increment Markov information update process where the control effectors know only the past information update intervals and the Markov transition mechanism is almost identical to that obtained with a known and uniform information update interval.
A new model for ancient DNA decay based on paleogenomic meta-analysis.
Kistler, Logan; Ware, Roselyn; Smith, Oliver; Collins, Matthew; Allaby, Robin G
2017-06-20
The persistence of DNA over archaeological and paleontological timescales in diverse environments has led to a revolutionary body of paleogenomic research, yet the dynamics of DNA degradation are still poorly understood. We analyzed 185 paleogenomic datasets and compared DNA survival with environmental variables and sample ages. We find cytosine deamination follows a conventional thermal age model, but we find no correlation between DNA fragmentation and sample age over the timespans analyzed, even when controlling for environmental variables. We propose a model for ancient DNA decay wherein fragmentation rapidly reaches a threshold, then subsequently slows. The observed loss of DNA over time may be due to a bulk diffusion process in many cases, highlighting the importance of tissues and environments creating effectively closed systems for DNA preservation. This model of DNA degradation is largely based on mammal bone samples due to published genomic dataset availability. Continued refinement to the model to reflect diverse biological systems and tissue types will further improve our understanding of ancient DNA breakdown dynamics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Commeau, Natalie; Cornu, Marie; Albert, Isabelle; Denis, Jean-Baptiste; Parent, Eric
2012-03-01
Assessing within-batch and between-batch variability is of major interest for risk assessors and risk managers in the context of microbiological contamination of food. For example, the ratio between the within-batch variability and the between-batch variability has a large impact on the results of a sampling plan. Here, we designed hierarchical Bayesian models to represent such variability. Compatible priors were built mathematically to obtain sound model comparisons. A numeric criterion is proposed to assess the contamination structure comparing the ability of the models to replicate grouped data at the batch level using a posterior predictive loss approach. Models were applied to two case studies: contamination by Listeria monocytogenes of pork breast used to produce diced bacon and contamination by the same microorganism on cold smoked salmon at the end of the process. In the first case study, a contamination structure clearly exists and is located at the batch level, that is, between batches variability is relatively strong, whereas in the second a structure also exists but is less marked. © 2012 Society for Risk Analysis.
Design of Malaria Diagnostic Criteria for the Sysmex XE-2100 Hematology Analyzer
Campuzano-Zuluaga, Germán; Álvarez-Sánchez, Gonzalo; Escobar-Gallo, Gloria Elcy; Valencia-Zuluaga, Luz Marina; Ríos-Orrego, Alexandra Marcela; Pabón-Vidal, Adriana; Miranda-Arboleda, Andrés Felipe; Blair-Trujillo, Silvia; Campuzano-Maya, Germán
2010-01-01
Thick film, the standard diagnostic procedure for malaria, is not always ordered promptly. A failsafe diagnostic strategy using an XE-2100 analyzer is proposed, and for this strategy, malaria diagnostic models for the XE-2100 were developed and tested for accuracy. Two hundred eighty-one samples were distributed into Plasmodium vivax, P. falciparum, and acute febrile syndrome groups for model construction. Model validation was performed using 60% of malaria cases and a composite control group of samples from AFS and healthy participants from endemic and non-endemic regions. For P. vivax, two observer-dependent models (accuracy = 95.3–96.9%), one non–observer-dependent model using built-in variables (accuracy = 94.7%), and one non–observer-dependent model using new and built-in variables (accuracy = 96.8%) were developed. For P. falciparum, two non–observer-dependent models (accuracies = 85% and 89%) were developed. These models could be used by health personnel or be integrated as a malaria alarm for the XE-2100 to prompt early malaria microscopic diagnosis. PMID:20207864
Innovation Motivation and Artistic Creativity
ERIC Educational Resources Information Center
Joy, Stephen P.
2005-01-01
Innovation motivation is a social learning model of originality comprising two variables: the need to be different and innovation expectancy. This study examined their contribution to artistic creativity in a sample of undergraduates. Participants completed measures of both innovation motivation variables as well as intelligence, adjustment, and…
Validated predictive modelling of the environmental resistome
Amos, Gregory CA; Gozzard, Emma; Carter, Charlotte E; Mead, Andrew; Bowes, Mike J; Hawkey, Peter M; Zhang, Lihong; Singer, Andrew C; Gaze, William H; Wellington, Elizabeth M H
2015-01-01
Multi-drug-resistant bacteria pose a significant threat to public health. The role of the environment in the overall rise in antibiotic-resistant infections and risk to humans is largely unknown. This study aimed to evaluate drivers of antibiotic-resistance levels across the River Thames catchment, model key biotic, spatial and chemical variables and produce predictive models for future risk assessment. Sediment samples from 13 sites across the River Thames basin were taken at four time points across 2011 and 2012. Samples were analysed for class 1 integron prevalence and enumeration of third-generation cephalosporin-resistant bacteria. Class 1 integron prevalence was validated as a molecular marker of antibiotic resistance; levels of resistance showed significant geospatial and temporal variation. The main explanatory variables of resistance levels at each sample site were the number, proximity, size and type of surrounding wastewater-treatment plants. Model 1 revealed treatment plants accounted for 49.5% of the variance in resistance levels. Other contributing factors were extent of different surrounding land cover types (for example, Neutral Grassland), temporal patterns and prior rainfall; when modelling all variables the resulting model (Model 2) could explain 82.9% of variations in resistance levels in the whole catchment. Chemical analyses correlated with key indicators of treatment plant effluent and a model (Model 3) was generated based on water quality parameters (contaminant and macro- and micro-nutrient levels). Model 2 was beta tested on independent sites and explained over 78% of the variation in integron prevalence showing a significant predictive ability. We believe all models in this study are highly useful tools for informing and prioritising mitigation strategies to reduce the environmental resistome. PMID:25679532
Validated predictive modelling of the environmental resistome.
Amos, Gregory C A; Gozzard, Emma; Carter, Charlotte E; Mead, Andrew; Bowes, Mike J; Hawkey, Peter M; Zhang, Lihong; Singer, Andrew C; Gaze, William H; Wellington, Elizabeth M H
2015-06-01
Multi-drug-resistant bacteria pose a significant threat to public health. The role of the environment in the overall rise in antibiotic-resistant infections and risk to humans is largely unknown. This study aimed to evaluate drivers of antibiotic-resistance levels across the River Thames catchment, model key biotic, spatial and chemical variables and produce predictive models for future risk assessment. Sediment samples from 13 sites across the River Thames basin were taken at four time points across 2011 and 2012. Samples were analysed for class 1 integron prevalence and enumeration of third-generation cephalosporin-resistant bacteria. Class 1 integron prevalence was validated as a molecular marker of antibiotic resistance; levels of resistance showed significant geospatial and temporal variation. The main explanatory variables of resistance levels at each sample site were the number, proximity, size and type of surrounding wastewater-treatment plants. Model 1 revealed treatment plants accounted for 49.5% of the variance in resistance levels. Other contributing factors were extent of different surrounding land cover types (for example, Neutral Grassland), temporal patterns and prior rainfall; when modelling all variables the resulting model (Model 2) could explain 82.9% of variations in resistance levels in the whole catchment. Chemical analyses correlated with key indicators of treatment plant effluent and a model (Model 3) was generated based on water quality parameters (contaminant and macro- and micro-nutrient levels). Model 2 was beta tested on independent sites and explained over 78% of the variation in integron prevalence showing a significant predictive ability. We believe all models in this study are highly useful tools for informing and prioritising mitigation strategies to reduce the environmental resistome.
Ferguson, Kristin M; Bender, Kimberly; Thompson, Sanna J
2015-06-01
This study examined gender differences among homeless young adults' coping strategies and homelessness stressors as they relate to legal (e.g., full-time employment, selling personal possessions, selling blood/plasma) and illegal economic activity (e.g., selling drugs, theft, prostitution). A sample of 601 homeless young adults was recruited from 3 cities (Los Angeles, CA [n = 200], Austin, TX [n = 200], and Denver, CO [n = 201]) to participate in semi-structured interviews from March 2010 to July 2011. Risk and resilience correlates of legal and illegal economic activity were analyzed using six Ordinary Least Squares regression models with the full sample and with the female and male sub-samples. In the full sample, three variables (i.e., avoidant coping, problem-focused coping, and mania) were associated with legal income generation whereas eight variables (i.e., social coping, age, arrest history, transience, peer substance use, antisocial personality disorder [ASPD], substance use disorder [SUD], and major depressive episode [MDE]) were associated with illegal economic activity. In the female sub-sample, three variables (i.e., problem-focused coping, race/ethnicity, and transience) were correlated with legal income generation whereas six variables (i.e., problem-focused coping, social coping, age, arrest history, peer substance use, and ASPD) were correlated with illegal economic activity. Among males, the model depicting legal income generation was not significant yet seven variables (i.e., social coping, age, transience, peer substance use, ASPD, SUD, and MDE) were associated with illegal economic activity. Understanding gender differences in coping strategies and economic activity might help customize interventions aimed at safe and legal income generation for this population. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Decision Tree for Nonmetric Sex Assessment from the Skull.
Langley, Natalie R; Dudzik, Beatrix; Cloutier, Alesia
2018-01-01
This study uses five well-documented cranial nonmetric traits (glabella, mastoid process, mental eminence, supraorbital margin, and nuchal crest) and one additional trait (zygomatic extension) to develop a validated decision tree for sex assessment. The decision tree was built and cross-validated on a sample of 293 U.S. White individuals from the William M. Bass Donated Skeletal Collection. Ordinal scores from the six traits were analyzed using the partition modeling option in JMP Pro 12. A holdout sample of 50 skulls was used to test the model. The most accurate decision tree includes three variables: glabella, zygomatic extension, and mastoid process. This decision tree yielded 93.5% accuracy on the training sample, 94% on the cross-validated sample, and 96% on a holdout validation sample. Linear weighted kappa statistics indicate acceptable agreement among observers for these variables. Mental eminence should be avoided, and definitions and figures should be referenced carefully to score nonmetric traits. © 2017 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Yan, Wen-juan; Yang, Ming; He, Guo-quan; Qin, Lin; Li, Gang
2014-11-01
In order to identify the diabetic patients by using tongue near-infrared (NIR) spectrum - a spectral classification model of the NIR reflectivity of the tongue tip is proposed, based on the partial least square (PLS) method. 39sample data of tongue tip's NIR spectra are harvested from healthy people and diabetic patients , respectively. After pretreatment of the reflectivity, the spectral data are set as the independent variable matrix, and information of classification as the dependent variables matrix, Samples were divided into two groups - i.e. 53 samples as calibration set and 25 as prediction set - then the PLS is used to build the classification model The constructed modelfrom the 53 samples has the correlation of 0.9614 and the root mean square error of cross-validation (RMSECV) of 0.1387.The predictions for the 25 samples have the correlation of 0.9146 and the RMSECV of 0.2122.The experimental result shows that the PLS method can achieve good classification on features of healthy people and diabetic patients.
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses.
Liu, Ruijie; Holik, Aliaksei Z; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E; Asselin-Labat, Marie-Liesse; Smyth, Gordon K; Ritchie, Matthew E
2015-09-03
Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Kinetics of phase transformation in glass forming systems
NASA Technical Reports Server (NTRS)
Ray, Chandra S.
1994-01-01
The objectives of this research were to (1) develop computer models for realistic simulations of nucleation and crystal growth in glasses, which would also have the flexibility to accomodate the different variables related to sample characteristics and experimental conditions, and (2) design and perform nucleation and crystallization experiments using calorimetric measurements, such as differential scanning calorimetry (DSC) and differential thermal analysis (DTA) to verify these models. The variables related to sample characteristics mentioned in (1) above include size of the glass particles, nucleating agents, and the relative concentration of the surface and internal nuclei. A change in any of these variables changes the mode of the transformation (crystallization) kinetics. A variation in experimental conditions includes isothermal and nonisothermal DSC/DTA measurements. This research would lead to develop improved, more realistic methods for analysis of the DSC/DTA peak profiles to determine the kinetic parameters for nucleation and crystal growth as well as to assess the relative merits and demerits of the thermoanalytical models presently used to study the phase transformation in glasses.
Nunes, Karen M; Andrade, Marcus Vinícius O; Santos Filho, Antônio M P; Lasmar, Marcelo C; Sena, Marcelo M
2016-08-15
Concerns about meat authenticity are increasing recently, due to great fraud scandals. This paper analysed real samples (43 adulterated and 12 controls) originated from criminal networks dismantled by the Brazilian Police. This fraud consisted of injecting solutions of non-meat ingredients (NaCl, phosphates, carrageenan, maltodextrin) in bovine meat, aiming to increase its water holding capacity. Five physico-chemical variables were determined, protein, ash, chloride, sodium, phosphate. Additionally, infrared spectra were recorded. Supervised classification PLS-DA models were built with each data set individually, but the best model was obtained with data fusion, correctly detecting 91% of the adulterated samples. From this model, a variable selection based on the highest VIPscores was performed and a new data fusion model was built with only one chemical variable, providing slightly lower predictions, but a good cost/performance ratio. Finally, some of the selected infrared bands were specifically associated to the presence of adulterants NaCl, tripolyphosphate and carrageenan. Copyright © 2016 Elsevier Ltd. All rights reserved.
When Is Rapid On-Site Evaluation Cost-Effective for Fine-Needle Aspiration Biopsy?
Schmidt, Robert L.; Walker, Brandon S.; Cohen, Michael B.
2015-01-01
Background Rapid on-site evaluation (ROSE) can improve adequacy rates of fine-needle aspiration biopsy (FNAB) but increases operational costs. The performance of ROSE relative to fixed sampling depends on many factors. It is not clear when ROSE is less costly than sampling with a fixed number of needle passes. The objective of this study was to determine the conditions under which ROSE is less costly than fixed sampling. Methods Cost comparison of sampling with and without ROSE using mathematical modeling. Models were based on a societal perspective and used a mechanistic, micro-costing approach. Sampling policies (ROSE, fixed) were compared using the difference in total expected costs per case. Scenarios were based on procedure complexity (palpation-guided or image-guided), adequacy rates (low, high) and sampling protocols (stopping criteria for ROSE and fixed sampling). One-way, probabilistic, and scenario-based sensitivity analysis was performed to determine which variables had the greatest influence on the cost difference. Results ROSE is favored relative to fixed sampling under the following conditions: (1) the cytologist is accurate, (2) the total variable cost ($/hr) is low, (3) fixed costs ($/procedure) are high, (4) the setup time is long, (5) the time between needle passes for ROSE is low, (6) when the per-pass adequacy rate is low, and (7) ROSE stops after observing one adequate sample. The model is most sensitive to variation in the fixed cost, the per-pass adequacy rate, and the time per needle pass with ROSE. Conclusions Mathematical modeling can be used to predict the difference in cost between sampling with and without ROSE. PMID:26317785
Cruz-Motta, Juan José; Miloslavich, Patricia; Palomo, Gabriela; Iken, Katrin; Konar, Brenda; Pohle, Gerhard; Trott, Tom; Benedetti-Cecchi, Lisandro; Herrera, César; Hernández, Alejandra; Sardi, Adriana; Bueno, Andrea; Castillo, Julio; Klein, Eduardo; Guerra-Castro, Edlin; Gobin, Judith; Gómez, Diana Isabel; Riosmena-Rodríguez, Rafael; Mead, Angela; Bigatti, Gregorio; Knowlton, Ann; Shirayama, Yoshihisa
2010-01-01
Assemblages associated with intertidal rocky shores were examined for large scale distribution patterns with specific emphasis on identifying latitudinal trends of species richness and taxonomic distinctiveness. Seventy-two sites distributed around the globe were evaluated following the standardized sampling protocol of the Census of Marine Life NaGISA project (www.nagisa.coml.org). There were no clear patterns of standardized estimators of species richness along latitudinal gradients or among Large Marine Ecosystems (LMEs); however, a strong latitudinal gradient in taxonomic composition (i.e., proportion of different taxonomic groups in a given sample) was observed. Environmental variables related to natural influences were strongly related to the distribution patterns of the assemblages on the LME scale, particularly photoperiod, sea surface temperature (SST) and rainfall. In contrast, no environmental variables directly associated with human influences (with the exception of the inorganic pollution index) were related to assemblage patterns among LMEs. Correlations of the natural assemblages with either latitudinal gradients or environmental variables were equally strong suggesting that neither neutral models nor models based solely on environmental variables sufficiently explain spatial variation of these assemblages at a global scale. Despite the data shortcomings in this study (e.g., unbalanced sample distribution), we show the importance of generating biological global databases for the use in large-scale diversity comparisons of rocky intertidal assemblages to stimulate continued sampling and analyses. PMID:21179546
Sampling of the Diurnal Cycle of Precipitation using TRMM
NASA Technical Reports Server (NTRS)
Negri, Andrew J.; Bell, Thomas L.; Xu, Li-Ming; Starr, David OC. (Technical Monitor)
2001-01-01
We examine the temporal sampling of tropical regions using observations from the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) and Precipitation Radar (PR). We conclude that PR estimates at any one hour, even using three years of data, are inadequate to describe the diurnal cycle of precipitation over regions smaller than 12 degrees, due to high spatial variability in sampling. We show that the optimum period of accumulation is four hours. Diurnal signatures display half as much sampling error when averaged over four hours of local time. A similar pattern of sampling variability is found in the TMI data, despite the TMI's wider swath and increased sampling. These results are verified using an orbital model. The sensitivity of the sampling to satellite altitude is presented, as well as sampling patterns at the new TRMM altitude of 402.5 km.
Miñano Pérez, Pablo; Castejón Costa, Juan-Luis; Gilar Corbí, Raquel
2012-03-01
As a result of studies examining factors involved in the learning process, various structural models have been developed to explain the direct and indirect effects that occur between the variables in these models. The objective was to evaluate a structural model of cognitive and motivational variables predicting academic achievement, including general intelligence, academic self-concept, goal orientations, effort and learning strategies. The sample comprised of 341 Spanish students in the first year of compulsory secondary education. Different tests and questionnaires were used to evaluate each variable, and Structural Equation Modelling (SEM) was applied to contrast the relationships of the initial model. The model proposed had a satisfactory fit, and all the hypothesised relationships were significant. General intelligence was the variable most able to explain academic achievement. Also important was the direct influence of academic self-concept on achievement, goal orientations and effort, as well as the mediating ability of effort and learning strategies between academic goals and final achievement.
Dixon, Kristiana J; Edwards, Katie M; Gidycz, Christine A
2016-10-01
Previous research has examined the association between intimate partner violence (IPV) victimization experiences and investment model variables, particularly with relation to leaving intentions. However, research only has begun to explore the impact that various dyadic patterns of IPV (i.e., unidirectional victimization, unidirectional perpetration, bidirectional violence, and non-violence) have on investment model variables. Grounded in behavioral principles, the current study used a sample of college women to assess the impact that perpetration and victimization have on investment model variables. Results indicated that 69.2% of the sample was in a relationship with no IPV. Among those who reported IPV in their relationships, 11.9% reported unidirectional perpetration, 10.6% bidirectional violence, and 7.4% unidirectional victimization. Overall, the findings suggest that women's victimization (i.e., victim only and bidirectional IPV) is associated with lower levels of satisfaction and commitment, and that women's perpetration (i.e., perpetration only and bidirectional IPV) is associated with higher levels of investment. Women in bidirectionally violent relationships reported higher quality alternatives than women in non-violent relationships. The current study emphasizes the importance of considering both IPV perpetration and IPV victimization experiences when exploring women's decisions to remain in relationships. © The Author(s) 2015.
A discriminant function model for admission at undergraduate university level
NASA Astrophysics Data System (ADS)
Ali, Hamdi F.; Charbaji, Abdulrazzak; Hajj, Nada Kassim
1992-09-01
The study is aimed at predicting objective criteria based on a statistically tested model for admitting undergraduate students to Beirut University College. The University is faced with a dual problem of having to select only a fraction of an increasing number of applicants, and of trying to minimize the number of students placed on academic probation (currently 36 percent of new admissions). Out of 659 new students, a sample of 272 students (45 percent) were selected; these were all the students on the Dean's list and on academic probation. With academic performance as the dependent variable, the model included ten independent variables and their interactions. These variables included the type of high school, the language of instruction in high school, recommendations, sex, academic average in high school, score on the English Entrance Examination, the major in high school, and whether the major was originally applied for by the student. Discriminant analysis was used to evaluate the relative weight of the independent variables, and from the analysis three equations were developed, one for each academic division in the College. The predictive power of these equations was tested by using them to classify students not in the selected sample into successful and unsuccessful ones. Applicability of the model to other institutions of higher learning is discussed.
Kirby, James B.; Bollen, Kenneth A.
2009-01-01
Structural Equation Modeling with latent variables (SEM) is a powerful tool for social and behavioral scientists, combining many of the strengths of psychometrics and econometrics into a single framework. The most common estimator for SEM is the full-information maximum likelihood estimator (ML), but there is continuing interest in limited information estimators because of their distributional robustness and their greater resistance to structural specification errors. However, the literature discussing model fit for limited information estimators for latent variable models is sparse compared to that for full information estimators. We address this shortcoming by providing several specification tests based on the 2SLS estimator for latent variable structural equation models developed by Bollen (1996). We explain how these tests can be used to not only identify a misspecified model, but to help diagnose the source of misspecification within a model. We present and discuss results from a Monte Carlo experiment designed to evaluate the finite sample properties of these tests. Our findings suggest that the 2SLS tests successfully identify most misspecified models, even those with modest misspecification, and that they provide researchers with information that can help diagnose the source of misspecification. PMID:20419054
ERIC Educational Resources Information Center
Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel
2012-01-01
In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…
Comparing adult users of public and private dental services in the state of Minas Gerais, Brazil.
Pinto, Rafaela da Silveira; de Abreu, Mauro Henrique Nogueira Guimarães; Vargas, Andrea Maria Duarte
2014-08-06
Studying the factors associated with the use of dental services can provide the necessary knowledge to understand the reasons why individuals seek out public healthcare services and the formulation of more appropriate public policies for the present-day reality. This work was a cross-sectional epidemiological study consisting of a sample of adults found in a research databank concerning the conditions of the oral health of the population of the state of Minas Gerais, Brazil. This study examined both main oral health disorders and relevant socioeconomic aspects. The dependent variable was defined as the type of service used, categorized under public and private use. The independent variables were selected and grouped to be inserted in the analysis model according to an adaptation of the behavioral model described by Andersen and Davidson. A hierarchical model was used to analyze the data. The description of variables and bivariate analyses were performed in an attempt to verify possible associations. For each group of variables at each hierarchical level, the gross and adjusted odds ratios (OR) and the respective 95% confidence intervals (CI) were estimated by means of logistic regression. The Complex Samples model from the SPSS statistics program, version 19.0, was used to analyze the sample framework. In the final model, the factors associated with the use of public healthcare services by adults were directly related to the socioeconomic and demographic conditions of the individuals, including: being of a dark-skinned black race/color, belonging to families with more than four household residents and with a lower income level, residing in small towns, having more teeth that need treatment. According to the findings from this study, socioeconomic and demographic factors, as well as normative treatment needs, are associated with the use of public dental services.
Mechor, G D; Gröhn, Y T; McDowell, L R; Van Saun, R J
1992-11-01
The effects of temperature and colostrum components on specific gravity in bovine colostrum were investigated. Thirty-nine first milking colostrum samples were collected from Holstein cows. The samples were assayed for alpha-tocopherol, fat, protein, total solids, and IgG. The concentrations of total solids, total protein, total IgG, and fat in colostrum were 26.6, 12.5, 3.7, and 9.4 g/100 g, respectively. A range of 1.8 to 24.7 micrograms/ml for alpha-tocopherol was measured in the colostrum samples. Specific gravity of the colostrum was measured using a hydrometer in increments of 5 degrees C from 0 to 40 degrees C. Specific gravity explained 76% of the variation in colostral total IgG at a colostrum temperature of 20 degrees C. The regression model was improved only slightly with the addition of protein, fat, and total solids. The model for samples at 20 degrees C was IgG (milligrams per milliliter) = 958 x (specific gravity) - 969. Measurement of specific gravity at variable temperatures necessitated inclusion of temperature in the model for estimation of IgG. Inclusion of the other components of colostrum into the model slightly improved the fit. The regression model for samples at variable temperatures was as follows: IgG (milligrams per milliliter) = 853 x (specific gravity) + .4 x temperature (Celsius degrees) - 866.
Data Combination and Instrumental Variables in Linear Models
ERIC Educational Resources Information Center
Khawand, Christopher
2012-01-01
Instrumental variables (IV) methods allow for consistent estimation of causal effects, but suffer from poor finite-sample properties and data availability constraints. IV estimates also tend to have relatively large standard errors, often inhibiting the interpretability of differences between IV and non-IV point estimates. Lastly, instrumental…
Berrozpe, Pablo; Lamattina, Daniela; Santini, María Soledad; Araujo, Analía Vanesa; Utgés, María Eugenia; Salomón, Oscar Daniel
2017-10-01
Visceral leishmaniasis (VL) is an endemic disease in northeastern Argentina including the Corrientes province, where the presence of the vector and canine cases of VL were recently confirmed in December 2008. The objective of this study was to assess the modelling of micro- and macro-habitat variables to evaluate the urban environmental suitability for the spatial distribution of Lutzomyia longipalpis presence and abundance in an urban scenario. Sampling of 45 sites distributed throughout Corrientes city (Argentina) was carried out using REDILA-BL minilight traps in December 2013. The sampled specimens were identified according to methods described by Galati (2003). The analysis of variables derived from the processing of satellite images (macro-habitat variables) and from the entomological sampling and surveys (micro-habitat variables) was performed using the statistical software R. Three generalised linear models were constructed composed of micro- and macro-habitat variables to explain the spatial distribution of the abundance of Lu. longipalpis and one composed of micro-habitat variables to explain the occurrence of the vector. A total of 609 phlebotominae belonging to five species were collected, of which 56% were Lu. longipalpis. In addition, the presence of Nyssomyia neivai and Migonemya migonei, which are vectors of tegumentary leishmaniasis, were also documented and represented 34.81% and 6.74% of the collections, respectively. The explanatory variable normalised difference vegetation index (NDVI) described the abundance distribution, whereas the presence of farmyard animals was important for explaining both the abundance and the occurrence of the vector. The results contribute to the identification of variables that can be used to establish priority areas for entomological surveillance and provide an efficient transfer tool for the control and prevention of vector-borne diseases.
Berrozpe, Pablo; Lamattina, Daniela; Santini, María Soledad; Araujo, Analía Vanesa; Utgés, María Eugenia; Salomón, Oscar Daniel
2017-01-01
BACKGROUND Visceral leishmaniasis (VL) is an endemic disease in northeastern Argentina including the Corrientes province, where the presence of the vector and canine cases of VL were recently confirmed in December 2008. OBJECTIVES The objective of this study was to assess the modelling of micro- and macro-habitat variables to evaluate the urban environmental suitability for the spatial distribution of Lutzomyia longipalpis presence and abundance in an urban scenario. METHODS Sampling of 45 sites distributed throughout Corrientes city (Argentina) was carried out using REDILA-BL minilight traps in December 2013. The sampled specimens were identified according to methods described by Galati (2003). The analysis of variables derived from the processing of satellite images (macro-habitat variables) and from the entomological sampling and surveys (micro-habitat variables) was performed using the statistical software R. Three generalised linear models were constructed composed of micro- and macro-habitat variables to explain the spatial distribution of the abundance of Lu. longipalpis and one composed of micro-habitat variables to explain the occurrence of the vector. FINDINGS A total of 609 phlebotominae belonging to five species were collected, of which 56% were Lu. longipalpis. In addition, the presence of Nyssomyia neivai and Migonemya migonei, which are vectors of tegumentary leishmaniasis, were also documented and represented 34.81% and 6.74% of the collections, respectively. The explanatory variable normalised difference vegetation index (NDVI) described the abundance distribution, whereas the presence of farmyard animals was important for explaining both the abundance and the occurrence of the vector. MAIN CONCLUSIONS The results contribute to the identification of variables that can be used to establish priority areas for entomological surveillance and provide an efficient transfer tool for the control and prevention of vector-borne diseases. PMID:28953995
Zugck, C; Krüger, C; Kell, R; Körber, S; Schellberg, D; Kübler, W; Haass, M
2001-10-01
The performance of a US-American scoring system (Heart Failure Survival Score, HFSS) was prospectively evaluated in a sample of ambulatory patients with congestive heart failure (CHF). Additionally, it was investigated whether the HFSS might be simplified by assessment of the distance ambulated during a 6-min walk test (6'WT) instead of determination of peak oxygen uptake (peak VO(2)). In 208 middle-aged CHF patients (age 54+/-10 years, 82% male, NYHA class 2.3+/-0.7; follow-up 28+/-14 months) the seven variables of the HFSS: CHF aetiology; heart rate; mean arterial pressure; serum sodium concentration; intraventricular conduction time; left ventricular ejection fraction (LVEF); and peak VO(2), were determined. Additionally, a 6'WT was performed. The HFSS allowed discrimination between patients at low, medium and high risk, with mortality rates of 16, 39 and 50%, respectively. However, the prognostic power of the HFSS was not superior to a two-variable model consisting only of LVEF and peak VO(2). The areas under the receiver operating curves (AUC) for prediction of 1-year survival were even higher for the two-variable model (0.84 vs. 0.74, P<0.05). Replacing peak VO(2) with 6'WT resulted in a similar AUC (0.83). The HFSS continued to predict survival when applied to this patient sample. However, the HFSS was inferior to a two-variable model containing only LVEF and either peak VO(2) or 6'WT. As the 6'WT requires no sophisticated equipment, a simplified two-variable model containing only LVEF and 6'WT may be more widely applicable, and is therefore recommended.
NASA Astrophysics Data System (ADS)
García, Mariano; Saatchi, Sassan; Ustin, Susan; Balzter, Heiko
2018-04-01
Spatially-explicit information on forest structure is paramount to estimating aboveground carbon stocks for designing sustainable forest management strategies and mitigating greenhouse gas emissions from deforestation and forest degradation. LiDAR measurements provide samples of forest structure that must be integrated with satellite imagery to predict and to map landscape scale variations of forest structure. Here we evaluate the capability of existing satellite synthetic aperture radar (SAR) with multispectral data to estimate forest canopy height over five study sites across two biomes in North America, namely temperate broadleaf and mixed forests and temperate coniferous forests. Pixel size affected the modelling results, with an improvement in model performance as pixel resolution coarsened from 25 m to 100 m. Likewise, the sample size was an important factor in the uncertainty of height prediction using the Support Vector Machine modelling approach. Larger sample size yielded better results but the improvement stabilised when the sample size reached approximately 10% of the study area. We also evaluated the impact of surface moisture (soil and vegetation moisture) on the modelling approach. Whereas the impact of surface moisture had a moderate effect on the proportion of the variance explained by the model (up to 14%), its impact was more evident in the bias of the models with bias reaching values up to 4 m. Averaging the incidence angle corrected radar backscatter coefficient (γ°) reduced the impact of surface moisture on the models and improved their performance at all study sites, with R2 ranging between 0.61 and 0.82, RMSE between 2.02 and 5.64 and bias between 0.02 and -0.06, respectively, at 100 m spatial resolution. An evaluation of the relative importance of the variables in the model performance showed that for the study sites located within the temperate broadleaf and mixed forests biome ALOS-PALSAR HV polarised backscatter was the most important variable, with Landsat Tasselled Cap Transformation components barely contributing to the models for two of the study sites whereas it had a significant contribution at the third one. Over the temperate conifer forests, Landsat Tasselled Cap variables contributed more than the ALOS-PALSAR HV band to predict the landscape height variability. In all cases, incorporation of multispectral data improved the retrieval of forest canopy height and reduced the estimation uncertainty for tall forests. Finally, we concluded that models trained at one study site had higher uncertainty when applied to other sites, but a model developed from multiple sites performed equally to site-specific models to predict forest canopy height. This result suggest that a biome level model developed from several study sites can be used as a reliable estimator of biome-level forest structure from existing satellite imagery.
Ouyang, Liwen; Apley, Daniel W; Mehrotra, Sanjay
2016-04-01
Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time consuming, which limits the number of chart reviews in a large database. The authors' objective is to develop a new design-of-experiments-based systematic chart validation and review (DSCVR) approach that is more powerful than the random validation sampling used in existing approaches. The DSCVR approach judiciously and efficiently selects the cases to validate (i.e., validate whether the response values are correct for those cases) for maximum information content, based only on their predictor variable values. The final predictive model will be fit using only the validation sample, ignoring the remainder of the unvalidated and unreliable error-prone data. A Fisher information based D-optimality criterion is used, and an algorithm for optimizing it is developed. The authors' method is tested in a simulation comparison that is based on a sudden cardiac arrest case study with 23 041 patients' records. This DSCVR approach, using the Fisher information based D-optimality criterion, results in a fitted model with much better predictive performance, as measured by the receiver operating characteristic curve and the accuracy in predicting whether a patient will experience the event, than a model fitted using a random validation sample. The simulation comparisons demonstrate that this DSCVR approach can produce predictive models that are significantly better than those produced from random validation sampling, especially when the event rate is low. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Martyna, Agnieszka; Michalska, Aleksandra; Zadora, Grzegorz
2015-05-01
The problem of interpretation of common provenance of the samples within the infrared spectra database of polypropylene samples from car body parts and plastic containers as well as Raman spectra databases of blue solid and metallic automotive paints was under investigation. The research involved statistical tools such as likelihood ratio (LR) approach for expressing the evidential value of observed similarities and differences in the recorded spectra. Since the LR models can be easily proposed for databases described by a few variables, research focused on the problem of spectra dimensionality reduction characterised by more than a thousand variables. The objective of the studies was to combine the chemometric tools easily dealing with multidimensionality with an LR approach. The final variables used for LR models' construction were derived from the discrete wavelet transform (DWT) as a data dimensionality reduction technique supported by methods for variance analysis and corresponded with chemical information, i.e. typical absorption bands for polypropylene and peaks associated with pigments present in the car paints. Univariate and multivariate LR models were proposed, aiming at obtaining more information about the chemical structure of the samples. Their performance was controlled by estimating the levels of false positive and false negative answers and using the empirical cross entropy approach. The results for most of the LR models were satisfactory and enabled solving the stated comparison problems. The results prove that the variables generated from DWT preserve signal characteristic, being a sparse representation of the original signal by keeping its shape and relevant chemical information.
NASA Technical Reports Server (NTRS)
Meitner, P. L.; Glassman, A. J.
1980-01-01
An off-design performance loss model is developed for variable-area (pivoted vane) radial turbines. The variation in stator loss with stator area is determined by a viscous loss model while the variation in rotor loss due to stator area variation (for no stator end-clearance gap) is determined through analytical matching of experimental data. An incidence loss model is also based on matching of the experimental data. A stator vane end-clearance leakage model is developed and sample calculations are made to show the predicted effects of stator vane end-clearance leakage on performance.
Small area estimation (SAE) model: Case study of poverty in West Java Province
NASA Astrophysics Data System (ADS)
Suhartini, Titin; Sadik, Kusman; Indahwati
2016-02-01
This paper showed the comparative of direct estimation and indirect/Small Area Estimation (SAE) model. Model selection included resolve multicollinearity problem in auxiliary variable, such as choosing only variable non-multicollinearity and implemented principal component (PC). Concern parameters in this paper were the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The approach for estimating these parameters could be performed based on direct estimation and SAE. The problem of direct estimation, three area even zero and could not be conducted by directly estimation, because small sample size. The proportion of agricultural venture poor households showed 19.22% and agricultural poor households showed 46.79%. The best model from agricultural venture poor households by choosing only variable non-multicollinearity and the best model from agricultural poor households by implemented PC. The best estimator showed SAE better then direct estimation both of the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The solution overcame small sample size and obtained estimation for small area was implemented small area estimation method for evidence higher accuracy and better precision improved direct estimator.
Study on the medical meteorological forecast of the number of hypertension inpatient based on SVR
NASA Astrophysics Data System (ADS)
Zhai, Guangyu; Chai, Guorong; Zhang, Haifeng
2017-06-01
The purpose of this study is to build a hypertension prediction model by discussing the meteorological factors for hypertension incidence. The research method is selecting the standard data of relative humidity, air temperature, visibility, wind speed and air pressure of Lanzhou from 2010 to 2012(calculating the maximum, minimum and average value with 5 days as a unit ) as the input variables of Support Vector Regression(SVR) and the standard data of hypertension incidence of the same period as the output dependent variables to obtain the optimal prediction parameters by cross validation algorithm, then by SVR algorithm learning and training, a SVR forecast model for hypertension incidence is built. The result shows that the hypertension prediction model is composed of 15 input independent variables, the training accuracy is 0.005, the final error is 0.0026389. The forecast accuracy based on SVR model is 97.1429%, which is higher than statistical forecast equation and neural network prediction method. It is concluded that SVR model provides a new method for hypertension prediction with its simple calculation, small error as well as higher historical sample fitting and Independent sample forecast capability.
Mollenhauer, Robert; Mouser, Joshua B.; Brewer, Shannon K.
2018-01-01
Temporal and spatial variability in streams result in heterogeneous gear capture probability (i.e., the proportion of available individuals identified) that confounds interpretation of data used to monitor fish abundance. We modeled tow-barge electrofishing capture probability at multiple spatial scales for nine Ozark Highland stream fishes. In addition to fish size, we identified seven reach-scale environmental characteristics associated with variable capture probability: stream discharge, water depth, conductivity, water clarity, emergent vegetation, wetted width–depth ratio, and proportion of riffle habitat. The magnitude of the relationship between capture probability and both discharge and depth varied among stream fishes. We also identified lithological characteristics among stream segments as a coarse-scale source of variable capture probability. The resulting capture probability model can be used to adjust catch data and derive reach-scale absolute abundance estimates across a wide range of sampling conditions with similar effort as used in more traditional fisheries surveys (i.e., catch per unit effort). Adjusting catch data based on variable capture probability improves the comparability of data sets, thus promoting both well-informed conservation and management decisions and advances in stream-fish ecology.
Queri, S; Konrad, M; Keller, K
2012-08-01
Increasing stress-associated health problems in Germany often are attributed to problems on the job, in particular to rising work demands. The study includes several stress predictors from other results and from literature in one predictive model for the field of work of "psychiatric rehabilitation".A cross-sectional design was used to measure personal and organizational variables with quantitative standard questionnaires as self-ratings from n=243 pedagogically active employees from various professions. Overall stress and job stress were measured with different instruments.The sample showed above-average overall stress scores along with below-average job stress scores. The multivariate predictive model for explaining the heightened stress shows pathogenetic and salutogenetic main effects for organizational variables such as "gratification crisis" and personal variables such as "occupational self-efficacy expectations" as well as an interaction of both types of variables. There are relevant gender-specific results concerning empathy and differences between professions concerning the extent of occupational self-efficacy.The results are a matter of particular interest for the practice of workplace health promotion as well as for social work schools, the main group in our sample being social workers. © Georg Thieme Verlag KG Stuttgart · New York.
Montoya, Isaac D; Bell, David C
2006-11-01
This article examines the effect of target, perceiver, and relationship characteristics on the perceiver's assessment that the target may be HIV seropositive (HIV+). A sample of 267 persons was recruited from low income, high drug use neighborhoods. Respondents (perceivers) were asked to name people (targets) with whom they had a social, drug sharing, or sexual relationship. Perceivers described 1,640 such relationships. Perceivers were asked about the targets' age, gender, and race/ethnicity, whether the targets were good-looking, their level of trust with the target, and how long they had known them. Perceivers were then asked to evaluate the chances that the target mentioned was HIV+. Two regression models were estimated on the 1,640 relationships mentioned. Model 1 included variables reflecting only target characteristics as independent variables. Model 2 included variables reflecting target characteristics as well as variables reflecting perceivers and perceiver-target relationship characteristics. The results showed that targets that were female, younger, and good-looking were perceived as being less likely to be HIV+. However, when accounting for perceiver and relationship effects, some of the target characteristic effects disappeared. Copyright 2006 APA, all rights reserved.
Vila-Rodriguez, F; Ochoa, S; Autonell, J; Usall, J; Haro, J M
2011-12-01
Social functioning (SF) is the ultimate target aimed in treatment plans in schizophrenia, thus it is critical to know what are the factors that determine SF. Gender is a well-established variable influencing SF, yet it is not known how social variables and symptoms interact in schizophrenia patients. Furthermore, it remains unclear whether the interaction between social variables and symptoms is different in men compared to women. Our aim is to test whether social variables are better predictors of SF in community-dwelled individuals with schizophrenia, and whether men and women differ in how symptoms and social variables interact to impact SF. Community-dwelling individuals with schizophrenia (N = 231) were randomly selected from a register. Participants were assessed with symptom measures (PANSS), performance-based social scale (LSP), objective social and demographic variables. Stratification by gender and stepwise multivariate regression analyses by gender were used to find the best-fitting models that predict SF in both gender. Men had poorer SF than women in spite of showing similar symptom scores. On stepwise regression analyses, gender was the main variable explaining SF, with a significant contribution by disorganized and excitatory symptoms. Age of onset made a less marked, yet significant, contribution to explain SF. When the sample was stratified by gender, disorganized symptoms and 'Income' variable entered the model and accounted for a 30.8% of the SF variance in women. On the other hand, positive and disorganized symptoms entered the model and accounted for a 36.1% of the SF variance in men. Community-dwelling men and women with schizophrenia differ in the constellation of variables associated with SF. Symptom scores still account for most of the variance in SF in both genders.
A structural model of the dimensions of teacher stress.
Boyle, G J; Borg, M G; Falzon, J M; Baglioni, A J
1995-03-01
A comprehensive survey of teacher stress, job satisfaction and career commitment among 710 full-time primary school teachers was undertaken by Borg, Riding & Falzon (1991) in the Mediterranean islands of Malta and Gozo. A principal components analysis of a 20-item sources of teacher stress inventory had suggested four distinct dimensions which were labelled: Pupil Misbehaviour, Time/Resource Difficulties, Professional Recognition Needs, and Poor Relationships, respectively. To check on the validity of the Borg et al. factor solution, the group of 710 teachers was randomly split into two separate samples. Exploratory factor analysis was carried out on the data from Sample 1 (N = 335), while Sample 2 (N = 375) provided the cross-validational data for a LISREL confirmatory factor analysis. Results supported the proposed dimensionality of the sources of teacher stress (measurement model), along with evidence of an additional teacher stress factor (Workload). Consequently, structural modelling of the 'causal relationships' between the various latent variables and self-reported stress was undertaken on the combined samples (N = 710). Although both non-recursive and recursive models incorporating Poor Colleague Relations as a mediating variable were tested for their goodness-of-fit, a simple regression model provided the most parsimonious fit to the empirical data, wherein Workload and Student Misbehaviour accounted for most of the variance in predicting teaching stress.
unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance
Fiske, Ian J.; Chandler, Richard B.
2011-01-01
Ecological research uses data collection techniques that are prone to substantial and unique types of measurement error to address scientific questions about species abundance and distribution. These data collection schemes include a number of survey methods in which unmarked individuals are counted, or determined to be present, at spatially- referenced sites. Examples include site occupancy sampling, repeated counts, distance sampling, removal sampling, and double observer sampling. To appropriately analyze these data, hierarchical models have been developed to separately model explanatory variables of both a latent abundance or occurrence process and a conditional detection process. Because these models have a straightforward interpretation paralleling mechanisms under which the data arose, they have recently gained immense popularity. The common hierarchical structure of these models is well-suited for a unified modeling interface. The R package unmarked provides such a unified modeling framework, including tools for data exploration, model fitting, model criticism, post-hoc analysis, and model comparison.
Buesing, Lars; Bill, Johannes; Nessler, Bernhard; Maass, Wolfgang
2011-01-01
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons. PMID:22096452
Quantile regression models of animal habitat relationships
Cade, Brian S.
2003-01-01
Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.
Estimation of Particulate Mass and Manganese Exposure Levels among Welders
Hobson, Angela; Seixas, Noah; Sterling, David; Racette, Brad A.
2011-01-01
Background: Welders are frequently exposed to Manganese (Mn), which may increase the risk of neurological impairment. Historical exposure estimates for welding-exposed workers are needed for epidemiological studies evaluating the relationship between welding and neurological or other health outcomes. The objective of this study was to develop and validate a multivariate model to estimate quantitative levels of welding fume exposures based on welding particulate mass and Mn concentrations reported in the published literature. Methods: Articles that described welding particulate and Mn exposures during field welding activities were identified through a comprehensive literature search. Summary measures of exposure and related determinants such as year of sampling, welding process performed, type of ventilation used, degree of enclosure, base metal, and location of sampling filter were extracted from each article. The natural log of the reported arithmetic mean exposure level was used as the dependent variable in model building, while the independent variables included the exposure determinants. Cross-validation was performed to aid in model selection and to evaluate the generalizability of the models. Results: A total of 33 particulate and 27 Mn means were included in the regression analysis. The final model explained 76% of the variability in the mean exposures and included welding process and degree of enclosure as predictors. There was very little change in the explained variability and root mean squared error between the final model and its cross-validation model indicating the final model is robust given the available data. Conclusions: This model may be improved with more detailed exposure determinants; however, the relatively large amount of variance explained by the final model along with the positive generalizability results of the cross-validation increases the confidence that the estimates derived from this model can be used for estimating welder exposures in absence of individual measurement data. PMID:20870928
Santini, María Soledad; Utgés, María Eugenia; Berrozpe, Pablo; Manteca Acosta, Mariana; Casas, Natalia; Heuer, Paola; Salomón, O. Daniel
2015-01-01
The principal objective of this study was to assess a modeling approach to Lu. longipalpis distribution in an urban scenario, discriminating micro-scale landscape variables at microhabitat and macrohabitat scales and the presence from the abundance of the vector. For this objective, we studied vectors and domestic reservoirs and evaluated different environmental variables simultaneously, so we constructed a set of 13 models to account for micro-habitats, macro-habitats and mixed-habitats. We captured a total of 853 sandflies, of which 98.35% were Lu. longipalpis. We sampled a total of 197 dogs; 177 of which were associated with households where insects were sampled. Positive rK39 dogs represented 16.75% of the total, of which 47% were asymptomatic. Distance to the border of the city and high to medium density vegetation cover ended to be the explanatory variables, all positive, for the presence of sandflies in the city. All variables in the abundance model ended to be explanatory, trees around the trap, distance to the stream and its quadratic, being the last one the only one with negative coefficient indicating that the maximum abundance was associated with medium values of distance to the stream. The spatial distribution of dogs infected with L. infantum showed a heterogeneous pattern throughout the city; however, we could not confirm an association of the distribution with the variables assessed. In relation to Lu. longipalpis distribution, the strategy to discriminate the micro-spatial scales at which the environmental variables were recorded allowed us to associate presence with macrohabitat variables and abundance with microhabitat and macrohabitat variables. Based on the variables associated with Lu. longipalpis, the model will be validated in other cities and environmental surveillance, and control interventions will be proposed and evaluated in the microscale level and integrated with socio-cultural approaches and programmatic and village (mesoscale) strategies. PMID:26274318
Santini, María Soledad; Utgés, María Eugenia; Berrozpe, Pablo; Manteca Acosta, Mariana; Casas, Natalia; Heuer, Paola; Salomón, O Daniel
2015-01-01
The principal objective of this study was to assess a modeling approach to Lu. longipalpis distribution in an urban scenario, discriminating micro-scale landscape variables at microhabitat and macrohabitat scales and the presence from the abundance of the vector. For this objective, we studied vectors and domestic reservoirs and evaluated different environmental variables simultaneously, so we constructed a set of 13 models to account for micro-habitats, macro-habitats and mixed-habitats. We captured a total of 853 sandflies, of which 98.35% were Lu. longipalpis. We sampled a total of 197 dogs; 177 of which were associated with households where insects were sampled. Positive rK39 dogs represented 16.75% of the total, of which 47% were asymptomatic. Distance to the border of the city and high to medium density vegetation cover ended to be the explanatory variables, all positive, for the presence of sandflies in the city. All variables in the abundance model ended to be explanatory, trees around the trap, distance to the stream and its quadratic, being the last one the only one with negative coefficient indicating that the maximum abundance was associated with medium values of distance to the stream. The spatial distribution of dogs infected with L. infantum showed a heterogeneous pattern throughout the city; however, we could not confirm an association of the distribution with the variables assessed. In relation to Lu. longipalpis distribution, the strategy to discriminate the micro-spatial scales at which the environmental variables were recorded allowed us to associate presence with macrohabitat variables and abundance with microhabitat and macrohabitat variables. Based on the variables associated with Lu. longipalpis, the model will be validated in other cities and environmental surveillance, and control interventions will be proposed and evaluated in the microscale level and integrated with socio-cultural approaches and programmatic and village (mesoscale) strategies.
[Measurement of Water COD Based on UV-Vis Spectroscopy Technology].
Wang, Xiao-ming; Zhang, Hai-liang; Luo, Wei; Liu, Xue-mei
2016-01-01
Ultraviolet/visible (UV/Vis) spectroscopy technology was used to measure water COD. A total of 135 water samples were collected from Zhejiang province. Raw spectra with 3 different pretreatment methods (Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) and 1st Derivatives were compared to determine the optimal pretreatment method for analysis. Spectral variable selection is an important strategy in spectrum modeling analysis, because it tends to parsimonious data representation and can lead to multivariate models with better performance. In order to simply calibration models, the preprocessed spectra were then used to select sensitive wavelengths by competitive adaptive reweighted sampling (CARS), Random frog and Successive Genetic Algorithm (GA) methods. Different numbers of sensitive wavelengths were selected by different variable selection methods with SNV preprocessing method. Partial least squares (PLS) was used to build models with the full spectra, and Extreme Learning Machine (ELM) was applied to build models with the selected wavelength variables. The overall results showed that ELM model performed better than PLS model, and the ELM model with the selected wavelengths based on CARS obtained the best results with the determination coefficient (R2), RMSEP and RPD were 0.82, 14.48 and 2.34 for prediction set. The results indicated that it was feasible to use UV/Vis with characteristic wavelengths which were obtained by CARS variable selection method, combined with ELM calibration could apply for the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.
Webster, R J; Williams, A; Marchetti, F; Yauk, C L
2018-07-01
Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Thinking Visually about Algebra
ERIC Educational Resources Information Center
Baroudi, Ziad
2015-01-01
Many introductions to algebra in high school begin with teaching students to generalise linear numerical patterns. This article argues that this approach needs to be changed so that students encounter variables in the context of modelling visual patterns so that the variables have a meaning. The article presents sample classroom activities,…
Nelson, Jon P
2014-01-01
Precise estimates of price elasticities are important for alcohol tax policy. Using meta-analysis, this paper corrects average beer elasticities for heterogeneity, dependence, and publication selection bias. A sample of 191 estimates is obtained from 114 primary studies. Simple and weighted means are reported. Dependence is addressed by restricting number of estimates per study, author-restricted samples, and author-specific variables. Publication bias is addressed using funnel graph, trim-and-fill, and Egger's intercept model. Heterogeneity and selection bias are examined jointly in meta-regressions containing moderator variables for econometric methodology, primary data, and precision of estimates. Results for fixed- and random-effects regressions are reported. Country-specific effects and sample time periods are unimportant, but several methodology variables help explain the dispersion of estimates. In models that correct for selection bias and heterogeneity, the average beer price elasticity is about -0.20, which is less elastic by 50% compared to values commonly used in alcohol tax policy simulations. Copyright © 2013 Elsevier B.V. All rights reserved.
Neural Spike Train Synchronisation Indices: Definitions, Interpretations and Applications.
Halliday, D M; Rosenberg, J R
2017-04-24
A comparison of previously defined spike train syncrhonization indices is undertaken within a stochastic point process framework. The second order cumulant density (covariance density) is shown to be common to all the indices. Simulation studies were used to investigate the sampling variability of a single index based on the second order cumulant. The simulations used a paired motoneurone model and a paired regular spiking cortical neurone model. The sampling variability of spike trains generated under identical conditions from the paired motoneurone model varied from 50% { 160% of the estimated value. On theoretical grounds, and on the basis of simulated data a rate dependence is present in all synchronization indices. The application of coherence and pooled coherence estimates to the issue of synchronization indices is considered. This alternative frequency domain approach allows an arbitrary number of spike train pairs to be evaluated for statistically significant differences, and combined into a single population measure. The pooled coherence framework allows pooled time domain measures to be derived, application of this to the simulated data is illustrated. Data from the cortical neurone model is generated over a wide range of firing rates (1 - 250 spikes/sec). The pooled coherence framework correctly characterizes the sampling variability as not significant over this wide operating range. The broader applicability of this approach to multi electrode array data is briefly discussed.
Optical Variability of Narrow-line and Broad-line Seyfert 1 Galaxies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rakshit, Suvendu; Stalin, C. S., E-mail: suvenduat@gmail.com
We studied the optical variability (OV) of a large sample of narrow-line Seyfert 1 (NLSy1) and broad-line Seyfert 1 (BLSy1) galaxies with z < 0.8 to investigate any differences in their OV properties. Using archival optical V -band light curves from the Catalina Real Time Transient Survey that span 5–9 years and modeling them using damped random walk, we estimated the amplitude of variability. We found that NLSy1 galaxies as a class show lower amplitude of variability than their broad-line counterparts. In the sample of both NLSy1 and BLSy1 galaxies, radio-loud sources are found to have higher variability amplitude thanmore » radio-quiet sources. Considering only sources that are detected in the X-ray band, NLSy1 galaxies are less optically variable than BLSy1 galaxies. The amplitude of variability in the sample of both NLSy1 and BLSy1 galaxies is found to be anti-correlated with Fe ii strength but correlated with the width of the H β line. The well-known anti-correlation of variability–luminosity and the variability–Eddington ratio is present in our data. Among the radio-loud sample, variability amplitude is found to be correlated with radio-loudness and radio-power, suggesting that jets also play an important role in the OV in radio-loud objects, in addition to the Eddington ratio, which is the main driving factor of OV in radio-quiet sources.« less
Study of Bias in 2012-Placement Test through Rasch Model in Terms of Gender Variable
ERIC Educational Resources Information Center
Turkan, Azmi; Cetin, Bayram
2017-01-01
Validity and reliability are among the most crucial characteristics of a test. One of the steps to make sure that a test is valid and reliable is to examine the bias in test items. The purpose of this study was to examine the bias in 2012 Placement Test items in terms of gender variable using Rasch Model in Turkey. The sample of this study was…
Attfield, Kathleen R; Hughes, Michael D; Spengler, John D; Lu, Chensheng
2014-02-01
Children are exposed to pesticides from many sources and routes, including dietary and incidental ingestion, dermal absorption, and inhalation. Linking health outcomes to these exposures using urinary metabolites requires understanding temporal variability within subjects to avoid exposure misclassification. We characterized the within- and between-child variability of urinary organophosphorus and pyrethroid metabolites in 23 participants of the Children's Pesticide Exposure Study-Washington over 1 year and examined the ability of one to four spot urine samples to categorize mean exposures. Each child provided urine samples twice daily over 7- to 16-day sessions in four seasons in 2003 and 2004. Samples were analyzed for five pyrethroid and five organophosphorus (OP) metabolites. After adjusting for specific gravity, we used a customized maximum likelihood estimation linear mixed-effects model that accounted for values below the limit of detection to calculate intraclass correlation coefficients (ICC) and conducted surrogate category analyses. Within-child variability was 2-11 times greater than between-child variability. When restricted to samples collected during a single season, ICCs were higher in the fall, winter, and spring than in summer for OPs, and higher in summer and winter for pyrethroids, indicating an increase in between-person variability relative to within-person variability during these seasons. Surrogate category analyses demonstrated that a single spot urine sample did not categorize metabolite concentrations well, and that four or more samples would be needed to categorize children into quartiles consistently. Urinary biomarkers of these short half-life pesticides exhibited substantial within-person variability in children observed over four seasons. Researchers investigating pesticides and health outcomes in children may need repeated biomarker measurements to derive accurate estimates of exposure and relative risks.
Lavado Contador, J F; Maneta, M; Schnabel, S
2006-10-01
The capability of Artificial Neural Network models to forecast near-surface soil moisture at fine spatial scale resolution has been tested for a 99.5 ha watershed located in SW Spain using several easy to achieve digital models of topographic and land cover variables as inputs and a series of soil moisture measurements as training data set. The study methods were designed in order to determining the potentials of the neural network model as a tool to gain insight into soil moisture distribution factors and also in order to optimize the data sampling scheme finding the optimum size of the training data set. Results suggest the efficiency of the methods in forecasting soil moisture, as a tool to assess the optimum number of field samples, and the importance of the variables selected in explaining the final map obtained.
A Database Approach for Predicting and Monitoring Baked Anode Properties
NASA Astrophysics Data System (ADS)
Lauzon-Gauthier, Julien; Duchesne, Carl; Tessier, Jayson
2012-11-01
The baked anode quality control strategy currently used by most carbon plants based on testing anode core samples in the laboratory is inadequate for facing increased raw material variability. The low core sampling rate limited by lab capacity and the common practice of reporting averaged properties based on some anode population mask a significant amount of individual anode variability. In addition, lab results are typically available a few weeks after production and the anodes are often already set in the reduction cells preventing early remedial actions when necessary. A database approach is proposed in this work to develop a soft-sensor for predicting individual baked anode properties at the end of baking cycle. A large historical database including raw material properties, process operating parameters and anode core data was collected from a modern Alcoa plant. A multivariate latent variable PLS regression method was used for analyzing the large database and building the soft-sensor model. It is shown that the general low frequency trends in most anode physical and mechanical properties driven by raw material changes are very well captured by the model. Improvements in the data infrastructure (instrumentation, sampling frequency and location) will be necessary for predicting higher frequency variations in individual baked anode properties. This paper also demonstrates how multivariate latent variable models can be interpreted against process knowledge and used for real-time process monitoring of carbon plants, and detection of faults and abnormal operation.
NASA Technical Reports Server (NTRS)
Johnson, R. A.; Wehrly, T.
1976-01-01
Population models for dependence between two angular measurements and for dependence between an angular and a linear observation are proposed. The method of canonical correlations first leads to new population and sample measures of dependence in this latter situation. An example relating wind direction to the level of a pollutant is given. Next, applied to pairs of angular measurements, the method yields previously proposed sample measures in some special cases and a new sample measure in general.
Menke, S.B.; Holway, D.A.; Fisher, R.N.; Jetz, W.
2009-01-01
Aim: Species distribution models (SDMs) or, more specifically, ecological niche models (ENMs) are a useful and rapidly proliferating tool in ecology and global change biology. ENMs attempt to capture associations between a species and its environment and are often used to draw biological inferences, to predict potential occurrences in unoccupied regions and to forecast future distributions under environmental change. The accuracy of ENMs, however, hinges critically on the quality of occurrence data. ENMs often use haphazardly collected data rather than data collected across the full spectrum of existing environmental conditions. Moreover, it remains unclear how processes affecting ENM predictions operate at different spatial scales. The scale (i.e. grain size) of analysis may be dictated more by the sampling regime than by biologically meaningful processes. The aim of our study is to jointly quantify how issues relating to region and scale affect ENM predictions using an economically important and ecologically damaging invasive species, the Argentine ant (Linepithema humile). Location: California, USA. Methods: We analysed the relationship between sampling sufficiency, regional differences in environmental parameter space and cell size of analysis and resampling environmental layers using two independently collected sets of presence/absence data. Differences in variable importance were determined using model averaging and logistic regression. Model accuracy was measured with area under the curve (AUC) and Cohen's kappa. Results: We first demonstrate that insufficient sampling of environmental parameter space can cause large errors in predicted distributions and biological interpretation. Models performed best when they were parametrized with data that sufficiently sampled environmental parameter space. Second, we show that altering the spatial grain of analysis changes the relative importance of different environmental variables. These changes apparently result from how environmental constraints and the sampling distributions of environmental variables change with spatial grain. Conclusions: These findings have clear relevance for biological inference. Taken together, our results illustrate potentially general limitations for ENMs, especially when such models are used to predict species occurrences in novel environments. We offer basic methodological and conceptual guidelines for appropriate sampling and scale matching. ?? 2009 The Authors Journal compilation ?? 2009 Blackwell Publishing.
2013-01-01
Background Different recovery patterns are reported for those befallen a whip-lash injury, but little is known about the variability within subgroups. The aims were (1) to compare a self-selected mildly affected sample (MILD) with a self-selected moderately to severely affected sample (MOD/SEV) with regard to background characteristics and pain-related disability, pain intensity, functional self-efficacy, fear of movement/(re)injury, pain catastrophising, post-traumatic stress symptoms in the acute stage (at baseline), (2) to study the development over the first year after the accident for the above listed clinical variables in the MILD sample, and (3) to study the validity of a prediction model including baseline levels of clinical variables on pain-related disability one year after baseline assessments. Methods The study had a prospective and correlative design. Ninety-eight participants were consecutively selected. Inclusion criteria; age 18 to 65 years, WAD grade I-II, Swedish language skills, and subjective report of not being in need of treatment due to mild symptoms. A multivariate linear regression model was applied for the prediction analysis. Results The MILD sample was less affected in all study variables compared to the MOD/SEV sample. Pain-related disability, pain catastrophising, and post-traumatic stress symptoms decreased over the first year after the accident, whereas functional self-efficacy and fear of movement/(re)injury increased. Pain intensity was stable. Pain-related disability at baseline emerged as the only statistically significant predictor of pain-related disability one year after the accident (Adj r2 = 0.67). Conclusion A good prognosis over the first year is expected for the majority of individuals with WAD grade I or II who decline treatment due to mild symptoms. The prediction model was not valid in the MILD sample except for the contribution of pain-related disability. An implication is that early observations of individuals with elevated levels of pain-related disability are warranted, although they may decline treatment. PMID:24359208
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hero, Alfred O.; Rajaratnam, Bala
When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.« less
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-01-01
When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-12-09
When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.« less
Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco
2005-01-01
A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204
Comparing hierarchical models via the marginalized deviance information criterion.
Quintero, Adrian; Lesaffre, Emmanuel
2018-07-20
Hierarchical models are extensively used in pharmacokinetics and longitudinal studies. When the estimation is performed from a Bayesian approach, model comparison is often based on the deviance information criterion (DIC). In hierarchical models with latent variables, there are several versions of this statistic: the conditional DIC (cDIC) that incorporates the latent variables in the focus of the analysis and the marginalized DIC (mDIC) that integrates them out. Regardless of the asymptotic and coherency difficulties of cDIC, this alternative is usually used in Markov chain Monte Carlo (MCMC) methods for hierarchical models because of practical convenience. The mDIC criterion is more appropriate in most cases but requires integration of the likelihood, which is computationally demanding and not implemented in Bayesian software. Therefore, we consider a method to compute mDIC by generating replicate samples of the latent variables that need to be integrated out. This alternative can be easily conducted from the MCMC output of Bayesian packages and is widely applicable to hierarchical models in general. Additionally, we propose some approximations in order to reduce the computational complexity for large-sample situations. The method is illustrated with simulated data sets and 2 medical studies, evidencing that cDIC may be misleading whilst mDIC appears pertinent. Copyright © 2018 John Wiley & Sons, Ltd.
Implications of complete watershed soil moisture measurements to hydrologic modeling
NASA Technical Reports Server (NTRS)
Engman, E. T.; Jackson, T. J.; Schmugge, T. J.
1983-01-01
A series of six microwave data collection flights for measuring soil moisture were made over a small 7.8 square kilometer watershed in southwestern Minnesota. These flights were made to provide 100 percent coverage of the basin at a 400 m resolution. In addition, three flight lines were flown at preselected areas to provide a sample of data at a higher resolution of 60 m. The low level flights provide considerably more information on soil moisture variability. The results are discussed in terms of reproducibility, spatial variability and temporal variability, and their implications for hydrologic modeling.
A Q-GERT Model for Determining the Maintenance Crew Size for the SAC command Post Upgrade
1983-12-01
time that an equiprment fails. DAY3 A real variable corresponding to the day that an LRU is removed from the equipment. DAY4 A real variable...variable corresponding to the time that an LRU is repaired. TIM5 A real variable corresponaing to Lhe time that an equipment returns to service. TNOW...The current time . UF(IFN) User function IFN. UN(I) A sample from the uniform distri- bution defined by parameter set I. YIlN1 A real variable
Shoukri, Mohamed M; Elkum, Nasser; Walter, Stephen D
2006-01-01
Background In this paper we propose the use of the within-subject coefficient of variation as an index of a measurement's reliability. For continuous variables and based on its maximum likelihood estimation we derive a variance-stabilizing transformation and discuss confidence interval construction within the framework of a one-way random effects model. We investigate sample size requirements for the within-subject coefficient of variation for continuous and binary variables. Methods We investigate the validity of the approximate normal confidence interval by Monte Carlo simulations. In designing a reliability study, a crucial issue is the balance between the number of subjects to be recruited and the number of repeated measurements per subject. We discuss efficiency of estimation and cost considerations for the optimal allocation of the sample resources. The approach is illustrated by an example on Magnetic Resonance Imaging (MRI). We also discuss the issue of sample size estimation for dichotomous responses with two examples. Results For the continuous variable we found that the variance stabilizing transformation improves the asymptotic coverage probabilities on the within-subject coefficient of variation for the continuous variable. The maximum like estimation and sample size estimation based on pre-specified width of confidence interval are novel contribution to the literature for the binary variable. Conclusion Using the sample size formulas, we hope to help clinical epidemiologists and practicing statisticians to efficiently design reliability studies using the within-subject coefficient of variation, whether the variable of interest is continuous or binary. PMID:16686943
Mandy, William; Charman, Tony; Puura, Kaija; Skuse, David
2014-01-01
The recent Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5) reformulation of autism spectrum disorder has received empirical support from North American and UK samples. Autism spectrum disorder is an increasingly global diagnosis, and research is needed to discover how well it generalises beyond North America and the United Kingdom. We tested the applicability of the DSM-5 model to a sample of Finnish young people with autism spectrum disorder (n = 130) or the broader autism phenotype (n = 110). Confirmatory factor analysis tested the DSM-5 model in Finland and compared the fit of this model between Finnish and UK participants (autism spectrum disorder, n = 488; broader autism phenotype, n = 220). In both countries, autistic symptoms were measured using the Developmental, Diagnostic and Dimensional Interview. Replicating findings from English-speaking samples, the DSM-5 model fitted well in Finnish autism spectrum disorder participants, outperforming a Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV) model. The DSM-5 model fitted equally well in Finnish and UK autism spectrum disorder samples. Among broader autism phenotype participants, this model fitted well in the United Kingdom but poorly in Finland, suggesting that cross-cultural variability may be greatest for milder autistic characteristics. We encourage researchers with data from other cultures to emulate our methodological approach, to map any cultural variability in the manifestation of autism spectrum disorder and the broader autism phenotype. This would be especially valuable given the ongoing revision of the International Classification of Diseases-11th Edition, the most global of the diagnostic manuals.
Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks
NASA Astrophysics Data System (ADS)
Sun, Wei; Chang, K. C.
2005-05-01
Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.
Cowell, Robert G
2018-05-04
Current models for single source and mixture samples, and probabilistic genotyping software based on them used for analysing STR electropherogram data, assume simple probability distributions, such as the gamma distribution, to model the allelic peak height variability given the initial amount of DNA prior to PCR amplification. Here we illustrate how amplicon number distributions, for a model of the process of sample DNA collection and PCR amplification, may be efficiently computed by evaluating probability generating functions using discrete Fourier transforms. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
WU, Chunhung
2015-04-01
The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2 = 0.253 (0.260). The unit with the landslide susceptibility value > 0.5 (≦ 0.5) will be classified as a predicted landslide unit (not landslide unit). The AUC, i.e. the area under the relative operating characteristic curve, of or-LRLSM in the Chishan watershed is 0.72, while that of lr-LRLSM is 0.77. Furthermore, the average correct ratio of lr-LRLSM (73.3%) is better than that of or-LRLSM (68.3%). The research analyzed in detail the error sources from the two models. In continuous variables, using the landslide ratio-based classification in building the lr-LRLSM can let the distribution of weighted value more similar to distribution of landslide ratio in the range of continuous variable than that in building the or-LRLSM. In categorical variables, the meaning of using the landslide ratio-based classification in building the lr-LRLSM is to gather the parameters with approximate landslide ratio together. The mean correct ratio in continuous variables (categorical variables) by using the lr-LRLSM is better than that in or-LRLSM by 0.6 ~ 2.6% (1.7% ~ 6.0%). Building the landslide susceptibility model by using landslide ratio-based classification is practical and of better performance than that by using the original logistic regression.
Sample size calculations for case-control studies
This R package can be used to calculate the required samples size for unconditional multivariate analyses of unmatched case-control studies. The sample sizes are for a scalar exposure effect, such as binary, ordinal or continuous exposures. The sample sizes can also be computed for scalar interaction effects. The analyses account for the effects of potential confounder variables that are also included in the multivariate logistic model.
Muñoz, Manuel; Sanz, María; Pérez-Santos, Eloísa; Quiroga, María de Los Ángeles
2011-04-30
The social stigma of mental illness has received much attention in recent years and its effects on diverse variables such as psychiatric symptoms, social functioning, self-esteem, self-efficacy, quality of life, and social integration are well established. However, internalized stigma in people with severe and persistent mental illness has not received the same attention. The aim of the present work was to study the relationships between the principal variables involved in the functioning of internalized stigma (sociodemographic and clinical variables, social stigma, psychosocial functioning, recovery expectations, empowerment, and discrimination experiences) in a sample of people with severe and persistent mental illness (N=108). The main characteristics of the sample and the differences between groups with high and low internalized stigma were analyzed, a correlation analysis of the variables was performed, and a structural equation model, integrating variables of social, cognitive, and behavioral content, was proposed and tested. The results indicate the relationships among social stigma, discrimination experiences, recovery expectation, and internalized stigma and their role in the psychosocial and behavioral outcomes in schizophrenia spectrum disorders. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Assimilating bio-optical glider data during a phytoplankton bloom in the southern Ross Sea
NASA Astrophysics Data System (ADS)
Kaufman, Daniel E.; Friedrichs, Marjorie A. M.; Hemmings, John C. P.; Smith, Walker O., Jr.
2018-01-01
The Ross Sea is a region characterized by high primary productivity in comparison to other Antarctic coastal regions, and its productivity is marked by considerable variability both spatially (1-50 km) and temporally (days to weeks). This variability presents a challenge for inferring phytoplankton dynamics from observations that are limited in time or space, which is often the case due to logistical limitations of sampling. To better understand the spatiotemporal variability in Ross Sea phytoplankton dynamics and to determine how restricted sampling may skew dynamical interpretations, high-resolution bio-optical glider measurements were assimilated into a one-dimensional biogeochemical model adapted for the Ross Sea. The assimilation of data from the entire glider track using the micro-genetic and local search algorithms in the Marine Model Optimization Testbed improves the model-data fit by ˜ 50 %, generating rates of integrated primary production of 104 g C m-2 yr-1 and export at 200 m of 27 g C m-2 yr-1. Assimilating glider data from three different latitudinal bands and three different longitudinal bands results in minimal changes to the simulations, improves the model-data fit with respect to unassimilated data by ˜ 35 %, and confirms that analyzing these glider observations as a time series via a one-dimensional model is reasonable on these scales. Whereas assimilating the full glider data set produces well-constrained simulations, assimilating subsampled glider data at a frequency consistent with cruise-based sampling results in a wide range of primary production and export estimates. These estimates depend strongly on the timing of the assimilated observations, due to the presence of high mesoscale variability in this region. Assimilating surface glider data subsampled at a frequency consistent with available satellite-derived data results in 40 % lower carbon export, primarily resulting from optimized rates generating more slowly sinking diatoms. This analysis highlights the need for the strategic consideration of the impacts of data frequency, duration, and coverage when combining observations with biogeochemical modeling in regions with strong mesoscale variability.
ERIC Educational Resources Information Center
Li, Tiandong
2012-01-01
In large-scale assessments, such as the National Assessment of Educational Progress (NAEP), plausible values based on Multiple Imputations (MI) have been used to estimate population characteristics for latent constructs under complex sample designs. Mislevy (1991) derived a closed-form analytic solution for a fixed-effect model in creating…
David C. Chojnacky; Randolph H. Wynne; Christine E. Blinn
2009-01-01
Methodology is lacking to easily map Forest Inventory and Analysis (FIA) inventory statistics for all attribute variables without having to develop separate models and methods for each variable. We developed a mapping method that can directly transfer tabular data to a map on which pixels can be added any way desired to estimate carbon (or any other variable) for a...
Inter-Individual Variability in Human Response to Low-Dose Ionizing Radiation, Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rocke, David
2016-08-01
In order to investigate inter-individual variability in response to low-dose ionizing radiation, we are working with three models, 1) in-vivo irradiated human skin, for which we have a realistic model, but with few subjects, all from a previous project, 2) ex-vivo irradiated human skin, for which we also have a realistic model, though with the limitations involved in keeping skin pieces alive in media, and 3) MatTek EpiDermFT skin plugs, which provides a more realistic model than cell lines, which is more controllable than human samples.
Multivariate localization methods for ensemble Kalman filtering
NASA Astrophysics Data System (ADS)
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M. G.
2015-12-01
In ensemble Kalman filtering (EnKF), the small number of ensemble members that is feasible to use in a practical data assimilation application leads to sampling variability of the estimates of the background error covariances. The standard approach to reducing the effects of this sampling variability, which has also been found to be highly efficient in improving the performance of EnKF, is the localization of the estimates of the covariances. One family of localization techniques is based on taking the Schur (element-wise) product of the ensemble-based sample covariance matrix and a correlation matrix whose entries are obtained by the discretization of a distance-dependent correlation function. While the proper definition of the localization function for a single state variable has been extensively investigated, a rigorous definition of the localization function for multiple state variables that exist at the same locations has been seldom considered. This paper introduces two strategies for the construction of localization functions for multiple state variables. The proposed localization functions are tested by assimilating simulated observations experiments into the bivariate Lorenz 95 model with their help.
Austin, Peter C; Steyerberg, Ewout W
2012-06-20
When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong
2015-05-01
The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.
Workplace Determinants of Endotoxin Exposure in Dental Healthcare Facilities in South Africa
Singh, Tanusha S.; Bello, Braimoh; Mabe, Onnicah D.; Renton, Kevin; Jeebhay, Mohamed F.
2010-01-01
Objectives: Aerosols generated during dental procedures have been reported to contain endotoxin as a result of bacterial contamination of dental unit water lines. This study investigated the determinants of airborne endotoxin exposure in dental healthcare settings. Methods: The study population included dental personnel (n = 454) from five academic dental institutions in South Africa. Personal air samples (n = 413) in various dental jobs and water samples (n = 403) from dental handpieces and basin taps were collected. The chromogenic-1000 limulus amebocyte lysate assay was used to determine endotoxin levels. Exposure metrics were developed on the basis of individually measured exposures and average levels within each job category. Analysis of variance and multivariate linear regression models were constructed to ascertain the determinants of exposure in the dental group. Results: There was a 2-fold variation in personal airborne endotoxin from the least exposed (administration) to the most exposed (laboratory) jobs (geometric mean levels: 2.38 versus 5.63 EU m−3). Three percent of personal samples were above DECOS recommended exposure limit (50 EU m−3). In the univariate linear models, the age of the dental units explained the most variability observed in the personal air samples (R2 = 0.20, P < 0.001), followed by the season of the year (R2 = 0.11, P < 0.001). Other variables such as institution and total number of dental units per institution also explained a modest degree of variability. A multivariate model explaining the greatest variability (adjusted R2 = 0.40, P < 0.001) included: the age of institution buildings, total number of dental units per institution, ambient temperature, ambient air velocity, endotoxin levels in water, job category (staff versus students), dental unit model type and age of dental unit. Conclusions: Apart from job type, dental unit characteristics are important predictors of airborne endotoxin levels in this setting. PMID:20044586
NASA Astrophysics Data System (ADS)
La Cour, Brian R.
2017-07-01
An experiment has recently been performed to demonstrate quantum nonlocality by establishing contextuality in one of a pair of photons encoding four qubits; however, low detection efficiencies and use of the fair-sampling hypothesis leave these results open to possible criticism due to the detection loophole. In this Letter, a physically motivated local hidden-variable model is considered as a possible mechanism for explaining the experimentally observed results. The model, though not intrinsically contextual, acquires this quality upon post-selection of coincident detections.
Differentiating between precursor and control variables when analyzing reasoned action theories.
Hennessy, Michael; Bleakley, Amy; Fishbein, Martin; Brown, Larry; Diclemente, Ralph; Romer, Daniel; Valois, Robert; Vanable, Peter A; Carey, Michael P; Salazar, Laura
2010-02-01
This paper highlights the distinction between precursor and control variables in the context of reasoned action theory. Here the theory is combined with structural equation modeling to demonstrate how age and past sexual behavior should be situated in a reasoned action analysis. A two wave longitudinal survey sample of African-American adolescents is analyzed where the target behavior is having vaginal sex. Results differ when age and past behavior are used as control variables and when they are correctly used as precursors. Because control variables do not appear in any form of reasoned action theory, this approach to including background variables is not correct when analyzing data sets based on the theoretical axioms of the Theory of Reasoned Action, the Theory of Planned Behavior, or the Integrative Model.
Differentiating Between Precursor and Control Variables When Analyzing Reasoned Action Theories
Hennessy, Michael; Bleakley, Amy; Fishbein, Martin; Brown, Larry; DiClemente, Ralph; Romer, Daniel; Valois, Robert; Vanable, Peter A.; Carey, Michael P.; Salazar, Laura
2010-01-01
This paper highlights the distinction between precursor and control variables in the context of reasoned action theory. Here the theory is combined with structural equation modeling to demonstrate how age and past sexual behavior should be situated in a reasoned action analysis. A two wave longitudinal survey sample of African-American adolescents is analyzed where the target behavior is having vaginal sex. Results differ when age and past behavior are used as control variables and when they are correctly used as precursors. Because control variables do not appear in any form of reasoned action theory, this approach to including background variables is not correct when analyzing data sets based on the theoretical axioms of the Theory of Reasoned Action, the Theory of Planned Behavior, or the Integrative Model PMID:19370408
A generalized model for estimating the energy density of invertebrates
James, Daniel A.; Csargo, Isak J.; Von Eschen, Aaron; Thul, Megan D.; Baker, James M.; Hayer, Cari-Ann; Howell, Jessica; Krause, Jacob; Letvin, Alex; Chipps, Steven R.
2012-01-01
Invertebrate energy density (ED) values are traditionally measured using bomb calorimetry. However, many researchers rely on a few published literature sources to obtain ED values because of time and sampling constraints on measuring ED with bomb calorimetry. Literature values often do not account for spatial or temporal variability associated with invertebrate ED. Thus, these values can be unreliable for use in models and other ecological applications. We evaluated the generality of the relationship between invertebrate ED and proportion of dry-to-wet mass (pDM). We then developed and tested a regression model to predict ED from pDM based on a taxonomically, spatially, and temporally diverse sample of invertebrates representing 28 orders in aquatic (freshwater, estuarine, and marine) and terrestrial (temperate and arid) habitats from 4 continents and 2 oceans. Samples included invertebrates collected in all seasons over the last 19 y. Evaluation of these data revealed a significant relationship between ED and pDM (r2 = 0.96, p < 0.0001), where ED (as J/g wet mass) was estimated from pDM as ED = 22,960pDM − 174.2. Model evaluation showed that nearly all (98.8%) of the variability between observed and predicted values for invertebrate ED could be attributed to residual error in the model. Regression of observed on predicted values revealed that the 97.5% joint confidence region included the intercept of 0 (−103.0 ± 707.9) and slope of 1 (1.01 ± 0.12). Use of this model requires that only dry and wet mass measurements be obtained, resulting in significant time, sample size, and cost savings compared to traditional bomb calorimetry approaches. This model should prove useful for a wide range of ecological studies because it is unaffected by taxonomic, seasonal, or spatial variability.
Tomperi, Jani; Leiviskä, Kauko
2018-06-01
Traditionally the modelling in an activated sludge process has been based on solely the process measurements, but as the interest to optically monitor wastewater samples to characterize the floc morphology has increased, in the recent years the results of image analyses have been more frequently utilized to predict the characteristics of wastewater. This study shows that the traditional process measurements or the automated optical monitoring variables by themselves are not capable of developing the best predictive models for the treated wastewater quality in a full-scale wastewater treatment plant, but utilizing these variables together the optimal models, which show the level and changes in the treated wastewater quality, are achieved. By this early warning, process operation can be optimized to avoid environmental damages and economic losses. The study also shows that specific optical monitoring variables are important in modelling a certain quality parameter, regardless of the other input variables available.
NASA Astrophysics Data System (ADS)
Abbaszadeh Afshar, Farideh; Ayoubi, Shamsollah; Besalatpour, Ali Asghar; Khademi, Hossein; Castrignano, Annamaria
2016-03-01
This study was conducted to estimate soil clay content in two depths using geophysical techniques (Ground Penetration Radar-GPR and Electromagnetic Induction-EMI) and ancillary variables (remote sensing and topographic data) in an arid region of the southeastern Iran. GPR measurements were performed throughout ten transects of 100 m length with the line spacing of 10 m, and the EMI measurements were done every 10 m on the same transect in six sites. Ten soil cores were sampled randomly in each site and soil samples were taken from the depth of 0-20 and 20-40 cm, and then the clay fraction of each of sixty soil samples was measured in the laboratory. Clay content was predicted using three different sets of properties including geophysical data, ancillary data, and a combination of both as inputs to multiple linear regressions (MLR) and decision tree-based algorithm of Chi-Squared Automatic Interaction Detection (CHAID) models. The results of the CHAID and MLR models with all combined data showed that geophysical data were the most important variables for the prediction of clay content in two depths in the study area. The proposed MLR model, using the combined data, could explain only 0.44 and 0.31% of the total variability of clay content in 0-20 and 20-40 cm depths, respectively. Also, the coefficient of determination (R2) values for the clay content prediction, using the constructed CHAID model with the combined data, was 0.82 and 0.76 in 0-20 and 20-40 cm depths, respectively. CHAID models, therefore, showed a greater potential in predicting soil clay content from geophysical and ancillary data, while traditional regression methods (i.e. the MLR models) did not perform as well. Overall, the results may encourage researchers in using georeferenced GPR and EMI data as ancillary variables and CHAID algorithm to improve the estimation of soil clay content.
Vidal-Martínez, Víctor M; Torres-Irineo, Edgar; Romero, David; Gold-Bouchot, Gerardo; Martínez-Meyer, Enrique; Valdés-Lozano, David; Aguirre-Macedo, M Leopoldina
2015-11-26
Understanding the environmental and anthropogenic factors influencing the probability of occurrence of the marine parasitic species is fundamental for determining the circumstances under which they can act as bioindicators of environmental impact. The aim of this study was to determine whether physicochemical variables, polyaromatic hydrocarbons or sewage discharge affect the probability of occurrence of the larval cestode Oncomegas wageneri, which infects the shoal flounder, Syacium gunteri, in the southern Gulf of Mexico. The study area included 162 sampling sites in the southern Gulf of Mexico and covered 288,205 km(2), where the benthic sediments, water and the shoal flounder individuals were collected. We used the boosted generalised additive models (boosted GAM) and the MaxEnt to examine the potential statistical relationships between the environmental variables (nutrients, contaminants and physicochemical variables from the water and sediments) and the probability of the occurrence of this parasite. The models were calibrated using all of the sampling sites (full area) with and without parasite occurrences (n = 162) and a polygon area that included sampling sites with a depth of 1500 m or less (n = 134). Oncomegas wageneri occurred at 29/162 sampling sites. The boosted GAM for the full area and the polygon area accurately predicted the probability of the occurrence of O. wageneri in the study area. By contrast, poor probabilities of occurrence were obtained with the MaxEnt models for the same areas. The variables with the highest frequencies of appearance in the models (proxies for the explained variability) were the polyaromatic hydrocarbons of high molecular weight (PAHH, 95 %), followed by a combination of nutrients, spatial variables and polyaromatic hydrocarbons of low molecular weight (PAHL, 5 %). The contribution of the PAHH to the variability was explained by the fact that these compounds, together with N and P, are carried by rivers that discharge into the ocean, which enhances the growth of hydrocarbonoclastic bacteria and the productivity and number of the intermediate hosts. Our results suggest that sites with PAHL/PAHH ratio values up to 1.89 promote transmission based on the high values of the prevalence of O. wageneri in the study area. In contrast, PAHL/PAHH ratio values ≥ 1.90 can be considered harmful for the transmission stages of O. wageneri and its hosts (copepods, shrimps and shoal flounders). Overall, the results indicate that the PAHHs affect the probability of occurrence of this helminth parasite in the southern Gulf of Mexico.
Probing AGN Accretion Physics through AGN Variability: Insights from Kepler
NASA Astrophysics Data System (ADS)
Kasliwal, Vishal Pramod
Active Galactic Nuclei (AGN) exhibit large luminosity variations over the entire electromagnetic spectrum on timescales ranging from hours to years. The variations in luminosity are devoid of any periodic character and appear stochastic. While complex correlations exist between the variability observed in different parts of the electromagnetic spectrum, no frequency band appears to be completely dominant, suggesting that the physical processes producing the variability are exceedingly rich and complex. In the absence of a clear theoretical explanation of the variability, phenomenological models are used to study AGN variability. The stochastic behavior of AGN variability makes formulating such models difficult and connecting them to the underlying physics exceedingly hard. We study AGN light curves serendipitously observed by the NASA Kepler planet-finding mission. Compared to previous ground-based observations, Kepler offers higher precision and a smaller sampling interval resulting in potentially higher quality light curves. Using structure functions, we demonstrate that (1) the simplest statistical model of AGN variability, the damped random walk (DRW), is insufficient to characterize the observed behavior of AGN light curves; and (2) variability begins to occur in AGN on time-scales as short as hours. Of the 20 light curves studied by us, only 3-8 may be consistent with the DRW. The structure functions of the AGN in our sample exhibit complex behavior with pronounced dips on time-scales of 10-100 d suggesting that AGN variability can be very complex and merits further analysis. We examine the accuracy of the Kepler pipeline-generated light curves and find that the publicly available light curves may require re-processing to reduce contamination from field sources. We show that while the re-processing changes the exact PSD power law slopes inferred by us, it is unlikely to change the conclusion of our structure function study-Kepler AGN light curves indicate that the DRW is insufficient to characterize AGN variability. We provide a new approach to probing accretion physics with variability by decomposing observed light curves into a set of impulses that drive diffusive processes using C-ARMA models. Applying our approach to Kepler data, we demonstrate how the time-scales reported in the literature can be interpreted in the context of the growth and decay time-scales for flux perturbations and tentatively identify the flux perturbation driving process with accretion disk turbulence on length-scales much longer than the characteristic eddy size. Our analysis technique is applicable to (1) studying the connection between AGN sub-type and variability properties; (2) probing the origins of variability by studying the multi-wavelength behavior of AGN; (3) testing numerical simulations of accretion flows with the goal of creating a library of the variability properties of different accretion mechanisms; (4) hunting for changes in the behavior of the accretion flow by block-analyzing observed light curves; and (5) constraining the sampling requirements of future surveys of AGN variability.
NASA Astrophysics Data System (ADS)
Gomez, Jose Alfonso; Owens, Phillip N.; Koiter, Alex J.; Lobb, David
2016-04-01
One of the major sources of uncertainty in attributing sediment sources in fingerprinting studies is the uncertainty in determining the concentrations of the elements used in the mixing model due to the variability of the concentrations of these elements in the source materials (e.g., Kraushaar et al., 2015). The uncertainty in determining the "true" concentration of a given element in each one of the source areas depends on several factors, among them the spatial variability of that element, the sampling procedure and sampling density. Researchers have limited control over these factors, and usually sampling density tends to be sparse, limited by time and the resources available. Monte Carlo analysis has been used regularly in fingerprinting studies to explore the probable solutions within the measured variability of the elements in the source areas, providing an appraisal of the probability of the different solutions (e.g., Collins et al., 2012). This problem can be considered analogous to the propagation of uncertainty in hydrologic models due to uncertainty in the determination of the values of the model parameters, and there are many examples of Monte Carlo analysis of this uncertainty (e.g., Freeze, 1980; Gómez et al., 2001). Some of these model analyses rely on the simulation of "virtual" situations that were calibrated from parameter values found in the literature, with the purpose of providing insight about the response of the model to different configurations of input parameters. This approach - evaluating the answer for a "virtual" problem whose solution could be known in advance - might be useful in evaluating the propagation of uncertainty in mixing models in sediment fingerprinting studies. In this communication, we present the preliminary results of an on-going study evaluating the effect of variability of element concentrations in source materials, sampling density, and the number of elements included in the mixing models. For this study a virtual catchment was constructed, composed by three sub-catchments each of 500 x 500 m size. We assumed that there was no selectivity in sediment detachment or transport. A numerical excercise was performed considering these variables: 1) variability of element concentration: three levels with CVs of 20 %, 50 % and 80 %; 2) sampling density: 10, 25 and 50 "samples" per sub-catchment and element; and 3) number of elements included in the mixing model: two (determined), and five (overdetermined). This resulted in a total of 18 (3 x 3 x 2) possible combinations. The five fingerprinting elements considered in the study were: C, N, 40K, Al and Pavail, and their average values, taken from the literature, were: sub-catchment 1: 4.0 %, 0.35 %, 0.50 ppm, 5.0 ppm, 1.42 ppm, respectively; sub-catchment 2: 2.0 %, 0.18 %, 0.20 ppm, 10.0 ppm, 0.20 ppm, respectively; and sub-catchment 3: 1.0 %, 0.06 %, 1.0 ppm, 16.0 ppm, 7.8 ppm, respectively. For each sub-catchment, three maps of the spatial distribution of each element was generated using the random generator of Mejia and Rodriguez-Iturbe (1974) as described in Freeze (1980), using the average value and the three different CVs defined above. Each map for each source area and property was generated for a 100 x 100 square grid, each grid cell being 5 m x 5 m. Maps were randomly generated for each property and source area. In doing so, we did not consider the possibility of cross correlation among properties. Spatial autocorrelation was assumed to be weak. The reason for generating the maps was to create a "virtual" situation where all the element concentration values at each point are known. Simultaneously, we arbitrarily determined the percentage of sediment coming from sub-catchments. These values were 30 %, 10 % and 60 %, for sub-catchments 1, 2 and 3, respectively. Using these values, we determined the element concentrations in the sediment. The exercise consisted of creating different sampling strategies in a virtual environment to determine an average value for each of the different maps of element concentration and sub-catchment, under different sampling densities: 200 different average values for the "high" sampling density (average of 50 samples); 400 different average values for the "medium" sampling density (average of 25 samples); and 1,000 different average values for the "low" sampling density (average of 10 samples). All these combinations of possible values of element concentrations in the source areas were solved for the concentration in the sediment already determined for the "true" solution using limSolve (Soetaert et al., 2014) in R language. The sediment source solutions found for the different situations and values were analyzed in order to: 1) evaluate the uncertainty in the sediment source attribution; and 2) explore strategies to detect the most probable solutions that might lead to improved methods for constructing the most robust mixing models. Preliminary results on these will be presented and discussed in this communication. Key words: sediment, fingerprinting, uncertainty, variability, mixing model. References Collins, A.L., Zhang, Y., McChesney, D., Walling, D.E., Haley, S.M., Smith, P. 2012. Sediment source tracing in a lowland agricultural catchment in southern England using a modified procedure combining statistical analysis and numerical modelling. Science of the Total Environment 414: 301-317. Freeze, R.A. 1980. A stochastic-conceptual analysis of rainfall-runoff processes on a hillslope. Water Resources Research 16: 391-408.
ERIC Educational Resources Information Center
Unlu, Melihan; Ertekin, Erhan; Dilmac, Bulent
2017-01-01
The purpose of the research is to investigate the relationships between self-efficacy beliefs toward mathematics, mathematics anxiety and self-efficacy beliefs toward mathematics teaching, mathematics teaching anxiety variables and testing the relationships between these variables with structural equation model. The sample of the research, which…
Niño-Sandoval, Tania Camila; Guevara Perez, Sonia V; González, Fabio A; Jaque, Robinson Andrés; Infante-Contreras, Clementina
2016-04-01
The mandibular bone is an important part of the forensic facial reconstruction and it has the possibility of getting lost in skeletonized remains; for this reason, it is necessary to facilitate the identification process simulating the mandibular position only through craniomaxillary measures, for this task, different modeling techniques have been performed, but they only contemplate a straight facial profile that belong to skeletal pattern Class I, but the 24.5% corresponding to the Colombian skeletal patterns Class II and III are not taking into account, besides, craniofacial measures do not follow a parametric trend or a normal distribution. The aim of this study was to employ an automatic non-parametric method as the Support Vector Machines to classify skeletal patterns through craniomaxillary variables, in order to simulate the natural mandibular position on a contemporary Colombian sample. Lateral cephalograms (229) of Colombian young adults of both sexes were collected. Landmark coordinates protocols were used to create craniomaxillary variables. A Support Vector Machine with a linear kernel classifier model was trained on a subset of the available data and evaluated over the remaining samples. The weights of the model were used to select the 10 best variables for classification accuracy. An accuracy of 74.51% was obtained, defined by Pr-A-N, N-Pr-A, A-N-Pr, A-Te-Pr, A-Pr-Rhi, Rhi-A-Pr, Pr-A-Te, Te-Pr-A, Zm-A-Pr and PNS-A-Pr angles. The Class Precision and the Class Recall showed a correct distinction of the Class II from the Class III and vice versa. Support Vector Machines created an important model of classification of skeletal patterns using craniomaxillary variables that are not commonly used in the literature and could be applicable to the 24.5% of the contemporary Colombian sample. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Variance partitioning of stream diatom, fish, and invertebrate indicators of biological condition
Zuellig, Robert E.; Carlisle, Daren M.; Meador, Michael R.; Potapova, Marina
2012-01-01
Stream indicators used to make assessments of biological condition are influenced by many possible sources of variability. To examine this issue, we used multiple-year and multiple-reach diatom, fish, and invertebrate data collected from 20 least-disturbed and 46 developed stream segments between 1993 and 2004 as part of the US Geological Survey National Water Quality Assessment Program. We used a variance-component model to summarize the relative and absolute magnitude of 4 variance components (among-site, among-year, site × year interaction, and residual) in indicator values (observed/expected ratio [O/E] and regional multimetric indices [MMI]) among assemblages and between basin types (least-disturbed and developed). We used multiple-reach samples to evaluate discordance in site assessments of biological condition caused by sampling variability. Overall, patterns in variance partitioning were similar among assemblages and basin types with one exception. Among-site variance dominated the relative contribution to the total variance (64–80% of total variance), residual variance (sampling variance) accounted for more variability (8–26%) than interaction variance (5–12%), and among-year variance was always negligible (0–0.2%). The exception to this general pattern was for invertebrates at least-disturbed sites where variability in O/E indicators was partitioned between among-site and residual (sampling) variance (among-site = 36%, residual = 64%). This pattern was not observed for fish and diatom indicators (O/E and regional MMI). We suspect that unexplained sampling variability is what largely remained after the invertebrate indicators (O/E predictive models) had accounted for environmental differences among least-disturbed sites. The influence of sampling variability on discordance of within-site assessments was assemblage or basin-type specific. Discordance among assessments was nearly 2× greater in developed basins (29–31%) than in least-disturbed sites (15–16%) for invertebrates and diatoms, whereas discordance among assessments based on fish did not differ between basin types (least-disturbed = 16%, developed = 17%). Assessments made using invertebrate and diatom indicators from a single reach disagreed with other samples collected within the same stream segment nearly ⅓ of the time in developed basins, compared to ⅙ for all other cases.
ERIC Educational Resources Information Center
Barakat, Asia; Othman, Afaf
2015-01-01
The present study aims to identify the relationship between the five-factor model of personality and its relationship to cognitive style (rush and prudence) and academic achievement among a sample of students. The study is based on descriptive approach for studying the relationship between the variables of the study, results and analysis. The…
ERIC Educational Resources Information Center
Serry, Tanya Anne; Castles, Anne; Mensah, Fiona K.; Bavin, Edith L.; Eadie, Patricia; Pezic, Angela; Prior, Margot; Bretherton, Lesley; Reilly, Sheena
2015-01-01
The paper reports on a study designed to develop a risk model that can best predict single-word spelling in seven-year-old children when they were aged 4 and 5. Test measures, personal characteristics and environmental influences were all considered as variables from a community sample of 971 children. Strong concurrent correlations were found…
Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang
2014-10-01
In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.
NASA Astrophysics Data System (ADS)
Dietze, M. C.; Davidson, C. D.; Desai, A. R.; Feng, X.; Kelly, R.; Kooper, R.; LeBauer, D. S.; Mantooth, J.; McHenry, K.; Serbin, S. P.; Wang, D.
2012-12-01
Ecosystem models are designed to synthesize our current understanding of how ecosystems function and to predict responses to novel conditions, such as climate change. Reducing uncertainties in such models can thus improve both basic scientific understanding and our predictive capacity, but rarely have the models themselves been employed in the design of field campaigns. In the first part of this paper we provide a synthesis of uncertainty analyses conducted using the Predictive Ecosystem Analyzer (PEcAn) ecoinformatics workflow on the Ecosystem Demography model v2 (ED2). This work spans a number of projects synthesizing trait databases and using Bayesian data assimilation techniques to incorporate field data across temperate forests, grasslands, agriculture, short rotation forestry, boreal forests, and tundra. We report on a number of data needs that span a wide array diverse biomes, such as the need for better constraint on growth respiration. We also identify other data needs that are biome specific, such as reproductive allocation in tundra, leaf dark respiration in forestry and early-successional trees, and root allocation and turnover in mid- and late-successional trees. Future data collection needs to balance the unequal distribution of past measurements across biomes (temperate biased) and processes (aboveground biased) with the sensitivities of different processes. In the second part we present the development of a power analysis and sampling optimization module for the the PEcAn system. This module uses the results of variance decomposition analyses to estimate the further reduction in model predictive uncertainty for different sample sizes of different variables. By assigning a cost to each measurement type, we apply basic economic theory to optimize the reduction in model uncertainty for any total expenditure, or to determine the cost required to reduce uncertainty to a given threshold. Using this system we find that sampling switches among multiple measurement types but favors those with no prior measurements due to the need integrate over prior uncertainty in within and among site variability. When starting from scratch in a new system, the optimal design favors initial measurements of SLA due to high sensitivity and low cost. The value of many data types, such as photosynthetic response curves, depends strongly on whether one includes initial equipment costs or just per-sample costs. Similarly, sampling at previously measured locations is favored when infrastructure costs are high, otherwise across-site sampling is favored over intensive sampling except when within-site variability strongly dominates.
A Structural Equation Model of Expertise in College Physics
ERIC Educational Resources Information Center
Taasoobshirazi, Gita; Carr, Martha
2009-01-01
A model of expertise in physics was tested on a sample of 374 college students in 2 different level physics courses. Structural equation modeling was used to test hypothesized relationships among variables linked to expert performance in physics including strategy use, pictorial representation, categorization skills, and motivation, and these…
NASA Astrophysics Data System (ADS)
Herkül, Kristjan; Peterson, Anneliis; Paekivi, Sander
2017-06-01
Both basic science and marine spatial planning are in a need of high resolution spatially continuous data on seabed habitats and biota. As conventional point-wise sampling is unable to cover large spatial extents in high detail, it must be supplemented with remote sensing and modeling in order to fulfill the scientific and management needs. The combined use of in situ sampling, sonar scanning, and mathematical modeling is becoming the main method for mapping both abiotic and biotic seabed features. Further development and testing of the methods in varying locations and environmental settings is essential for moving towards unified and generally accepted methodology. To fill the relevant research gap in the Baltic Sea, we used multibeam sonar and mathematical modeling methods - generalized additive models (GAM) and random forest (RF) - together with underwater video to map seabed substrate and epibenthos of offshore shallows. In addition to testing the general applicability of the proposed complex of techniques, the predictive power of different sonar-based variables and modeling algorithms were tested. Mean depth, followed by mean backscatter, were the most influential variables in most of the models. Generally, mean values of sonar-based variables had higher predictive power than their standard deviations. The predictive accuracy of RF was higher than that of GAM. To conclude, we found the method to be feasible and with predictive accuracy similar to previous studies of sonar-based mapping.
Quantum Inference on Bayesian Networks
NASA Astrophysics Data System (ADS)
Yoder, Theodore; Low, Guang Hao; Chuang, Isaac
2014-03-01
Because quantum physics is naturally probabilistic, it seems reasonable to expect physical systems to describe probabilities and their evolution in a natural fashion. Here, we use quantum computation to speedup sampling from a graphical probability model, the Bayesian network. A specialization of this sampling problem is approximate Bayesian inference, where the distribution on query variables is sampled given the values e of evidence variables. Inference is a key part of modern machine learning and artificial intelligence tasks, but is known to be NP-hard. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time (nmP(e) - 1 / 2) , depending critically on P(e) , the probability the evidence might occur in the first place. However, by implementing a quantum version of rejection sampling, we obtain a square-root speedup, taking (n2m P(e) -1/2) time per sample. The speedup is the result of amplitude amplification, which is proving to be broadly applicable in sampling and machine learning tasks. In particular, we provide an explicit and efficient circuit construction that implements the algorithm without the need for oracle access.
The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.
Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S
2016-10-01
The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.
Approach for environmental baseline water sampling
Smith, K.S.
2011-01-01
Samples collected during the exploration phase of mining represent baseline conditions at the site. As such, they can be very important in forecasting potential environmental impacts should mining proceed, and can become measurements against which future changes are compared. Constituents in stream water draining mined and mineralized areas tend to be geochemically, spatially, and temporally variable, which presents challenges in collecting both exploration and baseline water-quality samples. Because short-term (daily) variations can complicate long-term trends, it is important to consider recent findings concerning geochemical variability of stream-water constituents at short-term timescales in designing sampling plans. Also, adequate water-quality information is key to forecasting potential ecological impacts from mining. Therefore, it is useful to collect baseline water samples adequate tor geochemical and toxicological modeling. This requires complete chemical analyses of dissolved constituents that include major and minor chemical elements as well as physicochemical properties (including pH, specific conductance, dissolved oxygen) and dissolved organic carbon. Applying chemical-equilibrium and appropriate toxicological models to water-quality information leads to an understanding of the speciation, transport, sequestration, bioavailability, and aquatic toxicity of potential contaminants. Insights gained from geochemical and toxicological modeling of water-quality data can be used to design appropriate mitigation and for economic planning for future mining activities.
Development of a wound healing index for patients with chronic wounds.
Horn, Susan D; Fife, Caroline E; Smout, Randall J; Barrett, Ryan S; Thomson, Brett
2013-01-01
Randomized controlled trials in wound care generalize poorly because they exclude patients with significant comorbid conditions. Research using real-world wound care patients is hindered by lack of validated methods to stratify patients according to severity of underlying illnesses. We developed a comprehensive stratification system for patients with wounds that predicts healing likelihood. Complete medical record data on 50,967 wounds from the United States Wound Registry were assigned a clear outcome (healed, amputated, etc.). Factors known to be associated with healing were evaluated using logistic regression models. Significant variables (p < 0.05) were determined and subsequently tested on a holdout sample of data. A different model predicted healing for each wound type. Some variables predicted significantly in nearly all models: wound size, wound age, number of wounds, evidence of bioburden, tissue type exposed (Wagner grade or stage), being nonambulatory, and requiring hospitalization during the course of care. Variables significant in some models included renal failure, renal transplant, malnutrition, autoimmune disease, and cardiovascular disease. All models validated well when applied to the holdout sample. The "Wound Healing Index" can validly predict likelihood of wound healing among real-world patients and can facilitate comparative effectiveness research to identify patients needing advanced therapeutics. © 2013 by the Wound Healing Society.
Puissant, Sylvia Pinna; Gauthier, Jean-Marie; Van Oirbeek, Robin
2011-11-01
This study explores the relative contribution of the overall quality of attachment to the mother, to the father and to peers (Inventory of Parent and Peer Attachment scales), the style of attachment towards peers (Attachment Questionnaire for Children scale), the social rank variables (submissive behavior and social comparison), and sex and age variables in predicting the depression score (Center of Epidemiological Studies Depression Scale) on a non-psychiatric sample of 13-18 year old adolescents (n = 225). Results of our integrated model (adjusted R-Square of .50) show that attachment variables (overall quality of attachment to the father and to the mother), social rank variables (social comparison and submissive behavior), age and sex are important in predicting depressive symptoms during adolescence. Moreover, the attachment to peers variables (quality of attachment to peers, secure and ambivalent style of attachment) and sex are mediated by the social rank variables (social comparison and submissive behavior).
Van Bogaert, Peter; Clarke, Sean; Willems, Riet; Mondelaers, Mieke
2013-07-01
To study the relationships between nurse practice environment, workload, burnout, job outcomes and nurse-reported quality of care in psychiatric hospital staff. Nurses' practice environments in general hospitals have been extensively investigated. Potential variations across practice settings, for instance in psychiatric hospitals, have been much less studied. A cross-sectional design with a survey. A structural equation model previously tested in acute hospitals was evaluated using survey data from a sample of 357 registered nurses, licensed practical nurses, and non-registered caregivers from two psychiatric hospitals in Belgium between December 2010-April 2011. The model included paths between practice environment dimensions and outcome variables, with burnout in a mediating position. A workload measure was also tested as a potential mediator between the practice environment and outcome variables. An improved model, slightly modified from the one validated earlier in samples of acute care nurses, was confirmed. This model explained 50% and 38% of the variance in job outcomes and nurse-reported quality of care respectively. In addition, workload was found to play a mediating role in accounting for job outcomes and significantly improved a model that ultimately explained 60% of the variance in these variables. In psychiatric hospitals as in general hospitals, nurse-physician relationship and other organizational dimensions such as nursing and hospital management were closely associated with perceptions of workload and with burnout and job satisfaction, turnover intentions, and nurse-reported quality of care. Mechanisms linking key variables and differences across settings in these relationships merit attention by managers and researchers. © 2012 Blackwell Publishing Ltd.
Incorporating imperfect detection into joint models of communites: A response to Warton et al.
Beissinger, Steven R.; Iknayan, Kelly J.; Guillera-Arroita, Gurutzeta; Zipkin, Elise; Dorazio, Robert; Royle, Andy; Kery, Marc
2016-01-01
Warton et al. [1] advance community ecology by describing a statistical framework that can jointly model abundances (or distributions) across many taxa to quantify how community properties respond to environmental variables. This framework specifies the effects of both measured and unmeasured (latent) variables on the abundance (or occurrence) of each species. Latent variables are random effects that capture the effects of both missing environmental predictors and correlations in parameter values among different species. As presented in Warton et al., however, the joint modeling framework fails to account for the common problem of detection or measurement errors that always accompany field sampling of abundance or occupancy, and are well known to obscure species- and community-level inferences.
Usami, Satoshi
2017-03-01
Behavioral and psychological researchers have shown strong interests in investigating contextual effects (i.e., the influences of combinations of individual- and group-level predictors on individual-level outcomes). The present research provides generalized formulas for determining the sample size needed in investigating contextual effects according to the desired level of statistical power as well as width of confidence interval. These formulas are derived within a three-level random intercept model that includes one predictor/contextual variable at each level to simultaneously cover various kinds of contextual effects that researchers can show interest. The relative influences of indices included in the formulas on the standard errors of contextual effects estimates are investigated with the aim of further simplifying sample size determination procedures. In addition, simulation studies are performed to investigate finite sample behavior of calculated statistical power, showing that estimated sample sizes based on derived formulas can be both positively and negatively biased due to complex effects of unreliability of contextual variables, multicollinearity, and violation of assumption regarding the known variances. Thus, it is advisable to compare estimated sample sizes under various specifications of indices and to evaluate its potential bias, as illustrated in the example.
Factors Determining Success in Youth Judokas
Krstulović, Saša; Caput, Petra Đapić
2017-01-01
Abstract The aim of this study was to compare two models of determining factors for success in judo. The first model (Model A) included testing motor abilities of high-level Croatian judokas in the cadet age category. The sample in Model A consisted of 71 male and female judokas aged 16 ± 0.6 years who were divided into four subsamples according to sex and weight category. The second model (Model B) consisted of interviewing 40 top-level judo experts on the importance of motor abilities for cadets’ success in judo. According to Model A, the greatest impact on the criterion variable of success in males and females of heavier weight categories were variables assessing maximum strength, coordination and jumping ability. In the lighter weight male categories, the highest correlation with the criterion variable of success was the variable assessing agility. However, in the lighter weight female categories, the greatest impact on success had the variable assessing muscular endurance. In Model B, specific endurance was crucial for success in judo, while flexibility was the least important, regardless of sex and weight category. Spearman’s rank correlation coefficients showed that there were no significant correlations in the results obtained in Models A and B for all observed subsamples. Although no significant correlations between the factors for success obtained through Models A and B were found, common determinants of success, regardless of the applied model, were identified. PMID:28469759
Simulation on Poisson and negative binomial models of count road accident modeling
NASA Astrophysics Data System (ADS)
Sapuan, M. S.; Razali, A. M.; Zamzuri, Z. H.; Ibrahim, K.
2016-11-01
Accident count data have often been shown to have overdispersion. On the other hand, the data might contain zero count (excess zeros). The simulation study was conducted to create a scenarios which an accident happen in T-junction with the assumption the dependent variables of generated data follows certain distribution namely Poisson and negative binomial distribution with different sample size of n=30 to n=500. The study objective was accomplished by fitting Poisson regression, negative binomial regression and Hurdle negative binomial model to the simulated data. The model validation was compared and the simulation result shows for each different sample size, not all model fit the data nicely even though the data generated from its own distribution especially when the sample size is larger. Furthermore, the larger sample size indicates that more zeros accident count in the dataset.
Bispo, Polyanna da Conceição; dos Santos, João Roberto; Valeriano, Márcio de Morisson; Graça, Paulo Maurício Lima de Alencastro; Balzter, Heiko; França, Helena; Bispo, Pitágoras da Conceição
2016-01-01
Surveying primary tropical forest over large regions is challenging. Indirect methods of relating terrain information or other external spatial datasets to forest biophysical parameters can provide forest structural maps at large scales but the inherent uncertainties need to be evaluated fully. The goal of the present study was to evaluate relief characteristics, measured through geomorphometric variables, as predictors of forest structural characteristics such as average tree basal area (BA) and height (H) and average percentage canopy openness (CO). Our hypothesis is that geomorphometric variables are good predictors of the structure of primary tropical forest, even in areas, with low altitude variation. The study was performed at the Tapajós National Forest, located in the Western State of Pará, Brazil. Forty-three plots were sampled. Predictive models for BA, H and CO were parameterized based on geomorphometric variables using multiple linear regression. Validation of the models with nine independent sample plots revealed a Root Mean Square Error (RMSE) of 3.73 m2/ha (20%) for BA, 1.70 m (12%) for H, and 1.78% (21%) for CO. The coefficient of determination between observed and predicted values were r2 = 0.32 for CO, r2 = 0.26 for H and r2 = 0.52 for BA. The models obtained were able to adequately estimate BA and CO. In summary, it can be concluded that relief variables are good predictors of vegetation structure and enable the creation of forest structure maps in primary tropical rainforest with an acceptable uncertainty. PMID:27089013
Bispo, Polyanna da Conceição; Dos Santos, João Roberto; Valeriano, Márcio de Morisson; Graça, Paulo Maurício Lima de Alencastro; Balzter, Heiko; França, Helena; Bispo, Pitágoras da Conceição
2016-01-01
Surveying primary tropical forest over large regions is challenging. Indirect methods of relating terrain information or other external spatial datasets to forest biophysical parameters can provide forest structural maps at large scales but the inherent uncertainties need to be evaluated fully. The goal of the present study was to evaluate relief characteristics, measured through geomorphometric variables, as predictors of forest structural characteristics such as average tree basal area (BA) and height (H) and average percentage canopy openness (CO). Our hypothesis is that geomorphometric variables are good predictors of the structure of primary tropical forest, even in areas, with low altitude variation. The study was performed at the Tapajós National Forest, located in the Western State of Pará, Brazil. Forty-three plots were sampled. Predictive models for BA, H and CO were parameterized based on geomorphometric variables using multiple linear regression. Validation of the models with nine independent sample plots revealed a Root Mean Square Error (RMSE) of 3.73 m2/ha (20%) for BA, 1.70 m (12%) for H, and 1.78% (21%) for CO. The coefficient of determination between observed and predicted values were r2 = 0.32 for CO, r2 = 0.26 for H and r2 = 0.52 for BA. The models obtained were able to adequately estimate BA and CO. In summary, it can be concluded that relief variables are good predictors of vegetation structure and enable the creation of forest structure maps in primary tropical rainforest with an acceptable uncertainty.
Fiske, Ian J.; Royle, J. Andrew; Gross, Kevin
2014-01-01
Ecologists and wildlife biologists increasingly use latent variable models to study patterns of species occurrence when detection is imperfect. These models have recently been generalized to accommodate both a more expansive description of state than simple presence or absence, and Markovian dynamics in the latent state over successive sampling seasons. In this paper, we write these multi-season, multi-state models as hidden Markov models to find both maximum likelihood estimates of model parameters and finite-sample estimators of the trajectory of the latent state over time. These estimators are especially useful for characterizing population trends in species of conservation concern. We also develop parametric bootstrap procedures that allow formal inference about latent trend. We examine model behavior through simulation, and we apply the model to data from the North American Amphibian Monitoring Program.
Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition
Fraley, Chris; Percival, Daniel
2014-01-01
Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001
NASA Astrophysics Data System (ADS)
Setyaningsih, S.
2017-01-01
The main element to build a leading university requires lecturer commitment in a professional manner. Commitment is measured through willpower, loyalty, pride, loyalty, and integrity as a professional lecturer. A total of 135 from 337 university lecturers were sampled to collect data. Data were analyzed using validity and reliability test and multiple linear regression. Many studies have found a link on the commitment of lecturers, but the basic cause of the causal relationship is generally neglected. These results indicate that the professional commitment of lecturers affected by variables empowerment, academic culture, and trust. The relationship model between variables is composed of three substructures. The first substructure consists of endogenous variables professional commitment and exogenous three variables, namely the academic culture, empowerment and trust, as well as residue variable ɛ y . The second substructure consists of one endogenous variable that is trust and two exogenous variables, namely empowerment and academic culture and the residue variable ɛ 3. The third substructure consists of one endogenous variable, namely the academic culture and exogenous variables, namely empowerment as well as residue variable ɛ 2. Multiple linear regression was used in the path model for each substructure. The results showed that the hypothesis has been proved and these findings provide empirical evidence that increasing the variables will have an impact on increasing the professional commitment of the lecturers.
A novel model incorporating two variability sources for describing motor evoked potentials
Goetz, Stefan M.; Luber, Bruce; Lisanby, Sarah H.; Peterchev, Angel V.
2014-01-01
Objective Motor evoked potentials (MEPs) play a pivotal role in transcranial magnetic stimulation (TMS), e.g., for determining the motor threshold and probing cortical excitability. Sampled across the range of stimulation strengths, MEPs outline an input–output (IO) curve, which is often used to characterize the corticospinal tract. More detailed understanding of the signal generation and variability of MEPs would provide insight into the underlying physiology and aid correct statistical treatment of MEP data. Methods A novel regression model is tested using measured IO data of twelve subjects. The model splits MEP variability into two independent contributions, acting on both sides of a strong sigmoidal nonlinearity that represents neural recruitment. Traditional sigmoidal regression with a single variability source after the nonlinearity is used for comparison. Results The distribution of MEP amplitudes varied across different stimulation strengths, violating statistical assumptions in traditional regression models. In contrast to the conventional regression model, the dual variability source model better described the IO characteristics including phenomena such as changing distribution spread and skewness along the IO curve. Conclusions MEP variability is best described by two sources that most likely separate variability in the initial excitation process from effects occurring later on. The new model enables more accurate and sensitive estimation of the IO curve characteristics, enhancing its power as a detection tool, and may apply to other brain stimulation modalities. Furthermore, it extracts new information from the IO data concerning the neural variability—information that has previously been treated as noise. PMID:24794287
Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.
Yaghouby, Farid; Sunderam, Sridhar
2015-04-01
The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Gavilan, C.; Grunwald, S.; Quiroz, R.; Zhu, L.
2015-12-01
The Andes represent the largest and highest mountain range in the tropics. Geological and climatic differentiation favored landscape and soil diversity, resulting in ecosystems adapted to very different climatic patterns. Although several studies support the fact that the Andes are a vast sink of soil organic carbon (SOC) only few have quantified this variable in situ. Estimating the spatial distribution of SOC stocks in data-poor and/or poorly accessible areas, like the Andean region, is challenging due to the lack of recent soil data at high spatial resolution and the wide range of coexistent ecosystems. Thus, the sampling strategy is vital in order to ensure the whole range of environmental covariates (EC) controlling SOC dynamics is represented. This approach allows grasping the variability of the area, which leads to more efficient statistical estimates and improves the modeling process. The objectives of this study were to i) characterize and model the spatial distribution of SOC stocks in the Central Andean region using soil-landscape modeling techniques, and to ii) validate and evaluate the model for predicting SOC content in the area. For that purpose, three representative study areas were identified and a suite of variables including elevation, mean annual temperature, annual precipitation and Normalized Difference Vegetation Index (NDVI), among others, was selected as EC. A stratified random sampling (namely conditioned Latin Hypercube) was implemented and a total of 400 sampling locations were identified. At all sites, four composite topsoil samples (0-30 cm) were collected within a 2 m radius. SOC content was measured using dry combustion and SOC stocks were estimated using bulk density measurements. Regression Kriging was used to map the spatial variation of SOC stocks. The accuracy, fit and bias of SOC models was assessed using a rigorous validation assessment. This study produced the first comprehensive, geospatial SOC stock assessment in this undersampled region that serves as a baseline reference to assess potential impacts of climate and land use change.
NASA Astrophysics Data System (ADS)
Häme, Tuomas; Mutanen, Teemu; Rauste, Yrjö; Antropov, Oleg; Molinier, Matthieu; Quegan, Shaun; Kantzas, Euripides; Mäkelä, Annikki; Minunno, Francesco; Atli Benediktsson, Jon; Falco, Nicola; Arnason, Kolbeinn; Storvold, Rune; Haarpaintner, Jörg; Elsakov, Vladimir; Rasinmäki, Jussi
2015-04-01
The objective of project North State, funded by Framework Program 7 of the European Union, is to develop innovative data fusion methods that exploit the new generation of multi-source data from Sentinels and other satellites in an intelligent, self-learning framework. The remote sensing outputs are interfaced with state-of-the-art carbon and water flux models for monitoring the fluxes over boreal Europe to reduce current large uncertainties. This will provide a paradigm for the development of products for future Copernicus services. The models to be interfaced are a dynamic vegetation model and a light use efficiency model. We have identified four groups of variables that will be estimated with remote sensed data: land cover variables, forest characteristics, vegetation activity, and hydrological variables. The estimates will be used as model inputs and to validate the model outputs. The earth observation variables are computed as automatically as possible, with an objective to completely automatic estimation. North State has two sites for intensive studies in southern and northern Finland, respectively, one in Iceland and one in state Komi of Russia. Additionally, the model input variables will be estimated and models applied over European boreal and sub-arctic region from Ural Mountains to Iceland. The accuracy assessment of the earth observation variables will follow statistical sampling design. Model output predictions are compared to earth observation variables. Also flux tower measurements are applied in the model assessment. In the paper, results of hyperspectral, Sentinel-1, and Landsat data and their use in the models is presented. Also an example of a completely automatic land cover class prediction is reported.
Unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance
Fiske, I.J.; Chandler, R.B.
2011-01-01
Ecological research uses data collection techniques that are prone to substantial and unique types of measurement error to address scientic questions about species abundance and distribution. These data collection schemes include a number of survey methods in which unmarked individuals are counted, or determined to be present, at spatially- referenced sites. Examples include site occupancy sampling, repeated counts, distance sampling, removal sampling, and double observer sampling. To appropriately analyze these data, hierarchical models have been developed to separately model explanatory variables of both a latent abundance or occurrence process and a conditional detection process. Because these models have a straightforward interpretation paralleling mecha- nisms under which the data arose, they have recently gained immense popularity. The common hierarchical structure of these models is well-suited for a unied modeling in- terface. The R package unmarked provides such a unied modeling framework, including tools for data exploration, model tting, model criticism, post-hoc analysis, and model comparison.
The choice of product indicators in latent variable interaction models: post hoc analyses.
Foldnes, Njål; Hagtvet, Knut Arne
2014-09-01
The unconstrained product indicator (PI) approach is a simple and popular approach for modeling nonlinear effects among latent variables. This approach leaves the practitioner to choose the PIs to be included in the model, introducing arbitrariness into the modeling. In contrast to previous Monte Carlo studies, we evaluated the PI approach by 3 post hoc analyses applied to a real-world case adopted from a research effort in social psychology. The measurement design applied 3 and 4 indicators for the 2 latent 1st-order variables, leaving the researcher with a choice among more than 4,000 possible PI configurations. Sixty so-called matched-pair configurations that have been recommended in previous literature are of special interest. In the 1st post hoc analysis we estimated the interaction effect for all PI configurations, keeping the real-world sample fixed. The estimated interaction effect was substantially affected by the choice of PIs, also across matched-pair configurations. Subsequently, a post hoc Monte Carlo study was conducted, with varying sample sizes and data distributions. Convergence, bias, Type I error and power of the interaction test were investigated for each matched-pair configuration and the all-pairs configuration. Variation in estimates across matched-pair configurations for a typical sample was substantial. The choice of specific configuration significantly affected convergence and the interaction test's outcome. The all-pairs configuration performed overall better than the matched-pair configurations. A further advantage of the all-pairs over the matched-pairs approach is its unambiguity. The final study evaluates the all-pairs configuration for small sample sizes and compares it to the non-PI approach of latent moderated structural equations. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Thomas, Joanna; Shiels, Chris; Gabbay, Mark B
2014-01-01
To date, most condom research has focused on young or high-risk groups, with little evidence about influences on condom use amongst lower-risk community samples. These groups are not risk free and may still wish to negotiate safer sex; yet the considerations involved could be different from those in higher-risk groups. Our research addresses this gap: We report a cross-sectional questionnaire study enquiring about recent condom use and future use intentions in community settings. Our sample (n = 311) purposively included couples in established relationships, known to be condom users. Items included demographics, sexual history and social-cognitive variables taken from the theory of planned behaviour. The strongest association with condom use/use intentions amongst our respondents was sexual partner's perceived willingness to use them. This applied across both univariate and multivariate analyses. Whilst most social-cognitive variables (attitudes; self-efficacy and peer social norms) were significant in univariate analyses, this was not supported in multivariate regression. Of the social-cognitive variables, only "condom-related attitudes" were retained in the model explaining recent condom use, whilst none of them entered the model explaining future use intentions. Further analysis showed that attitudes concerning pleasure, identity stigma and condom effectiveness were most salient for this cohort. Our results suggest that, in community samples, the decision to use a condom involves different considerations from those highlighted in previous research. Explanatory models for established couples should embrace interpersonal perspectives, emphasising couple-factors rather than individual beliefs. Messages to this cohort could usefully focus on negotiation skills, condom advantages (other than disease prevention) and reducing the stigma associated with use.
Effect of land use on the spatial variability of organic matter and nutrient status in an Oxisol
NASA Astrophysics Data System (ADS)
Paz-Ferreiro, Jorge; Alves, Marlene Cristina; Vidal Vázquez, Eva
2013-04-01
Heterogeneity is now considered as an inherent soil property. Spatial variability of soil attributes in natural landscapes results mainly from soil formation factors. In cultivated soils much heterogeneity can additionally occur as a result of land use, agricultural systems and management practices. Organic matter content (OMC) and nutrients associated to soil exchange complex are key attribute in the maintenance of a high quality soil. Neglecting spatial heterogeneity in soil OMC and nutrient status at the field scale might result in reduced yield and in environmental damage. We analyzed the impact of land use on the pattern of spatial variability of OMC and soil macronutrients at the stand scale. The study was conducted in São Paulo state, Brazil. Land uses were pasture, mango orchard and corn field. Soil samples were taken at 0-10 cm and 10-20 cm depth in 84 points, within 100 m x 100 m plots. Texture, pH, OMC, cation exchange capacity (CEC), exchangeable cations (Ca, Mg, K, H, Al) and resin extractable phosphorus were analyzed.. Statistical variability was found to be higher in parameters defining the soil nutrient status (resin extractable P, K, Ca and Mg) than in general soil properties (OMC, CEC, base saturation and pH). Geostatistical analysis showed contrasting patterns of spatial dependence for the different soil uses, sampling depths and studied properties. Most of the studied data sets collected at two different depths exhibited spatial dependence at the sampled scale and their semivariograms were modeled by a nugget effect plus a structure. The pattern of soil spatial variability was found to be different between the three study soil uses and at the two sampling depths, as far as model type, nugget effect or ranges of spatial dependence were concerned. Both statistical and geostatistical results pointed out the importance of OMC as a driver responsible for the spatial variability of soil nutrient status.
Simulation Study Using a New Type of Sample Variance
NASA Technical Reports Server (NTRS)
Howe, D. A.; Lainson, K. J.
1996-01-01
We evaluate with simulated data a new type of sample variance for the characterization of frequency stability. The new statistic (referred to as TOTALVAR and its square root TOTALDEV) is a better predictor of long-term frequency variations than the present sample Allan deviation. The statistical model uses the assumption that a time series of phase or frequency differences is wrapped (periodic) with overall frequency difference removed. We find that the variability at long averaging times is reduced considerably for the five models of power-law noise commonly encountered with frequency standards and oscillators.
Examining a Model of Life Satisfaction among Unemployed Adults
ERIC Educational Resources Information Center
Duffy, Ryan D.; Bott, Elizabeth M.; Allan, Blake A.; Torrey, Carrie L.
2013-01-01
The present study examined a model of life satisfaction among a diverse sample of 184 adults who had been unemployed for an average of 10.60 months. Using the Lent (2004) model of life satisfaction as a framework, a model was tested with 5 hypothesized predictor variables: optimism, job search self-efficacy, job search support, job search…
NASA Astrophysics Data System (ADS)
Gaughan, D. J.; Pearce, A. F.; Lewis, P. D.
2009-08-01
A transect that extended 40 km offshore across the continental shelf off Perth, Western Australia, was sampled monthly during 1997 and 1998. Zooplankton was sampled at 5 km intervals with a 300 micron-mesh bongo net deployed vertically to within 3 m of the bottom, or to a maximum depth of 70 m. Numbers of species of chaetognaths and siphonores were quantified, as were abundances of the common species from these groups and of the hydromedusae Auglaura hemistoma. The potential influences of four environmental variables (sea-level, sea surface temperature, salinity and chlorophyll concentration) on variability in diversity and abundance were assessed using generalized additive modeling. A combination of factors were found to influence the seasonal and spatial biological variability and, of these factors, non-linear relationships always contributed to the best fitting models. In all but one case, each of the environmental variables was included in the final model. The seasonally variable Leeuwin Current, whose strength is measured as variations in local sea-level, is the dominant mesoscale oceanographic feature in the study region but was not found to have an overriding influence on the shelf zooplankton. This contrasts a previous hypothesis that subjectively attributed seasonal variability of the same taxa examined in this study to seasonal variations in the Leeuwin Current. There remains a poor understanding of shelf zooplankton off Western Australia and, in particular, of the processes that influence seasonal and spatial variability. A more complete understanding of potential causative influences of the Leeuwin Current on the shelf plankton community of south-western Australia must be cognizant of a range of biophysical factors operating at both the broader mesoscale and at smaller scales within the shelf pelagic ecosystem.
Statistical modelling of formaldehyde occupational exposure levels in French industries, 1986-2003.
Lavoué, Jérôme; Vincent, Raymond; Gérin, Michel
2006-04-01
Occupational exposure databanks (OEDBs) have been cited as sources of exposure data for exposure surveillance and exposure assessment in epidemiology. In 2003, an extract was made from COLCHIC, the French national OEDB, of all concentrations of formaldehyde. The data were analysed with extended linear mixed-effects models in order to identify influent variables and elaborate a multi-sector picture of formaldehyde exposures. Respectively, 1401 and 1448 personal and area concentrations were available for the analysis. The fixed effects of the personal and area models explained, respectively, 57 and 53% of the total variance. Personal concentrations were related to the sampling duration (short-term higher than TWA levels), decreased with the year of sampling (-9% per year) and were higher when local exhaust ventilation was present. Personal levels taken during planned visits and for occupational illness notification purpose were consistently lower than those taken during ventilation modification programmes or because the hygienist suspected the presence of significant risk or exposure. Area concentrations were related to the sampling duration (short-term higher than TWA levels), and decreased with the year of sampling (-7% per year) and when the measurement sampling flow increased. Significant within-facility (correlation coefficient 0.4-0.5) and within-sampling campaign correlation (correlation coefficient 0.8) was found for both area and personal data. The industry/task classification appeared to have the greatest influence on exposure variability while the sample duration and the sampling flow were significant in some cases. Estimates made from the models for year 2002 showed elevated formaldehyde exposure in the fields of anatomopathological and biological analyses, operation of gluing machinery in the wood industry, operation and monitoring of mixers in the pharmaceutical industry, and garages and warehouses in urban transit authorities.
Monitoring benthic aIgal communides: A comparison of targeted and coefficient sampling methods
Edwards, Matthew S.; Tinker, M. Tim
2009-01-01
Choosing an appropriate sample unit is a fundamental decision in the design of ecological studies. While numerous methods have been developed to estimate organism abundance, they differ in cost, accuracy and precision.Using both field data and computer simulation modeling, we evaluated the costs and benefits associated with two methods commonly used to sample benthic organisms in temperate kelp forests. One of these methods, the Targeted Sampling method, relies on different sample units, each "targeted" for a specific species or group of species while the other method relies on coefficients that represent ranges of bottom cover obtained from visual esti-mates within standardized sample units. Both the field data and the computer simulations suggest that both methods yield remarkably similar estimates of organism abundance and among-site variability, although the Coefficient method slightly underestimates variability among sample units when abundances are low. In contrast, the two methods differ considerably in the effort needed to sample these communities; the Targeted Sampling requires more time and twice the personnel to complete. We conclude that the Coefficent Sampling method may be better for environmental monitoring programs where changes in mean abundance are of central concern and resources are limiting, but that the Targeted sampling methods may be better for ecological studies where quantitative relationships among species and small-scale variability in abundance are of central concern.
On representation of temporal variability in electricity capacity planning models
Merrick, James H.
2016-08-23
This study systematically investigates how to represent intra-annual temporal variability in models of optimum electricity capacity investment. Inappropriate aggregation of temporal resolution can introduce substantial error into model outputs and associated economic insight. The mechanisms underlying the introduction of this error are shown. How many representative periods are needed to fully capture the variability is then investigated. For a sample dataset, a scenario-robust aggregation of hourly (8760) resolution is possible in the order of 10 representative hours when electricity demand is the only source of variability. The inclusion of wind and solar supply variability increases the resolution of the robustmore » aggregation to the order of 1000. A similar scale of expansion is shown for representative days and weeks. These concepts can be applied to any such temporal dataset, providing, at the least, a benchmark that any other aggregation method can aim to emulate. Finally, how prior information about peak pricing hours can potentially reduce resolution further is also discussed.« less
On representation of temporal variability in electricity capacity planning models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merrick, James H.
This study systematically investigates how to represent intra-annual temporal variability in models of optimum electricity capacity investment. Inappropriate aggregation of temporal resolution can introduce substantial error into model outputs and associated economic insight. The mechanisms underlying the introduction of this error are shown. How many representative periods are needed to fully capture the variability is then investigated. For a sample dataset, a scenario-robust aggregation of hourly (8760) resolution is possible in the order of 10 representative hours when electricity demand is the only source of variability. The inclusion of wind and solar supply variability increases the resolution of the robustmore » aggregation to the order of 1000. A similar scale of expansion is shown for representative days and weeks. These concepts can be applied to any such temporal dataset, providing, at the least, a benchmark that any other aggregation method can aim to emulate. Finally, how prior information about peak pricing hours can potentially reduce resolution further is also discussed.« less
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...
2016-12-01
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
Lesmerises, Rémi; St-Laurent, Martin-Hugues
2017-11-01
Habitat selection studies conducted at the population scale commonly aim to describe general patterns that could improve our understanding of the limiting factors in species-habitat relationships. Researchers often consider interindividual variation in selection patterns to control for its effects and avoid pseudoreplication by using mixed-effect models that include individuals as random factors. Here, we highlight common pitfalls and possible misinterpretations of this strategy by describing habitat selection of 21 black bears Ursus americanus. We used Bayesian mixed-effect models and compared results obtained when using random intercept (i.e., population level) versus calculating individual coefficients for each independent variable (i.e., individual level). We then related interindividual variability to individual characteristics (i.e., age, sex, reproductive status, body condition) in a multivariate analysis. The assumption of comparable behavior among individuals was verified only in 40% of the cases in our seasonal best models. Indeed, we found strong and opposite responses among sampled bears and individual coefficients were linked to individual characteristics. For some covariates, contrasted responses canceled each other out at the population level. In other cases, interindividual variability was concealed by the composition of our sample, with the majority of the bears (e.g., old individuals and bears in good physical condition) driving the population response (e.g., selection of young forest cuts). Our results stress the need to consider interindividual variability to avoid misinterpretation and uninformative results, especially for a flexible and opportunistic species. This study helps to identify some ecological drivers of interindividual variability in bear habitat selection patterns.
NASA Astrophysics Data System (ADS)
Gao, Yuan; Ma, Jiayi; Yuille, Alan L.
2017-05-01
This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive nuisance variables such as bad lighting, wearing of glasses) and non-linear (i.e., non-additive pixel-wise nuisance variables such as expression changes). The small number of labeled examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain good recognition performance. To address the problem we propose a method called Semi-Supervised Sparse Representation based Classification (S$^3$RC). This is based on recent work on sparsity where faces are represented in terms of two dictionaries: a gallery dictionary consisting of one or more examples of each person, and a variation dictionary representing linear nuisance variables (e.g., different lighting conditions, different glasses). The main idea is that (i) we use the variation dictionary to characterize the linear nuisance variables via the sparsity framework, then (ii) prototype face images are estimated as a gallery dictionary via a Gaussian Mixture Model (GMM), with mixed labeled and unlabeled samples in a semi-supervised manner, to deal with the non-linear nuisance variations between labeled and unlabeled samples. We have done experiments with insufficient labeled samples, even when there is only a single labeled sample per person. Our results on the AR, Multi-PIE, CAS-PEAL, and LFW databases demonstrate that the proposed method is able to deliver significantly improved performance over existing methods.
Investigating Organizational Alienation Behavior in Terms of Some Variables
ERIC Educational Resources Information Center
Dagli, Abidin; Averbek, Emel
2017-01-01
The aim of this study is to detect the perceptions of public primary school teachers regarding organizational alienation behaviors in terms of some variables (gender, marital status and seniority). Survey model was used in this study. The research sample consists of randomly selected 346 teachers from 40 schools in the central district of Mardin,…
ERIC Educational Resources Information Center
Ercan, Hülya
2017-01-01
The aim of this study is to investigate the blind and constructive patriotism tendencies of university students in light of the demographic structure and variables. The investigation is performed by using the correlational descriptive model. The purposeful sampling technique has been used and data was collected from a total of 390 university…
ERIC Educational Resources Information Center
Blau, Gary
2007-01-01
This study proposed and tested corresponding sets of variables for explaining voluntary organizational versus occupational turnover for a sample of medical technologists. This study is believed to be the first test of the Rhodes and Doering (1983) occupational change model using occupational turnover data. Results showed that corresponding job…
The Variables Affecting the Success of Students
ERIC Educational Resources Information Center
Savas, Behsat; Gurel, Ramazan
2014-01-01
The aim of this study is to determine the variables affecting the success of students. This research, which was conducted through the relational screening model, has a sampling of students who were selected from a middle city in Turkey. The schools are classified into three as low, medium and high. A total of 3491 students are selected by using…
ERIC Educational Resources Information Center
Sanderson, Bettie; Kurdek, Lawrence A.
1993-01-01
Examined relationship satisfaction and commitment among African-American (n=34 couples) and white (n=61 couples) dating heterosexual couples. Found that extent to which variables from interdependence, individual differences, and problem-solving models were linked to both relationship satisfaction and relationship commitment did not differ for…
ERIC Educational Resources Information Center
Kadi, Sinem; Eldeniz Cetin, Muzeyyen
2018-01-01
This study investigated the resilience levels of parents with children with multiple disabilities by utilizing different variables. The study, conducted with survey model--a qualitative method--included a sample composed of a total of 222 voluntary parents (183 females, 39 males) residing in Bolu, Duzce and Zonguldak in Turkey. Parental…
Modeling the development of written language
Puranik, Cynthia S.; Foorman, Barbara; Foster, Elizabeth; Wilson, Laura Gehron; Tschinkel, Erika; Kantor, Patricia Thatcher
2011-01-01
Alternative models of the structure of individual and developmental differences of written composition and handwriting fluency were tested using confirmatory factor analysis of writing samples provided by first- and fourth-grade students. For both groups, a five-factor model provided the best fit to the data. Four of the factors represented aspects of written composition: macro-organization (use of top sentence and number and ordering of ideas), productivity (number and diversity of words used), complexity (mean length of T-unit and syntactic density), and spelling and punctuation. The fifth factor represented handwriting fluency. Handwriting fluency was correlated with written composition factors at both grades. The magnitude of developmental differences between first grade and fourth grade expressed as effect sizes varied for variables representing the five constructs: large effect sizes were found for productivity and handwriting fluency variables; moderate effect sizes were found for complexity and macro-organization variables; and minimal effect sizes were found for spelling and punctuation variables. PMID:22228924
NASA Astrophysics Data System (ADS)
Yasa, I. B. A.; Parnata, I. K.; Susilawati, N. L. N. A. S.
2018-01-01
This study aims to apply analytical review model to analyze the influence of GCG, accounting conservatism, financial distress models and company size on good and poor financial performance of LPD in Bangli Regency. Ordinal regression analysis is used to perform analytical review, so that obtained the influence and relationship between variables to be considered further audit. Respondents in this study were LPDs in Bangli Regency, which amounted to 159 LPDs of that number 100 LPDs were determined as randomly selected samples. The test results found GCG and company size have a significant effect on both the good and poor financial performance, while the conservatism and financial distress model has no significant effect. The influence of the four variables on the overall financial performance of 58.8%, while the remaining 41.2% influenced by other variables. Size, FDM and accounting conservatism are variables, which are further recommended to be audited.
Random vectors and spatial analysis by geostatistics for geotechnical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, D.S.
1987-08-01
Geostatistics is extended to the spatial analysis of vector variables by defining the estimation variance and vector variogram in terms of the magnitude of difference vectors. Many random variables in geotechnology are in vectorial terms rather than scalars, and its structural analysis requires those sample variable interpolations to construct and characterize structural models. A better local estimator will result in greater quality of input models; geostatistics can provide such estimators; kriging estimators. The efficiency of geostatistics for vector variables is demonstrated in a case study of rock joint orientations in geological formations. The positive cross-validation encourages application of geostatistics tomore » spatial analysis of random vectors in geoscience as well as various geotechnical fields including optimum site characterization, rock mechanics for mining and civil structures, cavability analysis of block cavings, petroleum engineering, and hydrologic and hydraulic modelings.« less
Higher-Order Factors of Personality: Do They Exist?
Ashton, Michael C.; Lee, Kibeom; Goldberg, Lewis R.; de Vries, Reinout E.
2010-01-01
Scales that measure the Big Five personality factors are often substantially intercorrelated. These correlations are sometimes interpreted as implying the existence of two higher-order factors of personality. We show that correlations between measures of broad personality factors do not necessarily imply the existence of higher-order factors, and might instead be due to variables that represent same-signed blends of orthogonal factors. Therefore, the hypotheses of higher-order factors and blended variables can only be tested with data on lower-level personality variables that define the personality factors. We compared the higher-order factor model and the blended variable model in three participant samples using the Big Five Aspect Scales, and found better fit for the latter model. In other analyses using the HEXACO Personality Inventory, we identified mutually uncorrelated markers of six personality factors. We conclude that correlations between personality factor scales can be explained without postulating any higher-order dimensions of personality. PMID:19458345
An uncertainty analysis of the flood-stage upstream from a bridge.
Sowiński, M
2006-01-01
The paper begins with the formulation of the problem in the form of a general performance function. Next the Latin hypercube sampling (LHS) technique--a modified version of the Monte Carlo method is briefly described. The essential uncertainty analysis of the flood-stage upstream from a bridge starts with a description of the hydraulic model. This model concept is based on the HEC-RAS model developed for subcritical flow under a bridge without piers in which the energy equation is applied. The next section contains the characteristic of the basic variables including a specification of their statistics (means and variances). Next the problem of correlated variables is discussed and assumptions concerning correlation among basic variables are formulated. The analysis of results is based on LHS ranking lists obtained from the computer package UNCSAM. Results fot two examples are given: one for independent and the other for correlated variables.
Sauvé, Jean-François; Beaudry, Charles; Bégin, Denis; Dion, Chantal; Gérin, Michel; Lavoué, Jérôme
2012-09-01
A quantitative determinants-of-exposure analysis of respirable crystalline silica (RCS) levels in the construction industry was performed using a database compiled from an extensive literature review. Statistical models were developed to predict work-shift exposure levels by trade. Monte Carlo simulation was used to recreate exposures derived from summarized measurements which were combined with single measurements for analysis. Modeling was performed using Tobit models within a multimodel inference framework, with year, sampling duration, type of environment, project purpose, project type, sampling strategy and use of exposure controls as potential predictors. 1346 RCS measurements were included in the analysis, of which 318 were non-detects and 228 were simulated from summary statistics. The model containing all the variables explained 22% of total variability. Apart from trade, sampling duration, year and strategy were the most influential predictors of RCS levels. The use of exposure controls was associated with an average decrease of 19% in exposure levels compared to none, and increased concentrations were found for industrial, demolition and renovation projects. Predicted geometric means for year 1999 were the highest for drilling rig operators (0.238 mg m(-3)) and tunnel construction workers (0.224 mg m(-3)), while the estimated exceedance fraction of the ACGIH TLV by trade ranged from 47% to 91%. The predicted geometric means in this study indicated important overexposure compared to the TLV. However, the low proportion of variability explained by the models suggests that the construction trade is only a moderate predictor of work-shift exposure levels. The impact of the different tasks performed during a work shift should also be assessed to provide better management and control of RCS exposure levels on construction sites.
Wang, Xiao; Gu, Jinghua; Hilakivi-Clarke, Leena; Clarke, Robert; Xuan, Jianhua
2017-01-15
The advent of high-throughput DNA methylation profiling techniques has enabled the possibility of accurate identification of differentially methylated genes for cancer research. The large number of measured loci facilitates whole genome methylation study, yet posing great challenges for differential methylation detection due to the high variability in tumor samples. We have developed a novel probabilistic approach, D: ifferential M: ethylation detection using a hierarchical B: ayesian model exploiting L: ocal D: ependency (DM-BLD), to detect differentially methylated genes based on a Bayesian framework. The DM-BLD approach features a joint model to capture both the local dependency of measured loci and the dependency of methylation change in samples. Specifically, the local dependency is modeled by Leroux conditional autoregressive structure; the dependency of methylation changes is modeled by a discrete Markov random field. A hierarchical Bayesian model is developed to fully take into account the local dependency for differential analysis, in which differential states are embedded as hidden variables. Simulation studies demonstrate that DM-BLD outperforms existing methods for differential methylation detection, particularly when the methylation change is moderate and the variability of methylation in samples is high. DM-BLD has been applied to breast cancer data to identify important methylated genes (such as polycomb target genes and genes involved in transcription factor activity) associated with breast cancer recurrence. A Matlab package of DM-BLD is available at http://www.cbil.ece.vt.edu/software.htm CONTACT: Xuan@vt.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Lu, Hang
2015-01-01
This study attempted to examine what factors might motivate Chinese international students, the fastest growing ethnic student group in the United States, to seek and process information about potential health risks from eating American-style food. This goal was accomplished by applying the Risk Information Seeking and Processing (RISP) model to this study. An online 2 (severity: high vs. low) × 2 (coping strategies: present vs. absent) between-subjects experiment was conducted via Qualtrics to evaluate the effects of the manipulated variables on the dependent variables of interest as well as various relationships proposed in the RISP model. A convenience sample of 635 participants was recruited online. Data were analyzed primarily using structural equation modeling (SEM) in AMOS 21.0 with maximum likelihood estimation. The final conceptual model has a good model fit to the data given the sample size. The results showed that although the experimentally manipulated variables failed to cause any significant differences in individuals' perceived severity and self-efficacy, this study largely supported the RISP model's propositions about the sociopsychological factors that explain individual variations in information seeking and processing. More specifically, the findings indicated a prominent role of informational subjective norms and affective responses (both negative and positive emotions) in predicting individuals' information seeking and processing. Future implications and limitations are also discussed.
González, Mari Feli; Facal, David; Juncos-Rabadán, Onésimo; Yanguas, Javier
2017-10-01
Cognitive performance is not easily predicted, since different variables play an important role in the manifestation of age-related declines. The objective of this study is to analyze the predictors of cognitive performance in a Spanish sample over 50 years from a multidimensional perspective, including socioeconomic, affective, and physical variables. Some of them are well-known predictors of cognition and others are emergent variables in the study of cognition. The total sample, drawn from the "Longitudinal Study Aging in Spain (ELES)" project, consisted of 832 individuals without signs of cognitive impairment. Cognitive function was measured with tests evaluating episodic and working memory, visuomotor speed, fluency, and naming. Thirteen independent variables were selected as predictors belonging to socioeconomic, emotional, and physical execution areas. Multiple linear regressions, following the enter method, were calculated for each age group in order to study the influence of these variables in cognitive performance. Education is the variable which best predicts cognitive performance in the 50-59, 60-69, and 70-79 years old groups. In the 80+ group, the best predictor is objective economic status and education does not enter in the model. Age-related decline can be modified by the influence of educational and socioeconomic variables. In this context, it is relevant to take into account how easy is to modify certain variables, compared to others which depend on each person's life course.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.
Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,”more » has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer« less
A motivational model for environmentally responsible behavior.
Tabernero, Carmen; Hernández, Bernardo
2012-07-01
This paper presents a study examining whether self-efficacy and intrinsic motivation are related to environmentally responsible behavior (ERB). The study analysed past environmental behavior, self-regulatory mechanisms (self-efficacy, satisfaction, goals), and intrinsic and extrinsic motivation in relation to ERBs in a sample of 156 university students. Results show that all the motivational variables studied are linked to ERB. The effects of self-efficacy on ERB are mediated by the intrinsic motivation responses of the participants. A theoretical model was created by means of path analysis, revealing the power of motivational variables to predict ERB. Structural equation modeling was used to test and fit the research model. The role of motivational variables is discussed with a view to creating adequate learning contexts and experiences to generate interest and new sensations in which self-efficacy and affective reactions play an important role.
Ryberg, Karen R.; Vecchia, Aldo V.
2013-01-01
The seawaveQ R package fits a parametric regression model (seawaveQ) to pesticide concentration data from streamwater samples to assess variability and trends. The model incorporates the strong seasonality and high degree of censoring common in pesticide data and users can incorporate numerous ancillary variables, such as streamflow anomalies. The model is fitted to pesticide data using maximum likelihood methods for censored data and is robust in terms of pesticide, stream location, and degree of censoring of the concentration data. This R package standardizes this methodology for trend analysis, documents the code, and provides help and tutorial information, as well as providing additional utility functions for plotting pesticide and other chemical concentration data.
Yeung, Wing-Fai; Chung, Ka-Fai; Zhang, Nevin Lian-Wen; Zhang, Shi Ping; Yung, Kam-Ping; Chen, Pei-Xian; Ho, Yan-Yee
2016-01-01
Chinese medicine (CM) syndrome (zheng) differentiation is based on the co-occurrence of CM manifestation profiles, such as signs and symptoms, and pulse and tongue features. Insomnia is a symptom that frequently occurs in major depressive disorder despite adequate antidepressant treatment. This study aims to identify co-occurrence patterns in participants with persistent insomnia and major depressive disorder from clinical feature data using latent tree analysis, and to compare the latent variables with relevant CM syndromes. One hundred and forty-two participants with persistent insomnia and a history of major depressive disorder completed a standardized checklist (the Chinese Medicine Insomnia Symptom Checklist) specially developed for CM syndrome classification of insomnia. The checklist covers symptoms and signs, including tongue and pulse features. The clinical features assessed by the checklist were analyzed using Lantern software. CM practitioners with relevant experience compared the clinical feature variables under each latent variable with reference to relevant CM syndromes, based on a previous review of CM syndromes. The symptom data were analyzed to build the latent tree model and the model with the highest Bayes information criterion score was regarded as the best model. This model contained 18 latent variables, each of which divided participants into two clusters. Six clusters represented more than 50 % of the sample. The clinical feature co-occurrence patterns of these six clusters were interpreted as the CM syndromes Liver qi stagnation transforming into fire, Liver fire flaming upward, Stomach disharmony, Hyperactivity of fire due to yin deficiency, Heart-kidney noninteraction, and Qi deficiency of the heart and gallbladder. The clinical feature variables that contributed significant cumulative information coverage (at least 95 %) were identified. Latent tree model analysis on a sample of depressed participants with insomnia revealed 13 clinical feature co-occurrence patterns, four mutual-exclusion patterns, and one pattern with a single clinical feature variable.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Salloum, Maher N.; Sargsyan, Khachik; Jones, Reese E.
2015-08-11
We present a methodology to assess the predictive fidelity of multiscale simulations by incorporating uncertainty in the information exchanged between the components of an atomistic-to-continuum simulation. We account for both the uncertainty due to finite sampling in molecular dynamics (MD) simulations and the uncertainty in the physical parameters of the model. Using Bayesian inference, we represent the expensive atomistic component by a surrogate model that relates the long-term output of the atomistic simulation to its uncertain inputs. We then present algorithms to solve for the variables exchanged across the atomistic-continuum interface in terms of polynomial chaos expansions (PCEs). We alsomore » consider a simple Couette flow where velocities are exchanged between the atomistic and continuum components, while accounting for uncertainty in the atomistic model parameters and the continuum boundary conditions. Results show convergence of the coupling algorithm at a reasonable number of iterations. As a result, the uncertainty in the obtained variables significantly depends on the amount of data sampled from the MD simulations and on the width of the time averaging window used in the MD simulations.« less
Generalized correlation integral vectors: A distance concept for chaotic dynamical systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haario, Heikki, E-mail: heikki.haario@lut.fi; Kalachev, Leonid, E-mail: KalachevL@mso.umt.edu; Hakkarainen, Janne
2015-06-15
Several concepts of fractal dimension have been developed to characterise properties of attractors of chaotic dynamical systems. Numerical approximations of them must be calculated by finite samples of simulated trajectories. In principle, the quantities should not depend on the choice of the trajectory, as long as it provides properly distributed samples of the underlying attractor. In practice, however, the trajectories are sensitive with respect to varying initial values, small changes of the model parameters, to the choice of a solver, numeric tolerances, etc. The purpose of this paper is to present a statistically sound approach to quantify this variability. Wemore » modify the concept of correlation integral to produce a vector that summarises the variability at all selected scales. The distribution of this stochastic vector can be estimated, and it provides a statistical distance concept between trajectories. Here, we demonstrate the use of the distance for the purpose of estimating model parameters of a chaotic dynamic model. The methodology is illustrated using computational examples for the Lorenz 63 and Lorenz 95 systems, together with a framework for Markov chain Monte Carlo sampling to produce posterior distributions of model parameters.« less
NASA Astrophysics Data System (ADS)
Hanan, Lu; Qiushi, Li; Shaobin, Li
2016-12-01
This paper presents an integrated optimization design method in which uniform design, response surface methodology and genetic algorithm are used in combination. In detail, uniform design is used to select the experimental sampling points in the experimental domain and the system performance is evaluated by means of computational fluid dynamics to construct a database. After that, response surface methodology is employed to generate a surrogate mathematical model relating the optimization objective and the design variables. Subsequently, genetic algorithm is adopted and applied to the surrogate model to acquire the optimal solution in the case of satisfying some constraints. The method has been applied to the optimization design of an axisymmetric diverging duct, dealing with three design variables including one qualitative variable and two quantitative variables. The method of modeling and optimization design performs well in improving the duct aerodynamic performance and can be also applied to wider fields of mechanical design and seen as a useful tool for engineering designers, by reducing the design time and computation consumption.
Marshall, Andrew J; Evanovich, Emma K; David, Sarah Jo; Mumma, Gregory H
2018-01-17
High comorbidity rates among emotional disorders have led researchers to examine transdiagnostic factors that may contribute to shared psychopathology. Bifactor models provide a unique method for examining transdiagnostic variables by modelling the common and unique factors within measures. Previous findings suggest that the bifactor model of the Depression Anxiety and Stress Scale (DASS) may provide a method for examining transdiagnostic factors within emotional disorders. This study aimed to replicate the bifactor model of the DASS, a multidimensional measure of psychological distress, within a US adult sample and provide initial estimates of the reliability of the general and domain-specific factors. Furthermore, this study hypothesized that Worry, a theorized transdiagnostic variable, would show stronger relations to general emotional distress than domain-specific subscales. Confirmatory factor analysis was used to evaluate the bifactor model structure of the DASS in 456 US adult participants (279 females and 177 males, mean age 35.9 years) recruited online. The DASS bifactor model fitted well (CFI = 0.98; RMSEA = 0.05). The General Emotional Distress factor accounted for most of the reliable variance in item scores. Domain-specific subscales accounted for modest portions of reliable variance in items after accounting for the general scale. Finally, structural equation modelling indicated that Worry was strongly predicted by the General Emotional Distress factor. The DASS bifactor model is generalizable to a US community sample and General Emotional Distress, but not domain-specific factors, strongly predict the transdiagnostic variable Worry.
Marković, Snežana; Kerč, Janez; Horvat, Matej
2017-03-01
We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Dempsey, Steven J; Gese, Eric M; Kluever, Bryan M; Lonsinger, Robert C; Waits, Lisette P
2015-01-01
Development and evaluation of noninvasive methods for monitoring species distribution and abundance is a growing area of ecological research. While noninvasive methods have the advantage of reduced risk of negative factors associated with capture, comparisons to methods using more traditional invasive sampling is lacking. Historically kit foxes (Vulpes macrotis) occupied the desert and semi-arid regions of southwestern North America. Once the most abundant carnivore in the Great Basin Desert of Utah, the species is now considered rare. In recent decades, attempts have been made to model the environmental variables influencing kit fox distribution. Using noninvasive scat deposition surveys for determination of kit fox presence, we modeled resource selection functions to predict kit fox distribution using three popular techniques (Maxent, fixed-effects, and mixed-effects generalized linear models) and compared these with similar models developed from invasive sampling (telemetry locations from radio-collared foxes). Resource selection functions were developed using a combination of landscape variables including elevation, slope, aspect, vegetation height, and soil type. All models were tested against subsequent scat collections as a method of model validation. We demonstrate the importance of comparing multiple model types for development of resource selection functions used to predict a species distribution, and evaluating the importance of environmental variables on species distribution. All models we examined showed a large effect of elevation on kit fox presence, followed by slope and vegetation height. However, the invasive sampling method (i.e., radio-telemetry) appeared to be better at determining resource selection, and therefore may be more robust in predicting kit fox distribution. In contrast, the distribution maps created from the noninvasive sampling (i.e., scat transects) were significantly different than the invasive method, thus scat transects may be appropriate when used in an occupancy framework to predict species distribution. We concluded that while scat deposition transects may be useful for monitoring kit fox abundance and possibly occupancy, they do not appear to be appropriate for determining resource selection. On our study area, scat transects were biased to roadways, while data collected using radio-telemetry was dictated by movements of the kit foxes themselves. We recommend that future studies applying noninvasive scat sampling should consider a more robust random sampling design across the landscape (e.g., random transects or more complete road coverage) that would then provide a more accurate and unbiased depiction of resource selection useful to predict kit fox distribution.
Growth-simulation model for lodgepole pine in central Oregon.
Walter G. Dahms
1983-01-01
A growth-simulation model for central Oregon lodgepole pine (Pinus contorta Dougl.) has been constructed by combining data from temporary and permanent sample plots. The model is similar to a conventional yield table with the added capacity for dealing with the stand-density variable. The simulator runs on a desk-top computer.
A Structural Equation Model for Predicting Business Student Performance
ERIC Educational Resources Information Center
Pomykalski, James J.; Dion, Paul; Brock, James L.
2008-01-01
In this study, the authors developed a structural equation model that accounted for 79% of the variability of a student's final grade point average by using a sample size of 147 students. The model is based on student grades in 4 foundational business courses: introduction to business, macroeconomics, statistics, and using databases. Educators and…
Modelling tourists arrival using time varying parameter
NASA Astrophysics Data System (ADS)
Suciptawati, P.; Sukarsa, K. G.; Kencana, Eka N.
2017-06-01
The importance of tourism and its related sectors to support economic development and poverty reduction in many countries increase researchers’ attentions to study and model tourists’ arrival. This work is aimed to demonstrate time varying parameter (TVP) technique to model the arrival of Korean’s tourists to Bali. The number of Korean tourists whom visiting Bali for period January 2010 to December 2015 were used to model the number of Korean’s tourists to Bali (KOR) as dependent variable. The predictors are the exchange rate of Won to IDR (WON), the inflation rate in Korea (INFKR), and the inflation rate in Indonesia (INFID). Observing tourists visit to Bali tend to fluctuate by their nationality, then the model was built by applying TVP and its parameters were approximated using Kalman Filter algorithm. The results showed all of predictor variables (WON, INFKR, INFID) significantly affect KOR. For in-sample and out-of-sample forecast with ARIMA’s forecasted values for the predictors, TVP model gave mean absolute percentage error (MAPE) as much as 11.24 percent and 12.86 percent, respectively.
A size-dependent constitutive model of bulk metallic glasses in the supercooled liquid region
Yao, Di; Deng, Lei; Zhang, Mao; Wang, Xinyun; Tang, Na; Li, Jianjun
2015-01-01
Size effect is of great importance in micro forming processes. In this paper, micro cylinder compression was conducted to investigate the deformation behavior of bulk metallic glasses (BMGs) in supercooled liquid region with different deformation variables including sample size, temperature and strain rate. It was found that the elastic and plastic behaviors of BMGs have a strong dependence on the sample size. The free volume and defect concentration were introduced to explain the size effect. In order to demonstrate the influence of deformation variables on steady stress, elastic modulus and overshoot phenomenon, four size-dependent factors were proposed to construct a size-dependent constitutive model based on the Maxwell-pulse type model previously presented by the authors according to viscosity theory and free volume model. The proposed constitutive model was then adopted in finite element method simulations, and validated by comparing the micro cylinder compression and micro double cup extrusion experimental data with the numerical results. Furthermore, the model provides a new approach to understanding the size-dependent plastic deformation behavior of BMGs. PMID:25626690
NASA Astrophysics Data System (ADS)
Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua
2017-10-01
Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of <24%, 24-30%, >30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.
Schleicher, Rosemary L; Sternberg, Maya R; Pfeiffer, Christine M
2013-06-01
Sociodemographic and lifestyle factors exert important influences on nutritional status; however, information on their association with biomarkers of fat-soluble nutrients is limited, particularly in a representative sample of adults. Serum or plasma concentrations of vitamin A, vitamin E, carotenes, xanthophylls, 25-hydroxyvitamin D [25(OH)D], SFAs, MUFAs, PUFAs, and total fatty acids (tFAs) were measured in adults (aged ≥ 20 y) during all or part of NHANES 2003-2006. Simple and multiple linear regression models were used to assess 5 sociodemographic variables (age, sex, race-ethnicity, education, and income) and 5 lifestyle behaviors (smoking, alcohol consumption, BMI, physical activity, and supplement use) and their relation to biomarker concentrations. Adjustment for total serum cholesterol and lipid-altering drug use was added to the full regression model. Adjustment for latitude and season was added to the full model for 25(OH)D. Based on simple linear regression, race-ethnicity, BMI, and supplement use were significantly related to all fat-soluble biomarkers. Sociodemographic variables as a group explained 5-17% of biomarker variability, whereas together, sociodemographic and lifestyle variables explained 22-23% [25(OH)D, vitamin E, xanthophylls], 17% (vitamin A), 15% (MUFAs), 10-11% (SFAs, carotenes, tFAs), and 6% (PUFAs) of biomarker variability. Although lipid adjustment explained additional variability for all biomarkers except for 25(OH)D, it appeared to be largely independent of sociodemographic and lifestyle variables. After adjusting for sociodemographic, lifestyle, and lipid-related variables, major differences in biomarkers were associated with race-ethnicity (from -44 to 57%), smoking (up to -25%), supplement use (up to 21%), and BMI (up to -15%). Latitude and season attenuated some race-ethnicity differences. Of the sociodemographic and lifestyle variables examined, with or without lipid adjustment, most fat-soluble nutrient biomarkers were significantly associated with race-ethnicity.
Kristensen, Terje; Ohlson, Mikael; Bolstad, Paul; Nagy, Zoltan
2015-08-01
Accurate field measurements from inventories across fine spatial scales are critical to improve sampling designs and to increase the precision of forest C cycling modeling. By studying soils undisturbed from active forest management, this paper gives a unique insight in the naturally occurring variability of organic layer C and provides valuable references against which subsequent and future sampling schemes can be evaluated. We found that the organic layer C stocks displayed great short-range variability with spatial autocorrelation distances ranging from 0.86 up to 2.85 m. When spatial autocorrelations are known, we show that a minimum of 20 inventory samples separated by ∼5 m is needed to determine the organic layer C stock with a precision of ±0.5 kg C m(-2). Our data also demonstrates a strong relationship between the organic layer C stock and horizon thickness (R (2) ranging from 0.58 to 0.82). This relationship suggests that relatively inexpensive measurements of horizon thickness can supplement soil C sampling, by reducing the number of soil samples collected, or to enhance the spatial resolution of organic layer C mapping.
NASA Astrophysics Data System (ADS)
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2013-12-01
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2013-12-07
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Martínez Vega, Mabel V; Sharifzadeh, Sara; Wulfsohn, Dvoralai; Skov, Thomas; Clemmensen, Line Harder; Toldam-Andersen, Torben B
2013-12-01
Visible-near infrared spectroscopy remains a method of increasing interest as a fast alternative for the evaluation of fruit quality. The success of the method is assumed to be achieved by using large sets of samples to produce robust calibration models. In this study we used representative samples of an early and a late season apple cultivar to evaluate model robustness (in terms of prediction ability and error) on the soluble solids content (SSC) and acidity prediction, in the wavelength range 400-1100 nm. A total of 196 middle-early season and 219 late season apples (Malus domestica Borkh.) cvs 'Aroma' and 'Holsteiner Cox' samples were used to construct spectral models for SSC and acidity. Partial least squares (PLS), ridge regression (RR) and elastic net (EN) models were used to build prediction models. Furthermore, we compared three sub-sample arrangements for forming training and test sets ('smooth fractionator', by date of measurement after harvest and random). Using the 'smooth fractionator' sampling method, fewer spectral bands (26) and elastic net resulted in improved performance for SSC models of 'Aroma' apples, with a coefficient of variation CVSSC = 13%. The model showed consistently low errors and bias (PLS/EN: R(2) cal = 0.60/0.60; SEC = 0.88/0.88°Brix; Biascal = 0.00/0.00; R(2) val = 0.33/0.44; SEP = 1.14/1.03; Biasval = 0.04/0.03). However, the prediction acidity and for SSC (CV = 5%) of the late cultivar 'Holsteiner Cox' produced inferior results as compared with 'Aroma'. It was possible to construct local SSC and acidity calibration models for early season apple cultivars with CVs of SSC and acidity around 10%. The overall model performance of these data sets also depend on the proper selection of training and test sets. The 'smooth fractionator' protocol provided an objective method for obtaining training and test sets that capture the existing variability of the fruit samples for construction of visible-NIR prediction models. The implication is that by using such 'efficient' sampling methods for obtaining an initial sample of fruit that represents the variability of the population and for sub-sampling to form training and test sets it should be possible to use relatively small sample sizes to develop spectral predictions of fruit quality. Using feature selection and elastic net appears to improve the SSC model performance in terms of R(2), RMSECV and RMSEP for 'Aroma' apples. © 2013 Society of Chemical Industry.
Variables affecting the academic and social integration of nursing students.
Zeitlin-Ophir, Iris; Melitz, Osnat; Miller, Rina; Podoshin, Pia; Mesh, Gustavo
2004-07-01
This study attempted to analyze the variables that influence the academic integration of nursing students. The theoretical model presented by Leigler was adapted to the existing conditions in a school of nursing in northern Israel. The independent variables included the student's background; amount of support received in the course of studies; extent of outside family and social commitments; satisfaction with the school's facilities and services; and level of social integration. The dependent variable was the student's level of academic integration. The findings substantiated four central hypotheses, with the study model explaining approximately 45% of the variance in the dependent variable. Academic integration is influenced by a number of variables, the most prominent of which is the social integration of the student with colleagues and educational staff. Among the background variables, country of origin was found to be significant to both social and academic integration for two main groups in the sample: Israeli-born students (both Jewish and Arab) and immigrant students.
Rudolph, Kara E.; Sánchez, Brisa N.; Stuart, Elizabeth A.; Greenberg, Benjamin; Fujishiro, Kaori; Wand, Gary S.; Shrager, Sandi; Seeman, Teresa; Diez Roux, Ana V.; Golden, Sherita H.
2016-01-01
Evidence of the link between job strain and cortisol levels has been inconsistent. This could be due to failure to account for cortisol variability leading to underestimated standard errors. Our objective was to model the relationship between job strain and the whole cortisol curve, accounting for sources of cortisol variability. Our functional mixed-model approach incorporated all available data—18 samples over 3 days—and uncertainty in estimated relationships. We used employed participants from the Multi-Ethnic Study of Atherosclerosis Stress I Study and data collected between 2002 and 2006. We used propensity score matching on an extensive set of variables to control for sources of confounding. We found that job strain was associated with lower salivary cortisol levels and lower total area under the curve. We found no relationship between job strain and the cortisol awakening response. Our findings differed from those of several previous studies. It is plausible that our results were unique to middle- to older-aged racially, ethnically, and occupationally diverse adults and were therefore not inconsistent with previous research among younger, mostly white samples. However, it is also plausible that previous findings were influenced by residual confounding and failure to propagate uncertainty (i.e., account for the multiple sources of variability) in estimating cortisol features. PMID:26905339
Cheung, Emily Yee Man; Sachs, John
2006-12-01
The modified technology acceptance model was used to predict actual Blackboard usage (a web-based information system) in a sample of 57 Hong Kong student teachers whose mean age was 27.8 yr. (SD = 6.9). While the general form of the model was supported, Application-specific Self-efficacy was a more powerful predictor of system use than Behavioural Intention as predicted by the theory of reasoned action. Thus in this cultural and educational context, it has been shown that the model does not fully mediate the effect of Self-efficacy on System Use. Also, users' Enjoyment exerted considerable influence on the component variables of Usefulness and Ease of Use and on Application-specific Self-efficacy, thus indirectly influencing system usage. Consequently, efforts to gain students' acceptance and, therefore, use of information systems such as Blackboard must pay adequate attention to users' Self-efficacy and motivational variables such as Enjoyment.
Systematic on-site monitoring of compliance dust samples
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grayson, R.L.; Gandy, J.R.
1996-12-31
Maintaining compliance with U.S. respirable coal mine dust standards can be difficult on high-productivity longwall panels. Comprehensive and systematic analysis of compliance dust sample data, coupled with access to the U.S. Bureau of Mines (USBM) DUSTPRO, can yield important information for use in maintaining compliance. The objective of this study was to develop and apply a customized software for the collection, storage, modification, and analysis of respirable dust data while providing for flexible export of data and linking with the USBM`s expert advisory system on dust control. An executable, IBM-compatible software was created and customized for use by the personmore » in charge of collecting, submitting, analyzing, and monitoring respirable dust compliance samples. Both descriptive statistics and multiple regression analysis were incorporated. The software allows ASCH files to be exported and directly links with DUSTPRO. After development and validation of the software, longwall compliance data from two different mines was analyzed to evaluate the value of the software. Data included variables on respirable dust concentration, tons produced, the existence of roof/floor rock (dummy variable), and the sampling cycle (dummy variables). Because of confidentiality, specific data will not be presented, only the equations and ANOVA tables. The final regression models explained 83.8% and 61.1% of the variation in the data for the two panels. Important correlations among variables within sampling cycles showed the value of using dummy variables for sampling cycles. The software proved flexible and fast for its intended use. The insights obtained from use improved the systematic monitoring of respirable dust compliance data, especially for pinpointing the most effective dust control methods during specific sampling cycles.« less
Drug awareness in adolescents attending a mental health service: analysis of longitudinal data.
Arnau, Jaume; Bono, Roser; Díaz, Rosa; Goti, Javier
2011-11-01
One of the procedures used most recently with longitudinal data is linear mixed models. In the context of health research the increasing number of studies that now use these models bears witness to the growing interest in this type of analysis. This paper describes the application of linear mixed models to a longitudinal study of a sample of Spanish adolescents attending a mental health service, the aim being to investigate their knowledge about the consumption of alcohol and other drugs. More specifically, the main objective was to compare the efficacy of a motivational interviewing programme with a standard approach to drug awareness. The models used to analyse the overall indicator of drug awareness were as follows: (a) unconditional linear growth curve model; (b) growth model with subject-associated variables; and (c) individual curve model with predictive variables. The results showed that awareness increased over time and that the variable 'schooling years' explained part of the between-subjects variation. The effect of motivational interviewing was also significant.
Lamers, L M
1999-01-01
OBJECTIVE: To evaluate the predictive accuracy of the Diagnostic Cost Group (DCG) model using health survey information. DATA SOURCES/STUDY SETTING: Longitudinal data collected for a sample of members of a Dutch sickness fund. In the Netherlands the sickness funds provide compulsory health insurance coverage for the 60 percent of the population in the lowest income brackets. STUDY DESIGN: A demographic model and DCG capitation models are estimated by means of ordinary least squares, with an individual's annual healthcare expenditures in 1994 as the dependent variable. For subgroups based on health survey information, costs predicted by the models are compared with actual costs. Using stepwise regression procedures a subset of relevant survey variables that could improve the predictive accuracy of the three-year DCG model was identified. Capitation models were extended with these variables. DATA COLLECTION/EXTRACTION METHODS: For the empirical analysis, panel data of sickness fund members were used that contained demographic information, annual healthcare expenditures, and diagnostic information from hospitalizations for each member. In 1993, a mailed health survey was conducted among a random sample of 15,000 persons in the panel data set, with a 70 percent response rate. PRINCIPAL FINDINGS: The predictive accuracy of the demographic model improves when it is extended with diagnostic information from prior hospitalizations (DCGs). A subset of survey variables further improves the predictive accuracy of the DCG capitation models. The predictable profits and losses based on survey information for the DCG models are smaller than for the demographic model. Most persons with predictable losses based on health survey information were not hospitalized in the preceding year. CONCLUSIONS: The use of diagnostic information from prior hospitalizations is a promising option for improving the demographic capitation payment formula. This study suggests that diagnostic information from outpatient utilization is complementary to DCGs in predicting future costs. PMID:10029506
NASA Astrophysics Data System (ADS)
Babcock, C. R.; Finley, A. O.; Andersen, H. E.; Moskal, L. M.; Morton, D. C.; Cook, B.; Nelson, R.
2017-12-01
Upcoming satellite lidar missions, such as GEDI and IceSat-2, are designed to collect laser altimetry data from space for narrow bands along orbital tracts. As a result lidar metric sets derived from these sources will not be of complete spatial coverage. This lack of complete coverage, or sparsity, means traditional regression approaches that consider lidar metrics as explanatory variables (without error) cannot be used to generate wall-to-wall maps of forest inventory variables. We implement a coregionalization framework to jointly model sparsely sampled lidar information and point-referenced forest variable measurements to create wall-to-wall maps with full probabilistic uncertainty quantification of all inputs. We inform the model with USFS Forest Inventory and Analysis (FIA) in-situ forest measurements and GLAS lidar data to spatially predict aboveground forest biomass (AGB) across the contiguous US. We cast our model within a Bayesian hierarchical framework to better model complex space-varying correlation structures among the lidar metrics and FIA data, which yields improved prediction and uncertainty assessment. To circumvent computational difficulties that arise when fitting complex geostatistical models to massive datasets, we use a Nearest Neighbor Gaussian process (NNGP) prior. Results indicate that a coregionalization modeling approach to leveraging sampled lidar data to improve AGB estimation is effective. Further, fitting the coregionalization model within a Bayesian mode of inference allows for AGB quantification across scales ranging from individual pixel estimates of AGB density to total AGB for the continental US with uncertainty. The coregionalization framework examined here is directly applicable to future spaceborne lidar acquisitions from GEDI and IceSat-2. Pairing these lidar sources with the extensive FIA forest monitoring plot network using a joint prediction framework, such as the coregionalization model explored here, offers the potential to improve forest AGB accounting certainty and provide maps for post-model fitting analysis of the spatial distribution of AGB.
Engelmann Spruce Site Index Models: A Comparison of Model Functions and Parameterizations
Nigh, Gordon
2015-01-01
Engelmann spruce (Picea engelmannii Parry ex Engelm.) is a high-elevation species found in western Canada and western USA. As this species becomes increasingly targeted for harvesting, better height growth information is required for good management of this species. This project was initiated to fill this need. The objective of the project was threefold: develop a site index model for Engelmann spruce; compare the fits and modelling and application issues between three model formulations and four parameterizations; and more closely examine the grounded-Generalized Algebraic Difference Approach (g-GADA) model parameterization. The model fitting data consisted of 84 stem analyzed Engelmann spruce site trees sampled across the Engelmann Spruce – Subalpine Fir biogeoclimatic zone. The fitted models were based on the Chapman-Richards function, a modified Hossfeld IV function, and the Schumacher function. The model parameterizations that were tested are indicator variables, mixed-effects, GADA, and g-GADA. Model evaluation was based on the finite-sample corrected version of Akaike’s Information Criteria and the estimated variance. Model parameterization had more of an influence on the fit than did model formulation, with the indicator variable method providing the best fit, followed by the mixed-effects modelling (9% increase in the variance for the Chapman-Richards and Schumacher formulations over the indicator variable parameterization), g-GADA (optimal approach) (335% increase in the variance), and the GADA/g-GADA (with the GADA parameterization) (346% increase in the variance). Factors related to the application of the model must be considered when selecting the model for use as the best fitting methods have the most barriers in their application in terms of data and software requirements. PMID:25853472
D'Zurilla, T J; Chang, E C; Nottingham, E J; Faccini, L
1998-12-01
The Social Problem-Solving Inventory-Revised was used to examine the relations between problem-solving abilities and hopelessness, depression, and suicidal risk in three different samples: undergraduate college students, general psychiatric inpatients, and suicidal psychiatric inpatients. A similar pattern of results was found in both college students and psychiatric patients: a negative problem orientation was most highly correlated with all three criterion variables, followed by either a positive problem orientation or an avoidance problem-solving style. Rational problem-solving skills emerged as an important predictor variable in the suicidal psychiatric sample. Support was found for a prediction model of suicidal risk that includes problem-solving deficits and hopelessness, with partial support being found for including depression in the model as well.
de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino
2018-05-01
This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.
Empirical findings on socioeconomic determinants of fertility differentials in Costa Rica.
Carvajal, M J; Geithman, D T
1986-01-01
"This paper seeks to (1) identify socioeconomic variables that are expected to generate fertility differentials; (2) hypothesize the direction and magnitude of the effect of each variable by reference to a demand-for-children model; and (3) test empirically the model using evidence from Costa Rica. The estimates are obtained from a ten-percent systematic random sample of all Costa Rican individual-family households. There are 15,924 families in the sample...." The authors specifically seek "to capture the effects of changing relative prices and available income and time constraints on parental preferences for children. Least-squares estimates show statistically significant relationships between household fertility and opportunity cost of time, parental education, occurrence of an extended family, medical care, household sanitation, economic sector of employment, and household stock of nonhuman capital." excerpt
MacInnis, Martin J; McGlory, Chris; Gibala, Martin J; Phillips, Stuart M
2017-06-01
Direct sampling of human skeletal muscle using the needle biopsy technique can facilitate insight into the biochemical and histological responses resulting from changes in exercise or feeding. However, the muscle biopsy procedure is invasive, and analyses are often expensive, which places pragmatic restraints on sample sizes. The unilateral exercise model can serve to increase statistical power and reduce the time and cost of a study. With this approach, 2 limbs of a participant are randomized to 1 of 2 treatments that can be applied almost concurrently or sequentially depending on the nature of the intervention. Similar to a typical repeated measures design, comparisons are made within participants, which increases statistical power by reducing the amount of between-person variability. A washout period is often unnecessary, reducing the time needed to complete the experiment and the influence of potential confounding variables such as habitual diet, activity, and sleep. Variations of the unilateral exercise model have been employed to investigate the influence of exercise, diet, and the interaction between the 2, on a wide range of variables including mitochondrial content, capillary density, and skeletal muscle hypertrophy. Like any model, unilateral exercise has some limitations: it cannot be used to study variables that potentially transfer across limbs, and it is generally limited to exercises that can be performed in pairs of treatments. Where appropriate, however, the unilateral exercise model can yield robust, well-controlled investigations of skeletal muscle responses to a wide range of interventions and conditions including exercise, dietary manipulation, and disuse or immobilization.
Bayesian dynamic modeling of time series of dengue disease case counts.
Martínez-Bello, Daniel Adyro; López-Quílez, Antonio; Torres-Prieto, Alexander
2017-07-01
The aim of this study is to model the association between weekly time series of dengue case counts and meteorological variables, in a high-incidence city of Colombia, applying Bayesian hierarchical dynamic generalized linear models over the period January 2008 to August 2015. Additionally, we evaluate the model's short-term performance for predicting dengue cases. The methodology shows dynamic Poisson log link models including constant or time-varying coefficients for the meteorological variables. Calendar effects were modeled using constant or first- or second-order random walk time-varying coefficients. The meteorological variables were modeled using constant coefficients and first-order random walk time-varying coefficients. We applied Markov Chain Monte Carlo simulations for parameter estimation, and deviance information criterion statistic (DIC) for model selection. We assessed the short-term predictive performance of the selected final model, at several time points within the study period using the mean absolute percentage error. The results showed the best model including first-order random walk time-varying coefficients for calendar trend and first-order random walk time-varying coefficients for the meteorological variables. Besides the computational challenges, interpreting the results implies a complete analysis of the time series of dengue with respect to the parameter estimates of the meteorological effects. We found small values of the mean absolute percentage errors at one or two weeks out-of-sample predictions for most prediction points, associated with low volatility periods in the dengue counts. We discuss the advantages and limitations of the dynamic Poisson models for studying the association between time series of dengue disease and meteorological variables. The key conclusion of the study is that dynamic Poisson models account for the dynamic nature of the variables involved in the modeling of time series of dengue disease, producing useful models for decision-making in public health.
Attachment change processes in the early years of marriage.
Davila, J; Karney, B R; Bradbury, T N
1999-05-01
The authors examined 4 models of attachment change: a contextual model, a social-cognitive model, an individual-difference model, and a diathesis-stress model. Models were examined in a sample of newlyweds over the first 2 years of marriage, using growth curve analyses. Reciprocal processes, whereby attachment representations and interpersonal life circumstances affect one another over time, also were studied. On average, newlyweds became more secure over time. However, there was significant within-subject variability on attachment change that was predicted by intra- and interpersonal factors. Attachment representations changed in response to contextual, social-cognitive, and individual-difference factors. Reciprocal processes between attachment representations and marital variables emerged, suggesting that these factors influence one another in an ongoing way.
NASA Astrophysics Data System (ADS)
Hamalainen, Sampsa; Geng, Xiaoyuan; He, Juanxia
2017-04-01
Latin Hypercube Sampling (LHS) at variable resolutions for enhanced watershed scale Soil Sampling and Digital Soil Mapping. Sampsa Hamalainen, Xiaoyuan Geng, and Juanxia, He. AAFC - Agriculture and Agr-Food Canada, Ottawa, Canada. The Latin Hypercube Sampling (LHS) approach to assist with Digital Soil Mapping has been developed for some time now, however the purpose of this work was to complement LHS with use of multiple spatial resolutions of covariate datasets and variability in the range of sampling points produced. This allowed for specific sets of LHS points to be produced to fulfil the needs of various partners from multiple projects working in the Ontario and Prince Edward Island provinces of Canada. Secondary soil and environmental attributes are critical inputs that are required in the development of sampling points by LHS. These include a required Digital Elevation Model (DEM) and subsequent covariate datasets produced as a result of a Digital Terrain Analysis performed on the DEM. These additional covariates often include but are not limited to Topographic Wetness Index (TWI), Length-Slope (LS) Factor, and Slope which are continuous data. The range of specific points created in LHS included 50 - 200 depending on the size of the watershed and more importantly the number of soil types found within. The spatial resolution of covariates included within the work ranged from 5 - 30 m. The iterations within the LHS sampling were run at an optimal level so the LHS model provided a good spatial representation of the environmental attributes within the watershed. Also, additional covariates were included in the Latin Hypercube Sampling approach which is categorical in nature such as external Surficial Geology data. Some initial results of the work include using a 1000 iteration variable within the LHS model. 1000 iterations was consistently a reasonable value used to produce sampling points that provided a good spatial representation of the environmental attributes. When working within the same spatial resolution for covariates, however only modifying the desired number of sampling points produced, the change of point location portrayed a strong geospatial relationship when using continuous data. Access to agricultural fields and adjacent land uses is often "pinned" as the greatest deterrent to performing soil sampling for both soil survey and soil attribute validation work. The lack of access can be a result of poor road access and/or difficult geographical conditions to navigate for field work individuals. This seems a simple yet continuous issue to overcome for the scientific community and in particular, soils professionals. The ability to assist with the ease of access to sampling points will be in the future a contribution to the Latin Hypercube Sampling (LHS) approach. By removing all locations in the initial instance from the DEM, the LHS model can be restricted to locations only with access from the adjacent road or trail. To further the approach, a road network geospatial dataset can be included within spatial Geographic Information Systems (GIS) applications to access already produced points using a shortest-distance network method.
Local air temperature tolerance: a sensible basis for estimating climate variability
NASA Astrophysics Data System (ADS)
Kärner, Olavi; Post, Piia
2016-11-01
The customary representation of climate using sample moments is generally biased due to the noticeably nonstationary behaviour of many climate series. In this study, we introduce a moment-free climate representation based on a statistical model fitted to a long-term daily air temperature anomaly series. This model allows us to separate the climate and weather scale variability in the series. As a result, the climate scale can be characterized using the mean annual cycle of series and local air temperature tolerance, where the latter is computed using the fitted model. The representation of weather scale variability is specified using the frequency and the range of outliers based on the tolerance. The scheme is illustrated using five long-term air temperature records observed by different European meteorological stations.
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA
Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu
2009-01-01
We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190
Creating Matched Samples Using Exact Matching. Statistical Report 2016-3
ERIC Educational Resources Information Center
Godfrey, Kelly E.
2016-01-01
By creating and analyzing matched samples, researchers can simplify their analyses to include fewer covariate variables, relying less on model assumptions, and thus generating results that may be easier to report and interpret. When two groups essentially "look" the same, it is easier to explore their differences and make comparisons…
Time Delay Embedding Increases Estimation Precision of Models of Intraindividual Variability
ERIC Educational Resources Information Center
von Oertzen, Timo; Boker, Steven M.
2010-01-01
This paper investigates the precision of parameters estimated from local samples of time dependent functions. We find that "time delay embedding," i.e., structuring data prior to analysis by constructing a data matrix of overlapping samples, increases the precision of parameter estimates and in turn statistical power compared to standard…
Bowden, Stephen C; Lissner, Dianne; McCarthy, Kerri A L; Weiss, Lawrence G; Holdnack, James A
2007-10-01
Equivalence of the psychological model underlying Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) scores obtained in the United States and Australia was examined in this study. Examination of metric invariance involves testing the hypothesis that all components of the measurement model relating observed scores to latent variables are numerically equal in different samples. The assumption of metric invariance is necessary for interpretation of scores derived from research studies that seek to generalize patterns of convergent and divergent validity and patterns of deficit or disability. An Australian community volunteer sample was compared to the US standardization data. A pattern of strict metric invariance was observed across samples. In addition, when the effects of different demographic characteristics of the US and Australian samples were included, structural parameters reflecting values of the latent cognitive variables were found not to differ. These results provide important evidence for the equivalence of measurement of core cognitive abilities with the WAIS-III and suggest that latent cognitive abilities in the US and Australia do not differ.
Uncertainty quantification in Rothermel's Model using an efficient sampling method
Edwin Jimenez; M. Yousuff Hussaini; Scott L. Goodrick
2007-01-01
The purpose of the present work is to quantify parametric uncertainty in Rothermelâs wildland fire spread model (implemented in software such as BehavePlus3 and FARSITE), which is undoubtedly among the most widely used fire spread models in the United States. This model consists of a nonlinear system of equations that relates environmental variables (input parameter...
F. Mauro; Vicente J. Monleon; H. Temesgen; L.A. Ruiz
2017-01-01
Accounting for spatial correlation of LiDAR model errors can improve the precision of model-based estimators. To estimate spatial correlation, sample designs that provide close observations are needed, but their implementation might be prohibitively expensive. To quantify the gains obtained by accounting for the spatial correlation of model errors, we examined (
NASA Technical Reports Server (NTRS)
Colarco, Peter; daSilva, Arlindo; Chin, Mian; Diehl, Thomas
2010-01-01
We have implemented a module for tropospheric aerosols (GO CART) online in the NASA Goddard Earth Observing System version 4 model and simulated global aerosol distributions for the period 2000-2006. The new online system offers several advantages over the previous offline version, providing a platform for aerosol data assimilation, aerosol-chemistry-climate interaction studies, and short-range chemical weather forecasting and climate prediction. We introduce as well a methodology for sampling model output consistently with satellite aerosol optical thickness (AOT) retrievals to facilitate model-satellite comparison. Our results are similar to the offline GOCART model and to the models participating in the AeroCom intercomparison. The simulated AOT has similar seasonal and regional variability and magnitude to Aerosol Robotic Network (AERONET), Moderate Resolution Imaging Spectroradiometer, and Multiangle Imaging Spectroradiometer observations. The model AOT and Angstrom parameter are consistently low relative to AERONET in biomass-burning-dominated regions, where emissions appear to be underestimated, consistent with the results of the offline GOCART model. In contrast, the model AOT is biased high in sulfate-dominated regions of North America and Europe. Our model-satellite comparison methodology shows that diurnal variability in aerosol loading is unimportant compared to sampling the model where the satellite has cloud-free observations, particularly in sulfate-dominated regions. Simulated sea salt burden and optical thickness are high by a factor of 2-3 relative to other models, and agreement between model and satellite over-ocean AOT is improved by reducing the model sea salt burden by a factor of 2. The best agreement in both AOT magnitude and variability occurs immediately downwind of the Saharan dust plume.
Sauvé, Jean-François; Beaudry, Charles; Bégin, Denis; Dion, Chantal; Gérin, Michel; Lavoué, Jérôme
2013-05-01
Many construction activities can put workers at risk of breathing silica containing dusts, and there is an important body of literature documenting exposure levels using a task-based strategy. In this study, statistical modeling was used to analyze a data set containing 1466 task-based, personal respirable crystalline silica (RCS) measurements gathered from 46 sources to estimate exposure levels during construction tasks and the effects of determinants of exposure. Monte-Carlo simulation was used to recreate individual exposures from summary parameters, and the statistical modeling involved multimodel inference with Tobit models containing combinations of the following exposure variables: sampling year, sampling duration, construction sector, project type, workspace, ventilation, and controls. Exposure levels by task were predicted based on the median reported duration by activity, the year 1998, absence of source control methods, and an equal distribution of the other determinants of exposure. The model containing all the variables explained 60% of the variability and was identified as the best approximating model. Of the 27 tasks contained in the data set, abrasive blasting, masonry chipping, scabbling concrete, tuck pointing, and tunnel boring had estimated geometric means above 0.1mg m(-3) based on the exposure scenario developed. Water-fed tools and local exhaust ventilation were associated with a reduction of 71 and 69% in exposure levels compared with no controls, respectively. The predictive model developed can be used to estimate RCS concentrations for many construction activities in a wide range of circumstances.
ERIC Educational Resources Information Center
Fauth, Elizabeth Braungart; Zarit, Steven H.; Malmberg, Bo; Johansson, Boo
2007-01-01
Purpose: This study used the Disablement Process Model to predict whether a sample of the oldest-old maintained their disability or disability-free status over a 2- and 4-year follow-up, or whether they transitioned into a state of disability during this time. Design and Methods: We followed a sample of 149 Swedish adults who were 86 years of age…
EMMMA: A web-based system for environmental mercury mapping, modeling, and analysis
Hearn,, Paul P.; Wente, Stephen P.; Donato, David I.; Aguinaldo, John J.
2006-01-01
tissue, atmospheric emissions and deposition, stream sediments, soils, and coal) and mercuryrelated data (mine locations); 2) Interactively view and access predictions of the National Descriptive Model of Mercury in Fish (NDMMF) at 4,976 sites and 6,829 sampling events (events are unique combinations of site and sampling date) across the United States; and 3) Use interactive mapping and graphing capabilities to visualize spatial and temporal trends and study relationships between mercury and other variables.
ERIC Educational Resources Information Center
Peters, Christina D.; Kranzler, John H.; Algina, James; Smith, Stephen W.; Daunic, Ann P.
2014-01-01
The aim of the current study was to examine mean-group differences on behavior rating scales and variables that may predict such differences. Sixty-five teachers completed the Clinical Assessment of Behavior-Teacher Form (CAB-T) for a sample of 982 students. Four outcome variables from the CAB-T were assessed. Hierarchical linear modeling was used…
ERIC Educational Resources Information Center
Goodyear, Rodney K.; Newcomb, Micheal D.; Locke, Thomas F.
2002-01-01
Data from a community sample of 493 pregnant Latina teenagers were used to test a mediated model of mate selection with 5 classes of variables: (a) male partner characteristics (antisocial behaviors, negative relationships with women, harm risk, and relationship length), (b) young women's psychosocial variables (antisocial behaviors, drug use,…
F. Mauro; Vicente Monleon; H. Temesgen
2015-01-01
Small area estimation (SAE) techniques have been successfully applied in forest inventories to provide reliable estimates for domains where the sample size is small (i.e. small areas). Previous studies have explored the use of either Area Level or Unit Level Empirical Best Linear Unbiased Predictors (EBLUPs) in a univariate framework, modeling each variable of interest...
USDA-ARS?s Scientific Manuscript database
Directed soil sampling based on geospatial measurements of apparent soil electrical conductivity (ECa) is a potential means of characterizing the spatial variability of any soil property that influences ECa including soil salinity, water content, texture, bulk density, organic matter, and cation exc...
Effects of sample size and sampling frequency on studies of brown bear home ranges and habitat use
Arthur, Steve M.; Schwartz, Charles C.
1999-01-01
We equipped 9 brown bears (Ursus arctos) on the Kenai Peninsula, Alaska, with collars containing both conventional very-high-frequency (VHF) transmitters and global positioning system (GPS) receivers programmed to determine an animal's position at 5.75-hr intervals. We calculated minimum convex polygon (MCP) and fixed and adaptive kernel home ranges for randomly-selected subsets of the GPS data to examine the effects of sample size on accuracy and precision of home range estimates. We also compared results obtained by weekly aerial radiotracking versus more frequent GPS locations to test for biases in conventional radiotracking data. Home ranges based on the MCP were 20-606 km2 (x = 201) for aerial radiotracking data (n = 12-16 locations/bear) and 116-1,505 km2 (x = 522) for the complete GPS data sets (n = 245-466 locations/bear). Fixed kernel home ranges were 34-955 km2 (x = 224) for radiotracking data and 16-130 km2 (x = 60) for the GPS data. Differences between means for radiotracking and GPS data were due primarily to the larger samples provided by the GPS data. Means did not differ between radiotracking data and equivalent-sized subsets of GPS data (P > 0.10). For the MCP, home range area increased and variability decreased asymptotically with number of locations. For the kernel models, both area and variability decreased with increasing sample size. Simulations suggested that the MCP and kernel models required >60 and >80 locations, respectively, for estimates to be both accurate (change in area <1%/additional location) and precise (CV < 50%). Although the radiotracking data appeared unbiased, except for the relationship between area and sample size, these data failed to indicate some areas that likely were important to bears. Our results suggest that the usefulness of conventional radiotracking data may be limited by potential biases and variability due to small samples. Investigators that use home range estimates in statistical tests should consider the effects of variability of those estimates. Use of GPS-equipped collars can facilitate obtaining larger samples of unbiased data and improve accuracy and precision of home range estimates.
NASA Astrophysics Data System (ADS)
Walker, David M.; Allingham, David; Lee, Heung Wing Joseph; Small, Michael
2010-02-01
Small world network models have been effective in capturing the variable behaviour of reported case data of the SARS coronavirus outbreak in Hong Kong during 2003. Simulations of these models have previously been realized using informed “guesses” of the proposed model parameters and tested for consistency with the reported data by surrogate analysis. In this paper we attempt to provide statistically rigorous parameter distributions using Approximate Bayesian Computation sampling methods. We find that such sampling schemes are a useful framework for fitting parameters of stochastic small world network models where simulation of the system is straightforward but expressing a likelihood is cumbersome.
Survival of white-tailed deer neonates in Minnesota and South Dakota
Grovenburg, T.W.; Swanson, C.C.; Jacques, C.N.; Klaver, R.W.; Brinkman, T.J.; Burris, B.M.; Deperno, C.S.; Jenks, J.A.
2011-01-01
Understanding the influence of intrinsic (e.g., age, birth mass, and sex) and habitat factors on survival of neonate white-tailed deer improves understanding of population ecology. During 2002–2004, we captured and radiocollared 78 neonates in eastern South Dakota and southwestern Minnesota, of which 16 died before 1 September. Predation accounted for 80% of mortality; the remaining 20% was attributed to starvation. Canids (coyotes [Canis latrans], domestic dogs) accounted for 100% of predation on neonates. We used known fate analysis in Program MARK to estimate survival rates and investigate the influence of intrinsic and habitat variables on survival. We developed 2 a priori model sets, including intrinsic variables (model set 1) and habitat variables (model set 2; forested cover, wetlands, grasslands, and croplands). For model set 1, model {Sage-interval} had the lowest AICc (Akaike's information criterion for small sample size) value, indicating that age at mortality (3-stage age-interval: 0–2 weeks, 2–8 weeks, and >8 weeks) best explained survival. Model set 2 indicated that habitat variables did not further influence survival in the study area; β-estimates and 95% confidence intervals for habitat variables in competing models encompassed zero; thus, we excluded these models from consideration. Overall survival rate using model {Sage-interval} was 0.87 (95% CI = 0.83–0.91); 61% of mortalities occurred at 0–2 weeks of age, 26% at 2–8 weeks of age, and 13% at >8 weeks of age. Our results indicate that variables influencing survival may be area specific. Region-specific data are needed to determine influences of intrinsic and habitat variables on neonate survival before wildlife managers can determine which habitat management activities influence neonate populations.
Variables Associated With Tic Exacerbation in Children With Chronic Tic Disorders
Himle, Michael B.; Capriotti, Matthew R.; Hayes, Loran P.; Ramanujam, Krishnapriya; Scahill, Lawrence; Sukhodolsky, Denis G.; Wilhelm, Sabine; Deckersbach, Thilo; Peterson, Alan L.; Specht, Matt W.; Walkup, John T.; Chang, Susanna; Piacentini, John
2014-01-01
Research has shown that motor and vocal tics fluctuate in frequency, intensity, and form in response to environmental and contextual cues. Behavioral models have proposed that some of the variation in tics may reflect context-dependent interactive learning processes such that once tics are performed, they are influenced by environmental contingencies. The current study describes the results of a function-based assessment of tics (FBAT) from a recently completed study comparing Comprehensive Behavioral Intervention for Tics (CBIT) with supportive psychotherapy. The current study describes the frequency with which antecedent and consequence variables were reported to exacerbate tics and the relationships between these functional variables and sample baseline characteristics, comorbidities, and measures of tic severity. Results showed that tic-exacerbating antecedents and consequences were nearly ubiquitous in a sample of children with chronic tic disorder. In addition, functional variables were related to baseline measures of comorbid internalizing symptoms and specific measures of tic severity. PMID:24778433
Mousavi, S. Mostafa; Beroza, Gregory C.; Hoover, Susan M.
2018-01-01
Probabilistic seismic hazard analysis (PSHA) characterizes ground-motion hazard from earthquakes. Typically, the time horizon of a PSHA forecast is long, but in response to induced seismicity related to hydrocarbon development, the USGS developed one-year PSHA models. In this paper, we present a display of the variability in USGS hazard curves due to epistemic uncertainty in its informed submodel using a simple bootstrapping approach. We find that variability is highest in low-seismicity areas. On the other hand, areas of high seismic hazard, such as the New Madrid seismic zone or Oklahoma, exhibit relatively lower variability simply because of more available data and a better understanding of the seismicity. Comparing areas of high hazard, New Madrid, which has a history of large naturally occurring earthquakes, has lower forecast variability than Oklahoma, where the hazard is driven mainly by suspected induced earthquakes since 2009. Overall, the mean hazard obtained from bootstrapping is close to the published model, and variability increased in the 2017 one-year model relative to the 2016 model. Comparing the relative variations caused by individual logic-tree branches, we find that the highest hazard variation (as measured by the 95% confidence interval of bootstrapping samples) in the final model is associated with different ground-motion models and maximum magnitudes used in the logic tree, while the variability due to the smoothing distance is minimal. It should be pointed out that this study is not looking at the uncertainty in the hazard in general, but only as it is represented in the USGS one-year models.
Review of Factors, Methods, and Outcome Definition in Designing Opioid Abuse Predictive Models.
Alzeer, Abdullah H; Jones, Josette; Bair, Matthew J
2018-05-01
Several opioid risk assessment tools are available to prescribers to evaluate opioid analgesic abuse among chronic patients. The objectives of this study are to 1) identify variables available in the literature to predict opioid abuse; 2) explore and compare methods (population, database, and analysis) used to develop statistical models that predict opioid abuse; and 3) understand how outcomes were defined in each statistical model predicting opioid abuse. The OVID database was searched for this study. The search was limited to articles written in English and published from January 1990 to April 2016. This search generated 1,409 articles. Only seven studies and nine models met our inclusion-exclusion criteria. We found nine models and identified 75 distinct variables. Three studies used administrative claims data, and four studies used electronic health record data. The majority, four out of seven articles (six out of nine models), were primarily dependent on the presence or absence of opioid abuse or dependence (ICD-9 diagnosis code) to define opioid abuse. However, two articles used a predefined list of opioid-related aberrant behaviors. We identified variables used to predict opioid abuse from electronic health records and administrative data. Medication variables are the recurrent variables in the articles reviewed (33 variables). Age and gender are the most consistent demographic variables in predicting opioid abuse. Overall, there is similarity in the sampling method and inclusion/exclusion criteria (age, number of prescriptions, follow-up period, and data analysis methods). Intuitive research to utilize unstructured data may increase opioid abuse models' accuracy.
NASA Technical Reports Server (NTRS)
Maughan, P. M. (Principal Investigator)
1973-01-01
The author has identified the following significant results. Linear regression of secchi disc visibility against number of sets yielded significant results in a number of instances. The variability seen in the slope of the regression lines is due to the nonuniformity of sample size. The longer the period sampled, the larger the total number of attempts. Further, there is no reason to expect either the influence of transparency or of other variables to remain constant throughout the season. However, the fact that the data for the entire season, variable as it is, was significant at the 5% level, suggests its potential utility for predictive modeling. Thus, this regression equation will be considered representative and will be utilized for the first numerical model. Secchi disc visibility was also regressed against number of sets for the three day period September 27-September 29, 1972 to determine if surface truth data supported the intense relationship between ERTS-1 identified turbidity and fishing effort previously discussed. A very negative correlation was found. These relationship lend additional credence to the hypothesis that ERTS imagery, when utilized as a source of visibility (turbidity) data, may be useful as a predictive tool.
Applications of Geostatistics in Plant Nematology
Wallace, M. K.; Hawkins, D. M.
1994-01-01
The application of geostatistics to plant nematology was made by evaluating soil and nematode data acquired from 200 soil samples collected from the Ap horizon of a reed canary-grass field in northern Minnesota. Geostatistical concepts relevant to nematology include semi-variogram modelling, kriging, and change of support calculations. Soil and nematode data generally followed a spherical semi-variogram model, with little random variability associated with soil data and large inherent variability for nematode data. Block kriging of soil and nematode data provided useful contour maps of the data. Change of snpport calculations indicated that most of the random variation in nematode data was due to short-range spatial variability in the nematode population densities. PMID:19279938
Applications of geostatistics in plant nematology.
Wallace, M K; Hawkins, D M
1994-12-01
The application of geostatistics to plant nematology was made by evaluating soil and nematode data acquired from 200 soil samples collected from the A(p) horizon of a reed canary-grass field in northern Minnesota. Geostatistical concepts relevant to nematology include semi-variogram modelling, kriging, and change of support calculations. Soil and nematode data generally followed a spherical semi-variogram model, with little random variability associated with soil data and large inherent variability for nematode data. Block kriging of soil and nematode data provided useful contour maps of the data. Change of snpport calculations indicated that most of the random variation in nematode data was due to short-range spatial variability in the nematode population densities.
Smoking and Cancers: Case-Robust Analysis of a Classic Data Set
ERIC Educational Resources Information Center
Bentler, Peter M.; Satorra, Albert; Yuan, Ke-Hai
2009-01-01
A typical structural equation model is intended to reproduce the means, variances, and correlations or covariances among a set of variables based on parameter estimates of a highly restricted model. It is not widely appreciated that the sample statistics being modeled can be quite sensitive to outliers and influential observations, leading to bias…
ERIC Educational Resources Information Center
Arbona, Consuelo; And Others
1995-01-01
Examined adequacy of Keefe and Padilla's model of cultural orientation on a sample of Mexican American students enrolled either in technical college (n=125) or state university (n=239) in Texas. Specifically examined how well the model fit the Cultural Awareness and Ethnic Loyalty scales. Results indicated excellent fit for the model. (JBJ)
Zhang, Miaomiao; Wells, William M; Golland, Polina
2016-10-01
Using image-based descriptors to investigate clinical hypotheses and therapeutic implications is challenging due to the notorious "curse of dimensionality" coupled with a small sample size. In this paper, we present a low-dimensional analysis of anatomical shape variability in the space of diffeomorphisms and demonstrate its benefits for clinical studies. To combat the high dimensionality of the deformation descriptors, we develop a probabilistic model of principal geodesic analysis in a bandlimited low-dimensional space that still captures the underlying variability of image data. We demonstrate the performance of our model on a set of 3D brain MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our model yields a more compact representation of group variation at substantially lower computational cost than models based on the high-dimensional state-of-the-art approaches such as tangent space PCA (TPCA) and probabilistic principal geodesic analysis (PPGA).
Spatial generalised linear mixed models based on distances.
Melo, Oscar O; Mateu, Jorge; Melo, Carlos E
2016-10-01
Risk models derived from environmental data have been widely shown to be effective in delineating geographical areas of risk because they are intuitively easy to understand. We present a new method based on distances, which allows the modelling of continuous and non-continuous random variables through distance-based spatial generalised linear mixed models. The parameters are estimated using Markov chain Monte Carlo maximum likelihood, which is a feasible and a useful technique. The proposed method depends on a detrending step built from continuous or categorical explanatory variables, or a mixture among them, by using an appropriate Euclidean distance. The method is illustrated through the analysis of the variation in the prevalence of Loa loa among a sample of village residents in Cameroon, where the explanatory variables included elevation, together with maximum normalised-difference vegetation index and the standard deviation of normalised-difference vegetation index calculated from repeated satellite scans over time. © The Author(s) 2013.
Zhu, Qi; Burzykowski, Tomasz
2011-03-01
To reduce the influence of the between-spectra variability on the results of peptide quantification, one can consider the (18)O-labeling approach. Ideally, with such labeling technique, a mass shift of 4 Da of the isotopic distributions of peptides from the labeled sample is induced, which allows one to distinguish the two samples and to quantify the relative abundance of the peptides. It is worth noting, however, that the presence of small quantities of (16)O and (17)O atoms during the labeling step can cause incomplete labeling. In practice, ignoring incomplete labeling may result in the biased estimation of the relative abundance of the peptide in the compared samples. A Markov model was developed to address this issue (Zhu, Valkenborg, Burzykowski. J. Proteome Res. 9, 2669-2677, 2010). The model assumed that the peak intensities were normally distributed with heteroscedasticity using a power-of-the-mean variance funtion. Such a dependence has been observed in practice. Alternatively, we formulate the model within the Bayesian framework. This opens the possibility to further extend the model by the inclusion of random effects that can be used to capture the biological/technical variability of the peptide abundance. The operational characteristics of the model were investigated by applications to real-life mass-spectrometry data sets and a simulation study. © American Society for Mass Spectrometry, 2011
Improving CSF biomarker accuracy in predicting prevalent and incident Alzheimer disease
Fagan, A.M.; Williams, M.M.; Ghoshal, N.; Aeschleman, M.; Grant, E.A.; Marcus, D.S.; Mintun, M.A.; Holtzman, D.M.; Morris, J.C.
2011-01-01
Objective: To investigate factors, including cognitive and brain reserve, which may independently predict prevalent and incident dementia of the Alzheimer type (DAT) and to determine whether inclusion of identified factors increases the predictive accuracy of the CSF biomarkers Aβ42, tau, ptau181, tau/Aβ42, and ptau181/Aβ42. Methods: Logistic regression identified variables that predicted prevalent DAT when considered together with each CSF biomarker in a cross-sectional sample of 201 participants with normal cognition and 46 with DAT. The area under the receiver operating characteristic curve (AUC) from the resulting model was compared with the AUC generated using the biomarker alone. In a second sample with normal cognition at baseline and longitudinal data available (n = 213), Cox proportional hazards models identified variables that predicted incident DAT together with each biomarker, and the models' concordance probability estimate (CPE), which was compared to the CPE generated using the biomarker alone. Results: APOE genotype including an ε4 allele, male gender, and smaller normalized whole brain volumes (nWBV) were cross-sectionally associated with DAT when considered together with every biomarker. In the longitudinal sample (mean follow-up = 3.2 years), 14 participants (6.6%) developed DAT. Older age predicted a faster time to DAT in every model, and greater education predicted a slower time in 4 of 5 models. Inclusion of ancillary variables resulted in better cross-sectional prediction of DAT for all biomarkers (p < 0.0021), and better longitudinal prediction for 4 of 5 biomarkers (p < 0.0022). Conclusions: The predictive accuracy of CSF biomarkers is improved by including age, education, and nWBV in analyses. PMID:21228296
Wang, Xulong; Philip, Vivek M.; Ananda, Guruprasad; White, Charles C.; Malhotra, Ankit; Michalski, Paul J.; Karuturi, Krishna R. Murthy; Chintalapudi, Sumana R.; Acklin, Casey; Sasner, Michael; Bennett, David A.; De Jager, Philip L.; Howell, Gareth R.; Carter, Gregory W.
2018-01-01
Recent technical and methodological advances have greatly enhanced genome-wide association studies (GWAS). The advent of low-cost, whole-genome sequencing facilitates high-resolution variant identification, and the development of linear mixed models (LMM) allows improved identification of putatively causal variants. While essential for correcting false positive associations due to sample relatedness and population stratification, LMMs have commonly been restricted to quantitative variables. However, phenotypic traits in association studies are often categorical, coded as binary case-control or ordered variables describing disease stages. To address these issues, we have devised a method for genomic association studies that implements a generalized LMM (GLMM) in a Bayesian framework, called Bayes-GLMM. Bayes-GLMM has four major features: (1) support of categorical, binary, and quantitative variables; (2) cohesive integration of previous GWAS results for related traits; (3) correction for sample relatedness by mixed modeling; and (4) model estimation by both Markov chain Monte Carlo sampling and maximal likelihood estimation. We applied Bayes-GLMM to the whole-genome sequencing cohort of the Alzheimer’s Disease Sequencing Project. This study contains 570 individuals from 111 families, each with Alzheimer’s disease diagnosed at one of four confidence levels. Using Bayes-GLMM we identified four variants in three loci significantly associated with Alzheimer’s disease. Two variants, rs140233081 and rs149372995, lie between PRKAR1B and PDGFA. The coded proteins are localized to the glial-vascular unit, and PDGFA transcript levels are associated with Alzheimer’s disease-related neuropathology. In summary, this work provides implementation of a flexible, generalized mixed-model approach in a Bayesian framework for association studies. PMID:29507048
Multiple site receptor modeling with a minimal spanning tree combined with a Kohonen neural network
NASA Astrophysics Data System (ADS)
Hopke, Philip K.
1999-12-01
A combination of two pattern recognition methods has been developed that allows the generation of geographical emission maps form multivariate environmental data. In such a projection into a visually interpretable subspace by a Kohonen Self-Organizing Feature Map, the topology of the higher dimensional variables space can be preserved, but parts of the information about the correct neighborhood among the sample vectors will be lost. This can partly be compensated for by an additional projection of Prim's Minimal Spanning Tree into the trained neural network. This new environmental receptor modeling technique has been adapted for multiple sampling sites. The behavior of the method has been studied using simulated data. Subsequently, the method has been applied to mapping data sets from the Southern California Air Quality Study. The projection of a 17 chemical variables measured at up to 8 sampling sites provided a 2D, visually interpretable, geometrically reasonable arrangement of air pollution source sin the South Coast Air Basin.
Kuo, Ben Ch; Kwantes, Catherine T
2014-01-01
Despite the prevalence and popularity of research on positive and negative affect within the field of psychology, there is currently little research on affect involving the examination of cultural variables and with participants of diverse cultural and ethnic backgrounds. To the authors' knowledge, currently no empirical studies have comprehensively examined predictive models of positive and negative affect based specifically on multiple psychosocial, acculturation, and coping variables as predictors with any sample populations. Therefore, the purpose of the present study was to test the predictive power of perceived stress, social support, bidirectional acculturation (i.e., Canadian acculturation and heritage acculturation), religious coping and cultural coping (i.e., collective, avoidance, and engagement coping) in explaining positive and negative affect in a multiethnic sample of 301 undergraduate students in Canada. Two hierarchal multiple regressions were conducted, one for each affect as the dependent variable, with the above described predictors. The results supported the hypotheses and showed the two overall models to be significant in predicting affect of both kinds. Specifically, a higher level of positive affect was predicted by a lower level of perceived stress, less use of religious coping, and more use of engagement coping in dealing with stress by the participants. Higher level of negative affect, however, was predicted by a higher level of perceived stress and more use of avoidance coping in responding to stress. The current findings highlight the value and relevance of empirically examining the stress-coping-adaptation experiences of diverse populations from an affective conceptual framework, particularly with the inclusion of positive affect. Implications and recommendations for advancing future research and theoretical works in this area are considered and presented.
Tavakol, Najmeh; Kheiri, Soleiman; Sedehi, Morteza
2016-01-01
Time to donating blood plays a major role in a regular donor to becoming continues one. The aim of this study was to determine the effective factors on the interval between the blood donations. In a longitudinal study in 2008, 864 samples of first-time donors in Shahrekord Blood Transfusion Center, capital city of Chaharmahal and Bakhtiari Province, Iran were selected by a systematic sampling and were followed up for five years. Among these samples, a subset of 424 donors who had at least two successful blood donations were chosen for this study and the time intervals between their donations were measured as response variable. Sex, body weight, age, marital status, education, stay and job were recorded as independent variables. Data analysis was performed based on log-normal hazard model with gamma correlated frailty. In this model, the frailties are sum of two independent components assumed a gamma distribution. The analysis was done via Bayesian approach using Markov Chain Monte Carlo algorithm by OpenBUGS. Convergence was checked via Gelman-Rubin criteria using BOA program in R. Age, job and education were significant on chance to donate blood (P<0.05). The chances of blood donation for the higher-aged donors, clericals, workers, free job, students and educated donors were higher and in return, time intervals between their blood donations were shorter. Due to the significance effect of some variables in the log-normal correlated frailty model, it is necessary to plan educational and cultural program to encourage the people with longer inter-donation intervals to donate more frequently.
Niazi, Nabeel K; Bishop, Thomas F A; Singh, Balwant
2011-12-15
This study investigated the spatial variability of total and phosphate-extractable arsenic (As) concentrations in soil adjacent to a cattle-dip site, employing a linear mixed model-based geostatistical approach. The soil samples in the study area (n = 102 in 8.1 m(2)) were taken at the nodes of a 0.30 × 0.35 m grid. The results showed that total As concentration (0-0.2 m depth) and phosphate-extractable As concentration (at depths of 0-0.2, 0.2-0.4, and 0.4-0.6 m) in soil adjacent to the dip varied greatly. Both total and phosphate-extractable soil As concentrations significantly (p = 0.004-0.048) increased toward the cattle-dip. Using the linear mixed model, we suggest that 5 samples are sufficient to assess a dip site for soil (As) contamination (95% confidence interval of ±475.9 mg kg(-1)), but 15 samples (95% confidence interval of ±212.3 mg kg(-1)) is desirable baseline when the ultimate goal is to evaluate the effects of phytoremediation. Such guidelines on sampling requirements are crucial for the assessment of As contamination levels at other cattle-dip sites, and to determine the effect of phytoremediation on soil As.
Mathematical modeling to predict residential solid waste generation.
Benítez, Sara Ojeda; Lozano-Olvera, Gabriela; Morelos, Raúl Adalberto; Vega, Carolina Armijo de
2008-01-01
One of the challenges faced by waste management authorities is determining the amount of waste generated by households in order to establish waste management systems, as well as trying to charge rates compatible with the principle applied worldwide, and design a fair payment system for households according to the amount of residential solid waste (RSW) they generate. The goal of this research work was to establish mathematical models that correlate the generation of RSW per capita to the following variables: education, income per household, and number of residents. This work was based on data from a study on generation, quantification and composition of residential waste in a Mexican city in three stages. In order to define prediction models, five variables were identified and included in the model. For each waste sampling stage a different mathematical model was developed, in order to find the model that showed the best linear relation to predict residential solid waste generation. Later on, models to explore the combination of included variables and select those which showed a higher R(2) were established. The tests applied were normality, multicolinearity and heteroskedasticity. Another model, formulated with four variables, was generated and the Durban-Watson test was applied to it. Finally, a general mathematical model is proposed to predict residential waste generation, which accounts for 51% of the total.
Designing efficient nitrous oxide sampling strategies in agroecosystems using simulation models
NASA Astrophysics Data System (ADS)
Saha, Debasish; Kemanian, Armen R.; Rau, Benjamin M.; Adler, Paul R.; Montes, Felipe
2017-04-01
Annual cumulative soil nitrous oxide (N2O) emissions calculated from discrete chamber-based flux measurements have unknown uncertainty. We used outputs from simulations obtained with an agroecosystem model to design sampling strategies that yield accurate cumulative N2O flux estimates with a known uncertainty level. Daily soil N2O fluxes were simulated for Ames, IA (corn-soybean rotation), College Station, TX (corn-vetch rotation), Fort Collins, CO (irrigated corn), and Pullman, WA (winter wheat), representing diverse agro-ecoregions of the United States. Fertilization source, rate, and timing were site-specific. These simulated fluxes surrogated daily measurements in the analysis. We ;sampled; the fluxes using a fixed interval (1-32 days) or a rule-based (decision tree-based) sampling method. Two types of decision trees were built: a high-input tree (HI) that included soil inorganic nitrogen (SIN) as a predictor variable, and a low-input tree (LI) that excluded SIN. Other predictor variables were identified with Random Forest. The decision trees were inverted to be used as rules for sampling a representative number of members from each terminal node. The uncertainty of the annual N2O flux estimation increased along with the fixed interval length. A 4- and 8-day fixed sampling interval was required at College Station and Ames, respectively, to yield ±20% accuracy in the flux estimate; a 12-day interval rendered the same accuracy at Fort Collins and Pullman. Both the HI and the LI rule-based methods provided the same accuracy as that of fixed interval method with up to a 60% reduction in sampling events, particularly at locations with greater temporal flux variability. For instance, at Ames, the HI rule-based and the fixed interval methods required 16 and 91 sampling events, respectively, to achieve the same absolute bias of 0.2 kg N ha-1 yr-1 in estimating cumulative N2O flux. These results suggest that using simulation models along with decision trees can reduce the cost and improve the accuracy of the estimations of cumulative N2O fluxes using the discrete chamber-based method.
2012-01-01
Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998
NASA Astrophysics Data System (ADS)
Kusrini, Elisa; Subagyo; Aini Masruroh, Nur
2016-01-01
This research is a sequel of the author's earlier conducted researches in the fields of designing of integrated performance measurement between supply chain's actors and regulator. In the previous paper, the design of performance measurement is done by combining Balanced Scorecard - Supply Chain Operation Reference - Regulator Contribution model and Data Envelopment Analysis. This model referred as B-S-Rc-DEA model. The combination has the disadvantage that all the performance variables have the same weight. This paper investigates whether by giving weight to performance variables will produce more sensitive performance measurement in detecting performance improvement. Therefore, this paper discusses the development of the model B-S-Rc-DEA by giving weight to its performance'variables. This model referred as Scale B-S-Rc-DEA model. To illustrate the model of development, some samples from small medium enterprises of leather craft industry supply chain in province of Yogyakarta, Indonesia are used in this research. It is found that Scale B-S-Rc-DEA model is more sensitive to detecting performance improvement than B-S- Rc-DEA model.
Gustafson, William Jr; Vogelmann, Andrew; Endo, Satoshi; Toto, Tami; Xiao, Heng; Li, Zhijin; Cheng, Xiaoping; Kim, Jinwon; Krishna, Bhargavi
2015-08-31
The Alpha 2 release is the second release from the LASSO Pilot Phase that builds upon the Alpha 1 release. Alpha 2 contains additional diagnostics in the data bundles and focuses on cases from spring-summer 2016. A data bundle is a unified package consisting of LASSO LES input and output, observations, evaluation diagnostics, and model skill scores. LES input include model configuration information and forcing data. LES output includes profile statistics and full domain fields of cloud and environmental variables. Model evaluation data consists of LES output and ARM observations co-registered on the same grid and sampling frequency. Model performance is quantified by skill scores and diagnostics in terms of cloud and environmental variables.
Soil nutrient-landscape relationships in a lowland tropical rainforest in Panama
Barthold, F.K.; Stallard, R.F.; Elsenbeer, H.
2008-01-01
Soils play a crucial role in biogeochemical cycles as spatially distributed sources and sinks of nutrients. Any spatial patterns depend on soil forming processes, our understanding of which is still limited, especially in regards to tropical rainforests. The objective of our study was to investigate the effects of landscape properties, with an emphasis on the geometry of the land surface, on the spatial heterogeneity of soil chemical properties, and to test the suitability of soil-landscape modeling as an appropriate technique to predict the spatial variability of exchangeable K and Mg in a humid tropical forest in Panama. We used a design-based, stratified sampling scheme to collect soil samples at 108 sites on Barro Colorado Island, Panama. Stratifying variables are lithology, vegetation and topography. Topographic variables were generated from high-resolution digital elevation models with a grid size of 5 m. We took samples from five depths down to 1 m, and analyzed for total and exchangeable K and Mg. We used simple explorative data analysis techniques to elucidate the importance of lithology for soil total and exchangeable K and Mg. Classification and Regression Trees (CART) were adopted to investigate importance of topography, lithology and vegetation for the spatial distribution of exchangeable K and Mg and with the intention to develop models that regionalize the point observations using digital terrain data as explanatory variables. Our results suggest that topography and vegetation do not control the spatial distribution of the selected soil chemical properties at a landscape scale and lithology is important to some degree. Exchangeable K is distributed equally across the study area indicating that other than landscape processes, e.g. biogeochemical processes, are responsible for its spatial distribution. Lithology contributes to the spatial variation of exchangeable Mg but controlling variables could not be detected. The spatial variation of soil total K and Mg is mainly influenced by lithology. ?? 2007 Elsevier B.V. All rights reserved.
A Contigency Model for Predicting Institutionalization of Innovation Across Divergent Organizations.
ERIC Educational Resources Information Center
Howes, Nancy J.
This study was undertaken to compare the variables related to the successful institutionalization of changes across divergent organizations, and to design, through cross-validation, an interorganization model of change. Descriptive survey questionnaires and structured interviews were the instruments used. The respondent sample consisted of 1,500…
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
USDA-ARS?s Scientific Manuscript database
Waterborne pathogens were detected in 96% of samples collected at three Lake Michigan beaches during the summer of 2010. Linear regression models were developed to explore environmental factors that may be influential for pathogen prevalence. Simulation of pathogen concentration using these models, ...
Optimal Experimental Design for Model Discrimination
ERIC Educational Resources Information Center
Myung, Jay I.; Pitt, Mark A.
2009-01-01
Models of a psychological process can be difficult to discriminate experimentally because it is not easy to determine the values of the critical design variables (e.g., presentation schedule, stimulus structure) that will be most informative in differentiating them. Recent developments in sampling-based search methods in statistics make it…
An Investigation of Calculus Learning Using Factorial Modeling.
ERIC Educational Resources Information Center
Dick, Thomas P.; Balomenos, Richard H.
Structural covariance models that would explain the correlations observed among mathematics achievement and participation measures and related cognitive and affective variables were developed. A sample of college calculus students (N=268; 124 females and 144 males) was administered a battery of cognitive tests (including measures of spatial-visual…
Modeling Noisy Data with Differential Equations Using Observed and Expected Matrices
ERIC Educational Resources Information Center
Deboeck, Pascal R.; Boker, Steven M.
2010-01-01
Complex intraindividual variability observed in psychology may be well described using differential equations. It is difficult, however, to apply differential equation models in psychological contexts, as time series are frequently short, poorly sampled, and have large proportions of measurement and dynamic error. Furthermore, current methods for…
NASA Astrophysics Data System (ADS)
Hu, Haixin
This dissertation consists of two parts. The first part studies the sample selection and spatial models of housing price index using transaction data on detached single-family houses of two California metropolitan areas from 1990 through 2008. House prices are often spatially correlated due to shared amenities, or when the properties are viewed as close substitutes in a housing submarket. There have been many studies that address spatial correlation in the context of housing markets. However, none has used spatial models to construct housing price indexes at zip code level for the entire time period analyzed in this dissertation to the best of my knowledge. In this paper, I study a first-order autoregressive spatial model with four different weighing matrix schemes. Four sets of housing price indexes are constructed accordingly. Gatzlaff and Haurin (1997, 1998) study the sample selection problem in housing index by using Heckman's two-step method. This method, however, is generally inefficient and can cause multicollinearity problem. Also, it requires data on unsold houses in order to carry out the first-step probit regression. Maximum likelihood (ML) method can be used to estimate a truncated incidental model which allows one to correct for sample selection based on transaction data only. However, convergence problem is very prevalent in practice. In this paper I adopt Lewbel's (2007) sample selection correction method which does not require one to model or estimate the selection model, except for some very general assumptions. I then extend this method to correct for spatial correlation. In the second part, I analyze the U.S. gasoline market with a disequilibrium model that allows lagged-latent variables, endogenous prices, and panel data with fixed effects. Most existing studies (see the survey of Espey, 1998, Energy Economics) of the gasoline market assume equilibrium. In practice, however, prices do not always adjust fast enough to clear the market. Equilibrium assumptions greatly simplify statistical inference, but are very restrictive and can produce conflicting estimates. For example, econometric models of markets that assume equilibrium often produce more elastic demand price elasticity than their disequilibrium counterparts (Holt and Johnson, 1989, Review of Economics and Statistics, Oczkowski, 1998, Economics Letters). The few studies that allow disequilibrium, however, have been limited to macroeconomic time-series data without lagged-latent variables. While time series data allows one to investigate national trends, it cannot be used to identify and analyze regional differences and the role of local markets. Exclusion of the lagged-latent variables is also undesirable because such variables capture adjustment costs and inter-temporal spillovers. Simulation methods offer tractable solutions to dynamic and panel data disequilibrium models (Lee, 1997, Journal of Econometrics), but assume normally distributed errors. This paper compares estimates of price/income elasticity and excess supply/demand across time periods, regions, and model specifications, using both equilibrium and disequilibrium methods. In the equilibrium model, I compare the within group estimator with Anderson and Hsiao's first-difference 2SLS estimator. In the disequilibrium model, I extend Amemiya's 2SLS by using Newey's efficient estimator with optimal instruments.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much.
He, Bryan; De Sa, Christopher; Mitliagkas, Ioannis; Ré, Christopher
2016-01-01
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much
He, Bryan; De Sa, Christopher; Mitliagkas, Ioannis; Ré, Christopher
2016-01-01
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance. PMID:28344429
NASA Astrophysics Data System (ADS)
Collatz, G. J.; Kawa, S. R.; Liu, Y.; Zeng, F.; Ivanoff, A.
2013-12-01
We evaluate our understanding of the land biospheric carbon cycle by benchmarking a model and its variants to atmospheric CO2 observations and to an atmospheric CO2 inversion. Though the seasonal cycle in CO2 observations is well simulated by the model (RMSE/standard deviation of observations <0.5 at most sites north of 15N and <1 for Southern Hemisphere sites) different model setups suggest that the CO2 seasonal cycle provides some constraint on gross photosynthesis, respiration, and fire fluxes revealed in the amplitude and phase at northern latitude sites. CarbonTracker inversions (CT) and model show similar phasing of the seasonal fluxes but agreement in the amplitude varies by region. We also evaluate interannual variability (IAV) in the measured atmospheric CO2 which, in contrast to the seasonal cycle, is not well represented by the model. We estimate the contributions of biospheric and fire fluxes, and atmospheric transport variability to explaining observed variability in measured CO2. Comparisons with CT show that modeled IAV has some correspondence to the inversion results >40N though fluxes match poorly at regional to continental scales. Regional and global fire emissions are strongly correlated with variability observed at northern flask sample sites and in the global atmospheric CO2 growth rate though in the latter case fire emissions anomalies are not large enough to account fully for the observed variability. We discuss remaining unexplained variability in CO2 observations in terms of the representation of fluxes by the model. This work also demonstrates the limitations of the current network of CO2 observations and the potential of new denser surface measurements and space based column measurements for constraining carbon cycle processes in models.
Drivers of Variability in Public-Supply Water Use Across the Contiguous United States
NASA Astrophysics Data System (ADS)
Worland, Scott C.; Steinschneider, Scott; Hornberger, George M.
2018-03-01
This study explores the relationship between municipal water use and an array of climate, economic, behavioral, and policy variables across the contiguous U.S. The relationship is explored using Bayesian-hierarchical regression models for over 2,500 counties, 18 covariates, and three higher-level grouping variables. Additionally, a second analysis is included for 83 cities where water price and water conservation policy information is available. A hierarchical model using the nine climate regions (product of National Oceanic and Atmospheric Administration) as the higher-level groups results in the best out-of-sample performance, as estimated by the Widely Available Information Criterion, compared to counties grouped by urban continuum classification or primary economic activity. The regression coefficients indicate that the controls on water use are not uniform across the nation: e.g., counties in the Northeast and Northwest climate regions are more sensitive to social variables, whereas counties in the Southwest and East North Central climate regions are more sensitive to environmental variables. For the national city-level model, it appears that arid cities with a high cost of living and relatively low water bills sell more water per customer, but as with the county-level model, the effect of each variable depends heavily on where a city is located.
Cerda, Gamal; Pérez, Carlos; Navarro, José I; Aguilar, Manuel; Casas, José A; Aragón, Estíbaliz
2015-01-01
This study tested a structural model of cognitive-emotional explanatory variables to explain performance in mathematics. The predictor variables assessed were related to students' level of development of early mathematical competencies (EMCs), specifically, relational and numerical competencies, predisposition toward mathematics, and the level of logical intelligence in a population of primary school Chilean students (n = 634). This longitudinal study also included the academic performance of the students during a period of 4 years as a variable. The sampled students were initially assessed by means of an Early Numeracy Test, and, subsequently, they were administered a Likert-type scale to measure their predisposition toward mathematics (EPMAT) and a basic test of logical intelligence. The results of these tests were used to analyse the interaction of all the aforementioned variables by means of a structural equations model. This combined interaction model was able to predict 64.3% of the variability of observed performance. Preschool students' performance in EMCs was a strong predictor for achievement in mathematics for students between 8 and 11 years of age. Therefore, this paper highlights the importance of EMCs and the modulating role of predisposition toward mathematics. Also, this paper discusses the educational role of these findings, as well as possible ways to improve negative predispositions toward mathematical tasks in the school domain.
Nikodelis, Thomas; Moscha, Dimitra; Metaxiotis, Dimitris; Kollias, Iraklis
2011-08-01
To investigate what sampling frequency is adequate for gait, the correlation of spatiotemporal parameters and the kinematic differences, between normal and CP spastic gait, for three sampling frequencies (100 Hz, 50 Hz, 25 Hz) were assessed. Spatiotemporal, angular, and linear displacement variables in the sagittal plane along with their 1st and 2nd derivatives were analyzed. Spatiotemporal stride parameters were highly correlated among the three sampling frequencies. The statistical model (2 × 3 ANOVA) gave no interactions between the factors group and frequency, indicating that group differences were invariant of sampling frequency. Lower frequencies led to smoother curves for all the variables, with a loss of information though, especially for the 2nd derivatives, having a homologous effect as the one of oversmoothing. It is proposed that in the circumstance that only spatiotemporal stride parameters, as well as angular and linear displacements are to be used, in gait reports, then commercial video camera speeds (25/30 Hz, 50/60 Hz when deinterlaced) can be considered as a low-cost solution to produce acceptable results.
Structural Equation Models in a Redundancy Analysis Framework With Covariates.
Lovaglio, Pietro Giorgio; Vittadini, Giorgio
2014-01-01
A recent method to specify and fit structural equation modeling in the Redundancy Analysis framework based on so-called Extended Redundancy Analysis (ERA) has been proposed in the literature. In this approach, the relationships between the observed exogenous variables and the observed endogenous variables are moderated by the presence of unobservable composites, estimated as linear combinations of exogenous variables. However, in the presence of direct effects linking exogenous and endogenous variables, or concomitant indicators, the composite scores are estimated by ignoring the presence of the specified direct effects. To fit structural equation models, we propose a new specification and estimation method, called Generalized Redundancy Analysis (GRA), allowing us to specify and fit a variety of relationships among composites, endogenous variables, and external covariates. The proposed methodology extends the ERA method, using a more suitable specification and estimation algorithm, by allowing for covariates that affect endogenous indicators indirectly through the composites and/or directly. To illustrate the advantages of GRA over ERA we propose a simulation study of small samples. Moreover, we propose an application aimed at estimating the impact of formal human capital on the initial earnings of graduates of an Italian university, utilizing a structural model consistent with well-established economic theory.
NASA Technical Reports Server (NTRS)
Brown, A. M.
1998-01-01
Accounting for the statistical geometric and material variability of structures in analysis has been a topic of considerable research for the last 30 years. The determination of quantifiable measures of statistical probability of a desired response variable, such as natural frequency, maximum displacement, or stress, to replace experience-based "safety factors" has been a primary goal of these studies. There are, however, several problems associated with their satisfactory application to realistic structures, such as bladed disks in turbomachinery. These include the accurate definition of the input random variables (rv's), the large size of the finite element models frequently used to simulate these structures, which makes even a single deterministic analysis expensive, and accurate generation of the cumulative distribution function (CDF) necessary to obtain the probability of the desired response variables. The research presented here applies a methodology called probabilistic dynamic synthesis (PDS) to solve these problems. The PDS method uses dynamic characteristics of substructures measured from modal test as the input rv's, rather than "primitive" rv's such as material or geometric uncertainties. These dynamic characteristics, which are the free-free eigenvalues, eigenvectors, and residual flexibility (RF), are readily measured and for many substructures, a reasonable sample set of these measurements can be obtained. The statistics for these rv's accurately account for the entire random character of the substructure. Using the RF method of component mode synthesis, these dynamic characteristics are used to generate reduced-size sample models of the substructures, which are then coupled to form system models. These sample models are used to obtain the CDF of the response variable by either applying Monte Carlo simulation or by generating data points for use in the response surface reliability method, which can perform the probabilistic analysis with an order of magnitude less computational effort. Both free- and forced-response analyses have been performed, and the results indicate that, while there is considerable room for improvement, the method produces usable and more representative solutions for the design of realistic structures with a substantial savings in computer time.
Exploratory reconstructability analysis of accident TBI data
NASA Astrophysics Data System (ADS)
Zwick, Martin; Carney, Nancy; Nettleton, Rosemary
2018-02-01
This paper describes the use of reconstructability analysis to perform a secondary study of traumatic brain injury data from automobile accidents. Neutral searches were done and their results displayed with a hypergraph. Directed searches, using both variable-based and state-based models, were applied to predict performance on two cognitive tests and one neurological test. Very simple state-based models gave large uncertainty reductions for all three DVs and sizeable improvements in percent correct for the two cognitive test DVs which were equally sampled. Conditional probability distributions for these models are easily visualized with simple decision trees. Confounding variables and counter-intuitive findings are also reported.
Corbière, Marc; Zaniboni, Sara; Lecomte, Tania; Bond, Gary; Gilles, Pierre-Yves; Lesage, Alain; Goldner, Elliot
2011-09-01
The main purpose of this study was to test a conceptual model based on the theory of planned behaviour (TPB) to explain competitive job acquisition of people with severe mental disorders enrolled in supported employment programs. Using a sample of 281 people with severe mental disorders participating in a prospective study design, the authors examined the contribution of the TPB in a model including clinical (e.g., severity of symptoms), psychosocial (e.g., self-esteem) and work related variables (e.g., length of time absent from the workplace) as predictors of job acquisition. Path analyses were used to test two conceptual models: (1) the model of job acquisition for people with mental illness adapted from the TPB, and (2) the extended TPB including clinical, psychosocial, and work related variables recognized in the literature as significant determinants of competitive employment. Findings revealed that both models presented good fit indices. In total, individual factors predicted 26% of the variance in job search behaviours (behavioural actions). However, client characteristics explained only 8% of variance in work outcomes, suggesting that environmental variables (e.g., stigma towards mental disorders) play an important role in predicting job acquisition. About 56% (N = 157) of our sample obtained competitive employment. Results suggest that employment specialists can be guided in their interventions by the concepts found in the extended model of work integration since most of these are modifiable, such as perceived barriers to employment, self-efficacy, and self-esteem.
Combining climatic and soil properties better predicts covers of Brazilian biomes.
Arruda, Daniel M; Fernandes-Filho, Elpídio I; Solar, Ricardo R C; Schaefer, Carlos E G R
2017-04-01
Several techniques have been used to model the area covered by biomes or species. However, most models allow little freedom of choice of response variables and are conditioned to the use of climate predictors. This major restriction of the models has generated distributions of low accuracy or inconsistent with the actual cover. Our objective was to characterize the environmental space of the most representative biomes of Brazil and predict their cover, using climate and soil-related predictors. As sample units, we used 500 cells of 100 km 2 for ten biomes, derived from the official vegetation map of Brazil (IBGE 2004). With a total of 38 (climatic and soil-related) predictors, an a priori model was run with the random forest classifier. Each biome was calibrated with 75% of the samples. The final model was based on four climate and six soil-related predictors, the most important variables for the a priori model, without collinearity. The model reached a kappa value of 0.82, generating a highly consistent prediction with the actual cover of the country. We showed here that the richness of biomes should not be underestimated, and that in spite of the complex relationship, highly accurate modeling based on climatic and soil-related predictors is possible. These predictors are complementary, for covering different parts of the multidimensional niche. Thus, a single biome can cover a wide range of climatic space, versus a narrow range of soil types, so that its prediction is best adjusted by soil-related variables, or vice versa.
Combining climatic and soil properties better predicts covers of Brazilian biomes
NASA Astrophysics Data System (ADS)
Arruda, Daniel M.; Fernandes-Filho, Elpídio I.; Solar, Ricardo R. C.; Schaefer, Carlos E. G. R.
2017-04-01
Several techniques have been used to model the area covered by biomes or species. However, most models allow little freedom of choice of response variables and are conditioned to the use of climate predictors. This major restriction of the models has generated distributions of low accuracy or inconsistent with the actual cover. Our objective was to characterize the environmental space of the most representative biomes of Brazil and predict their cover, using climate and soil-related predictors. As sample units, we used 500 cells of 100 km2 for ten biomes, derived from the official vegetation map of Brazil (IBGE 2004). With a total of 38 (climatic and soil-related) predictors, an a priori model was run with the random forest classifier. Each biome was calibrated with 75% of the samples. The final model was based on four climate and six soil-related predictors, the most important variables for the a priori model, without collinearity. The model reached a kappa value of 0.82, generating a highly consistent prediction with the actual cover of the country. We showed here that the richness of biomes should not be underestimated, and that in spite of the complex relationship, highly accurate modeling based on climatic and soil-related predictors is possible. These predictors are complementary, for covering different parts of the multidimensional niche. Thus, a single biome can cover a wide range of climatic space, versus a narrow range of soil types, so that its prediction is best adjusted by soil-related variables, or vice versa.
Individual differences in attention influence perceptual decision making.
Nunez, Michael D; Srinivasan, Ramesh; Vandekerckhove, Joachim
2015-01-01
Sequential sampling decision-making models have been successful in accounting for reaction time (RT) and accuracy data in two-alternative forced choice tasks. These models have been used to describe the behavior of populations of participants, and explanatory structures have been proposed to account for between individual variability in model parameters. In this study we show that individual differences in behavior from a novel perceptual decision making task can be attributed to (1) differences in evidence accumulation rates, (2) differences in variability of evidence accumulation within trials, and (3) differences in non-decision times across individuals. Using electroencephalography (EEG), we demonstrate that these differences in cognitive variables, in turn, can be explained by attentional differences as measured by phase-locking of steady-state visual evoked potential (SSVEP) responses to the signal and noise components of the visual stimulus. Parameters of a cognitive model (a diffusion model) were obtained from accuracy and RT distributions and related to phase-locking indices (PLIs) of SSVEPs with a single step in a hierarchical Bayesian framework. Participants who were able to suppress the SSVEP response to visual noise in high frequency bands were able to accumulate correct evidence faster and had shorter non-decision times (preprocessing or motor response times), leading to more accurate responses and faster response times. We show that the combination of cognitive modeling and neural data in a hierarchical Bayesian framework relates physiological processes to the cognitive processes of participants, and that a model with a new (out-of-sample) participant's neural data can predict that participant's behavior more accurately than models without physiological data.
Shahyad, Shima; Pakdaman, Shahla; Shokri, Omid; Saadat, Seyed Hassan
2018-01-12
The aim of the present study was to examine the causal relationships between psychological and social factors, being independent variables and body image dissatisfaction plus symptoms of eating disorders as dependent variables through the mediation of social comparison and thin-ideal internalization. To conduct the study, 477 high-school students from Tehran were recruited by method of cluster sampling. Next, they filled out Rosenberg Self-esteem Scale (RSES), Physical Appearance Comparison Scale (PACS), Self-Concept Clarity Scale (SCCS), Appearance Perfectionism Scale (APS), Eating Disorder Inventory (EDI), Multidimensional Body Self Relations Questionnaire (MBSRQ) and Sociocultural Attitudes towards Appearance Questionnaire (SATAQ-4). In the end, collected data were analyzed using structural equation modeling. Findings showed that the assumed model perfectly fitted the data after modification and as a result, all the path-coefficients of latent variables (except for the path between self-esteem and thin-ideal internalization) were statistically significant (p>0.05). Also, in this model, 75% of scores' distribution of body dissatisfaction was explained through psychological variables, socio-cultural variables, social comparison and internalization of the thin ideal. The results of the present study provid experimental basis for the confirmation of proposed causal model. The combination of psychological, social and cultural variables could efficiently predict body image dissatisfaction of young girls in Iran.
An Analysis of School Principals' Listening Skills According to Teacher Feedback
ERIC Educational Resources Information Center
Yavuz, Mustafa
2010-01-01
This study investigates school principals' listening skills according to teacher feedback in terms of a number of variables. The study is conducted according to a general survey model. The sample consists of 477 elementary, general and vocational secondary school teachers working in Konya, Turkey, in the 2007-2008 education year. The sample was…
Estimation and applications of size-biased distributions in forestry
Jeffrey H. Gove
2003-01-01
Size-biased distributions arise naturally in several contexts in forestry and ecology. Simple power relationships (e.g. basal area and diameter at breast height) between variables are one such area of interest arising from a modelling perspective. Another, probability proportional to size PPS) sampling, is found in the most widely used methods for sampling standing or...
Assessment of metabolic phenotypic variability in children’s urine using 1H NMR spectroscopy
NASA Astrophysics Data System (ADS)
Maitre, Léa; Lau, Chung-Ho E.; Vizcaino, Esther; Robinson, Oliver; Casas, Maribel; Siskos, Alexandros P.; Want, Elizabeth J.; Athersuch, Toby; Slama, Remy; Vrijheid, Martine; Keun, Hector C.; Coen, Muireann
2017-04-01
The application of metabolic phenotyping in clinical and epidemiological studies is limited by a poor understanding of inter-individual, intra-individual and temporal variability in metabolic phenotypes. Using 1H NMR spectroscopy we characterised short-term variability in urinary metabolites measured from 20 children aged 8-9 years old. Daily spot morning, night-time and pooled (50:50 morning and night-time) urine samples across six days (18 samples per child) were analysed, and 44 metabolites quantified. Intraclass correlation coefficients (ICC) and mixed effect models were applied to assess the reproducibility and biological variance of metabolic phenotypes. Excellent analytical reproducibility and precision was demonstrated for the 1H NMR spectroscopic platform (median CV 7.2%). Pooled samples captured the best inter-individual variability with an ICC of 0.40 (median). Trimethylamine, N-acetyl neuraminic acid, 3-hydroxyisobutyrate, 3-hydroxybutyrate/3-aminoisobutyrate, tyrosine, valine and 3-hydroxyisovalerate exhibited the highest stability with over 50% of variance specific to the child. The pooled sample was shown to capture the most inter-individual variance in the metabolic phenotype, which is of importance for molecular epidemiology study design. A substantial proportion of the variation in the urinary metabolome of children is specific to the individual, underlining the potential of such data to inform clinical and exposome studies conducted early in life.
Using atmospheric 14CO to constrain OH variability: concept and potential for future measurements
NASA Astrophysics Data System (ADS)
Petrenko, V. V.; Murray, L. T.; Smith, A. W.
2017-12-01
The primary source of 14C-containing carbon monoxide (14CO) in the atmosphere is via 14C production from 14N by secondary cosmic rays, and the primary sink is removal by OH. Variations in the global abundance of 14CO that are not explained by variations in 14C production are mainly driven by variations in the global abundance of OH. Monitoring OH variability via methyl chloroform is becoming increasingly difficult as methyl chloroform abundance is continuing to decline. Measurements of atmospheric 14CO have previously been successfully used to infer OH variability. However, these measurements are currently only continuing at one location (Baring Head, New Zealand), which is insufficient to infer global trends. We propose to restart global 14CO monitoring with the aim of providing another constraint on OH variability. A new analytical system for 14CO sampling and measurements is in development, which will allow to strongly reduce the required sample air volumes (previously ≥ 400 L) and simplify field logistics. A set of test measurements is planned, with sampling at the Mauna Loa Observatory. Preliminary work with a state-of-the-art chemical transport model is identifying the most promising locations for global 14CO sampling.
Zhang, Peng; Parenteau, Chantal; Wang, Lu; Holcombe, Sven; Kohoyda-Inglis, Carla; Sullivan, June; Wang, Stewart
2013-11-01
This study resulted in a model-averaging methodology that predicts crash injury risk using vehicle, demographic, and morphomic variables and assesses the importance of individual predictors. The effectiveness of this methodology was illustrated through analysis of occupant chest injuries in frontal vehicle crashes. The crash data were obtained from the International Center for Automotive Medicine (ICAM) database for calendar year 1996 to 2012. The morphomic data are quantitative measurements of variations in human body 3-dimensional anatomy. Morphomics are obtained from imaging records. In this study, morphomics were obtained from chest, abdomen, and spine CT using novel patented algorithms. A NASS-trained crash investigator with over thirty years of experience collected the in-depth crash data. There were 226 cases available with occupants involved in frontal crashes and morphomic measurements. Only cases with complete recorded data were retained for statistical analysis. Logistic regression models were fitted using all possible configurations of vehicle, demographic, and morphomic variables. Different models were ranked by the Akaike Information Criteria (AIC). An averaged logistic regression model approach was used due to the limited sample size relative to the number of variables. This approach is helpful when addressing variable selection, building prediction models, and assessing the importance of individual variables. The final predictive results were developed using this approach, based on the top 100 models in the AIC ranking. Model-averaging minimized model uncertainty, decreased the overall prediction variance, and provided an approach to evaluating the importance of individual variables. There were 17 variables investigated: four vehicle, four demographic, and nine morphomic. More than 130,000 logistic models were investigated in total. The models were characterized into four scenarios to assess individual variable contribution to injury risk. Scenario 1 used vehicle variables; Scenario 2, vehicle and demographic variables; Scenario 3, vehicle and morphomic variables; and Scenario 4 used all variables. AIC was used to rank the models and to address over-fitting. In each scenario, the results based on the top three models and the averages of the top 100 models were presented. The AIC and the area under the receiver operating characteristic curve (AUC) were reported in each model. The models were re-fitted after removing each variable one at a time. The increases of AIC and the decreases of AUC were then assessed to measure the contribution and importance of the individual variables in each model. The importance of the individual variables was also determined by their weighted frequencies of appearance in the top 100 selected models. Overall, the AUC was 0.58 in Scenario 1, 0.78 in Scenario 2, 0.76 in Scenario 3 and 0.82 in Scenario 4. The results showed that morphomic variables are as accurate at predicting injury risk as demographic variables. The results of this study emphasize the importance of including morphomic variables when assessing injury risk. The results also highlight the need for morphomic data in the development of human mathematical models when assessing restraint performance in frontal crashes, since morphomic variables are more "tangible" measurements compared to demographic variables such as age and gender. Copyright © 2013 Elsevier Ltd. All rights reserved.
Lietz, A.C.
2002-01-01
The acoustic Doppler current profiler (ADCP) and acoustic Doppler velocity meter (ADVM) were used to estimate constituent concentrations and loads at a sampling site along the Hendry-Collier County boundary in southwestern Florida. The sampling site is strategically placed within a highly managed canal system that exhibits low and rapidly changing water conditions. With the ADCP and ADVM, flow can be gaged more accurately rather than by conventional field-data collection methods. An ADVM velocity rating relates measured velocity determined by the ADCP (dependent variable) with the ADVM velocity (independent variable) by means of regression analysis techniques. The coefficient of determination (R2) for this rating is 0.99 at the sampling site. Concentrations and loads of total phosphorus, total Kjeldahl nitrogen, and total nitrogen (dependent variables) were related to instantaneous discharge, acoustic backscatter, stage, or water temperature (independent variables) recorded at the time of sampling. Only positive discharges were used for this analysis. Discharges less than 100 cubic feet per second generally are considered inaccurate (probably as a result of acoustic ray bending and vertical temperature gradients in the water column). Of the concentration models, only total phosphorus was statistically significant at the 95-percent confidence level (p-value less than 0.05). Total phosphorus had an adjusted R2 of 0.93, indicating most of the variation in the concentration can be explained by the discharge. All of the load models for total phosphorus, total Kjeldahl nitrogen, and total nitrogen were statistically significant. Most of the variation in load can be explained by the discharge as reflected in the adjusted R2 for total phosphorus (0.98), total Kjeldahl nitrogen (0.99), and total nitrogen (0.99).
Active subspace uncertainty quantification for a polydomain ferroelectric phase-field model
NASA Astrophysics Data System (ADS)
Leon, Lider S.; Smith, Ralph C.; Miles, Paul; Oates, William S.
2018-03-01
Quantum-informed ferroelectric phase field models capable of predicting material behavior, are necessary for facilitating the development and production of many adaptive structures and intelligent systems. Uncertainty is present in these models, given the quantum scale at which calculations take place. A necessary analysis is to determine how the uncertainty in the response can be attributed to the uncertainty in the model inputs or parameters. A second analysis is to identify active subspaces within the original parameter space, which quantify directions in which the model response varies most dominantly, thus reducing sampling effort and computational cost. In this investigation, we identify an active subspace for a poly-domain ferroelectric phase-field model. Using the active variables as our independent variables, we then construct a surrogate model and perform Bayesian inference. Once we quantify the uncertainties in the active variables, we obtain uncertainties for the original parameters via an inverse mapping. The analysis provides insight into how active subspace methodologies can be used to reduce computational power needed to perform Bayesian inference on model parameters informed by experimental or simulated data.
Prediction of Ba, Mn and Zn for tropical soils using iron oxides and magnetic susceptibility
NASA Astrophysics Data System (ADS)
Marques Júnior, José; Arantes Camargo, Livia; Reynaldo Ferracciú Alleoni, Luís; Tadeu Pereira, Gener; De Bortoli Teixeira, Daniel; Santos Rabelo de Souza Bahia, Angelica
2017-04-01
Agricultural activity is an important source of potentially toxic elements (PTEs) in soil worldwide but particularly in heavily farmed areas. Spatial distribution characterization of PTE contents in farming areas is crucial to assess further environmental impacts caused by soil contamination. Designing prediction models become quite useful to characterize the spatial variability of continuous variables, as it allows prediction of soil attributes that might be difficult to attain in a large number of samples through conventional methods. This study aimed to evaluate, in three geomorphic surfaces of Oxisols, the capacity for predicting PTEs (Ba, Mn, Zn) and their spatial variability using iron oxides and magnetic susceptibility (MS). Soil samples were collected from three geomorphic surfaces and analyzed for chemical, physical, mineralogical properties, as well as magnetic susceptibility (MS). PTE prediction models were calibrated by multiple linear regression (MLR). MLR calibration accuracy was evaluated using the coefficient of determination (R2). PTE spatial distribution maps were built using the values calculated by the calibrated models that reached the best accuracy by means of geostatistics. The high correlations between the attributes clay, MS, hematite (Hm), iron oxides extracted by sodium dithionite-citrate-bicarbonate (Fed), and iron oxides extracted using acid ammonium oxalate (Feo) with the elements Ba, Mn, and Zn enabled them to be selected as predictors for PTEs. Stepwise multiple linear regression showed that MS and Fed were the best PTE predictors individually, as they promoted no significant increase in R2 when two or more attributes were considered together. The MS-calibrated models for Ba, Mn, and Zn prediction exhibited R2 values of 0.88, 0.66, and 0.55, respectively. These are promising results since MS is a fast, cheap, and non-destructive tool, allowing the prediction of a large number of samples, which in turn enables detailed mapping of large areas. MS predicted values enabled the characterization and the understanding of spatial variability of the studied PTEs.
Newman, Andrea K; Van Dyke, Benjamin P; Torres, Calia A; Baxter, Jacob W; Eyer, Joshua C; Kapoor, Shweta; Thorn, Beverly E
2017-09-01
Chronic pain is a pervasive condition that is complicated by economic, educational, and racial disparities. This study analyzes key factors associated with chronic pain within an understudied and underserved population. The sample is characterized by a triple disparity with respect to income, education/literacy, and racial barriers that substantially increase the vulnerability to the negative consequences of chronic pain. The study examined the pretreatment data of 290 participants enrolled in the Learning About My Pain trial, a randomized controlled comparative effectiveness trial of psychosocial interventions (B.E.T., Principal Investigator, Patient-Centered Outcomes Research Institute Contract No. 941; clinicaltrials.gov identifier NCT01967342) for chronic pain. Hierarchical multiple regression analyses evaluated the relationships among sociodemographic (sex, age, race, poverty status, literacy, and education level) and psychological (depressive symptoms and pain catastrophizing) variables and pain interference, pain severity, and disability. The indirect effects of depressive symptoms and pain catastrophizing on the sociodemographic and pain variables were investigated using bootstrap resampling. Reversed mediation models were also examined. Results suggested that the experience of chronic pain within this low-income sample is better accounted for by psychological factors than sex, age, race, poverty status, literacy, and education level. Depressive symptoms and pain catastrophizing mediated the relationships between age and pain variables, whereas pain catastrophizing mediated the effects of primary literacy and poverty status. Some reversed models were equivalent to the hypothesized models, suggesting the possibility of bidirectionality. Although cross-sectional findings cannot establish causality, our results highlight the critical role psychological factors play in individuals with chronic pain and multiple health disparities.
ERIC Educational Resources Information Center
Sandler, Martin E.
The effects of selected variables on the academic persistence of adult students were examined in a study of a random sample of 469 adult students aged 24 years or older enrolled in a four-year college. The survey questionnaire, the Adult Student Experiences Survey, collected data regarding 12 endogenous variables and 13 exogenous variables…
ERIC Educational Resources Information Center
Frisby, Craig L.; Wang, Ze
2016-01-01
Data from the standardization sample of the Woodcock-Johnson Psychoeducational Battery--Third Edition (WJ III) Cognitive standard battery and Test Session Observation Checklist items were analyzed to understand the relationship between g (general mental ability) and test session behavior (TSB; n = 5,769). Latent variable modeling methods were used…
State-space modeling of population sizes and trends in Nihoa Finch and Millerbird
Gorresen, P. Marcos; Brinck, Kevin W.; Camp, Richard J.; Farmer, Chris; Plentovich, Sheldon M.; Banko, Paul C.
2016-01-01
Both of the 2 passerines endemic to Nihoa Island, Hawai‘i, USA—the Nihoa Millerbird (Acrocephalus familiaris kingi) and Nihoa Finch (Telespiza ultima)—are listed as endangered by federal and state agencies. Their abundances have been estimated by irregularly implemented fixed-width strip-transect sampling from 1967 to 2012, from which area-based extrapolation of the raw counts produced highly variable abundance estimates for both species. To evaluate an alternative survey method and improve abundance estimates, we conducted variable-distance point-transect sampling between 2010 and 2014. We compared our results to those obtained from strip-transect samples. In addition, we applied state-space models to derive improved estimates of population size and trends from the legacy time series of strip-transect counts. Both species were fairly evenly distributed across Nihoa and occurred in all or nearly all available habitat. Population trends for Nihoa Millerbird were inconclusive because of high within-year variance. Trends for Nihoa Finch were positive, particularly since the early 1990s. Distance-based analysis of point-transect counts produced mean estimates of abundance similar to those from strip-transects but was generally more precise. However, both survey methods produced biologically unrealistic variability between years. State-space modeling of the long-term time series of abundances obtained from strip-transect counts effectively reduced uncertainty in both within- and between-year estimates of population size, and allowed short-term changes in abundance trajectories to be smoothed into a long-term trend.
Child-related cognitions and affective functioning of physically abusive and comparison parents.
Haskett, Mary E; Smith Scott, Susan; Grant, Raven; Ward, Caryn Sabourin; Robinson, Canby
2003-06-01
The goal of this research was to utilize the cognitive behavioral model of abusive parenting to select and examine risk factors to illuminate the unique and combined influences of social cognitive and affective variables in predicting abuse group membership. Participants included physically abusive parents (n=56) and a closely-matched group of comparison parents (n=62). Social cognitive risk variables measured were (a) parent's expectations for children's abilities and maturity, (b) parental attributions of intentionality of child misbehavior, and (c) parents' perceptions of their children's adjustment. Affective risk variables included (a) psychopathology and (b) parenting stress. A series of logistic regression models were constructed to test the individual, combined, and interactive effects of risk variables on abuse group membership. The full set of five risk variables was predictive of abuse status; however, not all variables were predictive when considered individually and interactions did not contribute significantly to prediction. A risk composite score computed for each parent based on the five risk variables significantly predicted abuse status. Wide individual differences in risk across the five variables were apparent within the sample of abusive parents. Findings were generally consistent with a cognitive behavioral model of abuse, with cognitive variables being more salient in predicting abuse status than affective factors. Results point to the importance of considering diversity in characteristics of abusive parents.
Binford, Michael W.; Lee, Tae Jeong; Townsend, Robert M.
2004-01-01
Environmental variability is an important risk factor in rural agricultural communities. Testing models requires empirical sampling that generates data that are representative in both economic and ecological domains. Detrended correspondence analysis of satellite remote sensing data were used to design an effective low-cost sampling protocol for a field study to create an integrated socioeconomic and ecological database when no prior information on ecology of the survey area existed. We stratified the sample for the selection of tambons from various preselected provinces in Thailand based on factor analysis of spectral land-cover classes derived from satellite data. We conducted the survey for the sampled villages in the chosen tambons. The resulting data capture interesting variations in soil productivity and in the timing of good and bad years, which a purely random sample would likely have missed. Thus, this database will allow tests of hypotheses concerning the effect of credit on productivity, the sharing of idiosyncratic risks, and the economic influence of environmental variability. PMID:15254298
Successful aging in Spanish older adults: the role of psychosocial resources.
Dumitrache, Cristina G; Rubio, Laura; Cordón-Pozo, Eulogio
2018-05-25
ABSTRACTBackground:Psychological and social resources such as extraversion, optimism, social support, or social networks contribute to adaptation and to successful aging. Building on assumptions derived from successful aging and from the developmental adaptation models, this study aims to analyze the joint impact of different psychosocial resources, such as personality, social relations, health, and socio-demographic characteristics on life satisfaction in a group of people aged 65 years-old and older from Spain. A cross-sectional survey using non-proportional quota sampling was carried out. The sample comprised 406 community-dwelling older adults (M = 74.88, SD = 6.75). In order to collect the data, face-to-face interviews were individually conducted. A structural equation model (SEM) was carried out using the PLS software. The results of the SEM model showed that, within this sample, psychosocial variables explain 47.4% of the variance in life satisfaction. Social relations and personality, specifically optimism, were strongly related with life satisfaction, while health status and socio-demographic characteristics were modestly associated with life satisfaction. Findings support the view that psychosocial resources are important for successful aging and therefore should be included in successful aging models. Furthermore, interventions aimed at fostering successful aging should take into account the role of psychosocial variables.
Prieto, Carlos; Dahners, Hans W.
2009-01-01
Coexistence by a great number of species could reflect niche segregation at several resource axes. Differences in the use of a hilltop as mating site for a Eumaeini (Lycaenidae) community were measured to test whether niche segregation exists within this group. Specimens were collected throughout 21 samplings between July-October of 2004 and July-October of 2005. Two environmental variables and three temporal-spacial variables were analyzed utilizing null models with three randomization algorithms. Significant differences were found among the species with respect to utilization of vertical space, horizontal space, temporary distribution and environmental temperature. The species did not show significant differences with respect to light intensity. For all samplings, the niche overlap observed in the two environmental variables were higher or significantly higher than expected by chance, suggesting that niche segregation does not exist due to competition within these variables. Similar results were observed for temporal distribution. Some evidence of niche segregation was found in vertical space and horizontal space variables where some samples presented lower overlap than expected by chance. The results pointed out that community's assemblage could be mainly shaped in two ways. The first is that species with determined habitat requirements fit into unoccupied niche spaces. The second is by niche segregation in the vertical space distribution variable. PMID:19613456
Using maximum entropy modeling for optimal selection of sampling sites for monitoring networks
Stohlgren, Thomas J.; Kumar, Sunil; Barnett, David T.; Evangelista, Paul H.
2011-01-01
Environmental monitoring programs must efficiently describe state shifts. We propose using maximum entropy modeling to select dissimilar sampling sites to capture environmental variability at low cost, and demonstrate a specific application: sample site selection for the Central Plains domain (453,490 km2) of the National Ecological Observatory Network (NEON). We relied on four environmental factors: mean annual temperature and precipitation, elevation, and vegetation type. A “sample site” was defined as a 20 km × 20 km area (equal to NEON’s airborne observation platform [AOP] footprint), within which each 1 km2 cell was evaluated for each environmental factor. After each model run, the most environmentally dissimilar site was selected from all potential sample sites. The iterative selection of eight sites captured approximately 80% of the environmental envelope of the domain, an improvement over stratified random sampling and simple random designs for sample site selection. This approach can be widely used for cost-efficient selection of survey and monitoring sites.
Baritaux, Jean-Charles; Simon, Anne-Catherine; Schultz, Emmanuelle; Emain, C; Laurent, P; Dinten, Jean-Marc
2016-05-01
We report on our recent efforts towards identifying bacteria in environmental samples by means of Raman spectroscopy. We established a database of Raman spectra from bacteria submitted to various environmental conditions. This dataset was used to verify that Raman typing is possible from measurements performed in non-ideal conditions. Starting from the same dataset, we then varied the phenotype and matrix diversity content included in the reference library used to train the statistical model. The results show that it is possible to obtain models with an extended coverage of spectral variabilities, compared to environment-specific models trained on spectra from a restricted set of conditions. Broad coverage models are desirable for environmental samples since the exact conditions of the bacteria cannot be controlled.
Validation of Metrics as Error Predictors
NASA Astrophysics Data System (ADS)
Mendling, Jan
In this chapter, we test the validity of metrics that were defined in the previous chapter for predicting errors in EPC business process models. In Section 5.1, we provide an overview of how the analysis data is generated. Section 5.2 describes the sample of EPCs from practice that we use for the analysis. Here we discuss a disaggregation by the EPC model group and by error as well as a correlation analysis between metrics and error. Based on this sample, we calculate a logistic regression model for predicting error probability with the metrics as input variables in Section 5.3. In Section 5.4, we then test the regression function for an independent sample of EPC models from textbooks as a cross-validation. Section 5.5 summarizes the findings.
Johnelle Sparks, P
2009-11-01
To examine disparities in low birthweight using a diverse set of racial/ethnic categories and a nationally representative sample. This research explored the degree to which sociodemographic characteristics, health care access, maternal health status, and health behaviors influence birthweight disparities among seven racial/ethnic groups. Binary logistic regression models were estimated using a nationally representative sample of singleton, normal for gestational age births from 2001 using the ECLS-B, which has an approximate sample size of 7,800 infants. The multiple variable models examine disparities in low birthweight (LBW) for seven racial/ethnic groups, including non-Hispanic white, non-Hispanic black, U.S.-born Mexican-origin Hispanic, foreign-born Mexican-origin Hispanic, other Hispanic, Native American, and Asian mothers. Race-stratified logistic regression models were also examined. In the full sample models, only non-Hispanic black mothers have a LBW disadvantage compared to non-Hispanic white mothers. Maternal WIC usage was protective against LBW in the full models. No prenatal care and adequate plus prenatal care increase the odds of LBW. In the race-stratified models, prenatal care adequacy and high maternal health risks are the only variables that influence LBW for all racial/ethnic groups. The race-stratified models highlight the different mechanism important across the racial/ethnic groups in determining LBW. Differences in the distribution of maternal sociodemographic, health care access, health status, and behavior characteristics by race/ethnicity demonstrate that a single empirical framework may distort associations with LBW for certain racial and ethnic groups. More attention must be given to the specific mechanisms linking maternal risk factors to poor birth outcomes for specific racial/ethnic groups.
Landguth, Erin L.; Gedy, Bradley C.; Oyler-McCance, Sara J.; Garey, Andrew L.; Emel, Sarah L.; Mumma, Matthew; Wagner, Helene H.; Fortin, Marie-Josée; Cushman, Samuel A.
2012-01-01
The influence of study design on the ability to detect the effects of landscape pattern on gene flow is one of the most pressing methodological gaps in landscape genetic research. To investigate the effect of study design on landscape genetics inference, we used a spatially-explicit, individual-based program to simulate gene flow in a spatially continuous population inhabiting a landscape with gradual spatial changes in resistance to movement. We simulated a wide range of combinations of number of loci, number of alleles per locus and number of individuals sampled from the population. We assessed how these three aspects of study design influenced the statistical power to successfully identify the generating process among competing hypotheses of isolation-by-distance, isolation-by-barrier, and isolation-by-landscape resistance using a causal modelling approach with partial Mantel tests. We modelled the statistical power to identify the generating process as a response surface for equilibrium and non-equilibrium conditions after introduction of isolation-by-landscape resistance. All three variables (loci, alleles and sampled individuals) affect the power of causal modelling, but to different degrees. Stronger partial Mantel r correlations between landscape distances and genetic distances were found when more loci were used and when loci were more variable, which makes comparisons of effect size between studies difficult. Number of individuals did not affect the accuracy through mean equilibrium partial Mantel r, but larger samples decreased the uncertainty (increasing the precision) of equilibrium partial Mantel r estimates. We conclude that amplifying more (and more variable) loci is likely to increase the power of landscape genetic inferences more than increasing number of individuals.
Landguth, E.L.; Fedy, B.C.; Oyler-McCance, S.J.; Garey, A.L.; Emel, S.L.; Mumma, M.; Wagner, H.H.; Fortin, M.-J.; Cushman, S.A.
2012-01-01
The influence of study design on the ability to detect the effects of landscape pattern on gene flow is one of the most pressing methodological gaps in landscape genetic research. To investigate the effect of study design on landscape genetics inference, we used a spatially-explicit, individual-based program to simulate gene flow in a spatially continuous population inhabiting a landscape with gradual spatial changes in resistance to movement. We simulated a wide range of combinations of number of loci, number of alleles per locus and number of individuals sampled from the population. We assessed how these three aspects of study design influenced the statistical power to successfully identify the generating process among competing hypotheses of isolation-by-distance, isolation-by-barrier, and isolation-by-landscape resistance using a causal modelling approach with partial Mantel tests. We modelled the statistical power to identify the generating process as a response surface for equilibrium and non-equilibrium conditions after introduction of isolation-by-landscape resistance. All three variables (loci, alleles and sampled individuals) affect the power of causal modelling, but to different degrees. Stronger partial Mantel r correlations between landscape distances and genetic distances were found when more loci were used and when loci were more variable, which makes comparisons of effect size between studies difficult. Number of individuals did not affect the accuracy through mean equilibrium partial Mantel r, but larger samples decreased the uncertainty (increasing the precision) of equilibrium partial Mantel r estimates. We conclude that amplifying more (and more variable) loci is likely to increase the power of landscape genetic inferences more than increasing number of individuals. ?? 2011 Blackwell Publishing Ltd.
O’Shea, Thomas J.; Bowen, Richard A.; Stanley, Thomas R.; Shankar, Vidya; Rupprecht, Charles E.
2014-01-01
In 2001–2005 we sampled permanently marked big brown bats (Eptesicus fuscus) at summer roosts in buildings at Fort Collins, Colorado, for rabies virus neutralizing antibodies (RVNA). Seroprevalence was higher in adult females (17.9%, n = 2,332) than males (9.4%, n = 128; P = 0.007) or volant juveniles (10.2%, n = 738; P<0.0001). Seroprevalence was lowest in a drought year with local insecticide use and highest in the year with normal conditions, suggesting that environmental stress may suppress RVNA production in big brown bats. Seroprevalence also increased with age of bat, and varied from 6.2 to 26.7% among adult females at five roosts sampled each year for five years. Seroprevalence of adult females at 17 other roosts sampled for 1 to 4 years ranged from 0.0 to 47.1%. Using logistic regression, the only ranking model in our candidate set of explanatory variables for serological status at first sampling included year, day of season, and a year by day of season interaction that varied with relative drought conditions. The presence or absence of antibodies in individual bats showed temporal variability. Year alone provided the best model to explain the likelihood of adult female bats showing a transition to seronegative from a previously seropositive state. Day of the season was the only competitive model to explain the likelihood of a transition from seronegative to seropositive, which increased as the season progressed. We found no rabies viral RNA in oropharyngeal secretions of 261 seropositive bats or in organs of 13 euthanized seropositive bats. Survival of seropositive and seronegative bats did not differ. The presence of RVNA in serum of bats should not be interpreted as evidence for ongoing rabies infection.
Seabloom, William; Seabloom, Mary E; Seabloom, Eric; Barron, Robert; Hendrickson, Sharon
2003-08-01
The study determines the effectiveness of a sexuality-positive adolescent sexual offender treatment program and examines subsequent criminal recidivism in the three outcome groups (completed, withdrawn, referred). The sample consists of 122 adolescent males and their families (491 individuals). Of the demographic variables, only living situation was significant, such that patients living with parents were more likely to graduate. None of the behavioral variables were found to be significant. Of the treatment variables, length of time in the program and participation in the Family Journey Seminar were included in the final model. When they were included in the model, no other treatment variable were significantly related to probability of graduation. There were no arrests or convictions for sex-related crimes in the population of participants that successfully completed the program. This group was also less likely than the other groups to be arrested (p = 0.014) or convicted (p = 0.004) across all crime categories.
Stephens, Christine; Noone, Jack; Alpass, Fiona
2014-01-01
This study tested the effects of social network engagement and social support on the health of older people moving into retirement, using a model which includes social context variables. A prospective survey of a New Zealand population sample aged 54-70 at baseline (N = 2,282) was used to assess the effects on mental and physical health across time. A structural equation model assessed pathways from the social context variables through network engagement to social support and then to mental and physical health 2 years later. The proposed model of effects on mental health was supported when gender, economic living standards, and ethnicity were included along with the direct effects of these variables on social support. These findings confirm the importance of taking social context variables into account when considering social support networks. Social engagement appears to be an important aspect of social network functioning which could be investigated further.
Allen, Stephanie L.; Duku, Eric; Vaillancourt, Tracy; Szatmari, Peter; Bryson, Susan; Fombonne, Eric; Volden, Joanne; Waddell, Charlotte; Zwaigenbaum, Lonnie; Roberts, Wendy; Mirenda, Pat; Bennett, Teresa; Elsabbagh, Mayada; Georgiades, Stelios
2015-01-01
Objective The factor structure and validity of the Behavioral Pediatrics Feeding Assessment Scale (BPFAS; Crist & Napier-Phillips, 2001) were examined in preschoolers with autism spectrum disorder (ASD). Methods Confirmatory factor analysis was used to examine the original BPFAS five-factor model, the fit of each latent variable, and a rival one-factor model. None of the models was adequate, thus a categorical exploratory factor analysis (CEFA) was conducted. Correlations were used to examine relations between the BPFAS and concurrent variables of interest. Results The CEFA identified an acceptable three-factor model. Correlational analyses indicated that feeding problems were positively related to parent-reported autism symptoms, behavior problems, sleep problems, and parenting stress, but largely unrelated to performance-based indices of autism symptom severity, language, and cognitive abilities, as well as child age. Conclusion These results provide evidence supporting the use of the identified BPFAS three-factor model for samples of young children with ASD. PMID:25725217
Altweck, Laura; Marshall, Tara C; Ferenczi, Nelli; Lefringhausen, Katharina
2015-01-01
Many families worldwide have at least one member with a behavioral or mental disorder, and yet the majority of the public fails to correctly recognize symptoms of mental illness. Previous research has found that Mental Health Literacy (MHL)-the knowledge and positive beliefs about mental disorders-tends to be higher in European and North American cultures, compared to Asian and African cultures. Nonetheless quantitative research examining the variables that explain this cultural difference remains limited. The purpose of our study was fourfold: (a) to validate measures of MHL cross-culturally, (b) to examine the MHL model quantitatively, (c) to investigate cultural differences in the MHL model, and (d) to examine collectivism as a predictor of MHL. We validated measures of MHL in European American and Indian samples. The results lend strong quantitative support to the MHL model. Recognition of symptoms of mental illness was a central variable: greater recognition predicted greater endorsement of social causes of mental illness and endorsement of professional help-seeking as well as lesser endorsement of lay help-seeking. The MHL model also showed an overwhelming cultural difference; namely, lay help-seeking beliefs played a central role in the Indian sample, and a negligible role in the European American sample. Further, collectivism was positively associated with causal beliefs of mental illness in the European American sample, and with lay help-seeking beliefs in the Indian sample. These findings demonstrate the importance of understanding cultural differences in beliefs about mental illness, particularly in relation to help-seeking beliefs.
Altweck, Laura; Marshall, Tara C.; Ferenczi, Nelli; Lefringhausen, Katharina
2015-01-01
Many families worldwide have at least one member with a behavioral or mental disorder, and yet the majority of the public fails to correctly recognize symptoms of mental illness. Previous research has found that Mental Health Literacy (MHL)—the knowledge and positive beliefs about mental disorders—tends to be higher in European and North American cultures, compared to Asian and African cultures. Nonetheless quantitative research examining the variables that explain this cultural difference remains limited. The purpose of our study was fourfold: (a) to validate measures of MHL cross-culturally, (b) to examine the MHL model quantitatively, (c) to investigate cultural differences in the MHL model, and (d) to examine collectivism as a predictor of MHL. We validated measures of MHL in European American and Indian samples. The results lend strong quantitative support to the MHL model. Recognition of symptoms of mental illness was a central variable: greater recognition predicted greater endorsement of social causes of mental illness and endorsement of professional help-seeking as well as lesser endorsement of lay help-seeking. The MHL model also showed an overwhelming cultural difference; namely, lay help-seeking beliefs played a central role in the Indian sample, and a negligible role in the European American sample. Further, collectivism was positively associated with causal beliefs of mental illness in the European American sample, and with lay help-seeking beliefs in the Indian sample. These findings demonstrate the importance of understanding cultural differences in beliefs about mental illness, particularly in relation to help-seeking beliefs. PMID:26441699
Havens, Karl E; Harwell, Matthew C; Brady, Mark A; Sharfstein, Bruce; East, Therese L; Rodusky, Andrew J; Anson, Daniel; Maki, Ryan P
2002-04-09
A spatially intensive sampling program was developed for mapping the submerged aquatic vegetation (SAV) over an area of approximately 20,000 ha in a large, shallow lake in Florida, U.S. The sampling program integrates Geographic Information System (GIS) technology with traditional field sampling of SAV and has the capability of producing robust vegetation maps under a wide range of conditions, including high turbidity, variable depth (0 to 2 m), and variable sediment types. Based on sampling carried out in August-September 2000, we measured 1,050 to 4,300 ha of vascular SAV species and approximately 14,000 ha of the macroalga Chara spp. The results were similar to those reported in the early 1990s, when the last large-scale SAV sampling occurred. Occurrence of Chara was strongly associated with peat sediments, and maximal depths of occurrence varied between sediment types (mud, sand, rock, and peat). A simple model of Chara occurrence, based only on water depth, had an accuracy of 55%. It predicted occurrence of Chara over large areas where the plant actually was not found. A model based on sediment type and depth had an accuracy of 75% and produced a spatial map very similar to that based on observations. While this approach needs to be validated with independent data in order to test its general utility, we believe it may have application elsewhere. The simple modeling approach could serve as a coarse-scale tool for evaluating effects of water level management on Chara populations.
Toma, Luiza; Stott, Alistair W; Heffernan, Claire; Ringrose, Siân; Gunn, George J
2013-03-01
The paper analyses the impact of a priori determinants of biosecurity behaviour of farmers in Great Britain. We use a dataset collected through a stratified telephone survey of 900 cattle and sheep farmers in Great Britain (400 in England and a further 250 in Wales and Scotland respectively) which took place between 25 March 2010 and 18 June 2010. The survey was stratified by farm type, farm size and region. To test the influence of a priori determinants on biosecurity behaviour we used a behavioural economics method, structural equation modelling (SEM) with observed and latent variables. SEM is a statistical technique for testing and estimating causal relationships amongst variables, some of which may be latent using a combination of statistical data and qualitative causal assumptions. Thirteen latent variables were identified and extracted, expressing the behaviour and the underlying determining factors. The variables were: experience, economic factors, organic certification of farm, membership in a cattle/sheep health scheme, perceived usefulness of biosecurity information sources, knowledge about biosecurity measures, perceived importance of specific biosecurity strategies, perceived effect (on farm business in the past five years) of welfare/health regulation, perceived effect of severe outbreaks of animal diseases, attitudes towards livestock biosecurity, attitudes towards animal welfare, influence on decision to apply biosecurity measures and biosecurity behaviour. The SEM model applied on the Great Britain sample has an adequate fit according to the measures of absolute, incremental and parsimonious fit. The results suggest that farmers' perceived importance of specific biosecurity strategies, organic certification of farm, knowledge about biosecurity measures, attitudes towards animal welfare, perceived usefulness of biosecurity information sources, perceived effect on business during the past five years of severe outbreaks of animal diseases, membership in a cattle/sheep health scheme, attitudes towards livestock biosecurity, influence on decision to apply biosecurity measures, experience and economic factors are significantly influencing behaviour (overall explaining 64% of the variance in behaviour). Three other models were run for the individual regions (England, Scotland and Wales). A smaller number of variables were included in each model to account for the smaller sample sizes. Results show lower but still high levels of variance explained for the individual models (about 40% for each country). The individual models' results are consistent with those of the total sample model. The results might suggest that ways to achieve behavioural change could include ensuring increased access of farmers to biosecurity information and advice sources. Copyright © 2012 Elsevier B.V. All rights reserved.
Ajala, E O; Aberuagba, F; Olaniyan, A M; Onifade, K R
2016-01-01
Shea butter (SB) was extracted from its kernel by using n-hexane as solvent in an optimization study. This was to determine the optima operating variables that would give optimum yield of SB and to study the effect of solvent on the physico-chemical properties and chemical composition of SB extracted using n-hexane. A Box-behnken response surface methodology (RSM) was used for the optimization study while statistical analysis using ANOVA was used to test the significance of the variables for the process. The variables considered for this study were: sample weight (g), solvent volume (ml) and extraction time (min). The physico-chemical properties of SB extracted were determined using standard methods and Fourier Transform Infrared Spectroscopy (FTIR) for the chemical composition. The results of RSM analysis showed that the three variables investigated have significant effect (p < 0.05) on the %yield of SB, with R(2) - 0.8989 which showed good fitness of a second-order model. Based on this model, optima operating variables for the extraction process were established as: sample weight of 30.04 g, solvent volume of 346.04 ml and extraction time of 40 min, which gave 66.90 % yield of SB. Furthermore, the result of the physico-chemical properties obtained for the shea butter extracted using traditional method (SBT) showed that it is a more suitable raw material for food, biodiesel production, cosmetics, medicinal and pharmaceutical purposes than shea butter extracted using solvent extraction method (SBS). Fourier Transform Infrared Spectroscopy (FTIR) results obtained for the two samples were similar to what was obtainable from other vegetable oil.
A simulation study on Bayesian Ridge regression models for several collinearity levels
NASA Astrophysics Data System (ADS)
Efendi, Achmad; Effrihan
2017-12-01
When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.
New scaling model for variables and increments with heavy-tailed distributions
NASA Astrophysics Data System (ADS)
Riva, Monica; Neuman, Shlomo P.; Guadagnini, Alberto
2015-06-01
Many hydrological (as well as diverse earth, environmental, ecological, biological, physical, social, financial and other) variables, Y, exhibit frequency distributions that are difficult to reconcile with those of their spatial or temporal increments, ΔY. Whereas distributions of Y (or its logarithm) are at times slightly asymmetric with relatively mild peaks and tails, those of ΔY tend to be symmetric with peaks that grow sharper, and tails that become heavier, as the separation distance (lag) between pairs of Y values decreases. No statistical model known to us captures these behaviors of Y and ΔY in a unified and consistent manner. We propose a new, generalized sub-Gaussian model that does so. We derive analytical expressions for probability distribution functions (pdfs) of Y and ΔY as well as corresponding lead statistical moments. In our model the peak and tails of the ΔY pdf scale with lag in line with observed behavior. The model allows one to estimate, accurately and efficiently, all relevant parameters by analyzing jointly sample moments of Y and ΔY. We illustrate key features of our new model and method of inference on synthetically generated samples and neutron porosity data from a deep borehole.
Preliminary assessment of factors influencing riverine fish communities in Massachusetts.
Armstrong, David S.; Richards, Todd A.; Brandt, Sara L.
2010-01-01
The U.S. Geological Survey, in cooperation with the Massachusetts Department of Conservation and Recreation (MDCR), Massachusetts Department of Environmental Protection (MDEP), and the Massachusetts Department of Fish and Game (MDFG), conducted a preliminary investigation of fish communities in small- to medium-sized Massachusetts streams. The objective of this investigation was to determine relations between fish-community characteristics and anthropogenic alteration, including flow alteration and impervious cover, relative to the effect of physical basin and land-cover (environmental) characteristics. Fish data were obtained for 756 fish-sampling sites from the Massachusetts Division of Fisheries and Wildlife fish-community database. A review of the literature was used to select a set of fish metrics responsive to flow alteration. Fish metrics tested include two fish-community metrics (fluvial-fish relative abundance and fluvial-fish species richness), and five indicator species metrics (relative abundance of brook trout, blacknose dace, fallfish, white sucker, and redfin pickerel). Streamflows were simulated for each fish-sampling site using the Sustainable Yield Estimator application (SYE). Daily streamflows and the SYE water-use database were used to determine a set of indicators of flow alteration, including percent alteration of August median flow, water-use intensity, and withdrawal and return-flow fraction. The contributing areas to the fish-sampling sites were delineated and used with a Geographic Information System (GIS) to determine a set of environmental characteristics, including elevation, basin slope, percent sand and gravel, percent wetland, and percent open water, and a set of anthropogenic-alteration variables, including impervious cover and dam density. Two analytical techniques, quantile regression and generalized linear modeling, were applied to determine the association between fish-response variables and the selected environmental and anthropogenic explanatory variables. Quantile regression indicated that flow alteration and impervious cover were negatively associated with both fluvial-fish relative abundance and fluvial-fish species richness. Three generalized linear models (GLMs) were developed to quantify the response of fish communities to multiple environmental and anthropogenic variables. Flow-alteration variables are statistically significant for the fluvial-fish relative-abundance model. Impervious cover is statistically significant for the fluvial-fish relative-abundance, fluvial-fish species richness, and brook trout relative-abundance models. The variables in the equations were demonstrated to be significant, and the variability explained by the models, as measured by the correlation between observed and predicted values, ranges from 39 to 65 percent. The GLM models indicated that, keeping all other variables the same, a one-unit (1 percent) increase in the percent depletion or percent surcharging of August median flow would result in a 0.4-percent decrease in the relative abundance (in counts per hour) of fluvial fish and that the relative abundance of fluvial fish was expected to be about 55 percent lower in net-depleted streams than in net-surcharged streams. The GLM models also indicated that a unit increase in impervious cover resulted in a 5.5-percent decrease in the relative abundance of fluvial fish and a 2.5-percent decrease in fluvial-fish species richness.
Werneke, Mark W; Edmond, Susan; Deutscher, Daniel; Ward, Jason; Grigsby, David; Young, Michelle; McGill, Troy; McClenahan, Brian; Weinberg, Jon; Davidow, Amy L
2016-09-01
Study Design Retrospective cohort. Background Patient-classification subgroupings may be important prognostic factors explaining outcomes. Objectives To determine effects of adding classification variables (McKenzie syndrome and pain patterns, including centralization and directional preference; Symptom Checklist Back Pain Prediction Model [SCL BPPM]; and the Fear-Avoidance Beliefs Questionnaire subscales of work and physical activity) to a baseline risk-adjusted model predicting functional status (FS) outcomes. Methods Consecutive patients completed a battery of questionnaires that gathered information on 11 risk-adjustment variables. Physical therapists trained in Mechanical Diagnosis and Therapy methods classified each patient by McKenzie syndromes and pain pattern. Functional status was assessed at discharge by patient-reported outcomes. Only patients with complete data were included. Risk of selection bias was assessed. Prediction of discharge FS was assessed using linear stepwise regression models, allowing 13 variables to enter the model. Significant variables were retained in subsequent models. Model power (R(2)) and beta coefficients for model variables were estimated. Results Two thousand sixty-six patients with lumbar impairments were evaluated. Of those, 994 (48%), 10 (<1%), and 601 (29%) were excluded due to incomplete psychosocial data, McKenzie classification data, and missing FS at discharge, respectively. The final sample for analyses was 723 (35%). Overall R(2) for the baseline prediction FS model was 0.40. Adding classification variables to the baseline model did not result in significant increases in R(2). McKenzie syndrome or pain pattern explained 2.8% and 3.0% of the variance, respectively. When pain pattern and SCL BPPM were added simultaneously, overall model R(2) increased to 0.44. Although none of these increases in R(2) were significant, some classification variables were stronger predictors compared with some other variables included in the baseline model. Conclusion The small added prognostic capabilities identified when combining McKenzie or pain-pattern classifications with the SCL BPPM classification did not significantly improve prediction of FS outcomes in this study. Additional research is warranted to investigate the importance of classification variables compared with those used in the baseline model to maximize predictive power. Level of Evidence Prognosis, level 4. J Orthop Sports Phys Ther 2016;46(9):726-741. Epub 31 Jul 2016. doi:10.2519/jospt.2016.6266.
Lerdal, Anners; Kottorp, Anders; Gay, Caryl; Aouizerat, Bradley E; Portillo, Carmen J; Lee, Kathryn A
2011-11-01
To examine the psychometric properties of the 9-item Fatigue Severity Scale (FSS) using a Rasch model application. A convenience sample of HIV-infected adults was recruited, and a subset of the sample was assessed at 6-month intervals for 2 years. Socio-demographic, clinical, and symptom data were collected by self-report questionnaires. CD4 T-cell count and viral load measures were obtained from medical records. The Rasch analysis included 316 participants with 698 valid questionnaires. FSS item 2 did not advanced monotonically, and items 1 and 2 did not show acceptable goodness-of-fit to the Rasch model. A reduced FSS 7-item version demonstrated acceptable goodness-of-fit and explained 61.2% of the total variance in the scale. In the FSS-7 item version, no uniform Differential Item Functioning was found in relation to time of evaluation or to any of the socio-demographic or clinical variables. This study demonstrated that the FSS-7 has better psychometric properties than the FSS-9 in this HIV sample and that responses to the different items are comparable over time and unrelated to socio-demographic and clinical variables.
Zhang, Houxi; Zhuang, Shunyao; Qian, Haiyan; Wang, Feng; Ji, Haibao
2015-01-01
Understanding the spatial variability of soil organic carbon (SOC) must be enhanced to improve sampling design and to develop soil management strategies in terrestrial ecosystems. Moso bamboo (Phyllostachys pubescens Mazel ex Houz.) forests have a high SOC storage potential; however, they also vary significantly spatially. This study investigated the spatial variability of SOC (0-20 cm) in association with other soil properties and with spatial variables in the Moso bamboo forests of Jian’ou City, which is a typical bamboo hometown in China. 209 soil samples were collected from Moso bamboo stands and then analyzed for SOC, bulk density (BD), pH, cation exchange capacity (CEC), and gravel content (GC) based on spatial distribution. The spatial variability of SOC was then examined using geostatistics. A Kriging map was produced through ordinary interpolation and required sample numbers were calculated by classical and Kriging methods. An aggregated boosted tree (ABT) analysis was also conducted. A semivariogram analysis indicated that ln(SOC) was best fitted with an exponential model and that it exhibited moderate spatial dependence, with a nugget/sill ratio of 0.462. SOC was significantly and linearly correlated with BD (r = −0.373**), pH (r = −0.429**), GC (r = −0.163*), CEC (r = 0.263**), and elevation (r = 0.192**). Moreover, the Kriging method requires fewer samples than the classical method given an expected standard error level as per a variance analysis. ABT analysis indicated that the physicochemical variables of soil affected SOC variation more significantly than spatial variables did, thus suggesting that the SOC in Moso bamboo forests can be strongly influenced by management practices. Thus, this study provides valuable information in relation to sampling strategy and insight into the potential of adjustments in agronomic measure, such as in fertilization for Moso bamboo production. PMID:25789615
A search for optical variability of type 2 quasars in SDSS stripe 82
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barth, Aaron J.; Carson, Daniel J.; Voevodkin, Alexey
Hundreds of Type 2 quasars have been identified in Sloan Digital Sky Survey (SDSS) data, and there is substantial evidence that they are generally galaxies with highly obscured central engines, in accord with unified models for active galactic nuclei (AGNs). A straightforward expectation of unified models is that highly obscured Type 2 AGNs should show little or no optical variability on timescales of days to years. As a test of this prediction, we have carried out a search for variability in Type 2 quasars in SDSS Stripe 82 using difference-imaging photometry. Starting with the Type 2 AGN catalogs of Zakamskamore » et al. and Reyes et al., we find evidence of significant g-band variability in 17 out of 173 objects for which light curves could be measured from the Stripe 82 data. To determine the nature of this variability, we obtained new Keck spectropolarimetry observations for seven of these variable AGNs. The Keck data show that these objects have low continuum polarizations (p ≲ 1% in most cases) and all seven have broad Hα and/or Mg II emission lines in their total (unpolarized) spectra, indicating that they should actually be classified as Type 1 AGNs. We conclude that the primary reason variability is found in the SDSS-selected Type 2 AGN samples is that these samples contain a small fraction of Type 1 AGNs as contaminants, and it is not necessary to invoke more exotic possible explanations such as a population of 'naked' or unobscured Type 2 quasars. Aside from misclassified Type 1 objects, the Type 2 quasars do not generally show detectable optical variability over the duration of the Stripe 82 survey.« less
Effects of parceling on model selection: Parcel-allocation variability in model ranking.
Sterba, Sonya K; Rights, Jason D
2017-03-01
Research interest often lies in comparing structural model specifications implying different relationships among latent factors. In this context parceling is commonly accepted, assuming the item-level measurement structure is well known and, conservatively, assuming items are unidimensional in the population. Under these assumptions, researchers compare competing structural models, each specified using the same parcel-level measurement model. However, little is known about consequences of parceling for model selection in this context-including whether and when model ranking could vary across alternative item-to-parcel allocations within-sample. This article first provides a theoretical framework that predicts the occurrence of parcel-allocation variability (PAV) in model selection index values and its consequences for PAV in ranking of competing structural models. These predictions are then investigated via simulation. We show that conditions known to manifest PAV in absolute fit of a single model may or may not manifest PAV in model ranking. Thus, one cannot assume that low PAV in absolute fit implies a lack of PAV in ranking, and vice versa. PAV in ranking is shown to occur under a variety of conditions, including large samples. To provide an empirically supported strategy for selecting a model when PAV in ranking exists, we draw on relationships between structural model rankings in parcel- versus item-solutions. This strategy employs the across-allocation modal ranking. We developed software tools for implementing this strategy in practice, and illustrate them with an example. Even if a researcher has substantive reason to prefer one particular allocation, investigating PAV in ranking within-sample still provides an informative sensitivity analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Bogunović, Igor; Pereira, Paulo; Šeput, Miranda
2016-04-01
Soil organic carbon (SOC), pH, available phosphorus (P), and potassium (K) are some of the most important factors to soil fertility. These soil parameters are highly variable in space and time, with implications to crop production. The aim of this work is study the spatial variability of SOC, pH, P and K in an organic farm located in river Rasa valley (Croatia). A regular grid (100 x 100 m) was designed and 182 samples were collected on Silty Clay Loam soil. P, K and SOC showed moderate heterogeneity with coefficient of variation (CV) of 21.6%, 32.8% and 51.9%, respectively. Soil pH record low spatial variability with CV of 1.5%. Soil pH, P and SOC did not follow normal distribution. Only after a Box-Cox transformation, data respected the normality requirements. Directional exponential models were the best fitted and used to describe spatial autocorrelation. Soil pH, P and SOC showed strong spatial dependence with nugget to sill ratio with 13.78%, 0.00% and 20.29%, respectively. Only K recorded moderate spatial dependence. Semivariogram ranges indicate that future sampling interval could be 150 - 200 m in order to reduce sampling costs. Fourteen different interpolation models for mapping soil properties were tested. The method with lowest Root Mean Square Error was the most appropriated to map the variable. The results showed that radial basis function models (Spline with Tension and Completely Regularized Spline) for P and K were the best predictors, while Thin Plate Spline and inverse distance weighting models were the least accurate. The best interpolator for pH and SOC was the local polynomial with the power of 1, while the least accurate were Thin Plate Spline. According to soil nutrient maps investigated area record very rich supply with K while P supply was insufficient on largest part of area. Soil pH maps showed mostly neutral reaction while individual parts of alkaline soil indicate the possibility of penetration of seawater and salt accumulation in the soil profile. Future research should focus on spatial patterns on soil pH, electrical conductivity and sodium adsorption ratio. Keywords: geostatistics, semivariogram, interpolation models, soil chemical properties
Saad, David A.; Schwarz, Gregory E.; Robertson, Dale M.; Booth, Nathaniel
2011-01-01
Stream-loading information was compiled from federal, state, and local agencies, and selected universities as part of an effort to develop regional SPAtially Referenced Regressions On Watershed attributes (SPARROW) models to help describe the distribution, sources, and transport of nutrients in streams throughout much of the United States. After screening, 2,739 sites, sampled by 73 agencies, were identified as having suitable data for calculating long-term mean annual nutrient loads required for SPARROW model calibration. These sites had a wide range in nutrient concentrations, loads, and yields, and environmental characteristics in their basins. An analysis of the accuracy in load estimates relative to site attributes indicated that accuracy in loads improve with increases in the number of observations, the proportion of uncensored data, and the variability in flow on observation days, whereas accuracy declines with increases in the root mean square error of the water-quality model, the flow-bias ratio, the number of days between samples, the variability in daily streamflow for the prediction period, and if the load estimate has been detrended. Based on compiled data, all areas of the country had recent declines in the number of sites with sufficient water-quality data to compute accurate annual loads and support regional modeling analyses. These declines were caused by decreases in the number of sites being sampled and data not being entered in readily accessible databases.
Efficient robust doubly adaptive regularized regression with applications.
Karunamuni, Rohana J; Kong, Linglong; Tu, Wei
2018-01-01
We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.
NASA Astrophysics Data System (ADS)
Yu, Huiling; Liang, Hao; Lin, Xue; Zhang, Yizhuo
2018-04-01
A nondestructive methodology is proposed to determine the modulus of elasticity (MOE) of Fraxinus mandschurica samples by using near-infrared (NIR) spectroscopy. The test data consisted of 150 NIR absorption spectra of the wood samples obtained using an NIR spectrometer, with the wavelength range of 900 to 1900 nm. To eliminate the high-frequency noise and the systematic variations on the baseline, Savitzky-Golay convolution combined with standard normal variate and detrending transformation was applied as data pretreated methods. The uninformative variable elimination (UVE), improved by the evolutionary Monte Carlo (EMC) algorithm and successive projections algorithm (SPA) selected three characteristic variables from full 117 variables. The predictive ability of the models was evaluated concerning the root-mean-square error of prediction (RMSEP) and coefficient of determination (Rp2) in the prediction set. In comparison with the predicted results of all the models established in the experiments, UVE-EMC-SPA-LS-SVM presented the best results with the smallest RMSEP of 0.652 and the highest Rp2 of 0.887. Thus, it is feasible to determine the MOE of F. mandschurica using NIR spectroscopy accurately.
Incorporating social anxiety into a model of college problem drinking: replication and extension.
Ham, Lindsay S; Hope, Debra A
2006-09-01
Although research has found an association between social anxiety and alcohol use in noncollege samples, results have been mixed for college samples. College students face many novel social situations in which they may drink to reduce social anxiety. In the current study, the authors tested a model of college problem drinking, incorporating social anxiety and related psychosocial variables among 228 undergraduate volunteers. According to structural equation modeling (SEM) results, social anxiety was unrelated to alcohol use and was negatively related to drinking consequences. Perceived drinking norms mediated the social anxiety-alcohol use relation and was the variable most strongly associated with problem drinking. College students appear to be unique with respect to drinking and social anxiety. Although the notion of social anxiety alone as a risk factor for problem drinking was unsupported, additional research is necessary to determine whether there is a subset of socially anxious students who have high drinking norms and are in need of intervention. ((c) 2006 APA, all rights reserved).
Variable classification in the LSST era: exploring a model for quasi-periodic light curves
NASA Astrophysics Data System (ADS)
Zinn, J. C.; Kochanek, C. S.; Kozłowski, S.; Udalski, A.; Szymański, M. K.; Soszyński, I.; Wyrzykowski, Ł.; Ulaczyk, K.; Poleski, R.; Pietrukowicz, P.; Skowron, J.; Mróz, P.; Pawlak, M.
2017-06-01
The Large Synoptic Survey Telescope (LSST) is expected to yield ˜107 light curves over the course of its mission, which will require a concerted effort in automated classification. Stochastic processes provide one means of quantitatively describing variability with the potential advantage over simple light-curve statistics that the parameters may be physically meaningful. Here, we survey a large sample of periodic, quasi-periodic and stochastic Optical Gravitational Lensing Experiment-III variables using the damped random walk (DRW; CARMA(1,0)) and quasi-periodic oscillation (QPO; CARMA(2,1)) stochastic process models. The QPO model is described by an amplitude, a period and a coherence time-scale, while the DRW has only an amplitude and a time-scale. We find that the periodic and quasi-periodic stellar variables are generally better described by a QPO than a DRW, while quasars are better described by the DRW model. There are ambiguities in interpreting the QPO coherence time due to non-sinusoidal light-curve shapes, signal-to-noise ratio, error mischaracterizations and cadence. Higher order implementations of the QPO model that better capture light-curve shapes are necessary for the coherence time to have its implied physical meaning. Independent of physical meaning, the extra parameter of the QPO model successfully distinguishes most of the classes of periodic and quasi-periodic variables we consider.
Modeling workplace bullying using catastrophe theory.
Escartin, J; Ceja, L; Navarro, J; Zapf, D
2013-10-01
Workplace bullying is defined as negative behaviors directed at organizational members or their work context that occur regularly and repeatedly over a period of time. Employees' perceptions of psychosocial safety climate, workplace bullying victimization, and workplace bullying perpetration were assessed within a sample of nearly 5,000 workers. Linear and nonlinear approaches were applied in order to model both continuous and sudden changes in workplace bullying. More specifically, the present study examines whether a nonlinear dynamical systems model (i.e., a cusp catastrophe model) is superior to the linear combination of variables for predicting the effect of psychosocial safety climate and workplace bullying victimization on workplace bullying perpetration. According to the AICc, and BIC indices, the linear regression model fits the data better than the cusp catastrophe model. The study concludes that some phenomena, especially unhealthy behaviors at work (like workplace bullying), may be better studied using linear approaches as opposed to nonlinear dynamical systems models. This can be explained through the healthy variability hypothesis, which argues that positive organizational behavior is likely to present nonlinear behavior, while a decrease in such variability may indicate the occurrence of negative behaviors at work.
General methods for sensitivity analysis of equilibrium dynamics in patch occupancy models
Miller, David A.W.
2012-01-01
Sensitivity analysis is a useful tool for the study of ecological models that has many potential applications for patch occupancy modeling. Drawing from the rich foundation of existing methods for Markov chain models, I demonstrate new methods for sensitivity analysis of the equilibrium state dynamics of occupancy models. Estimates from three previous studies are used to illustrate the utility of the sensitivity calculations: a joint occupancy model for a prey species, its predators, and habitat used by both; occurrence dynamics from a well-known metapopulation study of three butterfly species; and Golden Eagle occupancy and reproductive dynamics. I show how to deal efficiently with multistate models and how to calculate sensitivities involving derived state variables and lower-level parameters. In addition, I extend methods to incorporate environmental variation by allowing for spatial and temporal variability in transition probabilities. The approach used here is concise and general and can fully account for environmental variability in transition parameters. The methods can be used to improve inferences in occupancy studies by quantifying the effects of underlying parameters, aiding prediction of future system states, and identifying priorities for sampling effort.
Simplified rotor load models and fatigue damage estimates for offshore wind turbines.
Muskulus, M
2015-02-28
The aim of rotor load models is to characterize and generate the thrust loads acting on an offshore wind turbine. Ideally, the rotor simulation can be replaced by time series from a model with a few parameters and state variables only. Such models are used extensively in control system design and, as a potentially new application area, structural optimization of support structures. Different rotor load models are here evaluated for a jacket support structure in terms of fatigue lifetimes of relevant structural variables. All models were found to be lacking in accuracy, with differences of more than 20% in fatigue load estimates. The most accurate models were the use of an effective thrust coefficient determined from a regression analysis of dynamic thrust loads, and a novel stochastic model in state-space form. The stochastic model explicitly models the quasi-periodic components obtained from rotational sampling of turbulent fluctuations. Its state variables follow a mean-reverting Ornstein-Uhlenbeck process. Although promising, more work is needed on how to determine the parameters of the stochastic model and before accurate lifetime predictions can be obtained without comprehensive rotor simulations. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Rochman, Chelsea M; Lewison, Rebecca L; Eriksen, Marcus; Allen, Harry; Cook, Anna-Marie; Teh, Swee J
2014-04-01
The accumulation of plastic debris in pelagic habitats of the subtropical gyres is a global phenomenon of growing concern, particularly with regard to wildlife. When animals ingest plastic debris that is associated with chemical contaminants, they are at risk of bioaccumulating hazardous pollutants. We examined the relationship between the bioaccumulation of hazardous chemicals in myctophid fish associated with plastic debris and plastic contamination in remote and previously unmonitored pelagic habitats in the South Atlantic Ocean. Using a published model, we defined three sampling zones where accumulated densities of plastic debris were predicted to differ. Contrary to model predictions, we found variable levels of plastic debris density across all stations within the sampling zones. Mesopelagic lanternfishes, sampled from each station and analyzed for bisphenol A (BPA), alkylphenols, alkylphenol ethoxylates, polychlorinated biphenyls (PCBs) and polybrominated diphenyl ethers (PBDEs), exhibited variability in contaminant levels, but this variability was not related to plastic debris density for most of the targeted compounds with the exception of PBDEs. We found that myctophid sampled at stations with greater plastic densities did have significantly larger concentrations of BDE#s 183 -209 in their tissues suggesting that higher brominated congeners of PBDEs, added to plastics as flame-retardants, are indicative of plastic contamination in the marine environment. Our results provide data on a previously unsampled pelagic gyre and highlight the challenges associated with characterizing plastic debris accumulation and associated risks to wildlife. Copyright © 2014 Elsevier B.V. All rights reserved.
Wiedermann, Wolfgang; Li, Xintong
2018-04-16
In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.
Anderson, Chauncey W.; Rounds, Stewart A.
2010-01-01
Management of water quality in streams of the United States is becoming increasingly complex as regulators seek to control aquatic pollution and ecological problems through Total Maximum Daily Load programs that target reductions in the concentrations of certain constituents. Sediment, nutrients, and bacteria, for example, are constituents that regulators target for reduction nationally and in the Tualatin River basin, Oregon. These constituents require laboratory analysis of discrete samples for definitive determinations of concentrations in streams. Recent technological advances in the nearly continuous, in situ monitoring of related water-quality parameters has fostered the use of these parameters as surrogates for the labor intensive, laboratory-analyzed constituents. Although these correlative techniques have been successful in large rivers, it was unclear whether they could be applied successfully in tributaries of the Tualatin River, primarily because these streams tend to be small, have rapid hydrologic response to rainfall and high streamflow variability, and may contain unique sources of sediment, nutrients, and bacteria. This report evaluates the feasibility of developing correlative regression models for predicting dependent variables (concentrations of total suspended solids, total phosphorus, and Escherichia coli bacteria) in two Tualatin River basin streams: one draining highly urbanized land (Fanno Creek near Durham, Oregon) and one draining rural agricultural land (Dairy Creek at Highway 8 near Hillsboro, Oregon), during 2002-04. An important difference between these two streams is their response to storm runoff; Fanno Creek has a relatively rapid response due to extensive upstream impervious areas and Dairy Creek has a relatively slow response because of the large amount of undeveloped upstream land. Four other stream sites also were evaluated, but in less detail. Potential explanatory variables included continuously monitored streamflow (discharge), stream stage, specific conductance, turbidity, and time (to account for seasonal processes). Preliminary multiple-regression models were identified using stepwise regression and Mallow's Cp, which maximizes regression correlation coefficients and accounts for the loss of additional degrees of freedom when extra explanatory variables are used. Several data scenarios were created and evaluated for each site to assess the representativeness of existing monitoring data and autosampler-derived data, and to assess the utility of the available data to develop robust predictive models. The goodness-of-fit of candidate predictive models was assessed with diagnostic statistics from validation exercises that compared predictions against a subset of the available data. The regression modeling met with mixed success. Functional model forms that have a high likelihood of success were identified for most (but not all) dependent variables at each site, but there were limitations in the available datasets, notably the lack of samples from high-flows. These limitations increase the uncertainty in the predictions of the models and suggest that the models are not yet ready for use in assessing these streams, particularly under high-flow conditions, without additional data collection and recalibration of model coefficients. Nonetheless, the results reveal opportunities to use existing resources more efficiently. Baseline conditions are well represented in the available data, and, for the most part, the models reproduced these conditions well. Future sampling might therefore focus on high flow conditions, without much loss of ability to characterize the baseline. Seasonal cycles, as represented by trigonometric functions of time, were not significant in the evaluated models, perhaps because the baseline conditions are well characterized in the datasets or because the other explanatory variables indirectly incorporate seasonal aspects. Multicollinearity among independent variabl
Liang, Shih-Hsiung; Walther, Bruno Andreas; Shieh, Bao-Sen
2017-01-01
Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies.
Liang, Shih-Hsiung; Walther, Bruno Andreas
2017-01-01
Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Discussion Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies. PMID:28316893
Belsey, Natalie A; Cant, David J H; Minelli, Caterina; Araujo, Joyce R; Bock, Bernd; Brüner, Philipp; Castner, David G; Ceccone, Giacomo; Counsell, Jonathan D P; Dietrich, Paul M; Engelhard, Mark H; Fearn, Sarah; Galhardo, Carlos E; Kalbe, Henryk; Won Kim, Jeong; Lartundo-Rojas, Luis; Luftman, Henry S; Nunney, Tim S; Pseiner, Johannes; Smith, Emily F; Spampinato, Valentina; Sturm, Jacobus M; Thomas, Andrew G; Treacy, Jon P W; Veith, Lothar; Wagstaffe, Michael; Wang, Hai; Wang, Meiling; Wang, Yung-Chen; Werner, Wolfgang; Yang, Li; Shard, Alexander G
2016-10-27
We report the results of a VAMAS (Versailles Project on Advanced Materials and Standards) inter-laboratory study on the measurement of the shell thickness and chemistry of nanoparticle coatings. Peptide-coated gold particles were supplied to laboratories in two forms: a colloidal suspension in pure water and; particles dried onto a silicon wafer. Participants prepared and analyzed these samples using either X-ray photoelectron spectroscopy (XPS) or low energy ion scattering (LEIS). Careful data analysis revealed some significant sources of discrepancy, particularly for XPS. Degradation during transportation, storage or sample preparation resulted in a variability in thickness of 53 %. The calculation method chosen by XPS participants contributed a variability of 67 %. However, variability of 12 % was achieved for the samples deposited using a single method and by choosing photoelectron peaks that were not adversely affected by instrumental transmission effects. The study identified a need for more consistency in instrumental transmission functions and relative sensitivity factors, since this contributed a variability of 33 %. The results from the LEIS participants were more consistent, with variability of less than 10 % in thickness and this is mostly due to a common method of data analysis. The calculation was performed using a model developed for uniform, flat films and some participants employed a correction factor to account for the sample geometry, which appears warranted based upon a simulation of LEIS data from one of the participants and comparison to the XPS results.
A Structural Equation Model Explaining 8th Grade Students' Mathematics Achievements
ERIC Educational Resources Information Center
Yurt, Eyüp; Sünbül, Ali Murat
2014-01-01
The purpose of this study is to investigate, via a model, the explanatory and predictive relationships among the following variables: Mathematical Problem Solving and Reasoning Skills, Sources of Mathematics Self-Efficacy, Spatial Ability, and Mathematics Achievements of Secondary School 8th Grade Students. The sample group of the study, itself…
ERIC Educational Resources Information Center
Sutton, Jazmyne A.; Walsh-Buhi, Eric R.
2017-01-01
Objective: This study investigated variables within the Integrative Model of Behavioral Prediction (IMBP) as well as differences across socioeconomic status (SES) levels within the context of inconsistent contraceptive use among college women. Participants: A nonprobability sample of 515 female college students completed an Internet-based survey…
On the Power of Multivariate Latent Growth Curve Models to Detect Correlated Change
ERIC Educational Resources Information Center
Hertzog, Christopher; Lindenberger, Ulman; Ghisletta, Paolo; Oertzen, Timo von
2006-01-01
We evaluated the statistical power of single-indicator latent growth curve models (LGCMs) to detect correlated change between two variables (covariance of slopes) as a function of sample size, number of longitudinal measurement occasions, and reliability (measurement error variance). Power approximations following the method of Satorra and Saris…
ERIC Educational Resources Information Center
James, Kai'Iah A.
2010-01-01
This dissertation study examines the impact of traditional and non-cognitive variables on the academic prediction model for a sample of collegiate student-athletes. Three hundred and fifty-nine NCAA Division IA male and female student-athletes, representing 13 sports, including football and Men's and Women's Basketball provided demographic…
Aspen, climate, and sudden decline in western USA
Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston
2009-01-01
A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...
Response surface models of subsoil K concentration for loess over till soils in Missouri
USDA-ARS?s Scientific Manuscript database
Crop uptake of potassium (K) has demonstrated sensitivity to subsoil variation in K content. This fact has not been sufficiently considered in K management strategies in part due to logistical difficulties in sampling spatially variable subsoil K. We propose a simplified soil factorial model, a resp...
Exploring Sex Differences in Worry with a Cognitive Vulnerability Model
ERIC Educational Resources Information Center
Zalta, Alyson K.; Chambless, Dianne L.
2008-01-01
A multivariate model was developed to examine the relative contributions of mastery, stress, interpretive bias, and coping to sex differences in worry. Rumination was incorporated as a second outcome variable to test the specificity of these associations. Participants included two samples of undergraduates totaling 302 men and 379 women. A path…