DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, D.L.
The last decade has been a period of rapid development in the implementation of covariance-matrix methodology in nuclear data research. This paper offers some perspective on the progress which has been made, on some of the unresolved problems, and on the potential yet to be realized. These discussions address a variety of issues related to the development of nuclear data. Topics examined are: the importance of designing and conducting experiments so that error information is conveniently generated; the procedures for identifying error sources and quantifying their magnitudes and correlations; the combination of errors; the importance of consistent and well-characterized measurementmore » standards; the role of covariances in data parameterization (fitting); the estimation of covariances for values calculated from mathematical models; the identification of abnormalities in covariance matrices and the analysis of their consequences; the problems encountered in representing covariance information in evaluated files; the role of covariances in the weighting of diverse data sets; the comparison of various evaluations; the influence of primary-data covariance in the analysis of covariances for derived quantities (sensitivity); and the role of covariances in the merging of the diverse nuclear data information. 226 refs., 2 tabs.« less
NASA Technical Reports Server (NTRS)
Hepner, T. E.; Meyers, J. F. (Inventor)
1985-01-01
A laser velocimeter covariance processor which calculates the auto covariance and cross covariance functions for a turbulent flow field based on Poisson sampled measurements in time from a laser velocimeter is described. The device will process a block of data that is up to 4096 data points in length and return a 512 point covariance function with 48-bit resolution along with a 512 point histogram of the interarrival times which is used to normalize the covariance function. The device is designed to interface and be controlled by a minicomputer from which the data is received and the results returned. A typical 4096 point computation takes approximately 1.5 seconds to receive the data, compute the covariance function, and return the results to the computer.
Williams, M. L.; Wiarda, D.; Ilas, G.; ...
2014-06-15
Recently, we processed a new covariance data library based on ENDF/B-VII.1 for the SCALE nuclear analysis code system. The multigroup covariance data are discussed here, along with testing and application results for critical benchmark experiments. Moreover, the cross section covariance library, along with covariances for fission product yields and decay data, is used to compute uncertainties in the decay heat produced by a burned reactor fuel assembly.
NASA Astrophysics Data System (ADS)
Shirasaki, Masato; Takada, Masahiro; Miyatake, Hironao; Takahashi, Ryuichi; Hamana, Takashi; Nishimichi, Takahiro; Murata, Ryoma
2017-09-01
We develop a method to simulate galaxy-galaxy weak lensing by utilizing all-sky, light-cone simulations and their inherent halo catalogues. Using the mock catalogue to study the error covariance matrix of galaxy-galaxy weak lensing, we compare the full covariance with the 'jackknife' (JK) covariance, the method often used in the literature that estimates the covariance from the resamples of the data itself. We show that there exists the variation of JK covariance over realizations of mock lensing measurements, while the average JK covariance over mocks can give a reasonably accurate estimation of the true covariance up to separations comparable with the size of JK subregion. The scatter in JK covariances is found to be ∼10 per cent after we subtract the lensing measurement around random points. However, the JK method tends to underestimate the covariance at the larger separations, more increasingly for a survey with a higher number density of source galaxies. We apply our method to the Sloan Digital Sky Survey (SDSS) data, and show that the 48 mock SDSS catalogues nicely reproduce the signals and the JK covariance measured from the real data. We then argue that the use of the accurate covariance, compared to the JK covariance, allows us to use the lensing signals at large scales beyond a size of the JK subregion, which contains cleaner cosmological information in the linear regime.
Das, Kiranmoy; Daniels, Michael J.
2014-01-01
Summary Estimation of the covariance structure for irregular sparse longitudinal data has been studied by many authors in recent years but typically using fully parametric specifications. In addition, when data are collected from several groups over time, it is known that assuming the same or completely different covariance matrices over groups can lead to loss of efficiency and/or bias. Nonparametric approaches have been proposed for estimating the covariance matrix for regular univariate longitudinal data by sharing information across the groups under study. For the irregular case, with longitudinal measurements that are bivariate or multivariate, modeling becomes more difficult. In this article, to model bivariate sparse longitudinal data from several groups, we propose a flexible covariance structure via a novel matrix stick-breaking process for the residual covariance structure and a Dirichlet process mixture of normals for the random effects. Simulation studies are performed to investigate the effectiveness of the proposed approach over more traditional approaches. We also analyze a subset of Framingham Heart Study data to examine how the blood pressure trajectories and covariance structures differ for the patients from different BMI groups (high, medium and low) at baseline. PMID:24400941
Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.
Xie, Yanmei; Zhang, Biao
2017-04-20
Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).
Development and Testing of Neutron Cross Section Covariance Data for SCALE 6.2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marshall, William BJ J; Williams, Mark L; Wiarda, Dorothea
2015-01-01
Neutron cross-section covariance data are essential for many sensitivity/uncertainty and uncertainty quantification assessments performed both within the TSUNAMI suite and more broadly throughout the SCALE code system. The release of ENDF/B-VII.1 included a more complete set of neutron cross-section covariance data: these data form the basis for a new cross-section covariance library to be released in SCALE 6.2. A range of testing is conducted to investigate the properties of these covariance data and ensure that the data are reasonable. These tests include examination of the uncertainty in critical experiment benchmark model k eff values due to nuclear data uncertainties, asmore » well as similarity assessments of irradiated pressurized water reactor (PWR) and boiling water reactor (BWR) fuel with suites of critical experiments. The contents of the new covariance library, the testing performed, and the behavior of the new covariance data are described in this paper. The neutron cross-section covariances can be combined with a sensitivity data file generated using the TSUNAMI suite of codes within SCALE to determine the uncertainty in system k eff caused by nuclear data uncertainties. The Verified, Archived Library of Inputs and Data (VALID) maintained at Oak Ridge National Laboratory (ORNL) contains over 400 critical experiment benchmark models, and sensitivity data are generated for each of these models. The nuclear data uncertainty in k eff is generated for each experiment, and the resulting uncertainties are tabulated and compared to the differences in measured and calculated results. The magnitude of the uncertainty for categories of nuclides (such as actinides, fission products, and structural materials) is calculated for irradiated PWR and BWR fuel to quantify the effect of covariance library changes between the SCALE 6.1 and 6.2 libraries. One of the primary applications of sensitivity/uncertainty methods within SCALE is the assessment of similarities between benchmark experiments and safety applications. This is described by a c k value for each experiment with each application. Several studies have analyzed typical c k values for a range of critical experiments compared with hypothetical irradiated fuel applications. The c k value is sensitive to the cross-section covariance data because the contribution of each nuclide is influenced by its uncertainty; large uncertainties indicate more likely bias sources and are thus given more weight. Changes in c k values resulting from different covariance data can be used to examine and assess underlying data changes. These comparisons are performed for PWR and BWR fuel in storage and transportation systems.« less
A quantile regression model for failure-time data with time-dependent covariates
Gorfine, Malka; Goldberg, Yair; Ritov, Ya’acov
2017-01-01
Summary Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This article provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992, Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N. P., Dietz, K. and Farewell, V. T. (editors), AIDS Epidemiology. Boston: Birkhaäuser, pp. 297–331.). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset. PMID:27485534
Random sampling and validation of covariance matrices of resonance parameters
NASA Astrophysics Data System (ADS)
Plevnik, Lucijan; Zerovnik, Gašper
2017-09-01
Analytically exact methods for random sampling of arbitrary correlated parameters are presented. Emphasis is given on one hand on the possible inconsistencies in the covariance data, concentrating on the positive semi-definiteness and consistent sampling of correlated inherently positive parameters, and on the other hand on optimization of the implementation of the methods itself. The methods have been applied in the program ENDSAM, written in the Fortran language, which from a file from a nuclear data library of a chosen isotope in ENDF-6 format produces an arbitrary number of new files in ENDF-6 format which contain values of random samples of resonance parameters (in accordance with corresponding covariance matrices) in places of original values. The source code for the program ENDSAM is available from the OECD/NEA Data Bank. The program works in the following steps: reads resonance parameters and their covariance data from nuclear data library, checks whether the covariance data is consistent, and produces random samples of resonance parameters. The code has been validated with both realistic and artificial data to show that the produced samples are statistically consistent. Additionally, the code was used to validate covariance data in existing nuclear data libraries. A list of inconsistencies, observed in covariance data of resonance parameters in ENDF-VII.1, JEFF-3.2 and JENDL-4.0 is presented. For now, the work has been limited to resonance parameters, however the methods presented are general and can in principle be extended to sampling and validation of any nuclear data.
Covariate analysis of bivariate survival data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, L.E.
1992-01-01
The methods developed are used to analyze the effects of covariates on bivariate survival data when censoring and ties are present. The proposed method provides models for bivariate survival data that include differential covariate effects and censored observations. The proposed models are based on an extension of the univariate Buckley-James estimators which replace censored data points by their expected values, conditional on the censoring time and the covariates. For the bivariate situation, it is necessary to determine the expectation of the failure times for one component conditional on the failure or censoring time of the other component. Two different methodsmore » have been developed to estimate these expectations. In the semiparametric approach these expectations are determined from a modification of Burke's estimate of the bivariate empirical survival function. In the parametric approach censored data points are also replaced by their conditional expected values where the expected values are determined from a specified parametric distribution. The model estimation will be based on the revised data set, comprised of uncensored components and expected values for the censored components. The variance-covariance matrix for the estimated covariate parameters has also been derived for both the semiparametric and parametric methods. Data from the Demographic and Health Survey was analyzed by these methods. The two outcome variables are post-partum amenorrhea and breastfeeding; education and parity were used as the covariates. Both the covariate parameter estimates and the variance-covariance estimates for the semiparametric and parametric models will be compared. In addition, a multivariate test statistic was used in the semiparametric model to examine contrasts. The significance of the statistic was determined from a bootstrap distribution of the test statistic.« less
Wallace, Meredith L; Anderson, Stewart J; Mazumdar, Sati
2010-12-20
Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.
Using Fit Indexes to Select a Covariance Model for Longitudinal Data
ERIC Educational Resources Information Center
Liu, Siwei; Rovine, Michael J.; Molenaar, Peter C. M.
2012-01-01
This study investigated the performance of fit indexes in selecting a covariance structure for longitudinal data. Data were simulated to follow a compound symmetry, first-order autoregressive, first-order moving average, or random-coefficients covariance structure. We examined the ability of the likelihood ratio test (LRT), root mean square error…
Massive data compression for parameter-dependent covariance matrices
NASA Astrophysics Data System (ADS)
Heavens, Alan F.; Sellentin, Elena; de Mijolla, Damien; Vianello, Alvise
2017-12-01
We show how the massive data compression algorithm MOPED can be used to reduce, by orders of magnitude, the number of simulated data sets which are required to estimate the covariance matrix required for the analysis of Gaussian-distributed data. This is relevant when the covariance matrix cannot be calculated directly. The compression is especially valuable when the covariance matrix varies with the model parameters. In this case, it may be prohibitively expensive to run enough simulations to estimate the full covariance matrix throughout the parameter space. This compression may be particularly valuable for the next generation of weak lensing surveys, such as proposed for Euclid and Large Synoptic Survey Telescope, for which the number of summary data (such as band power or shear correlation estimates) is very large, ∼104, due to the large number of tomographic redshift bins which the data will be divided into. In the pessimistic case where the covariance matrix is estimated separately for all points in an Monte Carlo Markov Chain analysis, this may require an unfeasible 109 simulations. We show here that MOPED can reduce this number by a factor of 1000, or a factor of ∼106 if some regularity in the covariance matrix is assumed, reducing the number of simulations required to a manageable 103, making an otherwise intractable analysis feasible.
Measuring continuous baseline covariate imbalances in clinical trial data
Ciolino, Jody D.; Martin, Renee’ H.; Zhao, Wenle; Hill, Michael D.; Jauch, Edward C.; Palesch, Yuko Y.
2014-01-01
This paper presents and compares several methods of measuring continuous baseline covariate imbalance in clinical trial data. Simulations illustrate that though the t-test is an inappropriate method of assessing continuous baseline covariate imbalance, the test statistic itself is a robust measure in capturing imbalance in continuous covariate distributions. Guidelines to assess effects of imbalance on bias, type I error rate, and power for hypothesis test for treatment effect on continuous outcomes are presented, and the benefit of covariate-adjusted analysis (ANCOVA) is also illustrated. PMID:21865270
Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello
2018-04-22
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
Gaskins, J T; Daniels, M J
2016-01-02
The estimation of the covariance matrix is a key concern in the analysis of longitudinal data. When data consists of multiple groups, it is often assumed the covariance matrices are either equal across groups or are completely distinct. We seek methodology to allow borrowing of strength across potentially similar groups to improve estimation. To that end, we introduce a covariance partition prior which proposes a partition of the groups at each measurement time. Groups in the same set of the partition share dependence parameters for the distribution of the current measurement given the preceding ones, and the sequence of partitions is modeled as a Markov chain to encourage similar structure at nearby measurement times. This approach additionally encourages a lower-dimensional structure of the covariance matrices by shrinking the parameters of the Cholesky decomposition toward zero. We demonstrate the performance of our model through two simulation studies and the analysis of data from a depression study. This article includes Supplementary Material available online.
The use of a covariate reduces experimental error in nutrient digestion studies in growing pigs
USDA-ARS?s Scientific Manuscript database
Covariance analysis limits error, the degree of nuisance variation, and overparameterizing factors to accurately measure treatment effects. Data dealing with growth, carcass composition, and genetics often utilize covariates in data analysis. In contrast, nutritional studies typically do not. The ob...
On analyzing ordinal data when responses and covariates are both missing at random.
Rana, Subrata; Roy, Surupa; Das, Kalyan
2016-08-01
In many occasions, particularly in biomedical studies, data are unavailable for some responses and covariates. This leads to biased inference in the analysis when a substantial proportion of responses or a covariate or both are missing. Except a few situations, methods for missing data have earlier been considered either for missing response or for missing covariates, but comparatively little attention has been directed to account for both missing responses and missing covariates, which is partly attributable to complexity in modeling and computation. This seems to be important as the precise impact of substantial missing data depends on the association between two missing data processes as well. The real difficulty arises when the responses are ordinal by nature. We develop a joint model to take into account simultaneously the association between the ordinal response variable and covariates and also that between the missing data indicators. Such a complex model has been analyzed here by using the Markov chain Monte Carlo approach and also by the Monte Carlo relative likelihood approach. Their performance on estimating the model parameters in finite samples have been looked into. We illustrate the application of these two methods using data from an orthodontic study. Analysis of such data provides some interesting information on human habit. © The Author(s) 2013.
Jin, Huifeng; Cheng, Haojie; Chen, Wei; Sheng, Xiaoming; Levy, Mark A; Brown, Mark J; Tian, Junqiang
2018-05-01
The single nucleotide polymorphism of the gene 5,10-methylenetetrahydrofolate reductase (MTHFR) C677T (or rs1801133) is the most established genetic factor that increases plasma total homocysteine (tHcy) and consequently results in hyperhomocysteinemia. Yet, given the limited penetrance of this genetic variant, it is necessary to individually predict the risk of hyperhomocysteinemia for an rs1801133 carrier. We hypothesized that variability in this genetic risk is largely due to the presence of factors (covariates) that serve as effect modifiers, confounders, or both, such as folic acid (FA) intake, and aimed to assess this risk in the complex context of these covariates. We systematically extracted from published studies the data on tHcy, rs1801133, and any previously reported rs1801133 covariates. The resulting metadata set was first used to analyze the covariates' modifying effect by meta-regression and other statistical means. Subsequently, we controlled for this modifying effect by genotype-stratifying tHcy data and analyzed the variability in the risk resulting from the confounding of covariates. The data set contains data on 36 rs1801133 covariates that were collected from 114,799 participants and 256 qualified studies, among which 6 covariates (sex, age, race, FA intake, smoking, and alcohol consumption) are the most frequently informed and therefore included for statistical analysis. The effect of rs1801133 on tHcy exhibits significant variability that can be attributed to effect modification as well as confounding by these covariates. Via statistical modeling, we predicted the covariate-dependent risk of tHcy elevation and hyperhomocysteinemia in a systematic manner. We showed an evidence-based approach that globally assesses the covariate-dependent effect of rs1801133 on tHcy. The results should assist clinicians in interpreting the rs1801133 data from genetic testing for their patients. Such information is also important for the public, who increasingly receive genetic data from commercial services without interpretation of its clinical relevance. This study was registered at Research Registry with the registration number reviewregistry328.
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION
Allen, Genevera I.; Tibshirani, Robert
2015-01-01
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility. PMID:26877823
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.
Allen, Genevera I; Tibshirani, Robert
2010-06-01
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.
ERIC Educational Resources Information Center
Fu, Jianbin
2016-01-01
The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Song, Rui; Kosorok, Michael R.; Cai, Jianwen
2009-01-01
Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107
Spatio-Temporal EEG Models for Brain Interfaces
Gonzalez-Navarro, P.; Moghadamfalahi, M.; Akcakaya, M.; Erdogmus, D.
2016-01-01
Multichannel electroencephalography (EEG) is widely used in non-invasive brain computer interfaces (BCIs) for user intent inference. EEG can be assumed to be a Gaussian process with unknown mean and autocovariance, and the estimation of parameters is required for BCI inference. However, the relatively high dimensionality of the EEG feature vectors with respect to the number of labeled observations lead to rank deficient covariance matrix estimates. In this manuscript, to overcome ill-conditioned covariance estimation, we propose a structure for the covariance matrices of the multichannel EEG signals. Specifically, we assume that these covariances can be modeled as a Kronecker product of temporal and spatial covariances. Our results over the experimental data collected from the users of a letter-by-letter typing BCI show that with less number of parameter estimations, the system can achieve higher classification accuracies compared to a method that uses full unstructured covariance estimation. Moreover, in order to illustrate that the proposed Kronecker product structure could enable shortening the BCI calibration data collection sessions, using Cramer-Rao bound analysis on simulated data, we demonstrate that a model with structured covariance matrices will achieve the same estimation error as a model with no covariance structure using fewer labeled EEG observations. PMID:27713590
Inventory Uncertainty Quantification using TENDL Covariance Data in Fispact-II
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eastwood, J.W.; Morgan, J.G.; Sublet, J.-Ch., E-mail: jean-christophe.sublet@ccfe.ac.uk
2015-01-15
The new inventory code Fispact-II provides predictions of inventory, radiological quantities and their uncertainties using nuclear data covariance information. Central to the method is a novel fast pathways search algorithm using directed graphs. The pathways output provides (1) an aid to identifying important reactions, (2) fast estimates of uncertainties, (3) reduced models that retain important nuclides and reactions for use in the code's Monte Carlo sensitivity analysis module. Described are the methods that are being implemented for improving uncertainty predictions, quantification and propagation using the covariance data that the recent nuclear data libraries contain. In the TENDL library, above themore » upper energy of the resolved resonance range, a Monte Carlo method in which the covariance data come from uncertainties of the nuclear model calculations is used. The nuclear data files are read directly by FISPACT-II without any further intermediate processing. Variance and covariance data are processed and used by FISPACT-II to compute uncertainties in collapsed cross sections, and these are in turn used to predict uncertainties in inventories and all derived radiological data.« less
On the Likely Utility of Hybrid Weights Optimized for Variances in Hybrid Error Covariance Models
NASA Astrophysics Data System (ADS)
Satterfield, E.; Hodyss, D.; Kuhl, D.; Bishop, C. H.
2017-12-01
Because of imperfections in ensemble data assimilation schemes, one cannot assume that the ensemble covariance is equal to the true error covariance of a forecast. Previous work demonstrated how information about the distribution of true error variances given an ensemble sample variance can be revealed from an archive of (observation-minus-forecast, ensemble-variance) data pairs. Here, we derive a simple and intuitively compelling formula to obtain the mean of this distribution of true error variances given an ensemble sample variance from (observation-minus-forecast, ensemble-variance) data pairs produced by a single run of a data assimilation system. This formula takes the form of a Hybrid weighted average of the climatological forecast error variance and the ensemble sample variance. Here, we test the extent to which these readily obtainable weights can be used to rapidly optimize the covariance weights used in Hybrid data assimilation systems that employ weighted averages of static covariance models and flow-dependent ensemble based covariance models. Univariate data assimilation and multi-variate cycling ensemble data assimilation are considered. In both cases, it is found that our computationally efficient formula gives Hybrid weights that closely approximate the optimal weights found through the simple but computationally expensive process of testing every plausible combination of weights.
NASA Astrophysics Data System (ADS)
Shulman, Igor; Gould, Richard W.; Frolov, Sergey; McCarthy, Sean; Penta, Brad; Anderson, Stephanie; Sakalaukus, Peter
2018-03-01
An ensemble-based approach to specify observational error covariance in the data assimilation of satellite bio-optical properties is proposed. The observational error covariance is derived from statistical properties of the generated ensemble of satellite MODIS-Aqua chlorophyll (Chl) images. The proposed observational error covariance is used in the Optimal Interpolation scheme for the assimilation of MODIS-Aqua Chl observations. The forecast error covariance is specified in the subspace of the multivariate (bio-optical, physical) empirical orthogonal functions (EOFs) estimated from a month-long model run. The assimilation of surface MODIS-Aqua Chl improved surface and subsurface model Chl predictions. Comparisons with surface and subsurface water samples demonstrate that data assimilation run with the proposed observational error covariance has higher RMSE than the data assimilation run with "optimistic" assumption about observational errors (10% of the ensemble mean), but has smaller or comparable RMSE than data assimilation run with an assumption that observational errors equal to 35% of the ensemble mean (the target error for satellite data product for chlorophyll). Also, with the assimilation of the MODIS-Aqua Chl data, the RMSE between observed and model-predicted fractions of diatoms to the total phytoplankton is reduced by a factor of two in comparison to the nonassimilative run.
Covariance specification and estimation to improve top-down Green House Gas emission estimates
NASA Astrophysics Data System (ADS)
Ghosh, S.; Lopez-Coto, I.; Prasad, K.; Whetstone, J. R.
2015-12-01
The National Institute of Standards and Technology (NIST) operates the North-East Corridor (NEC) project and the Indianapolis Flux Experiment (INFLUX) in order to develop measurement methods to quantify sources of Greenhouse Gas (GHG) emissions as well as their uncertainties in urban domains using a top down inversion method. Top down inversion updates prior knowledge using observations in a Bayesian way. One primary consideration in a Bayesian inversion framework is the covariance structure of (1) the emission prior residuals and (2) the observation residuals (i.e. the difference between observations and model predicted observations). These covariance matrices are respectively referred to as the prior covariance matrix and the model-data mismatch covariance matrix. It is known that the choice of these covariances can have large effect on estimates. The main objective of this work is to determine the impact of different covariance models on inversion estimates and their associated uncertainties in urban domains. We use a pseudo-data Bayesian inversion framework using footprints (i.e. sensitivities of tower measurements of GHGs to surface emissions) and emission priors (based on Hestia project to quantify fossil-fuel emissions) to estimate posterior emissions using different covariance schemes. The posterior emission estimates and uncertainties are compared to the hypothetical truth. We find that, if we correctly specify spatial variability and spatio-temporal variability in prior and model-data mismatch covariances respectively, then we can compute more accurate posterior estimates. We discuss few covariance models to introduce space-time interacting mismatches along with estimation of the involved parameters. We then compare several candidate prior spatial covariance models from the Matern covariance class and estimate their parameters with specified mismatches. We find that best-fitted prior covariances are not always best in recovering the truth. To achieve accuracy, we perform a sensitivity study to further tune covariance parameters. Finally, we introduce a shrinkage based sample covariance estimation technique for both prior and mismatch covariances. This technique allows us to achieve similar accuracy nonparametrically in a more efficient and automated way.
A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.
Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun
2017-09-21
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
Beamforming using subspace estimation from a diagonally averaged sample covariance.
Quijano, Jorge E; Zurk, Lisa M
2017-08-01
The potential benefit of a large-aperture sonar array for high resolution target localization is often challenged by the lack of sufficient data required for adaptive beamforming. This paper introduces a Toeplitz-constrained estimator of the clairvoyant signal covariance matrix corresponding to multiple far-field targets embedded in background isotropic noise. The estimator is obtained by averaging along subdiagonals of the sample covariance matrix, followed by covariance extrapolation using the method of maximum entropy. The sample covariance is computed from limited data snapshots, a situation commonly encountered with large-aperture arrays in environments characterized by short periods of local stationarity. Eigenvectors computed from the Toeplitz-constrained covariance are used to construct signal-subspace projector matrices, which are shown to reduce background noise and improve detection of closely spaced targets when applied to subspace beamforming. Monte Carlo simulations corresponding to increasing array aperture suggest convergence of the proposed projector to the clairvoyant signal projector, thereby outperforming the classic projector obtained from the sample eigenvectors. Beamforming performance of the proposed method is analyzed using simulated data, as well as experimental data from the Shallow Water Array Performance experiment.
Shen, Chung-Wei; Chen, Yi-Hau
2015-10-01
Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
NASA Astrophysics Data System (ADS)
Islamiyati, A.; Fatmawati; Chamidah, N.
2018-03-01
The correlation assumption of the longitudinal data with bi-response occurs on the measurement between the subjects of observation and the response. It causes the auto-correlation of error, and this can be overcome by using a covariance matrix. In this article, we estimate the covariance matrix based on the penalized spline regression model. Penalized spline involves knot points and smoothing parameters simultaneously in controlling the smoothness of the curve. Based on our simulation study, the estimated regression model of the weighted penalized spline with covariance matrix gives a smaller error value compared to the error of the model without covariance matrix.
The Performance Analysis Based on SAR Sample Covariance Matrix
Erten, Esra
2012-01-01
Multi-channel systems appear in several fields of application in science. In the Synthetic Aperture Radar (SAR) context, multi-channel systems may refer to different domains, as multi-polarization, multi-interferometric or multi-temporal data, or even a combination of them. Due to the inherent speckle phenomenon present in SAR images, the statistical description of the data is almost mandatory for its utilization. The complex images acquired over natural media present in general zero-mean circular Gaussian characteristics. In this case, second order statistics as the multi-channel covariance matrix fully describe the data. For practical situations however, the covariance matrix has to be estimated using a limited number of samples, and this sample covariance matrix follow the complex Wishart distribution. In this context, the eigendecomposition of the multi-channel covariance matrix has been shown in different areas of high relevance regarding the physical properties of the imaged scene. Specifically, the maximum eigenvalue of the covariance matrix has been frequently used in different applications as target or change detection, estimation of the dominant scattering mechanism in polarimetric data, moving target indication, etc. In this paper, the statistical behavior of the maximum eigenvalue derived from the eigendecomposition of the sample multi-channel covariance matrix in terms of multi-channel SAR images is simplified for SAR community. Validation is performed against simulated data and examples of estimation and detection problems using the analytical expressions are as well given. PMID:22736976
NASA Astrophysics Data System (ADS)
Mikosch, Jochen; Patchkovskii, Serguei
2013-10-01
We use an analytical theory of noisy Poisson processes, developed in the preceding companion publication, to compare coincidence and covariance measurement approaches in photoelectron and -ion spectroscopy. For non-unit detection efficiencies, coincidence data acquisition (DAQ) suffers from false coincidences. The rate of false coincidences grows quadratically with the rate of elementary ionization events. To minimize false coincidences for rare event outcomes, very low event rates may hence be required. Coincidence measurements exhibit high tolerance to noise introduced by unstable experimental conditions. Covariance DAQ on the other hand is free of systematic errors as long as stable experimental conditions are maintained. In the presence of noise, all channels in a covariance measurement become correlated. Under favourable conditions, covariance DAQ may allow orders of magnitude reduction in measurement times. Finally, we use experimental data for strong-field ionization of 1,3-butadiene to illustrate how fluctuations in experimental conditions can contaminate a covariance measurement, and how such contamination can be detected.
Effect of correlation on covariate selection in linear and nonlinear mixed effect models.
Bonate, Peter L
2017-01-01
The effect of correlation among covariates on covariate selection was examined with linear and nonlinear mixed effect models. Demographic covariates were extracted from the National Health and Nutrition Examination Survey III database. Concentration-time profiles were Monte Carlo simulated where only one covariate affected apparent oral clearance (CL/F). A series of univariate covariate population pharmacokinetic models was fit to the data and compared with the reduced model without covariate. The "best" covariate was identified using either the likelihood ratio test statistic or AIC. Weight and body surface area (calculated using Gehan and George equation, 1970) were highly correlated (r = 0.98). Body surface area was often selected as a better covariate than weight, sometimes as high as 1 in 5 times, when weight was the covariate used in the data generating mechanism. In a second simulation, parent drug concentration and three metabolites were simulated from a thorough QT study and used as covariates in a series of univariate linear mixed effects models of ddQTc interval prolongation. The covariate with the largest significant LRT statistic was deemed the "best" predictor. When the metabolite was formation-rate limited and only parent concentrations affected ddQTc intervals the metabolite was chosen as a better predictor as often as 1 in 5 times depending on the slope of the relationship between parent concentrations and ddQTc intervals. A correlated covariate can be chosen as being a better predictor than another covariate in a linear or nonlinear population analysis by sheer correlation These results explain why for the same drug different covariates may be identified in different analyses. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Covariance hypotheses for LANDSAT data
NASA Technical Reports Server (NTRS)
Decell, H. P.; Peters, C.
1983-01-01
Two covariance hypotheses are considered for LANDSAT data acquired by sampling fields, one an autoregressive covariance structure and the other the hypothesis of exchangeability. A minimum entropy approximation of the first structure by the second is derived and shown to have desirable properties for incorporation into a mixture density estimation procedure. Results of a rough test of the exchangeability hypothesis are presented.
Foldnes, Njål; Olsson, Ulf Henning
2016-01-01
We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.
Analysis of capture-recapture models with individual covariates using data augmentation
Royle, J. Andrew
2009-01-01
I consider the analysis of capture-recapture models with individual covariates that influence detection probability. Bayesian analysis of the joint likelihood is carried out using a flexible data augmentation scheme that facilitates analysis by Markov chain Monte Carlo methods, and a simple and straightforward implementation in freely available software. This approach is applied to a study of meadow voles (Microtus pennsylvanicus) in which auxiliary data on a continuous covariate (body mass) are recorded, and it is thought that detection probability is related to body mass. In a second example, the model is applied to an aerial waterfowl survey in which a double-observer protocol is used. The fundamental unit of observation is the cluster of individual birds, and the size of the cluster (a discrete covariate) is used as a covariate on detection probability.
Improving chemical species tomography of turbulent flows using covariance estimation.
Grauer, Samuel J; Hadwin, Paul J; Daun, Kyle J
2017-05-01
Chemical species tomography (CST) experiments can be divided into limited-data and full-rank cases. Both require solving ill-posed inverse problems, and thus the measurement data must be supplemented with prior information to carry out reconstructions. The Bayesian framework formalizes the role of additive information, expressed as the mean and covariance of a joint-normal prior probability density function. We present techniques for estimating the spatial covariance of a flow under limited-data and full-rank conditions. Our results show that incorporating a covariance estimate into CST reconstruction via a Bayesian prior increases the accuracy of instantaneous estimates. Improvements are especially dramatic in real-time limited-data CST, which is directly applicable to many industrially relevant experiments.
A three domain covariance framework for EEG/MEG data.
Roś, Beata P; Bijma, Fetsje; de Gunst, Mathisca C M; de Munck, Jan C
2015-10-01
In this paper we introduce a covariance framework for the analysis of single subject EEG and MEG data that takes into account observed temporal stationarity on small time scales and trial-to-trial variations. We formulate a model for the covariance matrix, which is a Kronecker product of three components that correspond to space, time and epochs/trials, and consider maximum likelihood estimation of the unknown parameter values. An iterative algorithm that finds approximations of the maximum likelihood estimates is proposed. Our covariance model is applicable in a variety of cases where spontaneous EEG or MEG acts as source of noise and realistic noise covariance estimates are needed, such as in evoked activity studies, or where the properties of spontaneous EEG or MEG are themselves the topic of interest, like in combined EEG-fMRI experiments in which the correlation between EEG and fMRI signals is investigated. We use a simulation study to assess the performance of the estimator and investigate the influence of different assumptions about the covariance factors on the estimated covariance matrix and on its components. We apply our method to real EEG and MEG data sets. Copyright © 2015 Elsevier Inc. All rights reserved.
(13)C-(15)N correlation via unsymmetrical indirect covariance NMR: application to vinblastine.
Martin, Gary E; Hilton, Bruce D; Blinov, Kirill A; Williams, Antony J
2007-12-01
Unsymmetrical indirect covariance processing methods allow the derivation of hyphenated 2D NMR data from the component 2D spectra, potentially circumventing the acquisition of the much lower sensitivity hyphenated 2D NMR experimental data. Calculation of HSQC-COSY and HSQC-NOESY spectra from GHSQC, COSY, and NOESY spectra, respectively, has been reported. The use of unsymmetrical indirect covariance processing has also been applied to the combination of (1)H- (13)C GHSQC and (1)H- (15)N long-range correlation data (GHMBC, IMPEACH, or CIGAR-HMBC). The application of unsymmetrical indirect covariance processing to spectra of vinblastine is now reported, specifically the algorithmic extraction of (13)C- (15)N correlations via the unsymmetrical indirect covariance processing of the combination of (1)H- (13)C GHSQC and long-range (1)H- (15)N GHMBC to produce the equivalent of a (13)C- (15)N HSQC-HMBC correlation spectrum. The elimination of artifact responses with aromatic solvent-induced shifts (ASIS) is shown in addition to a method of forecasting potential artifact responses through the indirect covariance processing of the GHSQC spectrum used in the unsymmetrical indirect covariance processing.
NASA Astrophysics Data System (ADS)
Cossarini, Gianpiero; D'Ortenzio, Fabrizio; Mariotti, Laura; Mignot, Alexandre; Salon, Stefano
2017-04-01
The Mediterranean Sea is a very promising site to develop and test the assimilation of Bio-Argo data since 1) the Bio-Argo network is one of the densest of the global ocean, and 2) a consolidate data assimilation framework of biogeochemical variables (3DVAR-BIO, presently based on assimilation of satellite-estimated surface chlorophyll data) already exists within the CMEMS biogeochemical model system for Mediterranean Sea. The MASSIMILI project, granted by the CMEMS Service Evolution initiative, is aimed to develop the assimilation of Bio-Argo Floats data into the CMEMS biogeochemical model system of the Mediterranean Sea, by means of an upgrade of the 3DVAR-BIO scheme. Specific developments of the 3DVAR-BIO scheme focus on the estimate of new operators of the variational decomposition of the background error covariance matrix and on the implementation of the new observation operator specifically for the Bio-Argo float vertical profile data. In particular, a new horizontal covariance operator for chlorophyll, nitrate and oxygen is based on 3D fields of horizontal correlation radius calculated from a long-term reanalysis simulation. A new vertical covariance operator is built on monthly and spatial varying EOF decomposition to account for the spatiotemporal variability of vertical structure of the three variables error covariance. Further, the observation error covariance is a key factor for an effective assimilation of the Bio-Argo data into the model dynamics. The sensitivities of assimilation to the different factors are estimated. First results of the implementation of the new 3DVAR-BIO scheme show the impact of Bio-Argo data on the 3D fields of chlorophyll, nitrate and oxygen. Tuning the length scale factors of horizontal covariance, analysing the sensitivity of the observation error covariance, introducing non-diagonal biogeochemical covariance operator and non-diagonal multi-platform operator (i.e. Bio-Argo and satellite) are crucial future steps for the success of the MASSIMILI project. In our contribute, we will discuss the recent and promising advancements this strategic project has been having in the past year and its potential for the whole operational biogeochemical modelling community.
Simplification of the Kalman filter for meteorological data assimilation
NASA Technical Reports Server (NTRS)
Dee, Dick P.
1991-01-01
The paper proposes a new statistical method of data assimilation that is based on a simplification of the Kalman filter equations. The forecast error covariance evolution is approximated simply by advecting the mass-error covariance field, deriving the remaining covariances geostrophically, and accounting for external model-error forcing only at the end of each forecast cycle. This greatly reduces the cost of computation of the forecast error covariance. In simulations with a linear, one-dimensional shallow-water model and data generated artificially, the performance of the simplified filter is compared with that of the Kalman filter and the optimal interpolation (OI) method. The simplified filter produces analyses that are nearly optimal, and represents a significant improvement over OI.
Ding, Aidong Adam; Hsieh, Jin-Jian; Wang, Weijing
2015-01-01
Bivariate survival analysis has wide applications. In the presence of covariates, most literature focuses on studying their effects on the marginal distributions. However covariates can also affect the association between the two variables. In this article we consider the latter issue by proposing a nonstandard local linear estimator for the concordance probability as a function of covariates. Under the Clayton copula, the conditional concordance probability has a simple one-to-one correspondence with the copula parameter for different data structures including those subject to independent or dependent censoring and dependent truncation. The proposed method can be used to study how covariates affect the Clayton association parameter without specifying marginal regression models. Asymptotic properties of the proposed estimators are derived and their finite-sample performances are examined via simulations. Finally, for illustration, we apply the proposed method to analyze a bone marrow transplant data set.
Generating survival times to simulate Cox proportional hazards models with time-varying covariates.
Austin, Peter C
2012-12-20
Simulations and Monte Carlo methods serve an important role in modern statistical research. They allow for an examination of the performance of statistical procedures in settings in which analytic and mathematical derivations may not be feasible. A key element in any statistical simulation is the existence of an appropriate data-generating process: one must be able to simulate data from a specified statistical model. We describe data-generating processes for the Cox proportional hazards model with time-varying covariates when event times follow an exponential, Weibull, or Gompertz distribution. We consider three types of time-varying covariates: first, a dichotomous time-varying covariate that can change at most once from untreated to treated (e.g., organ transplant); second, a continuous time-varying covariate such as cumulative exposure at a constant dose to radiation or to a pharmaceutical agent used for a chronic condition; third, a dichotomous time-varying covariate with a subject being able to move repeatedly between treatment states (e.g., current compliance or use of a medication). In each setting, we derive closed-form expressions that allow one to simulate survival times so that survival times are related to a vector of fixed or time-invariant covariates and to a single time-varying covariate. We illustrate the utility of our closed-form expressions for simulating event times by using Monte Carlo simulations to estimate the statistical power to detect as statistically significant the effect of different types of binary time-varying covariates. This is compared with the statistical power to detect as statistically significant a binary time-invariant covariate. Copyright © 2012 John Wiley & Sons, Ltd.
On the use of the covariance matrix to fit correlated data
NASA Astrophysics Data System (ADS)
D'Agostini, G.
1994-07-01
Best fits to data which are affected by systematic uncertainties on the normalization factor have the tendency to produce curves lower than expected if the covariance matrix of the data points is used in the definition of the χ2. This paper shows that the effect is a direct consequence of the hypothesis used to estimate the empirical covariance matrix, namely the linearization on which the usual error propagation relies. The bias can become unacceptable if the normalization error is large, or a large number of data points are fitted.
NASA Astrophysics Data System (ADS)
Congedo, Marco; Barachant, Alexandre
2015-01-01
Currently the Riemannian geometry of symmetric positive definite (SPD) matrices is gaining momentum as a powerful tool in a wide range of engineering applications such as image, radar and biomedical data signal processing. If the data is not natively represented in the form of SPD matrices, typically we may summarize them in such form by estimating covariance matrices of the data. However once we manipulate such covariance matrices on the Riemannian manifold we lose the representation in the original data space. For instance, we can evaluate the geometric mean of a set of covariance matrices, but not the geometric mean of the data generating the covariance matrices, the space of interest in which the geometric mean can be interpreted. As a consequence, Riemannian information geometry is often perceived by non-experts as a "black-box" tool and this perception prevents a wider adoption in the scientific community. Hereby we show that we can overcome this limitation by constructing a special form of SPD matrix embedding both the covariance structure of the data and the data itself. Incidentally, whenever the original data can be represented in the form of a generic data matrix (not even square), this special SPD matrix enables an exhaustive and unique description of the data up to second-order statistics. This is achieved embedding the covariance structure of both the rows and columns of the data matrix, allowing naturally a wide range of possible applications and bringing us over and above just an interpretability issue. We demonstrate the method by manipulating satellite images (pansharpening) and event-related potentials (ERPs) of an electroencephalography brain-computer interface (BCI) study. The first example illustrates the effect of moving along geodesics in the original data space and the second provides a novel estimation of ERP average (geometric mean), showing that, in contrast to the usual arithmetic mean, this estimation is robust to outliers. In conclusion, we are able to show that the Riemannian concepts of distance, geometric mean, moving along a geodesic, etc. can be readily transposed into a generic data space, whatever this data space represents.
A generalized spatiotemporal covariance model for stationary background in analysis of MEG data.
Plis, S M; Schmidt, D M; Jun, S C; Ranken, D M
2006-01-01
Using a noise covariance model based on a single Kronecker product of spatial and temporal covariance in the spatiotemporal analysis of MEG data was demonstrated to provide improvement in the results over that of the commonly used diagonal noise covariance model. In this paper we present a model that is a generalization of all of the above models. It describes models based on a single Kronecker product of spatial and temporal covariance as well as more complicated multi-pair models together with any intermediate form expressed as a sum of Kronecker products of spatial component matrices of reduced rank and their corresponding temporal covariance matrices. The model provides a framework for controlling the tradeoff between the described complexity of the background and computational demand for the analysis using this model. Ways to estimate the value of the parameter controlling this tradeoff are also discussed.
Relative-Error-Covariance Algorithms
NASA Technical Reports Server (NTRS)
Bierman, Gerald J.; Wolff, Peter J.
1991-01-01
Two algorithms compute error covariance of difference between optimal estimates, based on data acquired during overlapping or disjoint intervals, of state of discrete linear system. Provides quantitative measure of mutual consistency or inconsistency of estimates of states. Relative-error-covariance concept applied, to determine degree of correlation between trajectories calculated from two overlapping sets of measurements and construct real-time test of consistency of state estimates based upon recently acquired data.
Nasejje, Justine B; Mwambi, Henry; Dheda, Keertan; Lesosky, Maia
2017-07-28
Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points). The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.
Structural covariance networks across the life span, from 6 to 94 years of age.
DuPre, Elizabeth; Spreng, R Nathan
2017-10-01
Structural covariance examines covariation of gray matter morphology between brain regions and across individuals. Despite significant interest in the influence of age on structural covariance patterns, no study to date has provided a complete life span perspective-bridging childhood with early, middle, and late adulthood-on the development of structural covariance networks. Here, we investigate the life span trajectories of structural covariance in six canonical neurocognitive networks: default, dorsal attention, frontoparietal control, somatomotor, ventral attention, and visual. By combining data from five open-access data sources, we examine the structural covariance trajectories of these networks from 6 to 94 years of age in a sample of 1,580 participants. Using partial least squares, we show that structural covariance patterns across the life span exhibit two significant, age-dependent trends. The first trend is a stable pattern whose integrity declines over the life span. The second trend is an inverted-U that differentiates young adulthood from other age groups. Hub regions, including posterior cingulate cortex and anterior insula, appear particularly influential in the expression of this second age-dependent trend. Overall, our results suggest that structural covariance provides a reliable definition of neurocognitive networks across the life span and reveal both shared and network-specific trajectories.
Structural covariance networks across the life span, from 6 to 94 years of age
DuPre, Elizabeth; Spreng, R. Nathan
2017-01-01
Structural covariance examines covariation of gray matter morphology between brain regions and across individuals. Despite significant interest in the influence of age on structural covariance patterns, no study to date has provided a complete life span perspective—bridging childhood with early, middle, and late adulthood—on the development of structural covariance networks. Here, we investigate the life span trajectories of structural covariance in six canonical neurocognitive networks: default, dorsal attention, frontoparietal control, somatomotor, ventral attention, and visual. By combining data from five open-access data sources, we examine the structural covariance trajectories of these networks from 6 to 94 years of age in a sample of 1,580 participants. Using partial least squares, we show that structural covariance patterns across the life span exhibit two significant, age-dependent trends. The first trend is a stable pattern whose integrity declines over the life span. The second trend is an inverted-U that differentiates young adulthood from other age groups. Hub regions, including posterior cingulate cortex and anterior insula, appear particularly influential in the expression of this second age-dependent trend. Overall, our results suggest that structural covariance provides a reliable definition of neurocognitive networks across the life span and reveal both shared and network-specific trajectories. PMID:29855624
ERIC Educational Resources Information Center
Muthen, Bengt
This paper investigates methods that avoid using multiple groups to represent the missing data patterns in covariance structure modeling, attempting instead to do a single-group analysis where the only action the analyst has to take is to indicate that data is missing. A new covariance structure approach developed by B. Muthen and G. Arminger is…
NASA Astrophysics Data System (ADS)
Falchetti, Silvia; Alvarez, Alberto
2018-04-01
Data assimilation through an ensemble Kalman filter (EnKF) is not exempt from deficiencies, including the generation of long-range unphysical correlations that degrade its performance. The covariance localization technique has been proposed and used in previous research to mitigate this effect. However, an evaluation of its performance is usually hindered by the sparseness and unsustained collection of independent observations. This article assesses the performance of an ocean prediction system composed of a multivariate EnKF coupled with a regional configuration of the Regional Ocean Model System (ROMS) with a covariance localization solution and data assimilation from an ocean glider that operated over a limited region of the Ligurian Sea. Simultaneous with the operation of the forecast system, a high-quality data set was repeatedly collected with a CTD sensor, i.e., every day during the period from 5 to 20 August 2013 (approximately 4 to 5 times the synoptic time scale of the area), located on board the NR/V Alliance for model validation. Comparisons between the validation data set and the forecasts provide evidence that the performance of the prediction system with covariance localization is superior to that observed using only EnKF assimilation without localization or using a free run ensemble. Furthermore, it is shown that covariance localization also increases the robustness of the model to the location of the assimilated data. Our analysis reveals that improvements are detected with regard to not only preventing the occurrence of spurious correlations but also preserving the spatial coherence in the updated covariance matrix. Covariance localization has been shown to be relevant in operational frameworks where short-term forecasts (on the order of days) are required.
ERIC Educational Resources Information Center
Moeyaert, Mariola; Ugille, Maaike; Ferron, John M.; Beretvas, S. Natasha; Van den Noortgate, Wim
2016-01-01
The impact of misspecifying covariance matrices at the second and third levels of the three-level model is evaluated. Results indicate that ignoring existing covariance has no effect on the treatment effect estimate. In addition, the between-case variance estimates are unbiased when covariance is either modeled or ignored. If the research interest…
Marginalized zero-inflated Poisson models with missing covariates.
Benecha, Habtamu K; Preisser, John S; Divaris, Kimon; Herring, Amy H; Das, Kalyan
2018-05-11
Unlike zero-inflated Poisson regression, marginalized zero-inflated Poisson (MZIP) models for counts with excess zeros provide estimates with direct interpretations for the overall effects of covariates on the marginal mean. In the presence of missing covariates, MZIP and many other count data models are ordinarily fitted using complete case analysis methods due to lack of appropriate statistical methods and software. This article presents an estimation method for MZIP models with missing covariates. The method, which is applicable to other missing data problems, is illustrated and compared with complete case analysis by using simulations and dental data on the caries preventive effects of a school-based fluoride mouthrinse program. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
ERIC Educational Resources Information Center
Cook, Thomas D.; Steiner, Peter M.; Pohl, Steffi
2009-01-01
This study uses within-study comparisons to assess the relative importance of covariate choice, unreliability in the measurement of these covariates, and whether regression or various forms of propensity score analysis are used to analyze the outcome data. Two of the within-study comparisons are of the four-arm type, and many more are of the…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sartori, E.; Roussin, R.W.
This paper presents a brief review of computer codes concerned with checking, plotting, processing and using of covariances of neutron cross-section data. It concentrates on those available from the computer code information centers of the United States and the OECD/Nuclear Energy Agency. Emphasis will be placed also on codes using covariances for specific applications such as uncertainty analysis, data adjustment and data consistency analysis. Recent evaluations contain neutron cross section covariance information for all isotopes of major importance for technological applications of nuclear energy. It is therefore important that the available software tools needed for taking advantage of this informationmore » are widely known as hey permit the determination of better safety margins and allow the optimization of more economic, I designs of nuclear energy systems.« less
A statistical approach for isolating fossil fuel emissions in atmospheric inverse problems
Yadav, Vineet; Michalak, Anna M.; Ray, Jaideep; ...
2016-10-27
We study independent verification and quantification of fossil fuel (FF) emissions that constitutes a considerable scientific challenge. By coupling atmospheric observations of CO 2 with models of atmospheric transport, inverse models offer the possibility of overcoming this challenge. However, disaggregating the biospheric and FF flux components of terrestrial fluxes from CO 2 concentration measurements has proven to be difficult, due to observational and modeling limitations. In this study, we propose a statistical inverse modeling scheme for disaggregating winter time fluxes on the basis of their unique error covariances and covariates, where these covariances and covariates are representative of the underlyingmore » processes affecting FF and biospheric fluxes. The application of the method is demonstrated with one synthetic and two real data prototypical inversions by using in situ CO 2 measurements over North America. Also, inversions are performed only for the month of January, as predominance of biospheric CO 2 signal relative to FF CO 2 signal and observational limitations preclude disaggregation of the fluxes in other months. The quality of disaggregation is assessed primarily through examination of a posteriori covariance between disaggregated FF and biospheric fluxes at regional scales. Findings indicate that the proposed method is able to robustly disaggregate fluxes regionally at monthly temporal resolution with a posteriori cross covariance lower than 0.15 µmol m -2 s -1 between FF and biospheric fluxes. Error covariance models and covariates based on temporally varying FF inventory data provide a more robust disaggregation over static proxies (e.g., nightlight intensity and population density). However, the synthetic data case study shows that disaggregation is possible even in absence of detailed temporally varying FF inventory data.« less
Smooth individual level covariates adjustment in disease mapping.
Huque, Md Hamidul; Anderson, Craig; Walton, Richard; Woolford, Samuel; Ryan, Louise
2018-05-01
Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available "indiCAR" model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log-linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non-log-linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth-indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two-step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth-indiCAR through simulation. Our results indicate that the smooth-indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian hierarchical model for large-scale covariance matrix estimation.
Zhu, Dongxiao; Hero, Alfred O
2007-12-01
Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.
A Framework for Propagation of Uncertainties in the Kepler Data Analysis Pipeline
NASA Technical Reports Server (NTRS)
Clarke, Bruce D.; Allen, Christopher; Bryson, Stephen T.; Caldwell, Douglas A.; Chandrasekaran, Hema; Cote, Miles T.; Girouard, Forrest; Jenkins, Jon M.; Klaus, Todd C.; Li, Jie;
2010-01-01
The Kepler space telescope is designed to detect Earth-like planets around Sun-like stars using transit photometry by simultaneously observing 100,000 stellar targets nearly continuously over a three and a half year period. The 96-megapixel focal plane consists of 42 charge-coupled devices (CCD) each containing two 1024 x 1100 pixel arrays. Cross-correlations between calibrated pixels are introduced by common calibrations performed on each CCD requiring downstream data products access to the calibrated pixel covariance matrix in order to properly estimate uncertainties. The prohibitively large covariance matrices corresponding to the 75,000 calibrated pixels per CCD preclude calculating and storing the covariance in standard lock-step fashion. We present a novel framework used to implement standard propagation of uncertainties (POU) in the Kepler Science Operations Center (SOC) data processing pipeline. The POU framework captures the variance of the raw pixel data and the kernel of each subsequent calibration transformation allowing the full covariance matrix of any subset of calibrated pixels to be recalled on-the-fly at any step in the calibration process. Singular value decomposition (SVD) is used to compress and low-pass filter the raw uncertainty data as well as any data dependent kernels. The combination of POU framework and SVD compression provide downstream consumers of the calibrated pixel data access to the full covariance matrix of any subset of the calibrated pixels traceable to pixel level measurement uncertainties without having to store, retrieve and operate on prohibitively large covariance matrices. We describe the POU Framework and SVD compression scheme and its implementation in the Kepler SOC pipeline.
NASA Astrophysics Data System (ADS)
Ballard, S.; Hipp, J. R.; Encarnacao, A.; Young, C. J.; Begnaud, M. L.; Phillips, W. S.
2012-12-01
Seismic event locations can be made more accurate and precise by computing predictions of seismic travel time through high fidelity 3D models of the wave speed in the Earth's interior. Given the variable data quality and uneven data sampling associated with this type of model, it is essential that there be a means to calculate high-quality estimates of the path-dependent variance and covariance associated with the predicted travel times of ray paths through the model. In this paper, we describe a methodology for accomplishing this by exploiting the full model covariance matrix and show examples of path-dependent travel time prediction uncertainty computed from SALSA3D, our global, seamless 3D tomographic P-velocity model. Typical global 3D models have on the order of 1/2 million nodes, so the challenge in calculating the covariance matrix is formidable: 0.9 TB storage for 1/2 of a symmetric matrix, necessitating an Out-Of-Core (OOC) blocked matrix solution technique. With our approach the tomography matrix (G which includes Tikhonov regularization terms) is multiplied by its transpose (GTG) and written in a blocked sub-matrix fashion. We employ a distributed parallel solution paradigm that solves for (GTG)-1 by assigning blocks to individual processing nodes for matrix decomposition update and scaling operations. We first find the Cholesky decomposition of GTG which is subsequently inverted. Next, we employ OOC matrix multiplication methods to calculate the model covariance matrix from (GTG)-1 and an assumed data covariance matrix. Given the model covariance matrix, we solve for the travel-time covariance associated with arbitrary ray-paths by summing the model covariance along both ray paths. Setting the paths equal and taking the square root yields the travel prediction uncertainty for the single path.
Treating Sample Covariances for Use in Strongly Coupled Atmosphere-Ocean Data Assimilation
NASA Astrophysics Data System (ADS)
Smith, Polly J.; Lawless, Amos S.; Nichols, Nancy K.
2018-01-01
Strongly coupled data assimilation requires cross-domain forecast error covariances; information from ensembles can be used, but limited sampling means that ensemble derived error covariances are routinely rank deficient and/or ill-conditioned and marred by noise. Thus, they require modification before they can be incorporated into a standard assimilation framework. Here we compare methods for improving the rank and conditioning of multivariate sample error covariance matrices for coupled atmosphere-ocean data assimilation. The first method, reconditioning, alters the matrix eigenvalues directly; this preserves the correlation structures but does not remove sampling noise. We show that it is better to recondition the correlation matrix rather than the covariance matrix as this prevents small but dynamically important modes from being lost. The second method, model state-space localization via the Schur product, effectively removes sample noise but can dampen small cross-correlation signals. A combination that exploits the merits of each is found to offer an effective alternative.
Meta-STEPP: subpopulation treatment effect pattern plot for individual patient data meta-analysis.
Wang, Xin Victoria; Cole, Bernard; Bonetti, Marco; Gelber, Richard D
2016-09-20
We have developed a method, called Meta-STEPP (subpopulation treatment effect pattern plot for meta-analysis), to explore treatment effect heterogeneity across covariate values in the meta-analysis setting for time-to-event data when the covariate of interest is continuous. Meta-STEPP forms overlapping subpopulations from individual patient data containing similar numbers of events with increasing covariate values, estimates subpopulation treatment effects using standard fixed-effects meta-analysis methodology, displays the estimated subpopulation treatment effect as a function of the covariate values, and provides a statistical test to detect possibly complex treatment-covariate interactions. Simulation studies show that this test has adequate type-I error rate recovery as well as power when reasonable window sizes are chosen. When applied to eight breast cancer trials, Meta-STEPP suggests that chemotherapy is less effective for tumors with high estrogen receptor expression compared with those with low expression. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Treatment decisions based on scalar and functional baseline covariates.
Ciarleglio, Adam; Petkova, Eva; Ogden, R Todd; Tarpey, Thaddeus
2015-12-01
The amount and complexity of patient-level data being collected in randomized-controlled trials offer both opportunities and challenges for developing personalized rules for assigning treatment for a given disease or ailment. For example, trials examining treatments for major depressive disorder are not only collecting typical baseline data such as age, gender, or scores on various tests, but also data that measure the structure and function of the brain such as images from magnetic resonance imaging (MRI), functional MRI (fMRI), or electroencephalography (EEG). These latter types of data have an inherent structure and may be considered as functional data. We propose an approach that uses baseline covariates, both scalars and functions, to aid in the selection of an optimal treatment. In addition to providing information on which treatment should be selected for a new patient, the estimated regime has the potential to provide insight into the relationship between treatment response and the set of baseline covariates. Our approach can be viewed as an extension of "advantage learning" to include both scalar and functional covariates. We describe our method and how to implement it using existing software. Empirical performance of our method is evaluated with simulated data in a variety of settings and also applied to data arising from a study of patients with major depressive disorder from whom baseline scalar covariates as well as functional data from EEG are available. © 2015, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Holmes, Jesse Curtis
Nuclear data libraries provide fundamental reaction information required by nuclear system simulation codes. The inclusion of data covariances in these libraries allows the user to assess uncertainties in system response parameters as a function of uncertainties in the nuclear data. Formats and procedures are currently established for representing covariances for various types of reaction data in ENDF libraries. This covariance data is typically generated utilizing experimental measurements and empirical models, consistent with the method of parent data production. However, ENDF File 7 thermal neutron scattering library data is, by convention, produced theoretically through fundamental scattering physics model calculations. Currently, there is no published covariance data for ENDF File 7 thermal libraries. Furthermore, no accepted methodology exists for quantifying or representing uncertainty information associated with this thermal library data. The quality of thermal neutron inelastic scattering cross section data can be of high importance in reactor analysis and criticality safety applications. These cross sections depend on the material's structure and dynamics. The double-differential scattering law, S(alpha, beta), tabulated in ENDF File 7 libraries contains this information. For crystalline solids, S(alpha, beta) is primarily a function of the material's phonon density of states (DOS). Published ENDF File 7 libraries are commonly produced by calculation and processing codes, such as the LEAPR module of NJOY, which utilize the phonon DOS as the fundamental input for inelastic scattering calculations to directly output an S(alpha, beta) matrix. To determine covariances for the S(alpha, beta) data generated by this process, information about uncertainties in the DOS is required. The phonon DOS may be viewed as a probability density function of atomic vibrational energy states that exist in a material. Probable variation in the shape of this spectrum may be established that depends on uncertainties in the physics models and methodology employed to produce the DOS. Through Monte Carlo sampling of perturbations from the reference phonon spectrum, an S(alpha, beta) covariance matrix may be generated. In this work, density functional theory and lattice dynamics in the harmonic approximation are used to calculate the phonon DOS for hexagonal crystalline graphite. This form of graphite is used as an example material for the purpose of demonstrating procedures for analyzing, calculating and processing thermal neutron inelastic scattering uncertainty information. Several sources of uncertainty in thermal neutron inelastic scattering calculations are examined, including sources which cannot be directly characterized through a description of the phonon DOS uncertainty, and their impacts are evaluated. Covariances for hexagonal crystalline graphite S(alpha, beta) data are quantified by coupling the standard methodology of LEAPR with a Monte Carlo sampling process. The mechanics of efficiently representing and processing this covariance information is also examined. Finally, with appropriate sensitivity information, it is shown that an S(alpha, beta) covariance matrix can be propagated to generate covariance data for integrated cross sections, secondary energy distributions, and coupled energy-angle distributions. This approach enables a complete description of thermal neutron inelastic scattering cross section uncertainties which may be employed to improve the simulation of nuclear systems.
Kustatscher, Georg; Grabowski, Piotr; Rappsilber, Juri
2016-02-01
Subcellular localization is an important aspect of protein function, but the protein composition of many intracellular compartments is poorly characterized. For example, many nuclear bodies are challenging to isolate biochemically and thus remain inaccessible to proteomics. Here, we explore covariation in proteomics data as an alternative route to subcellular proteomes. Rather than targeting a structure of interest biochemically, we target it by machine learning. This becomes possible by taking data obtained for one organelle and searching it for traces of another organelle. As an extreme example and proof-of-concept we predict mitochondrial proteins based on their covariation in published interphase chromatin data. We detect about ⅓ of the known mitochondrial proteins in our chromatin data, presumably most as contaminants. However, these proteins are not present at random. We show covariation of mitochondrial proteins in chromatin proteomics data. We then exploit this covariation by multiclassifier combinatorial proteomics to define a list of mitochondrial proteins. This list agrees well with different databases on mitochondrial composition. This benchmark test raises the possibility that, in principle, covariation proteomics may also be applicable to structures for which no biochemical isolation procedures are available. © 2015 The Authors. Proteomics Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Seaman, Shaun R; White, Ian R; Carpenter, James R
2015-01-01
Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available. PMID:24525487
Treatment selection in a randomized clinical trial via covariate-specific treatment effect curves.
Ma, Yunbei; Zhou, Xiao-Hua
2017-02-01
For time-to-event data in a randomized clinical trial, we proposed two new methods for selecting an optimal treatment for a patient based on the covariate-specific treatment effect curve, which is used to represent the clinical utility of a predictive biomarker. To select an optimal treatment for a patient with a specific biomarker value, we proposed pointwise confidence intervals for each covariate-specific treatment effect curve and the difference between covariate-specific treatment effect curves of two treatments. Furthermore, to select an optimal treatment for a future biomarker-defined subpopulation of patients, we proposed confidence bands for each covariate-specific treatment effect curve and the difference between each pair of covariate-specific treatment effect curve over a fixed interval of biomarker values. We constructed the confidence bands based on a resampling technique. We also conducted simulation studies to evaluate finite-sample properties of the proposed estimation methods. Finally, we illustrated the application of the proposed method in a real-world data set.
Bayes Factor Covariance Testing in Item Response Models.
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
2017-12-01
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Covariate Selection for Multilevel Models with Missing Data
Marino, Miguel; Buxton, Orfeu M.; Li, Yi
2017-01-01
Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
Estimation of covariate-specific time-dependent ROC curves in the presence of missing biomarkers.
Li, Shanshan; Ning, Yang
2015-09-01
Covariate-specific time-dependent ROC curves are often used to evaluate the diagnostic accuracy of a biomarker with time-to-event outcomes, when certain covariates have an impact on the test accuracy. In many medical studies, measurements of biomarkers are subject to missingness due to high cost or limitation of technology. This article considers estimation of covariate-specific time-dependent ROC curves in the presence of missing biomarkers. To incorporate the covariate effect, we assume a proportional hazards model for the failure time given the biomarker and the covariates, and a semiparametric location model for the biomarker given the covariates. In the presence of missing biomarkers, we propose a simple weighted estimator for the ROC curves where the weights are inversely proportional to the selection probability. We also propose an augmented weighted estimator which utilizes information from the subjects with missing biomarkers. The augmented weighted estimator enjoys the double-robustness property in the sense that the estimator remains consistent if either the missing data process or the conditional distribution of the missing data given the observed data is correctly specified. We derive the large sample properties of the proposed estimators and evaluate their finite sample performance using numerical studies. The proposed approaches are illustrated using the US Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. © 2015, The International Biometric Society.
Utilizing soil polypedons to improve model performance for digital soil mapping
USDA-ARS?s Scientific Manuscript database
Most digital soil mapping approaches that use point data to develop relationships with covariate data intersect sample locations with one raster pixel regardless of pixel size. Resulting models are subject to spurious values in covariate data which may limit model performance. An alternative approac...
Linear and nonlinear variable selection in competing risks data.
Ren, Xiaowei; Li, Shanshan; Shen, Changyu; Yu, Zhangsheng
2018-06-15
Subdistribution hazard model for competing risks data has been applied extensively in clinical researches. Variable selection methods of linear effects for competing risks data have been studied in the past decade. There is no existing work on selection of potential nonlinear effects for subdistribution hazard model. We propose a two-stage procedure to select the linear and nonlinear covariate(s) simultaneously and estimate the selected covariate effect(s). We use spectral decomposition approach to distinguish the linear and nonlinear parts of each covariate and adaptive LASSO to select each of the 2 components. Extensive numerical studies are conducted to demonstrate that the proposed procedure can achieve good selection accuracy in the first stage and small estimation biases in the second stage. The proposed method is applied to analyze a cardiovascular disease data set with competing death causes. Copyright © 2018 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Hipp, J. R.; Ballard, S.; Begnaud, M. L.; Encarnacao, A. V.; Young, C. J.; Phillips, W. S.
2015-12-01
Recently our combined SNL-LANL research team has succeeded in developing a global, seamless 3D tomographic P- and S-velocity model (SALSA3D) that provides superior first P and first S travel time predictions at both regional and teleseismic distances. However, given the variable data quality and uneven data sampling associated with this type of model, it is essential that there be a means to calculate high-quality estimates of the path-dependent variance and covariance associated with the predicted travel times of ray paths through the model. In this paper, we describe a methodology for accomplishing this by exploiting the full model covariance matrix and show examples of path-dependent travel time prediction uncertainty computed from our latest tomographic model. Typical global 3D SALSA3D models have on the order of 1/2 million nodes, so the challenge in calculating the covariance matrix is formidable: 0.9 TB storage for 1/2 of a symmetric matrix, necessitating an Out-Of-Core (OOC) blocked matrix solution technique. With our approach the tomography matrix (G which includes a prior model covariance constraint) is multiplied by its transpose (GTG) and written in a blocked sub-matrix fashion. We employ a distributed parallel solution paradigm that solves for (GTG)-1 by assigning blocks to individual processing nodes for matrix decomposition update and scaling operations. We first find the Cholesky decomposition of GTG which is subsequently inverted. Next, we employ OOC matrix multiplication methods to calculate the model covariance matrix from (GTG)-1 and an assumed data covariance matrix. Given the model covariance matrix, we solve for the travel-time covariance associated with arbitrary ray-paths by summing the model covariance along both ray paths. Setting the paths equal and taking the square root yields the travel prediction uncertainty for the single path.
Yahya, Noorazrul; Chua, Xin-Jane; Manan, Hanani A; Ismail, Fuad
2018-05-17
This systematic review evaluates the completeness of dosimetric features and their inclusion as covariates in genetic-toxicity association studies. Original research studies associating genetic features and normal tissue complications following radiotherapy were identified from PubMed. The use of dosimetric data was determined by mining the statement of prescription dose, dose fractionation, target volume selection or arrangement and dose distribution. The consideration of the dosimetric data as covariates was based on the statement mentioned in the statistical analysis section. The significance of these covariates was extracted from the results section. Descriptive analyses were performed to determine their completeness and inclusion as covariates. A total of 174 studies were found to satisfy the inclusion criteria. Studies published ≥2010 showed increased use of dose distribution information (p = 0.07). 33% of studies did not include any dose features in the analysis of gene-toxicity associations. Only 29% included dose distribution features as covariates and reported the results. 59% of studies which included dose distribution features found significant associations to toxicity. A large proportion of studies on the correlation of genetic markers with radiotherapy-related side effects considered no dosimetric parameters. Significance of dose distribution features was found in more than half of the studies including these features, emphasizing their importance. Completeness of radiation-specific clinical data may have increased in recent years which may improve gene-toxicity association studies.
Working covariance model selection for generalized estimating equations.
Carey, Vincent J; Wang, You-Gan
2011-11-20
We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice. Copyright © 2011 John Wiley & Sons, Ltd.
Covariance Data File Formats for Whisper-1.0 & Whisper-1.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Forrest B.; Rising, Michael Evan
2017-01-09
Whisper is a statistical analysis package developed in 2014 to support nuclear criticality safety (NCS) validation. It uses the sensitivity profile data for an application as computed by MCNP6 along with covariance files for the nuclear data to determine a baseline upper-subcritical-limit (USL) for the application. Whisper version 1.0 was first developed and used at LANL in 2014. During 2015-2016, Whisper was updated to version 1.1 and is to be included with the upcoming release of MCNP6.2. This report describes the file formats used for the covariance data in both Whisper-1.0 and Whisper-1.1.
Large Covariance Estimation by Thresholding Principal Orthogonal Complements
Fan, Jianqing; Liao, Yuan; Mincheva, Martina
2012-01-01
This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented. PMID:24348088
Large Covariance Estimation by Thresholding Principal Orthogonal Complements.
Fan, Jianqing; Liao, Yuan; Mincheva, Martina
2013-09-01
This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling
ERIC Educational Resources Information Center
Lee, Taehun; Cai, Li
2012-01-01
Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…
Logistic regression of family data from retrospective study designs.
Whittemore, Alice S; Halpern, Jerry
2003-11-01
We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Abd-Elmotaal, Hussein; Kühtreiber, Norbert
2016-04-01
In the framework of the IAG African Geoid Project, there are a lot of large data gaps in its gravity database. These gaps are filled initially using unequal weight least-squares prediction technique. This technique uses a generalized Hirvonen covariance function model to replace the empirically determined covariance function. The generalized Hirvonen covariance function model has a sensitive parameter which is related to the curvature parameter of the covariance function at the origin. This paper studies the effect of the curvature parameter on the least-squares prediction results, especially in the large data gaps as appearing in the African gravity database. An optimum estimation of the curvature parameter has also been carried out. A wide comparison among the results obtained in this research along with their obtained accuracy is given and thoroughly discussed.
Construction of Covariance Functions with Variable Length Fields
NASA Technical Reports Server (NTRS)
Gaspari, Gregory; Cohn, Stephen E.; Guo, Jing; Pawson, Steven
2005-01-01
This article focuses on construction, directly in physical space, of three-dimensional covariance functions parametrized by a tunable length field, and on an application of this theory to reproduce the Quasi-Biennial Oscillation (QBO) in the Goddard Earth Observing System, Version 4 (GEOS-4) data assimilation system. These Covariance models are referred to as multi-level or nonseparable, to associate them with the application where a multi-level covariance with a large troposphere to stratosphere length field gradient is used to reproduce the QBO from sparse radiosonde observations in the tropical lower stratosphere. The multi-level covariance functions extend well-known single level covariance functions depending only on a length scale. Generalizations of the first- and third-order autoregressive covariances in three dimensions are given, providing multi-level covariances with zero and three derivatives at zero separation, respectively. Multi-level piecewise rational covariances with two continuous derivatives at zero separation are also provided. Multi-level powerlaw covariances are constructed with continuous derivatives of all orders. Additional multi-level covariance functions are constructed using the Schur product of single and multi-level covariance functions. A multi-level powerlaw covariance used to reproduce the QBO in GEOS-4 is described along with details of the assimilation experiments. The new covariance model is shown to represent the vertical wind shear associated with the QBO much more effectively than in the baseline GEOS-4 system.
Keshavarzi, Sareh; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf; Pakfetrat, Maryam
2012-01-01
BACKGROUND. In many studies with longitudinal data, time-dependent covariates can only be measured intermittently (not at all observation times), and this presents difficulties for standard statistical analyses. This situation is common in medical studies, and methods that deal with this challenge would be useful. METHODS. In this study, we performed the seemingly unrelated regression (SUR) based models, with respect to each observation time in longitudinal data with intermittently observed time-dependent covariates and further compared these models with mixed-effect regression models (MRMs) under three classic imputation procedures. Simulation studies were performed to compare the sample size properties of the estimated coefficients for different modeling choices. RESULTS. In general, the proposed models in the presence of intermittently observed time-dependent covariates showed a good performance. However, when we considered only the observed values of the covariate without any imputations, the resulted biases were greater. The performances of the proposed SUR-based models in comparison with MRM using classic imputation methods were nearly similar with approximately equal amounts of bias and MSE. CONCLUSION. The simulation study suggests that the SUR-based models work as efficiently as MRM in the case of intermittently observed time-dependent covariates. Thus, it can be used as an alternative to MRM.
Are your covariates under control? How normalization can re-introduce covariate effects.
Pain, Oliver; Dudbridge, Frank; Ronald, Angelica
2018-04-30
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.
Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression
ERIC Educational Resources Information Center
Peng, Chao-Ying Joanne; Zhu, Jin
2008-01-01
For the past 25 years, methodological advances have been made in missing data treatment. Most published work has focused on missing data in dependent variables under various conditions. The present study seeks to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression: the…
A formal and data-based comparison of measures of motor-equivalent covariation.
Verrel, Julius
2011-09-15
Different analysis methods have been developed for assessing motor-equivalent organization of movement variability. In the uncontrolled manifold (UCM) method, the structure of variability is analyzed by comparing goal-equivalent and non-goal-equivalent variability components at the level of elemental variables (e.g., joint angles). In contrast, in the covariation by randomization (CR) approach, motor-equivalent organization is assessed by comparing variability at the task level between empirical and decorrelated surrogate data. UCM effects can be due to both covariation among elemental variables and selective channeling of variability to elemental variables with low task sensitivity ("individual variation"), suggesting a link between the UCM and CR method. However, the precise relationship between the notion of covariation in the two approaches has not been analyzed in detail yet. Analysis of empirical and simulated data from a study on manual pointing shows that in general the two approaches are not equivalent, but the respective covariation measures are highly correlated (ρ > 0.7) for two proposed definitions of covariation in the UCM context. For one-dimensional task spaces, a formal comparison is possible and in fact the two notions of covariation are equivalent. In situations in which individual variation does not contribute to UCM effects, for which necessary and sufficient conditions are derived, this entails the equivalence of the UCM and CR analysis. Implications for the interpretation of UCM effects are discussed. Copyright © 2011 Elsevier B.V. All rights reserved.
MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES
Su, Yu-Ru; Wang, Jane-Ling
2018-01-01
There is a surge in medical follow-up studies that include longitudinal covariates in the modeling of survival data. So far, the focus has been largely on right censored survival data. We consider survival data that are subject to both left truncation and right censoring. Left truncation is well known to produce biased sample. The sampling bias issue has been resolved in the literature for the case which involves baseline or time-varying covariates that are observable. The problem remains open however for the important case where longitudinal covariates are present in survival models. A joint likelihood approach has been shown in the literature to provide an effective way to overcome those difficulties for right censored data, but this approach faces substantial additional challenges in the presence of left truncation. Here we thus propose an alternative likelihood to overcome these difficulties and show that the regression coefficient in the survival component can be estimated unbiasedly and efficiently. Issues about the bias for the longitudinal component are discussed. The new approach is illustrated numerically through simulations and data from a multi-center AIDS cohort study. PMID:29479122
VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA
Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu
2009-01-01
We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190
Bruce, Iain P.; Karaman, M. Muge; Rowe, Daniel B.
2012-01-01
The acquisition of sub-sampled data from an array of receiver coils has become a common means of reducing data acquisition time in MRI. Of the various techniques used in parallel MRI, SENSitivity Encoding (SENSE) is one of the most common, making use of a complex-valued weighted least squares estimation to unfold the aliased images. It was recently shown in Bruce et al. [Magn. Reson. Imag. 29(2011):1267–1287] that when the SENSE model is represented in terms of a real-valued isomorphism, it assumes a skew-symmetric covariance between receiver coils, as well as an identity covariance structure between voxels. In this manuscript, we show that not only is the skew-symmetric coil covariance unlike that of real data, but the estimated covariance structure between voxels over a time series of experimental data is not an identity matrix. As such, a new model, entitled SENSE-ITIVE, is described with both revised coil and voxel covariance structures. Both the SENSE and SENSE-ITIVE models are represented in terms of real-valued isomorphisms, allowing for a statistical analysis of reconstructed voxel means, variances, and correlations resulting from the use of different coil and voxel covariance structures used in the reconstruction processes to be conducted. It is shown through both theoretical and experimental illustrations that the miss-specification of the coil and voxel covariance structures in the SENSE model results in a lower standard deviation in each voxel of the reconstructed images, and thus an artificial increase in SNR, compared to the standard deviation and SNR of the SENSE-ITIVE model where both the coil and voxel covariances are appropriately accounted for. It is also shown that there are differences in the correlations induced by the reconstruction operations of both models, and consequently there are differences in the correlations estimated throughout the course of reconstructed time series. These differences in correlations could result in meaningful differences in interpretation of results. PMID:22617147
Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals.
Engemann, Denis A; Gramfort, Alexandre
2015-03-01
Magnetoencephalography and electroencephalography (M/EEG) measure non-invasively the weak electromagnetic fields induced by post-synaptic neural currents. The estimation of the spatial covariance of the signals recorded on M/EEG sensors is a building block of modern data analysis pipelines. Such covariance estimates are used in brain-computer interfaces (BCI) systems, in nearly all source localization methods for spatial whitening as well as for data covariance estimation in beamformers. The rationale for such models is that the signals can be modeled by a zero mean Gaussian distribution. While maximizing the Gaussian likelihood seems natural, it leads to a covariance estimate known as empirical covariance (EC). It turns out that the EC is a poor estimate of the true covariance when the number of samples is small. To address this issue the estimation needs to be regularized. The most common approach downweights off-diagonal coefficients, while more advanced regularization methods are based on shrinkage techniques or generative models with low rank assumptions: probabilistic PCA (PPCA) and factor analysis (FA). Using cross-validation all of these models can be tuned and compared based on Gaussian likelihood computed on unseen data. We investigated these models on simulations, one electroencephalography (EEG) dataset as well as magnetoencephalography (MEG) datasets from the most common MEG systems. First, our results demonstrate that different models can be the best, depending on the number of samples, heterogeneity of sensor types and noise properties. Second, we show that the models tuned by cross-validation are superior to models with hand-selected regularization. Hence, we propose an automated solution to the often overlooked problem of covariance estimation of M/EEG signals. The relevance of the procedure is demonstrated here for spatial whitening and source localization of MEG signals. Copyright © 2015 Elsevier Inc. All rights reserved.
Conditional Covariance Theory and Detect for Polytomous Items
ERIC Educational Resources Information Center
Zhang, Jinming
2007-01-01
This paper extends the theory of conditional covariances to polytomous items. It has been proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, given an appropriately chosen composite is positive if, and only if, the two items…
Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates
ERIC Educational Resources Information Center
Lockwood, J. R.; McCaffrey, Daniel F.
2015-01-01
Regression, weighting and related approaches to estimating a population mean from a sample with nonrandom missing data often rely on the assumption that conditional on covariates, observed samples can be treated as random. Standard methods using this assumption generally will fail to yield consistent estimators when covariates are measured with…
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Zero-inflated count models for longitudinal measurements with heterogeneous random effects.
Zhu, Huirong; Luo, Sheng; DeSantis, Stacia M
2017-08-01
Longitudinal zero-inflated count data arise frequently in substance use research when assessing the effects of behavioral and pharmacological interventions. Zero-inflated count models (e.g. zero-inflated Poisson or zero-inflated negative binomial) with random effects have been developed to analyze this type of data. In random effects zero-inflated count models, the random effects covariance matrix is typically assumed to be homogeneous (constant across subjects). However, in many situations this matrix may be heterogeneous (differ by measured covariates). In this paper, we extend zero-inflated count models to account for random effects heterogeneity by modeling their variance as a function of covariates. We show via simulation that ignoring intervention and covariate-specific heterogeneity can produce biased estimates of covariate and random effect estimates. Moreover, those biased estimates can be rectified by correctly modeling the random effects covariance structure. The methodological development is motivated by and applied to the Combined Pharmacotherapies and Behavioral Interventions for Alcohol Dependence (COMBINE) study, the largest clinical trial of alcohol dependence performed in United States with 1383 individuals.
NASA Astrophysics Data System (ADS)
Hipp, J. R.; Encarnacao, A.; Ballard, S.; Young, C. J.; Phillips, W. S.; Begnaud, M. L.
2011-12-01
Recently our combined SNL-LANL research team has succeeded in developing a global, seamless 3D tomographic P-velocity model (SALSA3D) that provides superior first P travel time predictions at both regional and teleseismic distances. However, given the variable data quality and uneven data sampling associated with this type of model, it is essential that there be a means to calculate high-quality estimates of the path-dependent variance and covariance associated with the predicted travel times of ray paths through the model. In this paper, we show a methodology for accomplishing this by exploiting the full model covariance matrix. Our model has on the order of 1/2 million nodes, so the challenge in calculating the covariance matrix is formidable: 0.9 TB storage for 1/2 of a symmetric matrix, necessitating an Out-Of-Core (OOC) blocked matrix solution technique. With our approach the tomography matrix (G which includes Tikhonov regularization terms) is multiplied by its transpose (GTG) and written in a blocked sub-matrix fashion. We employ a distributed parallel solution paradigm that solves for (GTG)-1 by assigning blocks to individual processing nodes for matrix decomposition update and scaling operations. We first find the Cholesky decomposition of GTG which is subsequently inverted. Next, we employ OOC matrix multiply methods to calculate the model covariance matrix from (GTG)-1 and an assumed data covariance matrix. Given the model covariance matrix we solve for the travel-time covariance associated with arbitrary ray-paths by integrating the model covariance along both ray paths. Setting the paths equal gives variance for that path. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Comparing spatial regression to random forests for large environmental data sets
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputatio...
An estimating equation approach to dimension reduction for longitudinal data
Xu, Kelin; Guo, Wensheng; Xiong, Momiao; Zhu, Liping; Jin, Li
2016-01-01
Sufficient dimension reduction has been extensively explored in the context of independent and identically distributed data. In this article we generalize sufficient dimension reduction to longitudinal data and propose an estimating equation approach to estimating the central mean subspace. The proposed method accounts for the covariance structure within each subject and improves estimation efficiency when the covariance structure is correctly specified. Even if the covariance structure is misspecified, our estimator remains consistent. In addition, our method relaxes distributional assumptions on the covariates and is doubly robust. To determine the structural dimension of the central mean subspace, we propose a Bayesian-type information criterion. We show that the estimated structural dimension is consistent and that the estimated basis directions are root-\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$n$\\end{document} consistent, asymptotically normal and locally efficient. Simulations and an analysis of the Framingham Heart Study data confirm the effectiveness of our approach. PMID:27017956
Sekihara, K; Poeppel, D; Marantz, A; Koizumi, H; Miyashita, Y
1997-09-01
This paper proposes a method of localizing multiple current dipoles from spatio-temporal biomagnetic data. The method is based on the multiple signal classification (MUSIC) algorithm and is tolerant of the influence of background brain activity. In this method, the noise covariance matrix is estimated using a portion of the data that contains noise, but does not contain any signal information. Then, a modified noise subspace projector is formed using the generalized eigenvectors of the noise and measured-data covariance matrices. The MUSIC localizer is calculated using this noise subspace projector and the noise covariance matrix. The results from a computer simulation have verified the effectiveness of the method. The method was then applied to source estimation for auditory-evoked fields elicited by syllable speech sounds. The results strongly suggest the method's effectiveness in removing the influence of background activity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santamarina, A.; Bernard, D.; Dos Santos, N.
This paper describes the method to define relevant targeted integral measurements that allow the improvement of nuclear data evaluations and the determination of corresponding reliable covariances. {sup 235}U and {sup 56}Fe examples are pointed out for the improvement of JEFF3 data. Utilizations of these covariances are shown for Sensitivity and Representativity studies, Uncertainty calculations, and Transposition of experimental results to industrial applications. S/U studies are more and more used in Reactor Physics and Safety-Criticality. However, the reliability of study results relies strongly on the ND covariance relevancy. Our method derives the real uncertainty associated with each evaluation from calibration onmore » targeted integral measurements. These realistic covariance matrices allow reliable JEFF3.1.1 calculation of prior uncertainty due to nuclear data, as well as uncertainty reduction based on representative integral experiments, in challenging design calculations such as GEN3 and RJH reactors.« less
Hattori, Masasi; Oaksford, Mike
2007-09-10
In this article, 41 models of covariation detection from 2 × 2 contingency tables were evaluated against past data in the literature and against data from new experiments. A new model was also included based on a limiting case of the normative phi-coefficient under an extreme rarity assumption, which has been shown to be an important factor in covariation detection (McKenzie & Mikkelsen, 2007) and data selection (Hattori, 2002; Oaksford & Chater, 1994, 2003). The results were supportive of the new model. To investigate its explanatory adequacy, a rational analysis using two computer simulations was conducted. These simulations revealed the environmental conditions and the memory restrictions under which the new model best approximates the normative model of covariation detection in these tasks. They thus demonstrated the adaptive rationality of the new model. 2007 Cognitive Science Society, Inc.
Haem, Elham; Harling, Kajsa; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf; Karlsson, Mats O
2017-02-01
One important aim in population pharmacokinetics (PK) and pharmacodynamics is identification and quantification of the relationships between the parameters and covariates. Lasso has been suggested as a technique for simultaneous estimation and covariate selection. In linear regression, it has been shown that Lasso possesses no oracle properties, which means it asymptotically performs as though the true underlying model was given in advance. Adaptive Lasso (ALasso) with appropriate initial weights is claimed to possess oracle properties; however, it can lead to poor predictive performance when there is multicollinearity between covariates. This simulation study implemented a new version of ALasso, called adjusted ALasso (AALasso), to take into account the ratio of the standard error of the maximum likelihood (ML) estimator to the ML coefficient as the initial weight in ALasso to deal with multicollinearity in non-linear mixed-effect models. The performance of AALasso was compared with that of ALasso and Lasso. PK data was simulated in four set-ups from a one-compartment bolus input model. Covariates were created by sampling from a multivariate standard normal distribution with no, low (0.2), moderate (0.5) or high (0.7) correlation. The true covariates influenced only clearance at different magnitudes. AALasso, ALasso and Lasso were compared in terms of mean absolute prediction error and error of the estimated covariate coefficient. The results show that AALasso performed better in small data sets, even in those in which a high correlation existed between covariates. This makes AALasso a promising method for covariate selection in nonlinear mixed-effect models.
Selmer, Randi; Haglund, Bengt; Furu, Kari; Andersen, Morten; Nørgaard, Mette; Zoëga, Helga; Kieler, Helle
2016-10-01
Compare analyses of a pooled data set on the individual level with aggregate meta-analysis in a multi-database study. We reanalysed data on 2.3 million births in a Nordic register based cohort study. We compared estimated odds ratios (OR) for the effect of selective serotonin reuptake inhibitors (SSRI) and venlafaxine use in pregnancy on any cardiovascular birth defect and the rare outcome right ventricular outflow tract obstructions (RVOTO). Common covariates included maternal age, calendar year, birth order, maternal diabetes, and co-medication. Additional covariates were added in analyses with country-optimized adjustment. Country adjusted OR (95%CI) for any cardiovascular birth defect in the individual-based pooled analysis was 1.27 (1.17-1.39), 1.17 (1.07-1.27) adjusted for common covariates and 1.15 (1.05-1.26) adjusted for all covariates. In fixed effects meta-analyses pooled OR was 1.29 (1.19-1.41) based on crude country specific ORs, 1.19 (1.09-1.29) adjusted for common covariates, and 1.16 (1.06-1.27) for country-optimized adjustment. In a random effects model the adjusted OR was 1.07 (0.87-1.32). For RVOTO, OR was 1.48 (1.15-1.89) adjusted for all covariates in the pooled data set, and 1.53 (1.19-1.96) after country-optimized adjustment. Country-specific adjusted analyses at the substance level were not possible for RVOTO. Results of fixed effects meta-analysis and individual-based analyses of a pooled dataset were similar in this study on the association of SSRI/venlafaxine and cardiovascular birth defects. Country-optimized adjustment attenuated the estimates more than adjustment for common covariates only. When data are sparse pooled data on the individual level are needed for adjusted analyses. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Zheng, Xueying; Qin, Guoyou; Tu, Dongsheng
2017-05-30
Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data, which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial that motivated this research. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated proportional responses. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Performance of internal covariance estimators for cosmic shear correlation functions
Friedrich, O.; Seitz, S.; Eifler, T. F.; ...
2015-12-31
Data re-sampling methods such as the delete-one jackknife are a common tool for estimating the covariance of large scale structure probes. In this paper we investigate the concepts of internal covariance estimation in the context of cosmic shear two-point statistics. We demonstrate how to use log-normal simulations of the convergence field and the corresponding shear field to carry out realistic tests of internal covariance estimators and find that most estimators such as jackknife or sub-sample covariance can reach a satisfactory compromise between bias and variance of the estimated covariance. In a forecast for the complete, 5-year DES survey we show that internally estimated covariance matrices can provide a large fraction of the true uncertainties on cosmological parameters in a 2D cosmic shear analysis. The volume inside contours of constant likelihood in themore » $$\\Omega_m$$-$$\\sigma_8$$ plane as measured with internally estimated covariance matrices is on average $$\\gtrsim 85\\%$$ of the volume derived from the true covariance matrix. The uncertainty on the parameter combination $$\\Sigma_8 \\sim \\sigma_8 \\Omega_m^{0.5}$$ derived from internally estimated covariances is $$\\sim 90\\%$$ of the true uncertainty.« less
A Class of Population Covariance Matrices in the Bootstrap Approach to Covariance Structure Analysis
ERIC Educational Resources Information Center
Yuan, Ke-Hai; Hayashi, Kentaro; Yanagihara, Hirokazu
2007-01-01
Model evaluation in covariance structure analysis is critical before the results can be trusted. Due to finite sample sizes and unknown distributions of real data, existing conclusions regarding a particular statistic may not be applicable in practice. The bootstrap procedure automatically takes care of the unknown distribution and, for a given…
Conditional Covariance Theory and DETECT for Polytomous Items. Research Report. ETS RR-04-50
ERIC Educational Resources Information Center
Zhang, Jinming
2004-01-01
This paper extends the theory of conditional covariances to polytomous items. It has been mathematically proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, is positive if the two items are dimensionally homogeneous and negative…
Bayes linear covariance matrix adjustment
NASA Astrophysics Data System (ADS)
Wilkinson, Darren J.
1995-12-01
In this thesis, a Bayes linear methodology for the adjustment of covariance matrices is presented and discussed. A geometric framework for quantifying uncertainties about covariance matrices is set up, and an inner-product for spaces of random matrices is motivated and constructed. The inner-product on this space captures aspects of our beliefs about the relationship between covariance matrices of interest to us, providing a structure rich enough for us to adjust beliefs about unknown matrices in the light of data such as sample covariance matrices, exploiting second-order exchangeability and related specifications to obtain representations allowing analysis. Adjustment is associated with orthogonal projection, and illustrated with examples of adjustments for some common problems. The problem of adjusting the covariance matrices underlying exchangeable random vectors is tackled and discussed. Learning about the covariance matrices associated with multivariate time series dynamic linear models is shown to be amenable to a similar approach. Diagnostics for matrix adjustments are also discussed.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume; Koster, Randal D. (Editor)
2014-01-01
An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory. SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.
Visualizing DOM super-spectrum covariance in vanKrevelen space
NASA Astrophysics Data System (ADS)
Fatland, D. R.; Kalawe, J.; Stubbins, A.; Spencer, R. G.; Sleighter, R. L.; Abdulla, H. A.; Dittmar, T.
2011-12-01
We investigate the fate of terrigenous organic matter, DOM exported to the coastal marine environ. Many methods (fluor., FT-ICR-MS, NMR, 13C, lignin, etc) help characterize this DOM. We define a 'super spectrum' as amalgamation of analyses to a data stack and we search for physically significant patterns therein beginning with covariance across 31 samples from six circum-Arctic rivers: The Ob, Kolyma, Mackenzie, Yukon, Lena, and Yenisey sampled five times throughout the year. A vanKrevelen diagram is convenient to view distributions of molecules provided by Fourier Transform Ion Cyclotron Resonance Mass Spectometry (FT-ICR-MS). We augment this distribution space in the vertical dimension, for example to show peak height, molecular mass, principle component weighting or covariance. We use Worldwide Telescope, a virtual globe with strong data support from Microsoft Research to explore covariance results along 3+ dimensions (adding brightness, color and a parameter slide). The results show interesting covariance e.g. between molecules and PARAFAC peaks, a step towards fluorophore and cohort identification in the terrigenous DOM spectrum. Given the geoscience explosion in data volume and data complexity we feel these results should survive beyond the end point of a journal article. We are building a cloud-based Library on the Microsoft Azure platform to support this and subsequent analyses to enable data and methods to carry over and benefit other research groups and objectives.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele; Kovach, Robin M.; Vernieres, Guillaume
2014-01-01
An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory.SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.
Comparing spatial regression to random forests for large ...
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po
Escarela, Gabriel; Ruiz-de-Chavez, Juan; Castillo-Morales, Alberto
2016-08-01
Competing risks arise in medical research when subjects are exposed to various types or causes of death. Data from large cohort studies usually exhibit subsets of regressors that are missing for some study subjects. Furthermore, such studies often give rise to censored data. In this article, a carefully formulated likelihood-based technique for the regression analysis of right-censored competing risks data when two of the covariates are discrete and partially missing is developed. The approach envisaged here comprises two models: one describes the covariate effects on both long-term incidence and conditional latencies for each cause of death, whilst the other deals with the observation process by which the covariates are missing. The former is formulated with a well-established mixture model and the latter is characterised by copula-based bivariate probability functions for both the missing covariates and the missing data mechanism. The resulting formulation lends itself to the empirical assessment of non-ignorability by performing sensitivity analyses using models with and without a non-ignorable component. The methods are illustrated on a 20-year follow-up involving a prostate cancer cohort from the National Cancer Institutes Surveillance, Epidemiology, and End Results program. © The Author(s) 2013.
Li, Dan; Wang, Xia; Dey, Dipak K
2016-09-01
Our present work proposes a new survival model in a Bayesian context to analyze right-censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region-specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kistner, Emily O; Muller, Keith E
2004-09-01
Intraclass correlation and Cronbach's alpha are widely used to describe reliability of tests and measurements. Even with Gaussian data, exact distributions are known only for compound symmetric covariance (equal variances and equal correlations). Recently, large sample Gaussian approximations were derived for the distribution functions. New exact results allow calculating the exact distribution function and other properties of intraclass correlation and Cronbach's alpha, for Gaussian data with any covariance pattern, not just compound symmetry. Probabilities are computed in terms of the distribution function of a weighted sum of independent chi-square random variables. New F approximations for the distribution functions of intraclass correlation and Cronbach's alpha are much simpler and faster to compute than the exact forms. Assuming the covariance matrix is known, the approximations typically provide sufficient accuracy, even with as few as ten observations. Either the exact or approximate distributions may be used to create confidence intervals around an estimate of reliability. Monte Carlo simulations led to a number of conclusions. Correctly assuming that the covariance matrix is compound symmetric leads to accurate confidence intervals, as was expected from previously known results. However, assuming and estimating a general covariance matrix produces somewhat optimistically narrow confidence intervals with 10 observations. Increasing sample size to 100 gives essentially unbiased coverage. Incorrectly assuming compound symmetry leads to pessimistically large confidence intervals, with pessimism increasing with sample size. In contrast, incorrectly assuming general covariance introduces only a modest optimistic bias in small samples. Hence the new methods seem preferable for creating confidence intervals, except when compound symmetry definitely holds.
NASA Astrophysics Data System (ADS)
Friedrich, Oliver; Eifler, Tim
2018-01-01
Computing the inverse covariance matrix (or precision matrix) of large data vectors is crucial in weak lensing (and multiprobe) analyses of the large-scale structure of the Universe. Analytically computed covariances are noise-free and hence straightforward to invert; however, the model approximations might be insufficient for the statistical precision of future cosmological data. Estimating covariances from numerical simulations improves on these approximations, but the sample covariance estimator is inherently noisy, which introduces uncertainties in the error bars on cosmological parameters and also additional scatter in their best-fitting values. For future surveys, reducing both effects to an acceptable level requires an unfeasibly large number of simulations. In this paper we describe a way to expand the precision matrix around a covariance model and show how to estimate the leading order terms of this expansion from simulations. This is especially powerful if the covariance matrix is the sum of two contributions, C = A+B, where A is well understood analytically and can be turned off in simulations (e.g. shape noise for cosmic shear) to yield a direct estimate of B. We test our method in mock experiments resembling tomographic weak lensing data vectors from the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope (LSST). For DES we find that 400 N-body simulations are sufficient to achieve negligible statistical uncertainties on parameter constraints. For LSST this is achieved with 2400 simulations. The standard covariance estimator would require >105 simulations to reach a similar precision. We extend our analysis to a DES multiprobe case finding a similar performance.
Covariance descriptor fusion for target detection
NASA Astrophysics Data System (ADS)
Cukur, Huseyin; Binol, Hamidullah; Bal, Abdullah; Yavuz, Fatih
2016-05-01
Target detection is one of the most important topics for military or civilian applications. In order to address such detection tasks, hyperspectral imaging sensors provide useful images data containing both spatial and spectral information. Target detection has various challenging scenarios for hyperspectral images. To overcome these challenges, covariance descriptor presents many advantages. Detection capability of the conventional covariance descriptor technique can be improved by fusion methods. In this paper, hyperspectral bands are clustered according to inter-bands correlation. Target detection is then realized by fusion of covariance descriptor results based on the band clusters. The proposed combination technique is denoted Covariance Descriptor Fusion (CDF). The efficiency of the CDF is evaluated by applying to hyperspectral imagery to detect man-made objects. The obtained results show that the CDF presents better performance than the conventional covariance descriptor.
Galaxy–galaxy lensing estimators and their covariance properties
Singh, Sukhdeep; Mandelbaum, Rachel; Seljak, Uros; ...
2017-07-21
Here, we study the covariance properties of real space correlation function estimators – primarily galaxy–shear correlations, or galaxy–galaxy lensing – using SDSS data for both shear catalogues and lenses (specifically the BOSS LOWZ sample). Using mock catalogues of lenses and sources, we disentangle the various contributions to the covariance matrix and compare them with a simple analytical model. We show that not subtracting the lensing measurement around random points from the measurement around the lens sample is equivalent to performing the measurement using the lens density field instead of the lens overdensity field. While the measurement using the lens densitymore » field is unbiased (in the absence of systematics), its error is significantly larger due to an additional term in the covariance. Therefore, this subtraction should be performed regardless of its beneficial effects on systematics. Comparing the error estimates from data and mocks for estimators that involve the overdensity, we find that the errors are dominated by the shape noise and lens clustering, which empirically estimated covariances (jackknife and standard deviation across mocks) that are consistent with theoretical estimates, and that both the connected parts of the four-point function and the supersample covariance can be neglected for the current levels of noise. While the trade-off between different terms in the covariance depends on the survey configuration (area, source number density), the diagnostics that we use in this work should be useful for future works to test their empirically determined covariances.« less
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Using machine learning to assess covariate balance in matching studies.
Linden, Ariel; Yarnold, Paul R
2016-12-01
In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies. © 2016 John Wiley & Sons, Ltd.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Galaxy–galaxy lensing estimators and their covariance properties
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Sukhdeep; Mandelbaum, Rachel; Seljak, Uros
Here, we study the covariance properties of real space correlation function estimators – primarily galaxy–shear correlations, or galaxy–galaxy lensing – using SDSS data for both shear catalogues and lenses (specifically the BOSS LOWZ sample). Using mock catalogues of lenses and sources, we disentangle the various contributions to the covariance matrix and compare them with a simple analytical model. We show that not subtracting the lensing measurement around random points from the measurement around the lens sample is equivalent to performing the measurement using the lens density field instead of the lens overdensity field. While the measurement using the lens densitymore » field is unbiased (in the absence of systematics), its error is significantly larger due to an additional term in the covariance. Therefore, this subtraction should be performed regardless of its beneficial effects on systematics. Comparing the error estimates from data and mocks for estimators that involve the overdensity, we find that the errors are dominated by the shape noise and lens clustering, which empirically estimated covariances (jackknife and standard deviation across mocks) that are consistent with theoretical estimates, and that both the connected parts of the four-point function and the supersample covariance can be neglected for the current levels of noise. While the trade-off between different terms in the covariance depends on the survey configuration (area, source number density), the diagnostics that we use in this work should be useful for future works to test their empirically determined covariances.« less
Galaxy-galaxy lensing estimators and their covariance properties
NASA Astrophysics Data System (ADS)
Singh, Sukhdeep; Mandelbaum, Rachel; Seljak, Uroš; Slosar, Anže; Vazquez Gonzalez, Jose
2017-11-01
We study the covariance properties of real space correlation function estimators - primarily galaxy-shear correlations, or galaxy-galaxy lensing - using SDSS data for both shear catalogues and lenses (specifically the BOSS LOWZ sample). Using mock catalogues of lenses and sources, we disentangle the various contributions to the covariance matrix and compare them with a simple analytical model. We show that not subtracting the lensing measurement around random points from the measurement around the lens sample is equivalent to performing the measurement using the lens density field instead of the lens overdensity field. While the measurement using the lens density field is unbiased (in the absence of systematics), its error is significantly larger due to an additional term in the covariance. Therefore, this subtraction should be performed regardless of its beneficial effects on systematics. Comparing the error estimates from data and mocks for estimators that involve the overdensity, we find that the errors are dominated by the shape noise and lens clustering, which empirically estimated covariances (jackknife and standard deviation across mocks) that are consistent with theoretical estimates, and that both the connected parts of the four-point function and the supersample covariance can be neglected for the current levels of noise. While the trade-off between different terms in the covariance depends on the survey configuration (area, source number density), the diagnostics that we use in this work should be useful for future works to test their empirically determined covariances.
Inadequacy of internal covariance estimation for super-sample covariance
NASA Astrophysics Data System (ADS)
Lacasa, Fabien; Kunz, Martin
2017-08-01
We give an analytical interpretation of how subsample-based internal covariance estimators lead to biased estimates of the covariance, due to underestimating the super-sample covariance (SSC). This includes the jackknife and bootstrap methods as estimators for the full survey area, and subsampling as an estimator of the covariance of subsamples. The limitations of the jackknife covariance have been previously presented in the literature because it is effectively a rescaling of the covariance of the subsample area. However we point out that subsampling is also biased, but for a different reason: the subsamples are not independent, and the corresponding lack of power results in SSC underprediction. We develop the formalism in the case of cluster counts that allows the bias of each covariance estimator to be exactly predicted. We find significant effects for a small-scale area or when a low number of subsamples is used, with auto-redshift biases ranging from 0.4% to 15% for subsampling and from 5% to 75% for jackknife covariance estimates. The cross-redshift covariance is even more affected; biases range from 8% to 25% for subsampling and from 50% to 90% for jackknife. Owing to the redshift evolution of the probe, the covariances cannot be debiased by a simple rescaling factor, and an exact debiasing has the same requirements as the full SSC prediction. These results thus disfavour the use of internal covariance estimators on data itself or a single simulation, leaving analytical prediction and simulations suites as possible SSC predictors.
Nuclear Data Covariances in the Indian Context – Progress, Challenges, Excitement and Perspectives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ganesan, S., E-mail: gane-san555@gmail.com
We present a brief overview of progress, challenges, excitement and perspectives in developing nuclear data covariances in the Indian context in relation to target accuracies and sensitivity studies that are of great importance to Bhabha's 3-stage nuclear programme for energy and non-energy applications.
Yee, Yohan; Fernandes, Darren J; French, Leon; Ellegood, Jacob; Cahill, Lindsay S; Vousden, Dulcie A; Spencer Noakes, Leigh; Scholz, Jan; van Eede, Matthijs C; Nieman, Brian J; Sled, John G; Lerch, Jason P
2018-05-18
An organizational pattern seen in the brain, termed structural covariance, is the statistical association of pairs of brain regions in their anatomical properties. These associations, measured across a population as covariances or correlations usually in cortical thickness or volume, are thought to reflect genetic and environmental underpinnings. Here, we examine the biological basis of structural volume covariance in the mouse brain. We first examined large scale associations between brain region volumes using an atlas-based approach that parcellated the entire mouse brain into 318 regions over which correlations in volume were assessed, for volumes obtained from 153 mouse brain images via high-resolution MRI. We then used a seed-based approach and determined, for 108 different seed regions across the brain and using mouse gene expression and connectivity data from the Allen Institute for Brain Science, the variation in structural covariance data that could be explained by distance to seed, transcriptomic similarity to seed, and connectivity to seed. We found that overall, correlations in structure volumes hierarchically clustered into distinct anatomical systems, similar to findings from other studies and similar to other types of networks in the brain, including structural connectivity and transcriptomic similarity networks. Across seeds, this structural covariance was significantly explained by distance (17% of the variation, up to a maximum of 49% for structural covariance to the visceral area of the cortex), transcriptomic similarity (13% of the variation, up to maximum of 28% for structural covariance to the primary visual area) and connectivity (15% of the variation, up to a maximum of 36% for structural covariance to the intermediate reticular nucleus in the medulla) of covarying structures. Together, distance, connectivity, and transcriptomic similarity explained 37% of structural covariance, up to a maximum of 63% for structural covariance to the visceral area. Additionally, this pattern of explained variation differed spatially across the brain, with transcriptomic similarity playing a larger role in the cortex than subcortex, while connectivity explains structural covariance best in parts of the cortex, midbrain, and hindbrain. These results suggest that both gene expression and connectivity underlie structural volume covariance, albeit to different extents depending on brain region, and this relationship is modulated by distance. Copyright © 2018. Published by Elsevier Inc.
A comparative study of mixture cure models with covariate
NASA Astrophysics Data System (ADS)
Leng, Oh Yit; Khalid, Zarina Mohd
2017-05-01
In survival analysis, the survival time is assumed to follow a non-negative distribution, such as the exponential, Weibull, and log-normal distributions. In some cases, the survival time is influenced by some observed factors. The absence of these observed factors may cause an inaccurate estimation in the survival function. Therefore, a survival model which incorporates the influences of observed factors is more appropriate to be used in such cases. These observed factors are included in the survival model as covariates. Besides that, there are cases where a group of individuals who are cured, that is, not experiencing the event of interest. Ignoring the cure fraction may lead to overestimate in estimating the survival function. Thus, a mixture cure model is more suitable to be employed in modelling survival data with the presence of a cure fraction. In this study, three mixture cure survival models are used to analyse survival data with a covariate and a cure fraction. The first model includes covariate in the parameterization of the susceptible individuals survival function, the second model allows the cure fraction to depend on covariate, and the third model incorporates covariate in both cure fraction and survival function of susceptible individuals. This study aims to compare the performance of these models via a simulation approach. Therefore, in this study, survival data with varying sample sizes and cure fractions are simulated and the survival time is assumed to follow the Weibull distribution. The simulated data are then modelled using the three mixture cure survival models. The results show that the three mixture cure models are more appropriate to be used in modelling survival data with the presence of cure fraction and an observed factor.
A Robust Statistics Approach to Minimum Variance Portfolio Optimization
NASA Astrophysics Data System (ADS)
Yang, Liusha; Couillet, Romain; McKay, Matthew R.
2015-12-01
We study the design of portfolios under a minimum risk criterion. The performance of the optimized portfolio relies on the accuracy of the estimated covariance matrix of the portfolio asset returns. For large portfolios, the number of available market returns is often of similar order to the number of assets, so that the sample covariance matrix performs poorly as a covariance estimator. Additionally, financial market data often contain outliers which, if not correctly handled, may further corrupt the covariance estimation. We address these shortcomings by studying the performance of a hybrid covariance matrix estimator based on Tyler's robust M-estimator and on Ledoit-Wolf's shrinkage estimator while assuming samples with heavy-tailed distribution. Employing recent results from random matrix theory, we develop a consistent estimator of (a scaled version of) the realized portfolio risk, which is minimized by optimizing online the shrinkage intensity. Our portfolio optimization method is shown via simulations to outperform existing methods both for synthetic and real market data.
On the Bayesian Treed Multivariate Gaussian Process with Linear Model of Coregionalization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konomi, Bledar A.; Karagiannis, Georgios; Lin, Guang
2015-02-01
The Bayesian treed Gaussian process (BTGP) has gained popularity in recent years because it provides a straightforward mechanism for modeling non-stationary data and can alleviate computational demands by fitting models to less data. The extension of BTGP to the multivariate setting requires us to model the cross-covariance and to propose efficient algorithms that can deal with trans-dimensional MCMC moves. In this paper we extend the cross-covariance of the Bayesian treed multivariate Gaussian process (BTMGP) to that of linear model of Coregionalization (LMC) cross-covariances. Different strategies have been developed to improve the MCMC mixing and invert smaller matrices in the Bayesianmore » inference. Moreover, we compare the proposed BTMGP with existing multiple BTGP and BTMGP in test cases and multiphase flow computer experiment in a full scale regenerator of a carbon capture unit. The use of the BTMGP with LMC cross-covariance helped to predict the computer experiments relatively better than existing competitors. The proposed model has a wide variety of applications, such as computer experiments and environmental data. In the case of computer experiments we also develop an adaptive sampling strategy for the BTMGP with LMC cross-covariance function.« less
Censored quantile regression with recursive partitioning-based weights
Wey, Andrew; Wang, Lan; Rudser, Kyle
2014-01-01
Censored quantile regression provides a useful alternative to the Cox proportional hazards model for analyzing survival data. It directly models the conditional quantile of the survival time and hence is easy to interpret. Moreover, it relaxes the proportionality constraint on the hazard function associated with the popular Cox model and is natural for modeling heterogeneity of the data. Recently, Wang and Wang (2009. Locally weighted censored quantile regression. Journal of the American Statistical Association 103, 1117–1128) proposed a locally weighted censored quantile regression approach that allows for covariate-dependent censoring and is less restrictive than other censored quantile regression methods. However, their kernel smoothing-based weighting scheme requires all covariates to be continuous and encounters practical difficulty with even a moderate number of covariates. We propose a new weighting approach that uses recursive partitioning, e.g. survival trees, that offers greater flexibility in handling covariate-dependent censoring in moderately high dimensions and can incorporate both continuous and discrete covariates. We prove that this new weighting scheme leads to consistent estimation of the quantile regression coefficients and demonstrate its effectiveness via Monte Carlo simulations. We also illustrate the new method using a widely recognized data set from a clinical trial on primary biliary cirrhosis. PMID:23975800
Beeman, John; Hansel, Hal; Perry, Russell; Hockersmith, Eric; Sandford, Ben
2011-01-01
This report describes analyses of data from radio- or acoustic-tagged juvenile salmonids passing through hydro-dam turbines to determine factors affecting fish survival. The data were collected during a series of studies designed to estimate passage and survival probabilities at McNary (2002-09) and John Day (2002-03) Dams on the Columbia River during controlled experiments of structures or operations at spillways. Relatively few tagged fish passed turbines in any single study, but sample sizes generally were adequate for our analyses when data were combined from studies using common methods over a series of years. We used information-theoretic methods to evaluate biological, operational, and group covariates by creating models fitting linear (all covariates) or curvilinear (operational covariates only) functions to the data. Biological covariates included tag burden, weight, and water temperature; operational covariates included spill percentage, total discharge, hydraulic head, and turbine unit discharge; and group covariates included year, treatment, and photoperiod. Several interactions between the variables also were considered. Support of covariates by the data was assessed by comparing the Akaike Information Criterion of competing models. The analyses were conducted because there was a lack of information about factors affecting survival of fish passing turbines volitionally and the data were available from past studies. The depth of acclimation, tag size relative to fish size (tag burden), turbine unit discharge, and area of entry into the turbine intake have been shown to affect turbine passage survival of juvenile salmonids in other studies. This study indicates that turbine passage survival of the study fish was primarily affected by biological covariates rather than operational covariates. A negative effect of tag burden was strongly supported in data from yearling Chinook salmon at John Day and McNary dams, but not for subyearling Chinook salmon or juvenile steelhead. The negative effect of tag burden in data we examined from yearling Chinook salmon supports the recent findings from laboratory studies of barotrauma effects. A curvilinear (quadratic) effect of turbine unit discharge was weakly supported in data from subyearling Chinook salmon at John Day Dam. The maximum survival from those data was estimated to occur at a discharge of 15.9 thousand cubic feet per second, but the estimate was imprecise (95 percent confidence interval of -1.7-33.7 thousand cubic feet per second). This estimate is within the range of 1 percent of peak turbine operating efficiency (12.0-21.6 thousand cubic feet per second), but is lower than the 17.2 thousand cubic feet per second discharge at peak operating efficiency (at a head of 102 feet near the median in the data we examined). Effects of water temperature were supported in four of the five examined data sets and were strongly supported in all but one. Spill percentage, head, and total discharge received weak or moderate support in some cases. The results are consistent with those of several controlled field experiments of turbine discharge. Studies based on the Hi-Z Turb'N tag (balloon tag) often show small, generally statistically insignificant, differences in survival at different turbine discharge levels. Some studies also show that a quadratic equation can be well fit to the relation of survival and turbine unit discharge. The lack of support for the operational covariates in most of the data sets we examined may be due to the small effect turbine discharge has even in controlled studies, the observational nature of the data we used, and the evaluation method. We assessed support of the data for models of linear and quadratic effects, whereas controlled experiments often statistically compare the point estimates of survival from each operational treatment studied. The results of our analyses suggest tag burden should be minimized or controlled for in analyses of future stu
Regression analysis of sparse asynchronous longitudinal data.
Cao, Hongyuan; Zeng, Donglin; Fine, Jason P
2015-09-01
We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus.
Nasejje, Justine B; Mwambi, Henry
2017-09-07
Uganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results. Survival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption. Random survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the age of five in a household, number of births in the past 5 years, wealth index, total number of children ever born and the child's birth order. The results further indicated that the predictive performance for random survival forests built using covariates including those that violate the PH assumption was higher than that for random survival forests built using only covariates that satisfy the PH assumption. Random survival forests are appealing methods in analysing public health data to understand factors strongly associated with under-five child mortality rates especially in the presence of covariates that violate the proportional hazards assumption.
Using Audit Information to Adjust Parameter Estimates for Data Errors in Clinical Trials
Shepherd, Bryan E.; Shaw, Pamela A.; Dodd, Lori E.
2013-01-01
Background Audits are often performed to assess the quality of clinical trial data, but beyond detecting fraud or sloppiness, the audit data is generally ignored. In earlier work using data from a non-randomized study, Shepherd and Yu (2011) developed statistical methods to incorporate audit results into study estimates, and demonstrated that audit data could be used to eliminate bias. Purpose In this manuscript we examine the usefulness of audit-based error-correction methods in clinical trial settings where a continuous outcome is of primary interest. Methods We demonstrate the bias of multiple linear regression estimates in general settings with an outcome that may have errors and a set of covariates for which some may have errors and others, including treatment assignment, are recorded correctly for all subjects. We study this bias under different assumptions including independence between treatment assignment, covariates, and data errors (conceivable in a double-blinded randomized trial) and independence between treatment assignment and covariates but not data errors (possible in an unblinded randomized trial). We review moment-based estimators to incorporate the audit data and propose new multiple imputation estimators. The performance of estimators is studied in simulations. Results When treatment is randomized and unrelated to data errors, estimates of the treatment effect using the original error-prone data (i.e., ignoring the audit results) are unbiased. In this setting, both moment and multiple imputation estimators incorporating audit data are more variable than standard analyses using the original data. In contrast, in settings where treatment is randomized but correlated with data errors and in settings where treatment is not randomized, standard treatment effect estimates will be biased. And in all settings, parameter estimates for the original, error-prone covariates will be biased. Treatment and covariate effect estimates can be corrected by incorporating audit data using either the multiple imputation or moment-based approaches. Bias, precision, and coverage of confidence intervals improve as the audit size increases. Limitations The extent of bias and the performance of methods depend on the extent and nature of the error as well as the size of the audit. This work only considers methods for the linear model. Settings much different than those considered here need further study. Conclusions In randomized trials with continuous outcomes and treatment assignment independent of data errors, standard analyses of treatment effects will be unbiased and are recommended. However, if treatment assignment is correlated with data errors or other covariates, naive analyses may be biased. In these settings, and when covariate effects are of interest, approaches for incorporating audit results should be considered. PMID:22848072
Uehara, Takashi; Sartori, Matteo; Tanaka, Toshihisa; Fiori, Simone
2017-06-01
The estimation of covariance matrices is of prime importance to analyze the distribution of multivariate signals. In motor imagery-based brain-computer interfaces (MI-BCI), covariance matrices play a central role in the extraction of features from recorded electroencephalograms (EEGs); therefore, correctly estimating covariance is crucial for EEG classification. This letter discusses algorithms to average sample covariance matrices (SCMs) for the selection of the reference matrix in tangent space mapping (TSM)-based MI-BCI. Tangent space mapping is a powerful method of feature extraction and strongly depends on the selection of a reference covariance matrix. In general, the observed signals may include outliers; therefore, taking the geometric mean of SCMs as the reference matrix may not be the best choice. In order to deal with the effects of outliers, robust estimators have to be used. In particular, we discuss and test the use of geometric medians and trimmed averages (defined on the basis of several metrics) as robust estimators. The main idea behind trimmed averages is to eliminate data that exhibit the largest distance from the average covariance calculated on the basis of all available data. The results of the experiments show that while the geometric medians show little differences from conventional methods in terms of classification accuracy in the classification of electroencephalographic recordings, the trimmed averages show significant improvement for all subjects.
ERIC Educational Resources Information Center
Kistner, Emily O.; Muller, Keith E.
2004-01-01
Intraclass correlation and Cronbach's alpha are widely used to describe reliability of tests and measurements. Even with Gaussian data, exact distributions are known only for compound symmetric covariance (equal variances and equal correlations). Recently, large sample Gaussian approximations were derived for the distribution functions. New exact…
Generating Nonnormal Multivariate Data Using Copulas: Applications to SEM
ERIC Educational Resources Information Center
Mair, Patrick; Satorra, Albert; Bentler, Peter M.
2012-01-01
This article develops a procedure based on copulas to simulate multivariate nonnormal data that satisfy a prespecified variance-covariance matrix. The covariance matrix used can comply with a specific moment structure form (e.g., a factor analysis or a general structural equation model). Thus, the method is particularly useful for Monte Carlo…
Empirical data and the variance-covariance matrix for the 1969 Smithsonian Standard Earth (2)
NASA Technical Reports Server (NTRS)
Gaposchkin, E. M.
1972-01-01
The empirical data used in the 1969 Smithsonian Standard Earth (2) are presented. The variance-covariance matrix, or the normal equations, used for correlation analysis, are considered. The format and contents of the matrix, available on magnetic tape, are described and a sample printout is given.
NASA Technical Reports Server (NTRS)
Whitmore, S. A.
1985-01-01
The dynamics model and data sources used to perform air-data reconstruction are discussed, as well as the Kalman filter. The need for adaptive determination of the noise statistics of the process is indicated. The filter innovations are presented as a means of developing the adaptive criterion, which is based on the true mean and covariance of the filter innovations. A method for the numerical approximation of the mean and covariance of the filter innovations is presented. The algorithm as developed is applied to air-data reconstruction for the space shuttle, and data obtained from the third landing are presented. To verify the performance of the adaptive algorithm, the reconstruction is also performed using a constant covariance Kalman filter. The results of the reconstructions are compared, and the adaptive algorithm exhibits better performance.
SEPARABLE FACTOR ANALYSIS WITH APPLICATIONS TO MORTALITY DATA
Fosdick, Bailey K.; Hoff, Peter D.
2014-01-01
Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations. PMID:25489353
ERIC Educational Resources Information Center
Gil, Einat; Gibbs, Alison L.
2017-01-01
In this study, we follow students' modeling and covariational reasoning in the context of learning about big data. A three-week unit was designed to allow 12th grade students in a mathematics course to explore big and mid-size data using concepts such as trend and scatter to describe the relationships between variables in multivariate settings.…
Strangeness S =-1 hyperon-nucleon scattering in covariant chiral effective field theory
NASA Astrophysics Data System (ADS)
Li, Kai-Wen; Ren, Xiu-Lei; Geng, Li-Sheng; Long, Bingwei
2016-07-01
Motivated by the successes of covariant baryon chiral perturbation theory in one-baryon systems and in heavy-light systems, we study relevance of relativistic effects in hyperon-nucleon interactions with strangeness S =-1 . In this exploratory work, we follow the covariant framework developed by Epelbaum and Gegelia to calculate the Y N scattering amplitude at leading order. By fitting the five low-energy constants to the experimental data, we find that the cutoff dependence is mitigated, compared with the heavy-baryon approach. Nevertheless, the description of the experimental data remains quantitatively similar at leading order.
NASA Technical Reports Server (NTRS)
Morgera, S. D.; Cooper, D. B.
1976-01-01
The experimental observation that a surprisingly small sample size vis-a-vis dimension is needed to achieve good signal-to-interference ratio (SIR) performance with an adaptive predetection filter is explained. The adaptive filter requires estimates as obtained by a recursive stochastic algorithm of the inverse of the filter input data covariance matrix. The SIR performance with sample size is compared for the situations where the covariance matrix estimates are of unstructured (generalized) form and of structured (finite Toeplitz) form; the latter case is consistent with weak stationarity of the input data stochastic process.
Flux Tower Eddy Covariance and Meteorological Measurements for Barrow, Alaska: 2012-2016
Dengel, Sigrid; Torn, Margaret; Billesbach, David
2017-08-24
The dataset contains half-hourly eddy covariance flux measurements and determinations, companion meteorological measurements, and ancillary data from the flux tower (US-NGB) on the Barrow Environmental Observatory at Barrow (Utqiagvik), Alaska for the period 2012 through 2016. Data have been processed using EddyPro software and screened by the contributor. The flux tower sits in an Arctic coastal tundra ecosystem. This dataset updates a previous dataset by reprocessing a longer period of record in the same manner. Related dataset "Eddy-Covariance and auxiliary measurements, NGEE-Barrow, 2012-2013" DOI:10.5440/1124200.
Competing risks models and time-dependent covariates
Barnett, Adrian; Graves, Nick
2008-01-01
New statistical models for analysing survival data in an intensive care unit context have recently been developed. Two models that offer significant advantages over standard survival analyses are competing risks models and multistate models. Wolkewitz and colleagues used a competing risks model to examine survival times for nosocomial pneumonia and mortality. Their model was able to incorporate time-dependent covariates and so examine how risk factors that changed with time affected the chances of infection or death. We briefly explain how an alternative modelling technique (using logistic regression) can more fully exploit time-dependent covariates for this type of data. PMID:18423067
Zhang, Bo; Liu, Wei; Zhang, Zhiwei; Qu, Yanping; Chen, Zhen; Albert, Paul S
2017-08-01
Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.
Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data*
Cai, T. Tony; Zhang, Anru
2016-01-01
Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data. PMID:27777471
Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.
Cai, T Tony; Zhang, Anru
2016-09-01
Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.
NASA Astrophysics Data System (ADS)
Tarai, Madhumita; Kumar, Keshav; Divya, O.; Bairi, Partha; Mishra, Kishor Kumar; Mishra, Ashok Kumar
2017-09-01
The present work compares the dissimilarity and covariance based unsupervised chemometric classification approaches by taking the total synchronous fluorescence spectroscopy data sets acquired for the cumin and non-cumin based herbal preparations. The conventional decomposition method involves eigenvalue-eigenvector analysis of the covariance of the data set and finds the factors that can explain the overall major sources of variation present in the data set. The conventional approach does this irrespective of the fact that the samples belong to intrinsically different groups and hence leads to poor class separation. The present work shows that classification of such samples can be optimized by performing the eigenvalue-eigenvector decomposition on the pair-wise dissimilarity matrix.
Generating Nonnormal Multivariate Data Using Copulas: Applications to SEM.
Mair, Patrick; Satorra, Albert; Bentler, Peter M
2012-07-01
This article develops a procedure based on copulas to simulate multivariate nonnormal data that satisfy a prespecified variance-covariance matrix. The covariance matrix used can comply with a specific moment structure form (e.g., a factor analysis or a general structural equation model). Thus, the method is particularly useful for Monte Carlo evaluation of structural equation models within the context of nonnormal data. The new procedure for nonnormal data simulation is theoretically described and also implemented in the widely used R environment. The quality of the method is assessed by Monte Carlo simulations. A 1-sample test on the observed covariance matrix based on the copula methodology is proposed. This new test for evaluating the quality of a simulation is defined through a particular structural model specification and is robust against normality violations.
Using Item-Type Performance Covariance to Improve the Skill Model of an Existing Tutor
ERIC Educational Resources Information Center
Pavlik, Philip I., Jr.; Cen, Hao; Wu, Lili; Koedinger, Kenneth R.
2008-01-01
Using data from an existing pre-algebra computer-based tutor, we analyzed the covariance of item-types with the goal of describing a more effective way to assign skill labels to item-types. Analyzing covariance is important because it allows us to place the skills in a related network in which we can identify the role each skill plays in learning…
NASA Astrophysics Data System (ADS)
Zha, Yuanyuan; Yeh, Tian-Chyi J.; Illman, Walter A.; Zeng, Wenzhi; Zhang, Yonggen; Sun, Fangqiang; Shi, Liangsheng
2018-03-01
Hydraulic tomography (HT) is a recently developed technology for characterizing high-resolution, site-specific heterogeneity using hydraulic data (nd) from a series of cross-hole pumping tests. To properly account for the subsurface heterogeneity and to flexibly incorporate additional information, geostatistical inverse models, which permit a large number of spatially correlated unknowns (ny), are frequently used to interpret the collected data. However, the memory storage requirements for the covariance of the unknowns (ny × ny) in these models are prodigious for large-scale 3-D problems. Moreover, the sensitivity evaluation is often computationally intensive using traditional difference method (ny forward runs). Although employment of the adjoint method can reduce the cost to nd forward runs, the adjoint model requires intrusive coding effort. In order to resolve these issues, this paper presents a Reduced-Order Successive Linear Estimator (ROSLE) for analyzing HT data. This new estimator approximates the covariance of the unknowns using Karhunen-Loeve Expansion (KLE) truncated to nkl order, and it calculates the directional sensitivities (in the directions of nkl eigenvectors) to form the covariance and cross-covariance used in the Successive Linear Estimator (SLE). In addition, the covariance of unknowns is updated every iteration by updating the eigenvalues and eigenfunctions. The computational advantages of the proposed algorithm are demonstrated through numerical experiments and a 3-D transient HT analysis of data from a highly heterogeneous field site.
Use and Impact of Covariance Data in the Japanese Latest Adjusted Library ADJ2010 Based on JENDL-4.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yokoyama, K., E-mail: yokoyama.kenji09@jaea.go.jp; Ishikawa, M.
2015-01-15
The current status of covariance applications to fast reactor analysis and design in Japan is summarized. In Japan, the covariance data are mainly used for three purposes: (1) to quantify the uncertainty of nuclear core parameters, (2) to identify important nuclides, reactions and energy ranges which are dominant to the uncertainty of core parameters, and (3) to improve the accuracy of core design values by adopting the integral data such as the critical experiments and the power reactor operation data. For the last purpose, the cross section adjustment based on the Bayesian theorem is used. After the release of JENDL-4.0,more » a development project of the new adjusted group-constant set ADJ2010 was started in 2010 and completed in 2013. In the present paper, the final results of ADJ2010 are briefly summarized. In addition, the adjustment results of ADJ2010 are discussed from the viewpoint of use and impact of nuclear data covariances, focusing on {sup 239}Pu capture cross section alterations. For this purpose three kind of indices, called “degree of mobility,” “adjustment motive force,” and “adjustment potential,” are proposed.« less
AFCI-2.0 Library of Neutron Cross Section Covariances
DOE Office of Scientific and Technical Information (OSTI.GOV)
Herman, M.; Herman,M.; Oblozinsky,P.
2011-06-26
Neutron cross section covariance library has been under development by BNL-LANL collaborative effort over the last three years. The primary purpose of the library is to provide covariances for the Advanced Fuel Cycle Initiative (AFCI) data adjustment project, which is focusing on the needs of fast advanced burner reactors. The covariances refer to central values given in the 2006 release of the U.S. neutron evaluated library ENDF/B-VII. The preliminary version (AFCI-2.0beta) has been completed in October 2010 and made available to the users for comments. In the final 2.0 release, covariances for a few materials were updated, in particular newmore » LANL evaluations for {sup 238,240}Pu and {sup 241}Am were adopted. BNL was responsible for covariances for structural materials and fission products, management of the library and coordination of the work, while LANL was in charge of covariances for light nuclei and for actinides.« less
Quantitative methods to direct exploration based on hydrogeologic information
Graettinger, A.J.; Lee, J.; Reeves, H.W.; Dethan, D.
2006-01-01
Quantitatively Directed Exploration (QDE) approaches based on information such as model sensitivity, input data covariance and model output covariance are presented. Seven approaches for directing exploration are developed, applied, and evaluated on a synthetic hydrogeologic site. The QDE approaches evaluate input information uncertainty, subsurface model sensitivity and, most importantly, output covariance to identify the next location to sample. Spatial input parameter values and covariances are calculated with the multivariate conditional probability calculation from a limited number of samples. A variogram structure is used during data extrapolation to describe the spatial continuity, or correlation, of subsurface information. Model sensitivity can be determined by perturbing input data and evaluating output response or, as in this work, sensitivities can be programmed directly into an analysis model. Output covariance is calculated by the First-Order Second Moment (FOSM) method, which combines the covariance of input information with model sensitivity. A groundwater flow example, modeled in MODFLOW-2000, is chosen to demonstrate the seven QDE approaches. MODFLOW-2000 is used to obtain the piezometric head and the model sensitivity simultaneously. The seven QDE approaches are evaluated based on the accuracy of the modeled piezometric head after information from a QDE sample is added. For the synthetic site used in this study, the QDE approach that identifies the location of hydraulic conductivity that contributes the most to the overall piezometric head variance proved to be the best method to quantitatively direct exploration. ?? IWA Publishing 2006.
Uncertainty quantification in (α,n) neutron source calculations for an oxide matrix
Pigni, M. T.; Croft, S.; Gauld, I. C.
2016-04-25
Here we present a methodology to propagate nuclear data covariance information in neutron source calculations from (α,n) reactions. The approach is applied to estimate the uncertainty in the neutron generation rates for uranium oxide fuel types due to uncertainties on 1) 17,18O( α,n) reaction cross sections and 2) uranium and oxygen stopping power cross sections. The procedure to generate reaction cross section covariance information is based on the Bayesian fitting method implemented in the R-matrix SAMMY code. The evaluation methodology uses the Reich-Moore approximation to fit the 17,18O(α,n) reaction cross-sections in order to derive a set of resonance parameters andmore » a related covariance matrix that is then used to calculate the energydependent cross section covariance matrix. The stopping power cross sections and related covariance information for uranium and oxygen were obtained by the fit of stopping power data in the -energy range of 1 keV up to 12 MeV. Cross section perturbation factors based on the covariance information relative to the evaluated 17,18O( α,n) reaction cross sections, as well as uranium and oxygen stopping power cross sections, were used to generate a varied set of nuclear data libraries used in SOURCES4C and ORIGEN for inventory and source term calculations. The set of randomly perturbed output (α,n) source responses, provide the mean values and standard deviations of the calculated responses reflecting the uncertainties in nuclear data used in the calculations. Lastly, the results and related uncertainties are compared with experiment thick target (α,n) yields for uranium oxide.« less
Stone, William S; Mesholam-Gately, Raquelle I; Braff, David L; Calkins, Monica E; Freedman, Robert; Green, Michael F; Greenwood, Tiffany A; Gur, Raquel E; Gur, Ruben C; Lazzeroni, Laura C; Light, Gregory A; Nuechterlein, Keith H; Olincy, Ann; Radant, Allen D; Siever, Larry J; Silverman, Jeremy M; Sprock, Joyce; Sugar, Catherine A; Swerdlow, Neal R; Tsuang, Debby W; Tsuang, Ming T; Turetsky, Bruce I; Seidman, Larry J
2015-04-01
The first phase of the Consortium on the Genetics of Schizophrenia (COGS-1) showed performance deficits in learning and memory on the California Verbal Learning Test, Second Edition (CVLT-II) in individuals with schizophrenia (SZ), compared to healthy comparison subjects (HCS). A question is whether the COGS-1 study, which used a family study design (i.e. studying relatively intact families), yielded "milder" SZ phenotypes than those acquired subsequently in the COGS-2 case-control design that did not recruit unaffected family members. CVLT-II performance was compared for the COGS-1 and COGS-2 samples. Analyses focused on learning, recall and recognition variables, with age, gender and education as covariates. Analyses of COGS-2 data explored effects of additional covariates and moderating factors in CVLT-II performance. 324 SZ subjects and 510 HCS had complete CVLT-II and covariate data in COGS-1, while 1356 SZ and 1036 HCS had complete data in COGS-2. Except for recognition memory, analysis of covariance showed significantly worse performance in COGS-2 on all CVLT-II variables for SZ and HCS, and remained significant in the presence of the covariates. Performance in each of the 5 learning trials differed significantly. However, effect sizes comparing cases and controls were comparable across the two studies. COGS-2 analyses confirmed SZ performance deficits despite effects of multiple significant covariates and moderating factors. CVLT-II performance was worse in COGS-2 than in COGS-1 for both the SZ and the HCS in this large cohort, likely due to cohort effects. Demographically corrected data yield a consistent pattern of performance across the two studies in SZ. Copyright © 2014. Published by Elsevier B.V.
Stone, William S.; Mesholam-Gately, Raquelle I.; Braff, David L.; Calkins, Monica E.; Freedman, Robert; Green, Michael F.; Greenwood, Tiffany A.; Gur, Raquel E.; Gur, Ruben C.; Lazzeroni, Laura C.; Light, Gregory A.; Nuechterlein, Keith H.; Olincy, Ann; Radant, Allen D.; Siever, Larry J.; Silverman, Jeremy M.; Sprock, Joyce; Sugar, Catherine A.; Swerdlow, Neal R.; Tsuang, Debby W.; Tsuang, Ming T.; Turetsky, Bruce I.; Seidman, Larry J.
2018-01-01
The first phase of the Consortium on the Genetics of Schizophrenia (COGS-1) showed performance deficits in learning and memory on the California Verbal Learning Test, Second Edition (CVLT-II) in individuals with schizophrenia (SZ), compared to healthy comparison subjects (HCS). A question is whether the COGS-1 study, which used a family study design (i.e. studying relatively intact families), yielded “milder” SZ phenotypes than those acquired subsequently in the COGS-2 case–control design that did not recruit unaffected family members. CVLT-II performance was compared for the COGS-1 and COGS-2 samples. Analyses focused on learning, recall and recognition variables, with age, gender and education as covariates. Analyses of COGS-2 data explored effects of additional covariates and moderating factors in CVLT-II performance. 324 SZ subjects and 510 HCS had complete CVLT-II and covariate data in COGS-1, while 1356 SZ and 1036 HCS had complete data in COGS-2. Except for recognition memory, analysis of covariance showed significantly worse performance in COGS-2 on all CVLT-II variables for SZ and HCS, and remained significant in the presence of the covariates. Performance in each of the 5 learning trials differed significantly. However, effect sizes comparing cases and controls were comparable across the two studies. COGS-2 analyses confirmed SZ performance deficits despite effects of multiple significant covariates and moderating factors. CVLT-II performance was worse in COGS-2 than in COGS-1 for both the SZ and the HCS in this large cohort, likely due to cohort effects. Demographically corrected data yield a consistent pattern of performance across the two studies in SZ. PMID:25497440
Suboptimal schemes for atmospheric data assimilation based on the Kalman filter
NASA Technical Reports Server (NTRS)
Todling, Ricardo; Cohn, Stephen E.
1994-01-01
This work is directed toward approximating the evolution of forecast error covariances for data assimilation. The performance of different algorithms based on simplification of the standard Kalman filter (KF) is studied. These are suboptimal schemes (SOSs) when compared to the KF, which is optimal for linear problems with known statistics. The SOSs considered here are several versions of optimal interpolation (OI), a scheme for height error variance advection, and a simplified KF in which the full height error covariance is advected. To employ a methodology for exact comparison among these schemes, a linear environment is maintained, in which a beta-plane shallow-water model linearized about a constant zonal flow is chosen for the test-bed dynamics. The results show that constructing dynamically balanced forecast error covariances rather than using conventional geostrophically balanced ones is essential for successful performance of any SOS. A posteriori initialization of SOSs to compensate for model - data imbalance sometimes results in poor performance. Instead, properly constructed dynamically balanced forecast error covariances eliminate the need for initialization. When the SOSs studied here make use of dynamically balanced forecast error covariances, the difference among their performances progresses naturally from conventional OI to the KF. In fact, the results suggest that even modest enhancements of OI, such as including an approximate dynamical equation for height error variances while leaving height error correlation structure homogeneous, go a long way toward achieving the performance of the KF, provided that dynamically balanced cross-covariances are constructed and that model errors are accounted for properly. The results indicate that such enhancements are necessary if unconventional data are to have a positive impact.
Adjoints and Low-rank Covariance Representation
NASA Technical Reports Server (NTRS)
Tippett, Michael K.; Cohn, Stephen E.
2000-01-01
Quantitative measures of the uncertainty of Earth System estimates can be as important as the estimates themselves. Second moments of estimation errors are described by the covariance matrix, whose direct calculation is impractical when the number of degrees of freedom of the system state is large. Ensemble and reduced-state approaches to prediction and data assimilation replace full estimation error covariance matrices by low-rank approximations. The appropriateness of such approximations depends on the spectrum of the full error covariance matrix, whose calculation is also often impractical. Here we examine the situation where the error covariance is a linear transformation of a forcing error covariance. We use operator norms and adjoints to relate the appropriateness of low-rank representations to the conditioning of this transformation. The analysis is used to investigate low-rank representations of the steady-state response to random forcing of an idealized discrete-time dynamical system.
Bayesian Factor Analysis When Only a Sample Covariance Matrix Is Available
ERIC Educational Resources Information Center
Hayashi, Kentaro; Arav, Marina
2006-01-01
In traditional factor analysis, the variance-covariance matrix or the correlation matrix has often been a form of inputting data. In contrast, in Bayesian factor analysis, the entire data set is typically required to compute the posterior estimates, such as Bayes factor loadings and Bayes unique variances. We propose a simple method for computing…
ERIC Educational Resources Information Center
Cai, Li; Lee, Taehun
2009-01-01
We apply the Supplemented EM algorithm (Meng & Rubin, 1991) to address a chronic problem with the "two-stage" fitting of covariance structure models in the presence of ignorable missing data: the lack of an asymptotically chi-square distributed goodness-of-fit statistic. We show that the Supplemented EM algorithm provides a…
USDA-ARS?s Scientific Manuscript database
(Co)variance components for calving ease and stillbirth in US Holsteins were estimated using a single-trait threshold animal model and two different sets of data edits. Six sets of approximately 250,000 records each were created by randomly selecting herd codes without replacement from the data used...
Data Selection for Within-Class Covariance Estimation
2016-09-08
NIST evaluations to train the within- class and across-class covariance matrices required by these techniques, little attention has been paid to the...multiple utterances from a large population of speakers. Fortunately, participants in NIST evaluations have access to a repository of legacy data from...utterances chosen from previous NIST evaluations. Training data for the UBM and T-matrix was obtained from the NIST Switchboard 2 phases 2-5 and
Singularity and Nonnormality in the Classification of Compositional Data
Bohling, Geoffrey C.; Davis, J.C.; Olea, R.A.; Harff, Jan
1998-01-01
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore-Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.
Jahng, Seungmin; Wood, Phillip K.
2017-01-01
Intensive longitudinal studies, such as ecological momentary assessment studies using electronic diaries, are gaining popularity across many areas of psychology. Multilevel models (MLMs) are most widely used analytical tools for intensive longitudinal data (ILD). Although ILD often have individually distinct patterns of serial correlation of measures over time, inferences of the fixed effects, and random components in MLMs are made under the assumption that all variance and autocovariance components are homogenous across individuals. In the present study, we introduced a multilevel model with Cholesky transformation to model ILD with individually heterogeneous covariance structure. In addition, the performance of the transformation method and the effects of misspecification of heterogeneous covariance structure were investigated through a Monte Carlo simulation. We found that, if individually heterogeneous covariances are incorrectly assumed as homogenous independent or homogenous autoregressive, MLMs produce highly biased estimates of the variance of random intercepts and the standard errors of the fixed intercept and the fixed effect of a level 2 covariate when the average autocorrelation is high. For intensive longitudinal data with individual specific residual covariance, the suggested transformation method showed lower bias in those estimates than the misspecified models when the number of repeated observations within individuals is 50 or more. PMID:28286490
Independent component analysis of DTI data reveals white matter covariances in Alzheimer's disease
NASA Astrophysics Data System (ADS)
Ouyang, Xin; Sun, Xiaoyu; Guo, Ting; Sun, Qiaoyue; Chen, Kewei; Yao, Li; Wu, Xia; Guo, Xiaojuan
2014-03-01
Alzheimer's disease (AD) is a progressive neurodegenerative disease with the clinical symptom of the continuous deterioration of cognitive and memory functions. Multiple diffusion tensor imaging (DTI) indices such as fractional anisotropy (FA) and mean diffusivity (MD) can successfully explain the white matter damages in AD patients. However, most studies focused on the univariate measures (voxel-based analysis) to examine the differences between AD patients and normal controls (NCs). In this investigation, we applied a multivariate independent component analysis (ICA) to investigate the white matter covariances based on FA measurement from DTI data in 35 AD patients and 45 NCs from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We found that six independent components (ICs) showed significant FA reductions in white matter covariances in AD compared with NC, including the genu and splenium of corpus callosum (IC-1 and IC-2), middle temporal gyral of temporal lobe (IC-3), sub-gyral of frontal lobe (IC-4 and IC-5) and sub-gyral of parietal lobe (IC-6). Our findings revealed covariant white matter loss in AD patients and suggest that the unsupervised data-driven ICA method is effective to explore the changes of FA in AD. This study assists us in understanding the mechanism of white matter covariant reductions in the development of AD.
Wave height data assimilation using non-stationary kriging
NASA Astrophysics Data System (ADS)
Tolosana-Delgado, R.; Egozcue, J. J.; Sáchez-Arcilla, A.; Gómez, J.
2011-03-01
Data assimilation into numerical models should be both computationally fast and physically meaningful, in order to be applicable in online environmental surveillance. We present a way to improve assimilation for computationally intensive models, based on non-stationary kriging and a separable space-time covariance function. The method is illustrated with significant wave height data. The covariance function is expressed as a collection of fields: each one is obtained as the empirical covariance between the studied property (significant wave height in log-scale) at a pixel where a measurement is located (a wave-buoy is available) and the same parameter at every other pixel of the field. These covariances are computed from the available history of forecasts. The method provides a set of weights, that can be mapped for each measuring location, and that do not vary with time. Resulting weights may be used in a weighted average of the differences between the forecast and measured parameter. In the case presented, these weights may show long-range connection patterns, such as between the Catalan coast and the eastern coast of Sardinia, associated to common prevailing meteo-oceanographic conditions. When such patterns are considered as non-informative of the present situation, it is always possible to diminish their influence by relaxing the covariance maps.
Dorazio, Robert M.
2012-01-01
Several models have been developed to predict the geographic distribution of a species by combining measurements of covariates of occurrence at locations where the species is known to be present with measurements of the same covariates at other locations where species occurrence status (presence or absence) is unknown. In the absence of species detection errors, spatial point-process models and binary-regression models for case-augmented surveys provide consistent estimators of a species’ geographic distribution without prior knowledge of species prevalence. In addition, these regression models can be modified to produce estimators of species abundance that are asymptotically equivalent to those of the spatial point-process models. However, if species presence locations are subject to detection errors, neither class of models provides a consistent estimator of covariate effects unless the covariates of species abundance are distinct and independently distributed from the covariates of species detection probability. These analytical results are illustrated using simulation studies of data sets that contain a wide range of presence-only sample sizes. Analyses of presence-only data of three avian species observed in a survey of landbirds in western Montana and northern Idaho are compared with site-occupancy analyses of detections and nondetections of these species.
ERIC Educational Resources Information Center
Moustaki, Irini; Joreskog, Karl G.; Mavridis, Dimitris
2004-01-01
We consider a general type of model for analyzing ordinal variables with covariate effects and 2 approaches for analyzing data for such models, the item response theory (IRT) approach and the PRELIS-LISREL (PLA) approach. We compare these 2 approaches on the basis of 2 examples, 1 involving only covariate effects directly on the ordinal variables…
Covariance Recovery from a Square Root Information Matrix for Data Association
2009-07-02
association is one of the core problems of simultaneous localization and mapping (SLAM), and it requires knowledge about the uncertainties of the...association is one of the core problems of simultaneous localization and mapping (SLAM), and it requires knowledge about the uncertainties of the...back-substitution as well as efficient access to marginal covariances, which is described next. 2.2. Recovering Marginal Covariances Knowledge of the
Li, Siying; Koch, Gary G; Preisser, John S; Lam, Diana; Sanchez-Kam, Matilde
2017-01-01
Dichotomous endpoints in clinical trials have only two possible outcomes, either directly or via categorization of an ordinal or continuous observation. It is common to have missing data for one or more visits during a multi-visit study. This paper presents a closed form method for sensitivity analysis of a randomized multi-visit clinical trial that possibly has missing not at random (MNAR) dichotomous data. Counts of missing data are redistributed to the favorable and unfavorable outcomes mathematically to address possibly informative missing data. Adjusted proportion estimates and their closed form covariance matrix estimates are provided. Treatment comparisons over time are addressed with Mantel-Haenszel adjustment for a stratification factor and/or randomization-based adjustment for baseline covariables. The application of such sensitivity analyses is illustrated with an example. An appendix outlines an extension of the methodology to ordinal endpoints.
A Robust Adaptive Unscented Kalman Filter for Nonlinear Estimation with Uncertain Noise Covariance.
Zheng, Binqi; Fu, Pengcheng; Li, Baoqing; Yuan, Xiaobing
2018-03-07
The Unscented Kalman filter (UKF) may suffer from performance degradation and even divergence while mismatch between the noise distribution assumed as a priori by users and the actual ones in a real nonlinear system. To resolve this problem, this paper proposes a robust adaptive UKF (RAUKF) to improve the accuracy and robustness of state estimation with uncertain noise covariance. More specifically, at each timestep, a standard UKF will be implemented first to obtain the state estimations using the new acquired measurement data. Then an online fault-detection mechanism is adopted to judge if it is necessary to update current noise covariance. If necessary, innovation-based method and residual-based method are used to calculate the estimations of current noise covariance of process and measurement, respectively. By utilizing a weighting factor, the filter will combine the last noise covariance matrices with the estimations as the new noise covariance matrices. Finally, the state estimations will be corrected according to the new noise covariance matrices and previous state estimations. Compared with the standard UKF and other adaptive UKF algorithms, RAUKF converges faster to the actual noise covariance and thus achieves a better performance in terms of robustness, accuracy, and computation for nonlinear estimation with uncertain noise covariance, which is demonstrated by the simulation results.
A Robust Adaptive Unscented Kalman Filter for Nonlinear Estimation with Uncertain Noise Covariance
Zheng, Binqi; Yuan, Xiaobing
2018-01-01
The Unscented Kalman filter (UKF) may suffer from performance degradation and even divergence while mismatch between the noise distribution assumed as a priori by users and the actual ones in a real nonlinear system. To resolve this problem, this paper proposes a robust adaptive UKF (RAUKF) to improve the accuracy and robustness of state estimation with uncertain noise covariance. More specifically, at each timestep, a standard UKF will be implemented first to obtain the state estimations using the new acquired measurement data. Then an online fault-detection mechanism is adopted to judge if it is necessary to update current noise covariance. If necessary, innovation-based method and residual-based method are used to calculate the estimations of current noise covariance of process and measurement, respectively. By utilizing a weighting factor, the filter will combine the last noise covariance matrices with the estimations as the new noise covariance matrices. Finally, the state estimations will be corrected according to the new noise covariance matrices and previous state estimations. Compared with the standard UKF and other adaptive UKF algorithms, RAUKF converges faster to the actual noise covariance and thus achieves a better performance in terms of robustness, accuracy, and computation for nonlinear estimation with uncertain noise covariance, which is demonstrated by the simulation results. PMID:29518960
Strategies for reducing large fMRI data sets for independent component analysis.
Wang, Ze; Wang, Jiongjiong; Calhoun, Vince; Rao, Hengyi; Detre, John A; Childress, Anna R
2006-06-01
In independent component analysis (ICA), principal component analysis (PCA) is generally used to reduce the raw data to a few principal components (PCs) through eigenvector decomposition (EVD) on the data covariance matrix. Although this works for spatial ICA (sICA) on moderately sized fMRI data, it is intractable for temporal ICA (tICA), since typical fMRI data have a high spatial dimension, resulting in an unmanageable data covariance matrix. To solve this problem, two practical data reduction methods are presented in this paper. The first solution is to calculate the PCs of tICA from the PCs of sICA. This approach works well for moderately sized fMRI data; however, it is highly computationally intensive, even intractable, when the number of scans increases. The second solution proposed is to perform PCA decomposition via a cascade recursive least squared (CRLS) network, which provides a uniform data reduction solution for both sICA and tICA. Without the need to calculate the covariance matrix, CRLS extracts PCs directly from the raw data, and the PC extraction can be terminated after computing an arbitrary number of PCs without the need to estimate the whole set of PCs. Moreover, when the whole data set becomes too large to be loaded into the machine memory, CRLS-PCA can save data retrieval time by reading the data once, while the conventional PCA requires numerous data retrieval steps for both covariance matrix calculation and PC extractions. Real fMRI data were used to evaluate the PC extraction precision, computational expense, and memory usage of the presented methods.
Tarai, Madhumita; Kumar, Keshav; Divya, O; Bairi, Partha; Mishra, Kishor Kumar; Mishra, Ashok Kumar
2017-09-05
The present work compares the dissimilarity and covariance based unsupervised chemometric classification approaches by taking the total synchronous fluorescence spectroscopy data sets acquired for the cumin and non-cumin based herbal preparations. The conventional decomposition method involves eigenvalue-eigenvector analysis of the covariance of the data set and finds the factors that can explain the overall major sources of variation present in the data set. The conventional approach does this irrespective of the fact that the samples belong to intrinsically different groups and hence leads to poor class separation. The present work shows that classification of such samples can be optimized by performing the eigenvalue-eigenvector decomposition on the pair-wise dissimilarity matrix. Copyright © 2017 Elsevier B.V. All rights reserved.
Toward a clearer portrayal of confounding bias in instrumental variable applications.
Jackson, John W; Swanson, Sonja A
2015-07-01
Recommendations for reporting instrumental variable analyses often include presenting the balance of covariates across levels of the proposed instrument and levels of the treatment. However, such presentation can be misleading as relatively small imbalances among covariates across levels of the instrument can result in greater bias because of bias amplification. We introduce bias plots and bias component plots as alternative tools for understanding biases in instrumental variable analyses. Using previously published data on proposed preference-based, geography-based, and distance-based instruments, we demonstrate why presenting covariate balance alone can be problematic, and how bias component plots can provide more accurate context for bias from omitting a covariate from an instrumental variable versus non-instrumental variable analysis. These plots can also provide relevant comparisons of different proposed instruments considered in the same data. Adaptable code is provided for creating the plots.
Best (but oft-forgotten) practices: propensity score methods in clinical nutrition research.
Ali, M Sanni; Groenwold, Rolf Hh; Klungel, Olaf H
2016-08-01
In observational studies, treatment assignment is a nonrandom process and treatment groups may not be comparable in their baseline characteristics, a phenomenon known as confounding. Propensity score (PS) methods can be used to achieve comparability of treated and nontreated groups in terms of their observed covariates and, as such, control for confounding in estimating treatment effects. In this article, we provide a step-by-step guidance on how to use PS methods. For illustrative purposes, we used simulated data based on an observational study of the relation between oral nutritional supplementation and hospital length of stay. We focused on the key aspects of PS analysis, including covariate selection, PS estimation, covariate balance assessment, treatment effect estimation, and reporting. PS matching, stratification, covariate adjustment, and weighting are discussed. R codes and example data are provided to show the different steps in a PS analysis. © 2016 American Society for Nutrition.
Kovalchik, Stephanie A; Cumberland, William G
2012-05-01
Subgroup analyses are important to medical research because they shed light on the heterogeneity of treatment effectts. A treatment-covariate interaction in an individual patient data (IPD) meta-analysis is the most reliable means to estimate how a subgroup factor modifies a treatment's effectiveness. However, owing to the challenges in collecting participant data, an approach based on aggregate data might be the only option. In these circumstances, it would be useful to assess the relative efficiency and power loss of a subgroup analysis without patient-level data. We present methods that use aggregate data to estimate the standard error of an IPD meta-analysis' treatment-covariate interaction for regression models of a continuous or dichotomous patient outcome. Numerical studies indicate that the estimators have good accuracy. An application to a previously published meta-regression illustrates the practical utility of the methodology. © 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Regression analysis of sparse asynchronous longitudinal data
Cao, Hongyuan; Zeng, Donglin; Fine, Jason P.
2015-01-01
Summary We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus. PMID:26568699
DISSCO: direct imputation of summary statistics allowing covariates
Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun
2015-01-01
Background: Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. Methods: We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). Results: We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9–15.2% for variants with minor allele frequency <5%. Availability and implementation: http://www.unc.edu/∼yunmli/DISSCO. Contact: yunli@med.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25810429
[Calculating Pearson residual in logistic regressions: a comparison between SPSS and SAS].
Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan
2015-01-01
To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.
Wang, Xiaojing; Chen, Ming-Hui; Yan, Jun
2013-07-01
Cox models with time-varying coefficients offer great flexibility in capturing the temporal dynamics of covariate effects on event times, which could be hidden from a Cox proportional hazards model. Methodology development for varying coefficient Cox models, however, has been largely limited to right censored data; only limited work on interval censored data has been done. In most existing methods for varying coefficient models, analysts need to specify which covariate coefficients are time-varying and which are not at the time of fitting. We propose a dynamic Cox regression model for interval censored data in a Bayesian framework, where the coefficient curves are piecewise constant but the number of pieces and the jump points are covariate specific and estimated from the data. The model automatically determines the extent to which the temporal dynamics is needed for each covariate, resulting in smoother and more stable curve estimates. The posterior computation is carried out via an efficient reversible jump Markov chain Monte Carlo algorithm. Inference of each coefficient is based on an average of models with different number of pieces and jump points. A simulation study with three covariates, each with a coefficient of different degree in temporal dynamics, confirmed that the dynamic model is preferred to the existing time-varying model in terms of model comparison criteria through conditional predictive ordinate. When applied to a dental health data of children with age between 7 and 12 years, the dynamic model reveals that the relative risk of emergence of permanent tooth 24 between children with and without an infected primary predecessor is the highest at around age 7.5, and that it gradually reduces to one after age 11. These findings were not seen from the existing studies with Cox proportional hazards models.
Covariance Structure Models for Gene Expression Microarray Data
ERIC Educational Resources Information Center
Xie, Jun; Bentler, Peter M.
2003-01-01
Covariance structure models are applied to gene expression data using a factor model, a path model, and their combination. The factor model is based on a few factors that capture most of the expression information. A common factor of a group of genes may represent a common protein factor for the transcript of the co-expressed genes, and hence, it…
Transurethral Ultrasound Diffraction Tomography
2007-03-01
the covariance matrix was derived. The covariance reduced to that of the X- ray CT under the assumptions of linear operator and real data.[5] The...the covariance matrix in the linear x- ray computed tomography is a special case of the inverse scattering matrix derived in this paper. The matrix was...is derived in Sec. IV, and its relation to that of the linear x- ray computed tomography appears in Sec. V. In Sec. VI, the inverse scattering
Detecting population-environmental interactions with mismatched time series data.
Ferguson, Jake M; Reichert, Brian E; Fletcher, Robert J; Jager, Henriëtte I
2017-11-01
Time series analysis is an essential method for decomposing the influences of density and exogenous factors such as weather and climate on population regulation. However, there has been little work focused on understanding how well commonly collected data can reconstruct the effects of environmental factors on population dynamics. We show that, analogous to similar scale issues in spatial data analysis, coarsely sampled temporal data can fail to detect covariate effects when interactions occur on timescales that are fast relative to the survey period. We propose a method for modeling mismatched time series data that couples high-resolution environmental data to low-resolution abundance data. We illustrate our approach with simulations and by applying it to Florida's southern Snail kite population. Our simulation results show that our method can reliably detect linear environmental effects and that detecting nonlinear effects requires high-resolution covariate data even when the population turnover rate is slow. In the Snail kite analysis, our approach performed among the best in a suite of previously used environmental covariates explaining Snail kite dynamics and was able to detect a potential phenological shift in the environmental dependence of Snail kites. Our work provides a statistical framework for reliably detecting population-environment interactions from coarsely surveyed time series. An important implication of this work is that the low predictability of animal population growth by weather variables found in previous studies may be due, in part, to how these data are utilized as covariates. © 2017 by the Ecological Society of America.
Detecting population–environmental interactions with mismatched time series data
Ferguson, Jake M.; Reichert, Brian E.; Fletcher, Robert J.; Jager, Henriëtte I.
2017-01-01
Time series analysis is an essential method for decomposing the influences of density and exogenous factors such as weather and climate on population regulation. However, there has been little work focused on understanding how well commonly collected data can reconstruct the effects of environmental factors on population dynamics. We show that, analogous to similar scale issues in spatial data analysis, coarsely sampled temporal data can fail to detect covariate effects when interactions occur on timescales that are fast relative to the survey period. We propose a method for modeling mismatched time series data that couples high-resolution environmental data to low-resolution abundance data. We illustrate our approach with simulations and by applying it to Florida’s southern Snail kite population. Our simulation results show that our method can reliably detect linear environmental effects and that detecting nonlinear effects requires high-resolution covariate data even when the population turnover rate is slow. In the Snail kite analysis, our approach performed among the best in a suite of previously used environmental covariates explaining Snail kite dynamics and was able to detect a potential phenological shift in the environmental dependence of Snail kites. Our work provides a statistical framework for reliably detecting population–environment interactions from coarsely surveyed time series. An important implication of this work is that the low predictability of animal population growth by weather variables found in previous studies may be due, in part, to how these data are utilized as covariates. PMID:28759123
Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno
2016-01-22
Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
NASA Technical Reports Server (NTRS)
Menga, G.
1975-01-01
An approach, is proposed for the design of approximate, fixed order, discrete time realizations of stochastic processes from the output covariance over a finite time interval, was proposed. No restrictive assumptions are imposed on the process; it can be nonstationary and lead to a high dimension realization. Classes of fixed order models are defined, having the joint covariance matrix of the combined vector of the outputs in the interval of definition greater or equal than the process covariance; (the difference matrix is nonnegative definite). The design is achieved by minimizing, in one of those classes, a measure of the approximation between the model and the process evaluated by the trace of the difference of the respective covariance matrices. Models belonging to these classes have the notable property that, under the same measurement system and estimator structure, the output estimation error covariance matrix computed on the model is an upper bound of the corresponding covariance on the real process. An application of the approach is illustrated by the modeling of random meteorological wind profiles from the statistical analysis of historical data.
Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan
2015-01-01
Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.
Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R
2018-06-01
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
NASA Astrophysics Data System (ADS)
McKay, N.
2017-12-01
As timescale increases from years to centuries, the spatial scale of covariability in the climate system is hypothesized to increase as well. Covarying spatial scales are larger for temperature than for hydroclimate, however, both aspects of the climate system show systematic changes on large-spatial scales on orbital to tectonic timescales. The extent to which this phenomenon is evident in temperature and hydroclimate at centennial timescales is largely unknown. Recent syntheses of multidecadal to century-scale variability in hydroclimate during the past 2k in the Arctic, North America, and Australasia show little spatial covariability in hydroclimate during the Common Era. To determine 1) the evidence for systematic relationships between the spatial scale of climate covariability as a function of timescale, and 2) whether century-scale hydroclimate variability deviates from the relationship between spatial covariability and timescale, we quantify this phenomenon during the Common Era by calculating the e-folding distance in large instrumental and paleoclimate datasets. We calculate this metric of spatial covariability, at different timescales (1, 10 and 100-yr), for a large network of temperature and precipitation observations from the Global Historical Climatology Network (n=2447), from v2.0.0 of the PAGES2k temperature database (n=692), and from moisture-sensitive paleoclimate records North America, the Arctic, and the Iso2k project (n = 328). Initial results support the hypothesis that the spatial scale of covariability is larger for temperature, than for precipitation or paleoclimate hydroclimate indicators. Spatially, e-folding distances for temperature are largest at low latitudes and over the ocean. Both instrumental and proxy temperature data show clear evidence for increasing spatial extent as a function of timescale, but this phenomenon is very weak in the hydroclimate data analyzed here. In the proxy hydroclimate data, which are predominantly indicators of effective moisture, e-folding distance increases from annual to decadal timescales, but does not continue to increase to centennial timescales. Future work includes examining additional instrumental and proxy datasets of moisture variability, and extending the analysis to millennial timescales of variability.
Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume
2013-01-01
Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.
Combined Use of Integral Experiments and Covariance Data
NASA Astrophysics Data System (ADS)
Palmiotti, G.; Salvatores, M.; Aliberti, G.; Herman, M.; Hoblit, S. D.; McKnight, R. D.; Obložinský, P.; Talou, P.; Hale, G. M.; Hiruta, H.; Kawano, T.; Mattoon, C. M.; Nobre, G. P. A.; Palumbo, A.; Pigni, M.; Rising, M. E.; Yang, W.-S.; Kahler, A. C.
2014-04-01
In the frame of a US-DOE sponsored project, ANL, BNL, INL and LANL have performed a joint multidisciplinary research activity in order to explore the combined use of integral experiments and covariance data with the objective to both give quantitative indications on possible improvements of the ENDF evaluated data files and to reduce at the same time crucial reactor design parameter uncertainties. Methods that have been developed in the last four decades for the purposes indicated above have been improved by some new developments that benefited also by continuous exchanges with international groups working in similar areas. The major new developments that allowed significant progress are to be found in several specific domains: a) new science-based covariance data; b) integral experiment covariance data assessment and improved experiment analysis, e.g., of sample irradiation experiments; c) sensitivity analysis, where several improvements were necessary despite the generally good understanding of these techniques, e.g., to account for fission spectrum sensitivity; d) a critical approach to the analysis of statistical adjustments performance, both a priori and a posteriori; e) generalization of the assimilation method, now applied for the first time not only to multigroup cross sections data but also to nuclear model parameters (the "consistent" method). This article describes the major results obtained in each of these areas; a large scale nuclear data adjustment, based on the use of approximately one hundred high-accuracy integral experiments, will be reported along with a significant example of the application of the new "consistent" method of data assimilation.
A Bayesian approach to multivariate measurement system assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamada, Michael Scott
This article considers system assessment for multivariate measurements and presents a Bayesian approach to analyzing gauge R&R study data. The evaluation of variances for univariate measurement becomes the evaluation of covariance matrices for multivariate measurements. The Bayesian approach ensures positive definite estimates of the covariance matrices and easily provides their uncertainty. Furthermore, various measurement system assessment criteria are easily evaluated. The approach is illustrated with data from a real gauge R&R study as well as simulated data.
Statistics for characterizing data on the periphery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Estimating animal resource selection from telemetry data using point process models
Johnson, Devin S.; Hooten, Mevin B.; Kuhn, Carey E.
2013-01-01
To demonstrate the analysis of telemetry data with the point process approach, we analysed a data set of telemetry locations from northern fur seals (Callorhinus ursinus) in the Pribilof Islands, Alaska. Both a space–time and an aggregated space-only model were fitted. At the individual level, the space–time analysis showed little selection relative to the habitat covariates. However, at the study area level, the space-only model showed strong selection relative to the covariates.
Andrew D. Richardson; David Y. Hollinger; David Y. Hollinger
2005-01-01
Whether the goal is to fill gaps in the flux record, or to extract physiological parameters from eddy covariance data, researchers are frequently interested in fitting simple models of ecosystem physiology to measured data. Presently, there is no consensus on the best models to use, or the ideal optimization criteria. We demonstrate that, given our estimates of the...
A Bayesian approach to multivariate measurement system assessment
Hamada, Michael Scott
2016-07-01
This article considers system assessment for multivariate measurements and presents a Bayesian approach to analyzing gauge R&R study data. The evaluation of variances for univariate measurement becomes the evaluation of covariance matrices for multivariate measurements. The Bayesian approach ensures positive definite estimates of the covariance matrices and easily provides their uncertainty. Furthermore, various measurement system assessment criteria are easily evaluated. The approach is illustrated with data from a real gauge R&R study as well as simulated data.
Brier, Matthew R; Mitra, Anish; McCarthy, John E; Ances, Beau M; Snyder, Abraham Z
2015-11-01
Functional connectivity refers to shared signals among brain regions and is typically assessed in a task free state. Functional connectivity commonly is quantified between signal pairs using Pearson correlation. However, resting-state fMRI is a multivariate process exhibiting a complicated covariance structure. Partial covariance assesses the unique variance shared between two brain regions excluding any widely shared variance, hence is appropriate for the analysis of multivariate fMRI datasets. However, calculation of partial covariance requires inversion of the covariance matrix, which, in most functional connectivity studies, is not invertible owing to rank deficiency. Here we apply Ledoit-Wolf shrinkage (L2 regularization) to invert the high dimensional BOLD covariance matrix. We investigate the network organization and brain-state dependence of partial covariance-based functional connectivity. Although RSNs are conventionally defined in terms of shared variance, removal of widely shared variance, surprisingly, improved the separation of RSNs in a spring embedded graphical model. This result suggests that pair-wise unique shared variance plays a heretofore unrecognized role in RSN covariance organization. In addition, application of partial correlation to fMRI data acquired in the eyes open vs. eyes closed states revealed focal changes in uniquely shared variance between the thalamus and visual cortices. This result suggests that partial correlation of resting state BOLD time series reflect functional processes in addition to structural connectivity. Copyright © 2015 Elsevier Inc. All rights reserved.
Brier, Matthew R.; Mitra, Anish; McCarthy, John E.; Ances, Beau M.; Snyder, Abraham Z.
2015-01-01
Functional connectivity refers to shared signals among brain regions and is typically assessed in a task free state. Functional connectivity commonly is quantified between signal pairs using Pearson correlation. However, resting-state fMRI is a multivariate process exhibiting a complicated covariance structure. Partial covariance assesses the unique variance shared between two brain regions excluding any widely shared variance, hence is appropriate for the analysis of multivariate fMRI datasets. However, calculation of partial covariance requires inversion of the covariance matrix, which, in most functional connectivity studies, is not invertible owing to rank deficiency. Here we apply Ledoit-Wolf shrinkage (L2 regularization) to invert the high dimensional BOLD covariance matrix. We investigate the network organization and brain-state dependence of partial covariance-based functional connectivity. Although RSNs are conventionally defined in terms of shared variance, removal of widely shared variance, surprisingly, improved the separation of RSNs in a spring embedded graphical model. This result suggests that pair-wise unique shared variance plays a heretofore unrecognized role in RSN covariance organization. In addition, application of partial correlation to fMRI data acquired in the eyes open vs. eyes closed states revealed focal changes in uniquely shared variance between the thalamus and visual cortices. This result suggests that partial correlation of resting state BOLD time series reflect functional processes in addition to structural connectivity. PMID:26208872
Unleashing Empirical Equations with "Nonlinear Fitting" and "GUM Tree Calculator"
NASA Astrophysics Data System (ADS)
Lovell-Smith, J. W.; Saunders, P.; Feistel, R.
2017-10-01
Empirical equations having large numbers of fitted parameters, such as the international standard reference equations published by the International Association for the Properties of Water and Steam (IAPWS), which form the basis of the "Thermodynamic Equation of Seawater—2010" (TEOS-10), provide the means to calculate many quantities very accurately. The parameters of these equations are found by least-squares fitting to large bodies of measurement data. However, the usefulness of these equations is limited since uncertainties are not readily available for most of the quantities able to be calculated, the covariance of the measurement data is not considered, and further propagation of the uncertainty in the calculated result is restricted since the covariance of calculated quantities is unknown. In this paper, we present two tools developed at MSL that are particularly useful in unleashing the full power of such empirical equations. "Nonlinear Fitting" enables propagation of the covariance of the measurement data into the parameters using generalized least-squares methods. The parameter covariance then may be published along with the equations. Then, when using these large, complex equations, "GUM Tree Calculator" enables the simultaneous calculation of any derived quantity and its uncertainty, by automatic propagation of the parameter covariance into the calculated quantity. We demonstrate these tools in exploratory work to determine and propagate uncertainties associated with the IAPWS-95 parameters.
Coil-to-coil physiological noise correlations and their impact on fMRI time-series SNR
Triantafyllou, C.; Polimeni, J. R.; Keil, B.; Wald, L. L.
2017-01-01
Purpose Physiological nuisance fluctuations (“physiological noise”) are a major contribution to the time-series Signal to Noise Ratio (tSNR) of functional imaging. While thermal noise correlations between array coil elements have a well-characterized effect on the image Signal to Noise Ratio (SNR0), the element-to-element covariance matrix of the time-series fluctuations has not yet been analyzed. We examine this effect with a goal of ultimately improving the combination of multichannel array data. Theory and Methods We extend the theoretical relationship between tSNR and SNR0 to include a time-series noise covariance matrix Ψt, distinct from the thermal noise covariance matrix Ψ0, and compare its structure to Ψ0 and the signal coupling matrix SSH formed from the signal intensity vectors S. Results Inclusion of the measured time-series noise covariance matrix into the model relating tSNR and SNR0 improves the fit of experimental multichannel data and is shown to be distinct from Ψ0 or SSH. Conclusion Time-series noise covariances in array coils are found to differ from Ψ0 and more surprisingly, from the signal coupling matrix SSH. Correct characterization of the time-series noise has implications for the analysis of time-series data and for improving the coil element combination process. PMID:26756964
HSQC-1,n-ADEQUATE: a new approach to long-range 13C-13C correlation by covariance processing.
Martin, Gary E; Hilton, Bruce D; Willcott, M Robert; Blinov, Kirill A
2011-10-01
Long-range, two-dimensional heteronuclear shift correlation NMR methods play a pivotal role in the assembly of novel molecular structures. The well-established GHMBC method is a high-sensitivity mainstay technique, affording connectivity information via (n)J(CH) coupling pathways. Unfortunately, there is no simple way of determining the value of n and hence no way of differentiating two-bond from three- and occasionally four-bond correlations. Three-bond correlations, however, generally predominate. Recent work has shown that the unsymmetrical indirect covariance or generalized indirect covariance processing of multiplicity edited GHSQC and 1,1-ADEQUATE spectra provides high-sensitivity access to a (13)C-(13) C connectivity map in the form of an HSQC-1,1-ADEQUATE spectrum. Covariance processing of these data allows the 1,1-ADEQUATE connectivity information to be exploited with the inherent sensitivity of the GHSQC spectrum rather than the intrinsically lower sensitivity of the 1,1-ADEQUATE spectrum itself. Data acquisition times and/or sample size can be substantially reduced when covariance processing is to be employed. In an extension of that work, 1,n-ADEQUATE spectra can likewise be subjected to covariance processing to afford high-sensitivity access to the equivalent of (4)J(CH) GHMBC connectivity information. The method is illustrated using strychnine as a model compound. Copyright © 2011 John Wiley & Sons, Ltd.
Estimating restricted mean treatment effects with stacked survival models
Wey, Andrew; Vock, David M.; Connett, John; Rudser, Kyle
2016-01-01
The difference in restricted mean survival times between two groups is a clinically relevant summary measure. With observational data, there may be imbalances in confounding variables between the two groups. One approach to account for such imbalances is estimating a covariate-adjusted restricted mean difference by modeling the covariate-adjusted survival distribution, and then marginalizing over the covariate distribution. Since the estimator for the restricted mean difference is defined by the estimator for the covariate-adjusted survival distribution, it is natural to expect that a better estimator of the covariate-adjusted survival distribution is associated with a better estimator of the restricted mean difference. We therefore propose estimating restricted mean differences with stacked survival models. Stacked survival models estimate a weighted average of several survival models by minimizing predicted error. By including a range of parametric, semi-parametric, and non-parametric models, stacked survival models can robustly estimate a covariate-adjusted survival distribution and, therefore, the restricted mean treatment effect in a wide range of scenarios. We demonstrate through a simulation study that better performance of the covariate-adjusted survival distribution often leads to better mean-squared error of the restricted mean difference although there are notable exceptions. In addition, we demonstrate that the proposed estimator can perform nearly as well as Cox regression when the proportional hazards assumption is satisfied and significantly better when proportional hazards is violated. Finally, the proposed estimator is illustrated with data from the United Network for Organ Sharing to evaluate post-lung transplant survival between large and small-volume centers. PMID:26934835
Least-Squares Data Adjustment with Rank-Deficient Data Covariance Matrices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, J.G.
2011-07-01
A derivation of the linear least-squares adjustment formulae is required that avoids the assumption that the covariance matrix of prior parameters can be inverted. Possible proofs are of several kinds, including: (i) extension of standard results for the linear regression formulae, and (ii) minimization by differentiation of a quadratic form of the deviations in parameters and responses. In this paper, the least-squares adjustment equations are derived in both these ways, while explicitly assuming that the covariance matrix of prior parameters is singular. It will be proved that the solutions are unique and that, contrary to statements that have appeared inmore » the literature, the least-squares adjustment problem is not ill-posed. No modification is required to the adjustment formulae that have been used in the past in the case of a singular covariance matrix for the priors. In conclusion: The linear least-squares adjustment formula that has been used in the past is valid in the case of a singular covariance matrix for the covariance matrix of prior parameters. Furthermore, it provides a unique solution. Statements in the literature, to the effect that the problem is ill-posed are wrong. No regularization of the problem is required. This has been proved in the present paper by two methods, while explicitly assuming that the covariance matrix of prior parameters is singular: i) extension of standard results for the linear regression formulae, and (ii) minimization by differentiation of a quadratic form of the deviations in parameters and responses. No modification is needed to the adjustment formulae that have been used in the past. (author)« less
Statistics of the residual refraction errors in laser ranging data
NASA Technical Reports Server (NTRS)
Gardner, C. S.
1977-01-01
A theoretical model for the range error covariance was derived by assuming that the residual refraction errors are due entirely to errors in the meteorological data which are used to calculate the atmospheric correction. The properties of the covariance function are illustrated by evaluating the theoretical model for the special case of a dense network of weather stations uniformly distributed within a circle.
ERIC Educational Resources Information Center
Hattori, Masasi; Oaksford, Mike
2007-01-01
In this article, 41 models of covariation detection from 2 x 2 contingency tables were evaluated against past data in the literature and against data from new experiments. A new model was also included based on a limiting case of the normative phi-coefficient under an extreme rarity assumption, which has been shown to be an important factor in…
Towards a covariance matrix of CAB model parameters for H(H2O)
NASA Astrophysics Data System (ADS)
Scotta, Juan Pablo; Noguere, Gilles; Damian, José Ignacio Marquez
2017-09-01
Preliminary results on the uncertainties of hydrogen into light water thermal scattering law of the CAB model are presented. It was done through a coupling between the nuclear data code CONRAD and the molecular dynamic simulations code GROMACS. The Generalized Least Square method was used to adjust the model parameters on evaluated data and generate covariance matrices between the CAB model parameters.
Space-time models based on random fields with local interactions
NASA Astrophysics Data System (ADS)
Hristopulos, Dionissios T.; Tsantili, Ivi C.
2016-08-01
The analysis of space-time data from complex, real-life phenomena requires the use of flexible and physically motivated covariance functions. In most cases, it is not possible to explicitly solve the equations of motion for the fields or the respective covariance functions. In the statistical literature, covariance functions are often based on mathematical constructions. In this paper, we propose deriving space-time covariance functions by solving “effective equations of motion”, which can be used as statistical representations of systems with diffusive behavior. In particular, we propose to formulate space-time covariance functions based on an equilibrium effective Hamiltonian using the linear response theory. The effective space-time dynamics is then generated by a stochastic perturbation around the equilibrium point of the classical field Hamiltonian leading to an associated Langevin equation. We employ a Hamiltonian which extends the classical Gaussian field theory by including a curvature term and leads to a diffusive Langevin equation. Finally, we derive new forms of space-time covariance functions.
NASA Astrophysics Data System (ADS)
Zhaunerchyk, V.; Frasinski, L. J.; Eland, J. H. D.; Feifel, R.
2014-05-01
Multidimensional covariance analysis and its validity for correlation of processes leading to multiple products are investigated from a theoretical point of view. The need to correct for false correlations induced by experimental parameters which fluctuate from shot to shot, such as the intensity of self-amplified spontaneous emission x-ray free-electron laser pulses, is emphasized. Threefold covariance analysis based on simple extension of the two-variable formulation is shown to be valid for variables exhibiting Poisson statistics. In this case, false correlations arising from fluctuations in an unstable experimental parameter that scale linearly with signals can be eliminated by threefold partial covariance analysis, as defined here. Fourfold covariance based on the same simple extension is found to be invalid in general. Where fluctuations in an unstable parameter induce nonlinear signal variations, a technique of contingent covariance analysis is proposed here to suppress false correlations. In this paper we also show a method to eliminate false correlations associated with fluctuations of several unstable experimental parameters.
Using the Cross-Correlation Function to Evaluate the Quality of Eddy-Covariance Data
NASA Astrophysics Data System (ADS)
Qi, Yongfeng; Shang, Xiaodong; Chen, Guiying; Gao, Zhiqiu; Bi, Xueyan
2015-11-01
A cross-correlation test is proposed for evaluating the quality of 30-min eddy-covariance data. Cross-correlation as a function of time lag is computed for vertical velocity paired with temperature, humidity, and carbon dioxide concentration. High quality data have a dominant peak at zero time lag and approach zero within a time lag of 20 s. Poor quality data have erratic cross-correlation functions, which indicates that the eddy flux may no longer represent the energy and mass exchange between the atmospheric surface layer and the canopy, and such data should be rejected in post-data analyses. Eddy-covariance data over grassland in July 2004 are used to evaluate the proposed test. The results show that 17, 29, and 36 % of the available data should be rejected because of poor quality measurements of sensible heat, latent heat, and CO2 fluxes, respectively. The rejected data mainly occurred on calm nights and day/night transitions when the atmospheric surface layer became stable or neutrally stratified. We found no friction velocity (u_*) threshold below which all data should be rejected, a test that many other studies have implemented for rejecting questionable data. We instead found that some data with low u_* were reliable, whereas other data with higher u_* were not. The poor quality measurements collected under less than ideal conditions were replaced by using the mean diurnal variation gap-filling method. The correction for poor quality data shifted the daily average CO2 flux by +0.34 g C m^{-2} day^{-1}. After applying the quality-control test, the eddy CO2 fluxes did not display a clear dependence on u_*. The results suggest that the cross-correlation test is a potentially valuable step in evaluating the quality of eddy-covariance data.
Batson, Sarah; Sutton, Alex; Abrams, Keith
2016-01-01
Patients with atrial fibrillation are at a greater risk of stroke and therefore the main goal for treatment of patients with atrial fibrillation is to prevent stroke from occurring. There are a number of different stroke prevention treatments available to include warfarin and novel oral anticoagulants. Previous network meta-analyses of novel oral anticoagulants for stroke prevention in atrial fibrillation acknowledge the limitation of heterogeneity across the included trials but have not explored the impact of potentially important treatment modifying covariates. To explore potentially important treatment modifying covariates using network meta-regression analyses for stroke prevention in atrial fibrillation. We performed a network meta-analysis for the outcome of ischaemic stroke and conducted an exploratory regression analysis considering potentially important treatment modifying covariates. These covariates included the proportion of patients with a previous stroke, proportion of males, mean age, the duration of study follow-up and the patients underlying risk of ischaemic stroke. None of the covariates explored impacted relative treatment effects relative to placebo. Notably, the exploration of 'study follow-up' as a covariate supported the assumption that difference in trial durations is unimportant in this indication despite the variation across trials in the network. This study is limited by the quantity of data available. Further investigation is warranted, and, as justifying further trials may be difficult, it would be desirable to obtain individual patient level data (IPD) to facilitate an effort to relate treatment effects to IPD covariates in order to investigate heterogeneity. Observational data could also be examined to establish if there are potential trends elsewhere. The approach and methods presented have potentially wide applications within any indication as to highlight the potential benefit of extending decision problems to include additional comparators outside of those of primary interest to allow for the exploration of heterogeneity.
Investigation of the Capability of Compact Polarimetric SAR Interferometry to Estimate Forest Height
NASA Astrophysics Data System (ADS)
Zhang, Hong; Xie, Lei; Wang, Chao; Chen, Jiehong
2013-08-01
The main objective of this paper is to investigate the capability of compact Polarimetric SAR Interferometry (C-PolInSAR) on forest height estimation. For this, the pseudo fully polarimetric interferomteric (F-PolInSAR) covariance matrix is firstly reconstructed, then the three- stage inversion algorithm, hybrid algorithm, Music and Capon algorithm are applied to both C-PolInSAR covariance matrix and pseudo F-PolInSAR covariance matrix. The availability of forest height estimation is demonstrated using L-band data generated by simulator PolSARProSim and X-band airborne data acquired by East China Research Institute of Electronic Engineering, China Electronics Technology Group Corporation.
Communication: Three-fold covariance imaging of laser-induced Coulomb explosions
NASA Astrophysics Data System (ADS)
Pickering, James D.; Amini, Kasra; Brouard, Mark; Burt, Michael; Bush, Ian J.; Christensen, Lauge; Lauer, Alexandra; Nielsen, Jens H.; Slater, Craig S.; Stapelfeldt, Henrik
2016-04-01
We apply a three-fold covariance imaging method to analyse previously acquired data [C. S. Slater et al., Phys. Rev. A 89, 011401(R) (2014)] on the femtosecond laser-induced Coulomb explosion of spatially pre-aligned 3,5-dibromo-3',5'-difluoro-4'-cyanobiphenyl molecules. The data were acquired using the "Pixel Imaging Mass Spectrometry" camera. We show how three-fold covariance imaging of ionic photofragment recoil trajectories can be used to provide new information about the parent ion's molecular structure prior to its Coulomb explosion. In particular, we show how the analysis may be used to obtain information about molecular conformation and provide an alternative route for enantiomer determination.
Data Sharing and Scientific Impact in Eddy Covariance Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bond-Lamberty, B.
Do the benefits of data sharing outweigh its perceived costs? This is a critical question, and one with the potential to change culture and behavior. Dai et al. (2018) examine how data sharing is related to scientific impact in the field of eddy covariance (EC), and find that data sharers are disproportionately high-impact researchers, and vice versa; they also note strong regional differences in EC data sharing norms. The current policies and restrictions of EC journals and repositories are highly uneven. Incentivizing data sharing and enhancing computational reproducibility are critical next steps for EC, ecology, and science more broadly.
NASA Astrophysics Data System (ADS)
Soares, Edward J.; Gifford, Howard C.; Glick, Stephen J.
2003-05-01
We investigated the estimation of the ensemble channelized Hotelling observer (CHO) signal-to-noise ratio (SNR) for ordered-subset (OS) image reconstruction using noisy projection data. Previously, we computed the ensemble CHO SNR using a method for approximating the channelized covariance of OS reconstruction, which requires knowledge of the noise-free projection data. Here, we use a "plug-in" approach, in which noisy data is used in place of the noise-free data in the aforementioned channelized covariance approximation. Additionally, we evaluated the use of smoothing of the noisy projections before use in the covariance approximation. Additionally, we evaluated the use of smoothing of the noisy projections before use in the covariance calculation. The task was detection of a 10% contrast Gaussian signal within a slice of the MCAT phantom. Simulated projections of the MCAT phantom were scaled and Poisson noise was added to create 100 noisy signal-absent data sets. Simulated projections of the scaled signal were then added to the noisy background projections to create 100 noisy signal-present data set. These noisy data sets were then used to generate 100 estimates of the ensemble CHO SNR for reconstructions at various iterates. For comparison purposes, the same calculation was repeated with the noise-free data. The results, reported as plots of the average CHO SNR generated in this fashion, along with 95% confidence intervals, demonstrate that this approach works very well, and would allow optimization of imaging systems and reconstruction methods using a more accurate object model (i.e., real patient data).
Plis, Sergey M; George, J S; Jun, S C; Paré-Blagoev, J; Ranken, D M; Wood, C C; Schmidt, D M
2007-01-01
We propose a new model to approximate spatiotemporal noise covariance for use in neural electromagnetic source analysis, which better captures temporal variability in background activity. As with other existing formalisms, our model employs a Kronecker product of matrices representing temporal and spatial covariance. In our model, spatial components are allowed to have differing temporal covariances. Variability is represented as a series of Kronecker products of spatial component covariances and corresponding temporal covariances. Unlike previous attempts to model covariance through a sum of Kronecker products, our model is designed to have a computationally manageable inverse. Despite increased descriptive power, inversion of the model is fast, making it useful in source analysis. We have explored two versions of the model. One is estimated based on the assumption that spatial components of background noise have uncorrelated time courses. Another version, which gives closer approximation, is based on the assumption that time courses are statistically independent. The accuracy of the structural approximation is compared to an existing model, based on a single Kronecker product, using both Frobenius norm of the difference between spatiotemporal sample covariance and a model, and scatter plots. Performance of ours and previous models is compared in source analysis of a large number of single dipole problems with simulated time courses and with background from authentic magnetoencephalography data.
Model Comparison of Nonlinear Structural Equation Models with Fixed Covariates.
ERIC Educational Resources Information Center
Lee, Sik-Yum; Song, Xin-Yuan
2003-01-01
Proposed a new nonlinear structural equation model with fixed covariates to deal with some complicated substantive theory and developed a Bayesian path sampling procedure for model comparison. Illustrated the approach with an illustrative example using data from an international study. (SLD)
Population Pharmacokinetics of Intranasal Scopolamine
NASA Technical Reports Server (NTRS)
Wu, L.; Chow, D. S. L.; Putcha, L.
2013-01-01
Introduction: An intranasal gel dosage formulation of scopolamine (INSCOP) was developed for the treatment of Space Motion Sickness (SMS).The bioavailability and pharmacokinetics (PK) was evaluated using data collected in Phase II IND protocols. We reported earlier statistically significant gender differences in PK parameters of INSCOP at a dose level of 0.4 mg. To identify covariates that influence PK parameters of INSCOP, we examined population covariates of INSCOP PK model for 0.4 mg dose. Methods: Plasma scopolamine concentrations versus time data were collected from 20 normal healthy human subjects (11 male/9 female) after a 0.4 mg dose. Phoenix NLME was employed for PK analysis of these data using gender, body weight and age as covariates for model selection. Model selection was based on a likelihood ratio test on the difference of criteria (-2LL). Statistical significance for base model building and individual covariate analysis was set at P less than 0.05{delta(-2LL)=3.84}. Results: A one-compartment pharmacokinetic model with first-order elimination best described INSCOP concentration ]time profiles. Inclusion of gender, body weight and age as covariates individually significantly reduced -2LL by the cut-off value of 3.84(P less than 0.05) when tested against the base model. After the forward stepwise selection and backward elimination steps, gender was selected to add to the final model which had significant influence on absorption rate constant (ka) and the volume of distribution (V) of INSCOP. Conclusion: A population pharmacokinetic model for INSCOP has been identified and gender was a significant contributing covariate for the final model. The volume of distribution and Ka were significantly higher in males than in females which confirm gender-dependent pharmacokinetics of scopolamine after administration of a 0.4 mg dose.
Using Covariance Matrix for Change Detection of Polarimetric SAR Data
NASA Astrophysics Data System (ADS)
Esmaeilzade, M.; Jahani, F.; Amini, J.
2017-09-01
Nowadays change detection is an important role in civil and military fields. The Synthetic Aperture Radar (SAR) images due to its independent of atmospheric conditions and cloud cover, have attracted much attention in the change detection applications. When the SAR data are used, one of the appropriate ways to display the backscattered signal is using covariance matrix that follows the Wishart distribution. Based on this distribution a statistical test for equality of two complex variance-covariance matrices can be used. In this study, two full polarization data in band L from UAVSAR are used for change detection in agricultural fields and urban areas in the region of United States which the first image belong to 2014 and the second one is from 2017. To investigate the effect of polarization on the rate of change, full polarization data and dual polarization data were used and the results were compared. According to the results, full polarization shows more changes than dual polarization.
Cheng, Dunlei; Branscum, Adam J; Stamey, James D
2010-07-01
To quantify the impact of ignoring misclassification of a response variable and measurement error in a covariate on statistical power, and to develop software for sample size and power analysis that accounts for these flaws in epidemiologic data. A Monte Carlo simulation-based procedure is developed to illustrate the differences in design requirements and inferences between analytic methods that properly account for misclassification and measurement error to those that do not in regression models for cross-sectional and cohort data. We found that failure to account for these flaws in epidemiologic data can lead to a substantial reduction in statistical power, over 25% in some cases. The proposed method substantially reduced bias by up to a ten-fold margin compared to naive estimates obtained by ignoring misclassification and mismeasurement. We recommend as routine practice that researchers account for errors in measurement of both response and covariate data when determining sample size, performing power calculations, or analyzing data from epidemiological studies. 2010 Elsevier Inc. All rights reserved.
Venter, Anre; Maxwell, Scott E; Bolig, Erika
2002-06-01
Adding a pretest as a covariate to a randomized posttest-only design increases statistical power, as does the addition of intermediate time points to a randomized pretest-posttest design. Although typically 5 waves of data are required in this instance to produce meaningful gains in power, a 3-wave intensive design allows the evaluation of the straight-line growth model and may reduce the effect of missing data. The authors identify the statistically most powerful method of data analysis in the 3-wave intensive design. If straight-line growth is assumed, the pretest-posttest slope must assume fairly extreme values for the intermediate time point to increase power beyond the standard analysis of covariance on the posttest with the pretest as covariate, ignoring the intermediate time point.
Sternberg, Maya R; Schleicher, Rosemary L; Pfeiffer, Christine M
2013-06-01
The collection of articles in this supplement issue provides insight into the association of various covariates with concentrations of biochemical indicators of diet and nutrition (biomarkers), beyond age, race, and sex, using linear regression. We studied 10 specific sociodemographic and lifestyle covariates in combination with 29 biomarkers from NHANES 2003-2006 for persons aged ≥ 20 y. The covariates were organized into 2 sets or "chunks": sociodemographic (age, sex, race-ethnicity, education, and income) and lifestyle (dietary supplement use, smoking, alcohol consumption, BMI, and physical activity) and fit in hierarchical fashion by using each category or set of related variables to determine how covariates, jointly, are related to biomarker concentrations. In contrast to many regression modeling applications, all variables were retained in a full regression model regardless of significance to preserve the interpretation of the statistical properties of β coefficients, P values, and CIs and to keep the interpretation consistent across a set of biomarkers. The variables were preselected before data analysis, and the data analysis plan was designed at the outset to minimize the reporting of false-positive findings by limiting the amount of preliminary hypothesis testing. Although we generally found that demographic differences seen in biomarkers were over- or underestimated when ignoring other key covariates, the demographic differences generally remained significant after adjusting for sociodemographic and lifestyle variables. These articles are intended to provide a foundation to researchers to help them generate hypotheses for future studies or data analyses and/or develop predictive regression models using the wealth of NHANES data.
Coincidence and covariance data acquisition in photoelectron and -ion spectroscopy. I. Formal theory
NASA Astrophysics Data System (ADS)
Mikosch, Jochen; Patchkovskii, Serguei
2013-10-01
We derive a formal theory of noisy Poisson processes with multiple outcomes. We obtain simple, compact expressions for the probability distribution function of arbitrarily complex composite events and its moments. We illustrate the utility of the theory by analyzing properties of coincidence and covariance photoelectron-photoion detection involving single-ionization events. The results and techniques introduced in this work are directly applicable to more general coincidence and covariance experiments, including multiple ionization and multiple-ion fragmentation pathways.
Revised error propagation of 40Ar/39Ar data, including covariances
NASA Astrophysics Data System (ADS)
Vermeesch, Pieter
2015-12-01
The main advantage of the 40Ar/39Ar method over conventional K-Ar dating is that it does not depend on any absolute abundance or concentration measurements, but only uses the relative ratios between five isotopes of the same element -argon- which can be measured with great precision on a noble gas mass spectrometer. The relative abundances of the argon isotopes are subject to a constant sum constraint, which imposes a covariant structure on the data: the relative amount of any of the five isotopes can always be obtained from that of the other four. Thus, the 40Ar/39Ar method is a classic example of a 'compositional data problem'. In addition to the constant sum constraint, covariances are introduced by a host of other processes, including data acquisition, blank correction, detector calibration, mass fractionation, decay correction, interference correction, atmospheric argon correction, interpolation of the irradiation parameter, and age calculation. The myriad of correlated errors arising during the data reduction are best handled by casting the 40Ar/39Ar data reduction protocol in a matrix form. The completely revised workflow presented in this paper is implemented in a new software platform, Ar-Ar_Redux, which takes raw mass spectrometer data as input and generates accurate 40Ar/39Ar ages and their (co-)variances as output. Ar-Ar_Redux accounts for all sources of analytical uncertainty, including those associated with decay constants and the air ratio. Knowing the covariance matrix of the ages removes the need to consider 'internal' and 'external' uncertainties separately when calculating (weighted) mean ages. Ar-Ar_Redux is built on the same principles as its sibling program in the U-Pb community (U-Pb_Redux), thus improving the intercomparability of the two methods with tangible benefits to the accuracy of the geologic time scale. The program can be downloaded free of charge from http://redux.london-geochron.com.
Tangen, C M; Koch, G G
1999-03-01
In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
Treatment of Nuclear Data Covariance Information in Sample Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, Laura Painton; Adams, Brian M.; Wieselquist, William
This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on developing a sampling capability that can handle the challenges of generating samples from nuclear cross-section data. The covariance information between energy groups tends to be very ill-conditioned and thus poses a problem using traditional methods for generated correlated samples. This report outlines a method that addresses the sample generation from cross-section matrices.
Doubly robust matching estimators for high dimensional confounding adjustment.
Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan; Agniel, Denis
2018-05-11
Valid estimation of treatment effects from observational data requires proper control of confounding. If the number of covariates is large relative to the number of observations, then controlling for all available covariates is infeasible. In cases where a sparsity condition holds, variable selection or penalization can reduce the dimension of the covariate space in a manner that allows for valid estimation of treatment effects. In this article, we propose matching on both the estimated propensity score and the estimated prognostic scores when the number of covariates is large relative to the number of observations. We derive asymptotic results for the matching estimator and show that it is doubly robust in the sense that only one of the two score models need be correct to obtain a consistent estimator. We show via simulation its effectiveness in controlling for confounding and highlight its potential to address nonlinear confounding. Finally, we apply the proposed procedure to analyze the effect of gender on prescription opioid use using insurance claims data. © 2018, The International Biometric Society.
Multiple imputation of missing covariates for the Cox proportional hazards cure model
Beesley, Lauren J; Bartlett, Jonathan W; Wolf, Gregory T; Taylor, Jeremy M G
2016-01-01
We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects “cured,” and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification (FCS). We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. PMID:27439726
A New Approach to Extract Forest Water Use Efficiency from Eddy Covariance Data
NASA Astrophysics Data System (ADS)
Scanlon, T. M.; Sulman, B. N.
2016-12-01
Determination of forest water use efficiency (WUE) from eddy covariance data typically involves the following steps: (a) estimating gross primary productivity (GPP) from direct measurements of net ecosystem exchange (NEE) by extrapolating nighttime ecosystem respiration (ER) to daytime conditions, and (b) assuming direct evaporation (E) is minimal several days after rainfall, meaning that direct measurements of evapotranspiration (ET) are identical to transpiration (T). Both of these steps could lead to errors in the estimation of forest WUE. Here, we present a theoretical approach for estimating WUE through the analysis of standard eddy covariance data, which circumvents these steps. Only five statistics are needed from the high-frequency time series to extract WUE: CO2 flux, water vapor flux, standard deviation in CO2 concentration, standard deviation in water vapor concentration, and the correlation coefficient between CO2 and water vapor concentration for each half-hour period. The approach is based on the assumption that stomatal fluxes (i.e. photosynthesis and transpiration) lead to perfectly negative correlations and non-stomatal fluxes (i.e. ecosystem respiration and direct evaporation) lead to perfectly positive correlations within the CO2 and water vapor high frequency time series measured above forest canopies. A mathematical framework is presented, followed by a proof of concept using eddy covariance data and leaf-level measurements of WUE.
Reconstructing signals from noisy data with unknown signal and noise covariance.
Oppermann, Niels; Robbers, Georg; Ensslin, Torsten A
2011-10-01
We derive a method to reconstruct Gaussian signals from linear measurements with Gaussian noise. This new algorithm is intended for applications in astrophysics and other sciences. The starting point of our considerations is the principle of minimum Gibbs free energy, which was previously used to derive a signal reconstruction algorithm handling uncertainties in the signal covariance. We extend this algorithm to simultaneously uncertain noise and signal covariances using the same principles in the derivation. The resulting equations are general enough to be applied in many different contexts. We demonstrate the performance of the algorithm by applying it to specific example situations and compare it to algorithms not allowing for uncertainties in the noise covariance. The results show that the method we suggest performs very well under a variety of circumstances and is indeed qualitatively superior to the other methods in cases where uncertainty in the noise covariance is present.
NASA Technical Reports Server (NTRS)
Tangborn, Andrew; Auger, Ludovic
2003-01-01
A suboptimal Kalman filter system which evolves error covariances in terms of a truncated set of wavelet coefficients has been developed for the assimilation of chemical tracer observations of CH4. This scheme projects the discretized covariance propagation equations and covariance matrix onto an orthogonal set of compactly supported wavelets. Wavelet representation is localized in both location and scale, which allows for efficient representation of the inherently anisotropic structure of the error covariances. The truncation is carried out in such a way that the resolution of the error covariance is reduced only in the zonal direction, where gradients are smaller. Assimilation experiments which last 24 days, and used different degrees of truncation were carried out. These reduced the covariance size by 90, 97 and 99 % and the computational cost of covariance propagation by 80, 93 and 96 % respectively. The difference in both error covariance and the tracer field between the truncated and full systems over this period were found to be not growing in the first case, and growing relatively slowly in the later two cases. The largest errors in the tracer fields were found to occur in regions of largest zonal gradients in the constituent field. This results indicate that propagation of error covariances for a global two-dimensional data assimilation system are currently feasible. Recommendations for further reduction in computational cost are made with the goal of extending this technique to three-dimensional global assimilation systems.
Non-linear matter power spectrum covariance matrix errors and cosmological parameter uncertainties
NASA Astrophysics Data System (ADS)
Blot, L.; Corasaniti, P. S.; Amendola, L.; Kitching, T. D.
2016-06-01
The covariance of the matter power spectrum is a key element of the analysis of galaxy clustering data. Independent realizations of observational measurements can be used to sample the covariance, nevertheless statistical sampling errors will propagate into the cosmological parameter inference potentially limiting the capabilities of the upcoming generation of galaxy surveys. The impact of these errors as function of the number of realizations has been previously evaluated for Gaussian distributed data. However, non-linearities in the late-time clustering of matter cause departures from Gaussian statistics. Here, we address the impact of non-Gaussian errors on the sample covariance and precision matrix errors using a large ensemble of N-body simulations. In the range of modes where finite volume effects are negligible (0.1 ≲ k [h Mpc-1] ≲ 1.2), we find deviations of the variance of the sample covariance with respect to Gaussian predictions above ˜10 per cent at k > 0.3 h Mpc-1. Over the entire range these reduce to about ˜5 per cent for the precision matrix. Finally, we perform a Fisher analysis to estimate the effect of covariance errors on the cosmological parameter constraints. In particular, assuming Euclid-like survey characteristics we find that a number of independent realizations larger than 5000 is necessary to reduce the contribution of sampling errors to the cosmological parameter uncertainties at subpercent level. We also show that restricting the analysis to large scales k ≲ 0.2 h Mpc-1 results in a considerable loss in constraining power, while using the linear covariance to include smaller scales leads to an underestimation of the errors on the cosmological parameters.
Leffondré, Karen; Abrahamowicz, Michal; Siemiatycki, Jack
2003-12-30
Case-control studies are typically analysed using the conventional logistic model, which does not directly account for changes in the covariate values over time. Yet, many exposures may vary over time. The most natural alternative to handle such exposures would be to use the Cox model with time-dependent covariates. However, its application to case-control data opens the question of how to manipulate the risk sets. Through a simulation study, we investigate how the accuracy of the estimates of Cox's model depends on the operational definition of risk sets and/or on some aspects of the time-varying exposure. We also assess the estimates obtained from conventional logistic regression. The lifetime experience of a hypothetical population is first generated, and a matched case-control study is then simulated from this population. We control the frequency, the age at initiation, and the total duration of exposure, as well as the strengths of their effects. All models considered include a fixed-in-time covariate and one or two time-dependent covariate(s): the indicator of current exposure and/or the exposure duration. Simulation results show that none of the models always performs well. The discrepancies between the odds ratios yielded by logistic regression and the 'true' hazard ratio depend on both the type of the covariate and the strength of its effect. In addition, it seems that logistic regression has difficulty separating the effects of inter-correlated time-dependent covariates. By contrast, each of the two versions of Cox's model systematically induces either a serious under-estimation or a moderate over-estimation bias. The magnitude of the latter bias is proportional to the true effect, suggesting that an improved manipulation of the risk sets may eliminate, or at least reduce, the bias. Copyright 2003 JohnWiley & Sons, Ltd.
ORNL Resolved Resonance Covariance Generation for ENDF/B-VII.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leal, Luiz C.; Guber, Klaus H.; Wiarda, Dorothea
2012-12-01
Resonance-parameter covariance matrix (RPCM) evaluations in the resolved resonance regionwere done at the Oak Ridge National Laboratory (ORNL) for the chromium isotopes, titanium isotopes, 19F, 58Ni, 60Ni, 35Cl, 37Cl, 39K, 41K, 55Mn, 233U, 235U, 238U, and 239Pu using the computer code SAMMY. The retroactive approach of the code SAMMY was used to generate the RPCMs for 233U. For 235U, the approach used for covariance generation was similar to the retroactive approach with the distinction that real experimental data were used as opposed to data generated from the resonance parameters. RPCMs for 238U and 239Pu were generated together with the resonancemore » parameter evaluations. The RPCMs were then converted in the ENDF format using the FILE32 representation. Alternatively, for computer storage reasons, the FILE32 was converted in the FILE33 cross section covariance matrix (CSCM). Both representations were processed using the computer code PUFF-IV. This paper describes the procedures used to generate the RPCM and CSCM in the resonance region for ENDF/B-VII.1. The impact of data uncertainty in nuclear reactor benchmark calculations is also presented.« less
Do current cosmological observations rule out all covariant Galileons?
NASA Astrophysics Data System (ADS)
Peirone, Simone; Frusciante, Noemi; Hu, Bin; Raveri, Marco; Silvestri, Alessandra
2018-03-01
We revisit the cosmology of covariant Galileon gravity in view of the most recent cosmological data sets, including weak lensing. As a higher derivative theory, covariant Galileon models do not have a Λ CDM limit and predict a very different structure formation pattern compared with the standard Λ CDM scenario. Previous cosmological analyses suggest that this model is marginally disfavored, yet cannot be completely ruled out. In this work we use a more recent and extended combination of data, and we allow for more freedom in the cosmology, by including a massive neutrino sector with three different mass hierarchies. We use the Planck measurements of cosmic microwave background temperature and polarization; baryonic acoustic oscillations measurements by BOSS DR12; local measurements of H0; the joint light-curve analysis supernovae sample; and, for the first time, weak gravitational lensing from the KiDS Collaboration. We find, that in order to provide a reasonable fit, a nonzero neutrino mass is indeed necessary, but we do not report any sizable difference among the three neutrino hierarchies. Finally, the comparison of the Bayesian evidence to the Λ CDM one shows that in all the cases considered, covariant Galileon models are statistically ruled out by cosmological data.
Defining habitat covariates in camera-trap based occupancy studies
Niedballa, Jürgen; Sollmann, Rahel; Mohamed, Azlan bin; Bender, Johannes; Wilting, Andreas
2015-01-01
In species-habitat association studies, both the type and spatial scale of habitat covariates need to match the ecology of the focal species. We assessed the potential of high-resolution satellite imagery for generating habitat covariates using camera-trapping data from Sabah, Malaysian Borneo, within an occupancy framework. We tested the predictive power of covariates generated from satellite imagery at different resolutions and extents (focal patch sizes, 10–500 m around sample points) on estimates of occupancy patterns of six small to medium sized mammal species/species groups. High-resolution land cover information had considerably more model support for small, patchily distributed habitat features, whereas it had no advantage for large, homogeneous habitat features. A comparison of different focal patch sizes including remote sensing data and an in-situ measure showed that patches with a 50-m radius had most support for the target species. Thus, high-resolution satellite imagery proved to be particularly useful in heterogeneous landscapes, and can be used as a surrogate for certain in-situ measures, reducing field effort in logistically challenging environments. Additionally, remote sensed data provide more flexibility in defining appropriate spatial scales, which we show to impact estimates of wildlife-habitat associations. PMID:26596779
NASA Technical Reports Server (NTRS)
Murray, C. W., Jr.; Mueller, J. L.; Zwally, H. J.
1984-01-01
A field of measured anomalies of some physical variable relative to their time averages, is partitioned in either the space domain or the time domain. Eigenvectors and corresponding principal components of the smaller dimensioned covariance matrices associated with the partitioned data sets are calculated independently, then joined to approximate the eigenstructure of the larger covariance matrix associated with the unpartitioned data set. The accuracy of the approximation (fraction of the total variance in the field) and the magnitudes of the largest eigenvalues from the partitioned covariance matrices together determine the number of local EOF's and principal components to be joined by any particular level. The space-time distribution of Nimbus-5 ESMR sea ice measurement is analyzed.
An alternative covariance estimator to investigate genetic heterogeneity in populations
USDA-ARS?s Scientific Manuscript database
Genomic predictions and GWAS have used mixed models for identification of associations and trait predictions. In both cases, the covariance between individuals for performance is estimated using molecular markers. Mixed model properties indicate that the use of the data for prediction is optimal if ...
WAIS-IV subtest covariance structure: conceptual and statistical considerations.
Ward, L Charles; Bergman, Maria A; Hebert, Katina R
2012-06-01
D. Wechsler (2008b) reported confirmatory factor analyses (CFAs) with standardization data (ages 16-69 years) for 10 core and 5 supplemental subtests from the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV). Analyses of the 15 subtests supported 4 hypothesized oblique factors (Verbal Comprehension, Working Memory, Perceptual Reasoning, and Processing Speed) but also revealed unexplained covariance between Block Design and Visual Puzzles (Perceptual Reasoning subtests). That covariance was not included in the final models. Instead, a path was added from Working Memory to Figure Weights (Perceptual Reasoning subtest) to improve fit and achieve a desired factor pattern. The present research with the same data (N = 1,800) showed that the path from Working Memory to Figure Weights increases the association between Working Memory and Matrix Reasoning. Specifying both paths improves model fit and largely eliminates unexplained covariance between Block Design and Visual Puzzles but with the undesirable consequence that Figure Weights and Matrix Reasoning are equally determined by Perceptual Reasoning and Working Memory. An alternative 4-factor model was proposed that explained theory-implied covariance between Block Design and Visual Puzzles and between Arithmetic and Figure Weights while maintaining compatibility with WAIS-IV Index structure. The proposed model compared favorably with a 5-factor model based on Cattell-Horn-Carroll theory. The present findings emphasize that covariance model comparisons should involve considerations of conceptual coherence and theoretical adherence in addition to statistical fit. (c) 2012 APA, all rights reserved
Triantafyllou, Christina; Polimeni, Jonathan R; Keil, Boris; Wald, Lawrence L
2016-12-01
Physiological nuisance fluctuations ("physiological noise") are a major contribution to the time-series signal-to-noise ratio (tSNR) of functional imaging. While thermal noise correlations between array coil elements have a well-characterized effect on the image Signal to Noise Ratio (SNR 0 ), the element-to-element covariance matrix of the time-series fluctuations has not yet been analyzed. We examine this effect with a goal of ultimately improving the combination of multichannel array data. We extend the theoretical relationship between tSNR and SNR 0 to include a time-series noise covariance matrix Ψ t , distinct from the thermal noise covariance matrix Ψ 0 , and compare its structure to Ψ 0 and the signal coupling matrix SS H formed from the signal intensity vectors S. Inclusion of the measured time-series noise covariance matrix into the model relating tSNR and SNR 0 improves the fit of experimental multichannel data and is shown to be distinct from Ψ 0 or SS H . Time-series noise covariances in array coils are found to differ from Ψ 0 and more surprisingly, from the signal coupling matrix SS H . Correct characterization of the time-series noise has implications for the analysis of time-series data and for improving the coil element combination process. Magn Reson Med 76:1708-1719, 2016. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Bansal, Ravi; Hao, Xuejun; Peterson, Bradley S
2015-05-01
We hypothesize that coordinated functional activity within discrete neural circuits induces morphological organization and plasticity within those circuits. Identifying regions of morphological covariation that are independent of morphological covariation in other regions therefore may therefore allow us to identify discrete neural systems within the brain. Comparing the magnitude of these variations in individuals who have psychiatric disorders with the magnitude of variations in healthy controls may allow us to identify aberrant neural pathways in psychiatric illnesses. We measured surface morphological features by applying nonlinear, high-dimensional warping algorithms to manually defined brain regions. We transferred those measures onto the surface of a unit sphere via conformal mapping and then used spherical wavelets and their scaling coefficients to simplify the data structure representing these surface morphological features of each brain region. We used principal component analysis (PCA) to calculate covariation in these morphological measures, as represented by their scaling coefficients, across several brain regions. We then assessed whether brain subregions that covaried in morphology, as identified by large eigenvalues in the PCA, identified specific neural pathways of the brain. To do so, we spatially registered the subnuclei for each eigenvector into the coordinate space of a Diffusion Tensor Imaging dataset; we used these subnuclei as seed regions to track and compare fiber pathways with known fiber pathways identified in neuroanatomical atlases. We applied these procedures to anatomical MRI data in a cohort of 82 healthy participants (42 children, 18 males, age 10.5 ± 2.43 years; 40 adults, 22 males, age 32.42 ± 10.7 years) and 107 participants with Tourette's Syndrome (TS) (71 children, 59 males, age 11.19 ± 2.2 years; 36 adults, 21 males, age 37.34 ± 10.9 years). We evaluated the construct validity of the identified covariation in morphology using DTI data from a different set of 20 healthy adults (10 males, mean age 29.7 ± 7.7 years). The PCA identified portions of structures that covaried across the brain, the eigenvalues measuring the magnitude of the covariation in morphology along the respective eigenvectors. Our results showed that the eigenvectors, and the DTI fibers tracked from their associated brain regions, corresponded with known neural pathways in the brain. In addition, the eigenvectors that captured morphological covariation across regions, and the principal components along those eigenvectors, identified neural pathways with aberrant morphological features associated with TS. These findings suggest that covariations in brain morphology can identify aberrant neural pathways in specific neuropsychiatric disorders. Copyright © 2015. Published by Elsevier Inc.
Assessing the Extent and Impact of Online Data Sharing in Eddy Covariance Flux Research
NASA Astrophysics Data System (ADS)
Dai, Sheng-Qi; Li, Hong; Xiong, Jun; Ma, Jun; Guo, Hai-Qiang; Xiao, Xiangming; Zhao, Bin
2018-01-01
Research data sharing is appealing for its potential benefits on sharers' scientific impact and is also advocated by various policies. How do scientific benefits and policies correlate with practical ecological data sharing? In this study, we investigated data-sharing practices in eddy covariance flux research as a typical case. First, we collected researchers' data-sharing information from major observation networks. Then, we downloaded bibliometric data from the Web of Science and evaluated scientific impact using LeaderRank, a synthetic algorithm that takes both citation and cooperation impacts into consideration. Our results demonstrated the following: (1) specific to eddy covariance flux research, 8% of researchers published information in public data portals, whereas 64% of researchers provided their available data online in a downloadable form; (2) regional differences in data sharing, publications, and observation networks existed; and (3) the data sharers in impact-ranked ecologists followed a long-tail distribution, which suggested that, although sharing data is not necessary for researchers to be influential, data sharers are more likely to be high-impact researchers. Differentiated policies should be proposed to encourage ecologists in the long tail of data sharers, and from regions with little tradition of data sharing, to embrace a more open model of science.
Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen
2014-01-01
Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study.
A class of covariate-dependent spatiotemporal covariance functions
Reich, Brian J; Eidsvik, Jo; Guindani, Michele; Nail, Amy J; Schmidt, Alexandra M.
2014-01-01
In geostatistics, it is common to model spatially distributed phenomena through an underlying stationary and isotropic spatial process. However, these assumptions are often untenable in practice because of the influence of local effects in the correlation structure. Therefore, it has been of prolonged interest in the literature to provide flexible and effective ways to model non-stationarity in the spatial effects. Arguably, due to the local nature of the problem, we might envision that the correlation structure would be highly dependent on local characteristics of the domain of study, namely the latitude, longitude and altitude of the observation sites, as well as other locally defined covariate information. In this work, we provide a flexible and computationally feasible way for allowing the correlation structure of the underlying processes to depend on local covariate information. We discuss the properties of the induced covariance functions and discuss methods to assess its dependence on local covariate information by means of a simulation study and the analysis of data observed at ozone-monitoring stations in the Southeast United States. PMID:24772199
Tremblay, Marlène; Crim, Stacy M; Cole, Dana J; Hoekstra, Robert M; Henao, Olga L; Döpfer, Dörte
2017-10-01
The Foodborne Diseases Active Surveillance Network (FoodNet) is currently using a negative binomial (NB) regression model to estimate temporal changes in the incidence of Campylobacter infection. FoodNet active surveillance in 483 counties collected data on 40,212 Campylobacter cases between years 2004 and 2011. We explored models that disaggregated these data to allow us to account for demographic, geographic, and seasonal factors when examining changes in incidence of Campylobacter infection. We hypothesized that modeling structural zeros and including demographic variables would increase the fit of FoodNet's Campylobacter incidence regression models. Five different models were compared: NB without demographic covariates, NB with demographic covariates, hurdle NB with covariates in the count component only, hurdle NB with covariates in both zero and count components, and zero-inflated NB with covariates in the count component only. Of the models evaluated, the nonzero-augmented NB model with demographic variables provided the best fit. Results suggest that even though zero inflation was not present at this level, individualizing the level of aggregation and using different model structures and predictors per site might be required to correctly distinguish between structural and observational zeros and account for risk factors that vary geographically.
On Muthen's Maximum Likelihood for Two-Level Covariance Structure Models
ERIC Educational Resources Information Center
Yuan, Ke-Hai; Hayashi, Kentaro
2005-01-01
Data in social and behavioral sciences are often hierarchically organized. Special statistical procedures that take into account the dependence of such observations have been developed. Among procedures for 2-level covariance structure analysis, Muthen's maximum likelihood (MUML) has the advantage of easier computation and faster convergence. When…
Interpreting Graphs: Students Developing an Understanding of Covariation
ERIC Educational Resources Information Center
Fitzallen, Noleine
2012-01-01
Students' development of an understanding of covariation was the focus of a research project that investigated the way in which 12 Year 5/6 students engaged with the learning environment afforded by the graphing software, "TinkerPlots." Using data generated from individual interviews, the results demonstrate that upper primary students…
On the Bias-Amplifying Effect of Near Instruments in Observational Studies
ERIC Educational Resources Information Center
Steiner, Peter M.; Kim, Yongnam
2014-01-01
In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
ERIC Educational Resources Information Center
Dolan, Conor V.; Molenaar, Peter C. M.
1994-01-01
In multigroup covariance structure analysis with structured means, the traditional latent selection model is formulated as a special case of phenotypic selection. Illustrations with real and simulated data demonstrate how one can test specific hypotheses concerning selection on latent variables. (SLD)
Rebar: Reinforcing a Matching Estimator with Predictions from High-Dimensional Covariates
ERIC Educational Resources Information Center
Sales, Adam C.; Hansen, Ben B.; Rowan, Brian
2018-01-01
In causal matching designs, some control subjects are often left unmatched, and some covariates are often left unmodeled. This article introduces "rebar," a method using high-dimensional modeling to incorporate these commonly discarded data without sacrificing the integrity of the matching design. After constructing a match, a researcher…
Observed Score Linear Equating with Covariates
ERIC Educational Resources Information Center
Branberg, Kenny; Wiberg, Marie
2011-01-01
This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e., background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e., bias, variance, and…
Homonuclear long-range correlation spectra from HMBC experiments by covariance processing.
Schoefberger, Wolfgang; Smrecki, Vilko; Vikić-Topić, Drazen; Müller, Norbert
2007-07-01
We present a new application of covariance nuclear magnetic resonance processing based on 1H--13C-HMBC experiments which provides an effective way for establishing indirect 1H--1H and 13C--13C nuclear spin connectivity at natural isotope abundance. The method, which identifies correlated spin networks in terms of covariance between one-dimensional traces from a single decoupled HMBC experiment, derives 13C--13C as well as 1H--1H spin connectivity maps from the two-dimensional frequency domain heteronuclear long-range correlation data matrix. The potential and limitations of this novel covariance NMR application are demonstrated on two compounds: eugenyl-beta-D-glucopyranoside and an emodin-derivative. Copyright (c) 2007 John Wiley & Sons, Ltd.
Bayesian probability of success for clinical trials using historical data
Ibrahim, Joseph G.; Chen, Ming-Hui; Lakshminarayanan, Mani; Liu, Guanghan F.; Heyse, Joseph F.
2015-01-01
Developing sophisticated statistical methods for go/no-go decisions is crucial for clinical trials, as planning phase III or phase IV trials is costly and time consuming. In this paper, we develop a novel Bayesian methodology for determining the probability of success of a treatment regimen on the basis of the current data of a given trial. We introduce a new criterion for calculating the probability of success that allows for inclusion of covariates as well as allowing for historical data based on the treatment regimen, and patient characteristics. A new class of prior distributions and covariate distributions is developed to achieve this goal. The methodology is quite general and can be used with univariate or multivariate continuous or discrete data, and it generalizes Chuang-Stein’s work. This methodology will be invaluable for informing the scientist on the likelihood of success of the compound, while including the information of covariates for patient characteristics in the trial population for planning future pre-market or post-market trials. PMID:25339499
Bayesian probability of success for clinical trials using historical data.
Ibrahim, Joseph G; Chen, Ming-Hui; Lakshminarayanan, Mani; Liu, Guanghan F; Heyse, Joseph F
2015-01-30
Developing sophisticated statistical methods for go/no-go decisions is crucial for clinical trials, as planning phase III or phase IV trials is costly and time consuming. In this paper, we develop a novel Bayesian methodology for determining the probability of success of a treatment regimen on the basis of the current data of a given trial. We introduce a new criterion for calculating the probability of success that allows for inclusion of covariates as well as allowing for historical data based on the treatment regimen, and patient characteristics. A new class of prior distributions and covariate distributions is developed to achieve this goal. The methodology is quite general and can be used with univariate or multivariate continuous or discrete data, and it generalizes Chuang-Stein's work. This methodology will be invaluable for informing the scientist on the likelihood of success of the compound, while including the information of covariates for patient characteristics in the trial population for planning future pre-market or post-market trials. Copyright © 2014 John Wiley & Sons, Ltd.
Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity.
Chang, Jinyuan; Zheng, Chao; Zhou, Wen-Xin; Zhou, Wen
2017-12-01
In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2017, The International Biometric Society.
NASA Technical Reports Server (NTRS)
Borovikov, Anna; Rienecker, Michele M.; Keppenne, Christian; Johnson, Gregory C.
2004-01-01
One of the most difficult aspects of ocean state estimation is the prescription of the model forecast error covariances. The paucity of ocean observations limits our ability to estimate the covariance structures from model-observation differences. In most practical applications, simple covariances are usually prescribed. Rarely are cross-covariances between different model variables used. Here a comparison is made between a univariate Optimal Interpolation (UOI) scheme and a multivariate OI algorithm (MvOI) in the assimilation of ocean temperature. In the UOI case only temperature is updated using a Gaussian covariance function and in the MvOI salinity, zonal and meridional velocities as well as temperature, are updated using an empirically estimated multivariate covariance matrix. Earlier studies have shown that a univariate OI has a detrimental effect on the salinity and velocity fields of the model. Apparently, in a sequential framework it is important to analyze temperature and salinity together. For the MvOI an estimation of the model error statistics is made by Monte-Carlo techniques from an ensemble of model integrations. An important advantage of using an ensemble of ocean states is that it provides a natural way to estimate cross-covariances between the fields of different physical variables constituting the model state vector, at the same time incorporating the model's dynamical and thermodynamical constraints as well as the effects of physical boundaries. Only temperature observations from the Tropical Atmosphere-Ocean array have been assimilated in this study. In order to investigate the efficacy of the multivariate scheme two data assimilation experiments are validated with a large independent set of recently published subsurface observations of salinity, zonal velocity and temperature. For reference, a third control run with no data assimilation is used to check how the data assimilation affects systematic model errors. While the performance of the UOI and MvOI is similar with respect to the temperature field, the salinity and velocity fields are greatly improved when multivariate correction is used, as evident from the analyses of the rms differences of these fields and independent observations. The MvOI assimilation is found to improve upon the control run in generating the water masses with properties close to the observed, while the UOI failed to maintain the temperature and salinity structure.
Sparse Covariance Matrix Estimation With Eigenvalue Constraints
LIU, Han; WANG, Lie; ZHAO, Tuo
2014-01-01
We propose a new approach for estimating high-dimensional, positive-definite covariance matrices. Our method extends the generalized thresholding operator by adding an explicit eigenvalue constraint. The estimated covariance matrix simultaneously achieves sparsity and positive definiteness. The estimator is rate optimal in the minimax sense and we develop an efficient iterative soft-thresholding and projection algorithm based on the alternating direction method of multipliers. Empirically, we conduct thorough numerical experiments on simulated datasets as well as real data examples to illustrate the usefulness of our method. Supplementary materials for the article are available online. PMID:25620866
Modeling Nonstationarity in Space and Time
2017-01-01
Summary We propose to model a spatio-temporal random field that has nonstationary covariance structure in both space and time domains by applying the concept of the dimension expansion method in Bornn et al. (2012). Simulations are conducted for both separable and nonseparable space-time covariance models, and the model is also illustrated with a streamflow dataset. Both simulation and data analyses show that modeling nonstationarity in both space and time can improve the predictive performance over stationary covariance models or models that are nonstationary in space but stationary in time. PMID:28134977
Modeling nonstationarity in space and time.
Shand, Lyndsay; Li, Bo
2017-09-01
We propose to model a spatio-temporal random field that has nonstationary covariance structure in both space and time domains by applying the concept of the dimension expansion method in Bornn et al. (2012). Simulations are conducted for both separable and nonseparable space-time covariance models, and the model is also illustrated with a streamflow dataset. Both simulation and data analyses show that modeling nonstationarity in both space and time can improve the predictive performance over stationary covariance models or models that are nonstationary in space but stationary in time. © 2017, The International Biometric Society.
Bayesian operational modal analysis with asynchronous data, Part II: Posterior uncertainty
NASA Astrophysics Data System (ADS)
Zhu, Yi-Chen; Au, Siu-Kui
2018-01-01
A Bayesian modal identification method has been proposed in the companion paper that allows the most probable values of modal parameters to be determined using asynchronous ambient vibration data. This paper investigates the identification uncertainty of modal parameters in terms of their posterior covariance matrix. Computational issues are addressed. Analytical expressions are derived to allow the posterior covariance matrix to be evaluated accurately and efficiently. Synthetic, laboratory and field data examples are presented to verify the consistency, investigate potential modelling error and demonstrate practical applications.
Alterations in Anatomical Covariance in the Prematurely Born
Scheinost, Dustin; Kwon, Soo Hyun; Lacadie, Cheryl; Vohr, Betty R.; Schneider, Karen C.; Papademetris, Xenophon; Constable, R. Todd; Ment, Laura R.
2017-01-01
Abstract Preterm (PT) birth results in long-term alterations in functional and structural connectivity, but the related changes in anatomical covariance are just beginning to be explored. To test the hypothesis that PT birth alters patterns of anatomical covariance, we investigated brain volumes of 25 PTs and 22 terms at young adulthood using magnetic resonance imaging. Using regional volumetrics, seed-based analyses, and whole brain graphs, we show that PT birth is associated with reduced volume in bilateral temporal and inferior frontal lobes, left caudate, left fusiform, and posterior cingulate for prematurely born subjects at young adulthood. Seed-based analyses demonstrate altered patterns of anatomical covariance for PTs compared with terms. PTs exhibit reduced covariance with R Brodmann area (BA) 47, Broca's area, and L BA 21, Wernicke's area, and white matter volume in the left prefrontal lobe, but increased covariance with R BA 47 and left cerebellum. Graph theory analyses demonstrate that measures of network complexity are significantly less robust in PTs compared with term controls. Volumes in regions showing group differences are significantly correlated with phonological awareness, the fundamental basis for reading acquisition, for the PTs. These data suggest both long-lasting and clinically significant alterations in the covariance in the PTs at young adulthood. PMID:26494796
An alternative covariance estimator to investigate genetic heterogeneity in populations.
Heslot, Nicolas; Jannink, Jean-Luc
2015-11-26
For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship. We propose an alternative covariance estimator named K-kernel, to account for potential genetic heterogeneity between populations that is characterized by a lack of genetic correlation, and to limit the information flow between a priori unknown populations in a trait-specific manner. This is similar to a multi-trait model and parameters are estimated by REML and, in extreme cases, it can allow for an independent genetic architecture between populations. As such, K-kernel is useful to study the problem of the design of training populations. K-kernel was compared to other covariance estimators or kernels to examine its fit to the data, cross-validated accuracy and suitability for GWAS on several datasets. It provides a significantly better fit to the data than the genomic best linear unbiased prediction model and, in some cases it performs better than other kernels such as the Gaussian kernel, as shown by an empirical null distribution. In GWAS simulations, alternative kernels control type I errors as well as or better than the classical whole-genome kinship and increase statistical power. No or small gains were observed in cross-validated prediction accuracy. This alternative covariance estimator can be used to gain insight into trait-specific genetic heterogeneity by identifying relevant sub-populations that lack genetic correlation between them. Genetic correlation can be 0 between identified sub-populations by performing automatic selection of relevant sets of individuals to be included in the training population. It may also increase statistical power in GWAS.
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is fundamentally different than other approaches to data fusion for remote sensing data because it is inferential rather than merely descriptive. All approaches combine data in a way that minimizes some specified loss function. Most of these are more or less ad hoc criteria based on what looks good to the eye, or some criteria that relate only to the data at hand.
Synergies in the space of control variables within the equilibrium-point hypothesis.
Ambike, S; Mattos, D; Zatsiorsky, V M; Latash, M L
2016-02-19
We use an approach rooted in the recent theory of synergies to analyze possible co-variation between two hypothetical control variables involved in finger force production based on the equilibrium-point (EP) hypothesis. These control variables are the referent coordinate (R) and apparent stiffness (C) of the finger. We tested a hypothesis that inter-trial co-variation in the {R; C} space during repeated, accurate force production trials stabilizes the fingertip force. This was expected to correspond to a relatively low amount of inter-trial variability affecting force and a high amount of variability keeping the force unchanged. We used the "inverse piano" apparatus to apply small and smooth positional perturbations to fingers during force production tasks. Across trials, R and C showed strong co-variation with the data points lying close to a hyperbolic curve. Hyperbolic regressions accounted for over 99% of the variance in the {R; C} space. Another analysis was conducted by randomizing the original {R; C} data sets and creating surrogate data sets that were then used to compute predicted force values. The surrogate sets always showed much higher force variance compared to the actual data, thus reinforcing the conclusion that finger force control was organized in the {R; C} space, as predicted by the EP hypothesis, and involved co-variation in that space stabilizing total force. Copyright © 2015 IBRO. Published by Elsevier Ltd. All rights reserved.
Synergies in the space of control variables within the equilibrium-point hypothesis
Ambike, Satyajit; Mattos, Daniela; Zatsiorsky, Vladimir M.; Latash, Mark L.
2015-01-01
We use an approach rooted in the recent theory of synergies to analyze possible co-variation between two hypothetical control variables involved in finger force production based in the equilibrium-point hypothesis. These control variables are the referent coordinate (R) and apparent stiffness (C) of the finger. We tested a hypothesis that inter-trial co-variation in the {R; C} space during repeated, accurate force production trials stabilizes the fingertip force. This was expected to correspond to a relatively low amount of inter-trial variability affecting force and a high amount of variability keeping the force unchanged. We used the “inverse piano” apparatus to apply small and smooth positional perturbations to fingers during force production tasks. Across trials, R and C showed strong co-variation with the data points lying close to a hyperbolic curve. Hyperbolic regressions accounted for over 99% of the variance in the {R; C} space. Another analysis was conducted by randomizing the original {R; C} data sets and creating surrogate data sets that were then used to compute predicted force values. The surrogate sets always showed much higher force variance compared to the actual data, thus reinforcing the conclusion that finger force control was organized in the {R; C} space, as predicted by the equilibrium-point hypothesis, and involved co-variation in that space stabilizing total force. PMID:26701299
Yang, Yang; DeGruttola, Victor
2016-01-01
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584
Selecting a Separable Parametric Spatiotemporal Covariance Structure for Longitudinal Imaging Data
George, Brandon; Aban, Inmaculada
2014-01-01
Longitudinal imaging studies allow great insight into how the structure and function of a subject’s internal anatomy changes over time. Unfortunately, the analysis of longitudinal imaging data is complicated by inherent spatial and temporal correlation: the temporal from the repeated measures, and the spatial from the outcomes of interest being observed at multiple points in a patients body. We propose the use of a linear model with a separable parametric spatiotemporal error structure for the analysis of repeated imaging data. The model makes use of spatial (exponential, spherical, and Matérn) and temporal (compound symmetric, autoregressive-1, Toeplitz, and unstructured) parametric correlation functions. A simulation study, inspired by a longitudinal cardiac imaging study on mitral regurgitation patients, compared different information criteria for selecting a particular separable parametric spatiotemporal correlation structure as well as the effects on Type I and II error rates for inference on fixed effects when the specified model is incorrect. Information criteria were found to be highly accurate at choosing between separable parametric spatiotemporal correlation structures. Misspecification of the covariance structure was found to have the ability to inflate the Type I error or have an overly conservative test size, which corresponded to decreased power. An example with clinical data is given illustrating how the covariance structure procedure can be done in practice, as well as how covariance structure choice can change inferences about fixed effects. PMID:25293361
Yang, Yang; DeGruttola, Victor
2012-06-22
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.
Penalized gaussian process regression and classification for high-dimensional nonlinear data.
Yi, G; Shi, J Q; Choi, T
2011-12-01
The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported. © 2011, The International Biometric Society No claim to original US government works.
Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources
O’Brien, Liam M.; Fitzmaurice, Garrett M.
2006-01-01
We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which parsimonious models for the covariance can be obtained for longitudinal multiple source data. The methods are illustrated with an example of multiple informant data arising from a longitudinal interventional trial in psychiatry. PMID:15726666
Lebigre, Christophe; Arcese, Peter; Reid, Jane M
2013-07-01
Age-specific variances and covariances in reproductive success shape the total variance in lifetime reproductive success (LRS), age-specific opportunities for selection, and population demographic variance and effective size. Age-specific (co)variances in reproductive success achieved through different reproductive routes must therefore be quantified to predict population, phenotypic and evolutionary dynamics in age-structured populations. While numerous studies have quantified age-specific variation in mean reproductive success, age-specific variances and covariances in reproductive success, and the contributions of different reproductive routes to these (co)variances, have not been comprehensively quantified in natural populations. We applied 'additive' and 'independent' methods of variance decomposition to complete data describing apparent (social) and realised (genetic) age-specific reproductive success across 11 cohorts of socially monogamous but genetically polygynandrous song sparrows (Melospiza melodia). We thereby quantified age-specific (co)variances in male within-pair and extra-pair reproductive success (WPRS and EPRS) and the contributions of these (co)variances to the total variances in age-specific reproductive success and LRS. 'Additive' decomposition showed that within-age and among-age (co)variances in WPRS across males aged 2-4 years contributed most to the total variance in LRS. Age-specific (co)variances in EPRS contributed relatively little. However, extra-pair reproduction altered age-specific variances in reproductive success relative to the social mating system, and hence altered the relative contributions of age-specific reproductive success to the total variance in LRS. 'Independent' decomposition showed that the (co)variances in age-specific WPRS, EPRS and total reproductive success, and the resulting opportunities for selection, varied substantially across males that survived to each age. Furthermore, extra-pair reproduction increased the variance in age-specific reproductive success relative to the social mating system to a degree that increased across successive age classes. This comprehensive decomposition of the total variances in age-specific reproductive success and LRS into age-specific (co)variances attributable to two reproductive routes showed that within-age and among-age covariances contributed substantially to the total variance and that extra-pair reproduction can alter the (co)variance structure of age-specific reproductive success. Such covariances and impacts should consequently be integrated into theoretical assessments of demographic and evolutionary processes in age-structured populations. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.
The Impact of Conditional Scores on the Performance of DETECT.
ERIC Educational Resources Information Center
Zhang, Yanwei Oliver; Yu, Feng; Nandakumar, Ratna
DETECT is a nonparametric, conditional covariance-based procedure to identify dimensional structure and the degree of multidimensionality of test data. The ability composite or conditional score used to estimate conditional covariance plays a significant role in the performance of DETECT. The number correct score of all items in the test (T) and…
Properties of Endogenous Post-Stratified Estimation using remote sensing data
John Tipton; Jean Opsomer; Gretchen Moisen
2013-01-01
Post-stratification is commonly used to improve the precision of survey estimates. In traditional poststratification methods, the stratification variable must be known at the population level. When suitable covariates are available at the population level, an alternative approach consists of fitting a model on the covariates, making predictions for the population and...
ERIC Educational Resources Information Center
Bashkov, Bozhidar M.; Finney, Sara J.
2013-01-01
Traditional methods of assessing construct stability are reviewed and longitudinal mean and covariance structures (LMACS) analysis, a modern approach, is didactically illustrated using psychological entitlement data. Measurement invariance and latent variable stability results are interpreted, emphasizing substantive implications for educators and…
Wieder, Robert; Shafiq, Basit; Adam, Nabil
2016-01-01
BACKGROUND: African American race negatively impacts survival from localized breast cancer but co-variable factors confound the impact. METHODS: Data sets were analyzed from the Surveillance, Epidemiology and End Results (SEER) directories from 1973 to 2011 consisting of patients with designated diagnosis of breast adenocarcinoma, race as White or Caucasian, Black or African American, Asian, American Indian or Alaskan Native, Native Hawaiian or Pacific Islander, age, stage I, II or III, grade 1, 2 or 3, estrogen receptor or progesterone receptor positive or negative, marital status as single, married, separated, divorced or widowed and laterality as right or left. The Cox Proportional Hazards Regression model was used to determine hazard ratios for survival. Chi square test was applied to determine the interdependence of variables found significant in the multivariable Cox Proportional Hazards Regression analysis. Cells with stratified data of patients with identical characteristics except African American or Caucasian race were compared. RESULTS: Age, stage, grade, ER and PR status and marital status significantly co-varied with race and with each other. Stratifications by single co-variables demonstrated worse hazard ratios for survival for African Americans. Stratification by three and four co-variables demonstrated worse hazard ratios for survival for African Americans in most subgroupings with sufficient numbers of values. Differences in some subgroupings containing poor prognostic co-variables did not reach significance, suggesting that race effects may be partly overcome by additional poor prognostic indicators. CONCLUSIONS: African American race is a poor prognostic indicator for survival from breast cancer independent of 6 associated co-variables with prognostic significance. PMID:27698895
Covariance Between Genotypic Effects and its Use for Genomic Inference in Half-Sib Families
Wittenburg, Dörte; Teuscher, Friedrich; Klosa, Jan; Reinsch, Norbert
2016-01-01
In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10–22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes. PMID:27402363
Bayesian Covariate Selection in Mixed-Effects Models For Longitudinal Shape Analysis
Muralidharan, Prasanna; Fishbaugh, James; Kim, Eun Young; Johnson, Hans J.; Paulsen, Jane S.; Gerig, Guido; Fletcher, P. Thomas
2016-01-01
The goal of longitudinal shape analysis is to understand how anatomical shape changes over time, in response to biological processes, including growth, aging, or disease. In many imaging studies, it is also critical to understand how these shape changes are affected by other factors, such as sex, disease diagnosis, IQ, etc. Current approaches to longitudinal shape analysis have focused on modeling age-related shape changes, but have not included the ability to handle covariates. In this paper, we present a novel Bayesian mixed-effects shape model that incorporates simultaneous relationships between longitudinal shape data and multiple predictors or covariates to the model. Moreover, we place an Automatic Relevance Determination (ARD) prior on the parameters, that lets us automatically select which covariates are most relevant to the model based on observed data. We evaluate our proposed model and inference procedure on a longitudinal study of Huntington's disease from PREDICT-HD. We first show the utility of the ARD prior for model selection in a univariate modeling of striatal volume, and next we apply the full high-dimensional longitudinal shape model to putamen shapes. PMID:28090246
Generalized semiparametric varying-coefficient models for longitudinal data
NASA Astrophysics Data System (ADS)
Qi, Li
In this dissertation, we investigate the generalized semiparametric varying-coefficient models for longitudinal data that can flexibly model three types of covariate effects: time-constant effects, time-varying effects, and covariate-varying effects, i.e., the covariate effects that depend on other possibly time-dependent exposure variables. First, we consider the model that assumes the time-varying effects are unspecified functions of time while the covariate-varying effects are parametric functions of an exposure variable specified up to a finite number of unknown parameters. The estimation procedures are developed using multivariate local linear smoothing and generalized weighted least squares estimation techniques. The asymptotic properties of the proposed estimators are established. The simulation studies show that the proposed methods have satisfactory finite sample performance. ACTG 244 clinical trial of HIV infected patients are applied to examine the effects of antiretroviral treatment switching before and after HIV developing the 215-mutation. Our analysis shows benefit of treatment switching before developing the 215-mutation. The proposed methods are also applied to the STEP study with MITT cases showing that they have broad applications in medical research.
Quantification of Covariance in Tropical Cyclone Activity across Teleconnected Basins
NASA Astrophysics Data System (ADS)
Tolwinski-Ward, S. E.; Wang, D.
2015-12-01
Rigorous statistical quantification of natural hazard covariance across regions has important implications for risk management, and is also of fundamental scientific interest. We present a multivariate Bayesian Poisson regression model for inferring the covariance in tropical cyclone (TC) counts across multiple ocean basins and across Saffir-Simpson intensity categories. Such covariability results from the influence of large-scale modes of climate variability on local environments that can alternately suppress or enhance TC genesis and intensification, and our model also simultaneously quantifies the covariance of TC counts with various climatic modes in order to deduce the source of inter-basin TC covariability. The model explicitly treats the time-dependent uncertainty in observed maximum sustained wind data, and hence the nominal intensity category of each TC. Differences in annual TC counts as measured by different agencies are also formally addressed. The probabilistic output of the model can be probed for probabilistic answers to such questions as: - Does the relationship between different categories of TCs differ statistically by basin? - Which climatic predictors have significant relationships with TC activity in each basin? - Are the relationships between counts in different basins conditionally independent given the climatic predictors, or are there other factors at play affecting inter-basin covariability? - How can a portfolio of insured property be optimized across space to minimize risk? Although we present results of our model applied to TCs, the framework is generalizable to covariance estimation between multivariate counts of natural hazards across regions and/or across peril types.
Cox model with interval-censored covariate in cohort studies.
Ahn, Soohyun; Lim, Johan; Paik, Myunghee Cho; Sacco, Ralph L; Elkind, Mitchell S
2018-05-18
In cohort studies the outcome is often time to a particular event, and subjects are followed at regular intervals. Periodic visits may also monitor a secondary irreversible event influencing the event of primary interest, and a significant proportion of subjects develop the secondary event over the period of follow-up. The status of the secondary event serves as a time-varying covariate, but is recorded only at the times of the scheduled visits, generating incomplete time-varying covariates. While information on a typical time-varying covariate is missing for entire follow-up period except the visiting times, the status of the secondary event are unavailable only between visits where the status has changed, thus interval-censored. One may view interval-censored covariate of the secondary event status as missing time-varying covariates, yet missingness is partial since partial information is provided throughout the follow-up period. Current practice of using the latest observed status produces biased estimators, and the existing missing covariate techniques cannot accommodate the special feature of missingness due to interval censoring. To handle interval-censored covariates in the Cox proportional hazards model, we propose an available-data estimator, a doubly robust-type estimator as well as the maximum likelihood estimator via EM algorithm and present their asymptotic properties. We also present practical approaches that are valid. We demonstrate the proposed methods using our motivating example from the Northern Manhattan Study. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Li, Y.; Graubard, B. I.; Huang, P.; Gastwirth, J. L.
2015-01-01
Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many fields, including public health for medical treatment and prevention of disease. An observed difference in the mean outcome between an advantaged group (AG) and disadvantaged group (DG) can be due to differences in the distribution of relevant covariates. The Peters–Belson (PB) method fits a regression model with covariates to the AG to predict, for each DG member, their outcome measure as if they had been from the AG. The difference between the mean predicted and the mean observed outcomes of DG members is the (unexplained) disparity of interest. We focus on applying the PB method to estimate the disparity based on binary/multinomial/proportional odds logistic regression models using data collected from complex surveys with more than one DG. Estimators of the unexplained disparity, an analytic variance–covariance estimator that is based on the Taylor linearization variance–covariance estimation method, as well as a Wald test for testing a joint null hypothesis of zero for unexplained disparities between two or more minority groups and a majority group, are provided. Simulation studies with data selected from simple random sampling and cluster sampling, as well as the analyses of disparity in body mass index in the National Health and Nutrition Examination Survey 1999–2004, are conducted. Empirical results indicate that the Taylor linearization variance–covariance estimation is accurate and that the proposed Wald test maintains the nominal level. PMID:25382235
Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension
Wang, Lan; Wu, Yichao
2012-01-01
Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods. PMID:23082036
NASA Astrophysics Data System (ADS)
Langford, B.; Acton, W.; Ammann, C.; Valach, A.; Nemitz, E.
2015-10-01
All eddy-covariance flux measurements are associated with random uncertainties which are a combination of sampling error due to natural variability in turbulence and sensor noise. The former is the principal error for systems where the signal-to-noise ratio of the analyser is high, as is usually the case when measuring fluxes of heat, CO2 or H2O. Where signal is limited, which is often the case for measurements of other trace gases and aerosols, instrument uncertainties dominate. Here, we are applying a consistent approach based on auto- and cross-covariance functions to quantify the total random flux error and the random error due to instrument noise separately. As with previous approaches, the random error quantification assumes that the time lag between wind and concentration measurement is known. However, if combined with commonly used automated methods that identify the individual time lag by looking for the maximum in the cross-covariance function of the two entities, analyser noise additionally leads to a systematic bias in the fluxes. Combining data sets from several analysers and using simulations, we show that the method of time-lag determination becomes increasingly important as the magnitude of the instrument error approaches that of the sampling error. The flux bias can be particularly significant for disjunct data, whereas using a prescribed time lag eliminates these effects (provided the time lag does not fluctuate unduly over time). We also demonstrate that when sampling at higher elevations, where low frequency turbulence dominates and covariance peaks are broader, both the probability and magnitude of bias are magnified. We show that the statistical significance of noisy flux data can be increased (limit of detection can be decreased) by appropriate averaging of individual fluxes, but only if systematic biases are avoided by using a prescribed time lag. Finally, we make recommendations for the analysis and reporting of data with low signal-to-noise and their associated errors.
BORAH, BIJAN J.; BASU, ANIRBAN
2014-01-01
The quantile regression (QR) framework provides a pragmatic approach in understanding the differential impacts of covariates along the distribution of an outcome. However, the QR framework that has pervaded the applied economics literature is based on the conditional quantile regression method. It is used to assess the impact of a covariate on a quantile of the outcome conditional on specific values of other covariates. In most cases, conditional quantile regression may generate results that are often not generalizable or interpretable in a policy or population context. In contrast, the unconditional quantile regression method provides more interpretable results as it marginalizes the effect over the distributions of other covariates in the model. In this paper, the differences between these two regression frameworks are highlighted, both conceptually and econometrically. Additionally, using real-world claims data from a large US health insurer, alternative QR frameworks are implemented to assess the differential impacts of covariates along the distribution of medication adherence among elderly patients with Alzheimer’s disease. PMID:23616446
Semi-parametric regression model for survival data: graphical visualization with R
2016-01-01
Cox proportional hazards model is a semi-parametric model that leaves its baseline hazard function unspecified. The rationale to use Cox proportional hazards model is that (I) the underlying form of hazard function is stringent and unrealistic, and (II) researchers are only interested in estimation of how the hazard changes with covariate (relative hazard). Cox regression model can be easily fit with coxph() function in survival package. Stratified Cox model may be used for covariate that violates the proportional hazards assumption. The relative importance of covariates in population can be examined with the rankhazard package in R. Hazard ratio curves for continuous covariates can be visualized using smoothHR package. This curve helps to better understand the effects that each continuous covariate has on the outcome. Population attributable fraction is a classic quantity in epidemiology to evaluate the impact of risk factor on the occurrence of event in the population. In survival analysis, the adjusted/unadjusted attributable fraction can be plotted against survival time to obtain attributable fraction function. PMID:28090517
Efficient Storage Scheme of Covariance Matrix during Inverse Modeling
NASA Astrophysics Data System (ADS)
Mao, D.; Yeh, T. J.
2013-12-01
During stochastic inverse modeling, the covariance matrix of geostatistical based methods carries the information about the geologic structure. Its update during iterations reflects the decrease of uncertainty with the incorporation of observed data. For large scale problem, its storage and update cost too much memory and computational resources. In this study, we propose a new efficient storage scheme for storage and update. Compressed Sparse Column (CSC) format is utilized to storage the covariance matrix, and users can assign how many data they prefer to store based on correlation scales since the data beyond several correlation scales are usually not very informative for inverse modeling. After every iteration, only the diagonal terms of the covariance matrix are updated. The off diagonal terms are calculated and updated based on shortened correlation scales with a pre-assigned exponential model. The correlation scales are shortened by a coefficient, i.e. 0.95, every iteration to show the decrease of uncertainty. There is no universal coefficient for all the problems and users are encouraged to try several times. This new scheme is tested with 1D examples first. The estimated results and uncertainty are compared with the traditional full storage method. In the end, a large scale numerical model is utilized to validate this new scheme.
Baldocchi, Dennis
2014-12-01
The application of the eddy covariance flux method to measure fluxes of trace gas and energy between ecosystems and the atmosphere has exploded over the past 25 years. This opinion paper provides a perspective on the contributions and future opportunities of the eddy covariance method. First, the paper discusses the pros and cons of this method relative to other methods used to measure the exchange of trace gases between ecosystems and the atmosphere. Second, it discusses how the use of eddy covariance method has grown and evolved. Today, more than 400 flux measurement sites are operating world-wide and the duration of the time series exceed a decade at dozens of sites. Networks of tower sites now enable scientists to ask scientific questions related to climatic and ecological gradients, disturbance, changes in land use, and management. The paper ends with discussions on where the field of flux measurement is heading. Topics discussed include role of open access data sharing and data mining, in this new era of big data, and opportunities new sensors that measure a variety of trace gases, like volatile organic carbon compounds, methane and nitrous oxide, and aerosols, may yield. © 2014 John Wiley & Sons Ltd.
A Unimodal Model for Double Observer Distance Sampling Surveys.
Becker, Earl F; Christ, Aaron M
2015-01-01
Distance sampling is a widely used method to estimate animal population size. Most distance sampling models utilize a monotonically decreasing detection function such as a half-normal. Recent advances in distance sampling modeling allow for the incorporation of covariates into the distance model, and the elimination of the assumption of perfect detection at some fixed distance (usually the transect line) with the use of double-observer models. The assumption of full observer independence in the double-observer model is problematic, but can be addressed by using the point independence assumption which assumes there is one distance, the apex of the detection function, where the 2 observers are assumed independent. Aerially collected distance sampling data can have a unimodal shape and have been successfully modeled with a gamma detection function. Covariates in gamma detection models cause the apex of detection to shift depending upon covariate levels, making this model incompatible with the point independence assumption when using double-observer data. This paper reports a unimodal detection model based on a two-piece normal distribution that allows covariates, has only one apex, and is consistent with the point independence assumption when double-observer data are utilized. An aerial line-transect survey of black bears in Alaska illustrate how this method can be applied.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morin, T. H.; Bohrer, G.; Stefanik, K. C.
Methane (CH 4) emissions and carbon uptake in temperate freshwater wetlands act in opposing directions in the context of global radiative forcing. Large uncertainties exist for the rates of CH 4 emissions making it difficult to determine the extent that CH 4 emissions counteract the carbon sequestration of wetlands. Urban temperate wetlands are typically small and feature highly heterogeneous land cover, posing an additional challenge to determining their CH 4 budget. The data analysis approach we introduce here combines two different CH 4 flux measurement techniques to overcome scale and heterogeneity problems and determine the overall CH 4 budget ofmore » a small, heterogeneous, urban wetland landscape. Temporally intermittent point measurements from non-steady-state chambers provided information about patch-level heterogeneity of fluxes, while continuous, high temporal resolution flux measurements using the eddy-covariance (EC) technique provided information about the temporal dynamics of the fluxes. Patch-level scaling parameterization was developed from the chamber data to scale eddy covariance data to a ‘fixed-frame’, which corrects for variability in the spatial coverage of the eddy covariance observation footprint at any single point in time. Finally, by combining two measurement techniques at different scales, we addressed shortcomings of both techniques with respect to heterogeneous wetland sites.« less
Morin, T. H.; Bohrer, G.; Stefanik, K. C.; ...
2017-02-17
Methane (CH 4) emissions and carbon uptake in temperate freshwater wetlands act in opposing directions in the context of global radiative forcing. Large uncertainties exist for the rates of CH 4 emissions making it difficult to determine the extent that CH 4 emissions counteract the carbon sequestration of wetlands. Urban temperate wetlands are typically small and feature highly heterogeneous land cover, posing an additional challenge to determining their CH 4 budget. The data analysis approach we introduce here combines two different CH 4 flux measurement techniques to overcome scale and heterogeneity problems and determine the overall CH 4 budget ofmore » a small, heterogeneous, urban wetland landscape. Temporally intermittent point measurements from non-steady-state chambers provided information about patch-level heterogeneity of fluxes, while continuous, high temporal resolution flux measurements using the eddy-covariance (EC) technique provided information about the temporal dynamics of the fluxes. Patch-level scaling parameterization was developed from the chamber data to scale eddy covariance data to a ‘fixed-frame’, which corrects for variability in the spatial coverage of the eddy covariance observation footprint at any single point in time. Finally, by combining two measurement techniques at different scales, we addressed shortcomings of both techniques with respect to heterogeneous wetland sites.« less
NASA Astrophysics Data System (ADS)
Terranova, Nicholas; Serot, Olivier; Archier, Pascal; De Saint Jean, Cyrille; Sumini, Marco
2017-09-01
Fission product yields (FY) are fundamental nuclear data for several applications, including decay heat, shielding, dosimetry, burn-up calculations. To be safe and sustainable, modern and future nuclear systems require accurate knowledge on reactor parameters, with reduced margins of uncertainty. Present nuclear data libraries for FY do not provide consistent and complete uncertainty information which are limited, in many cases, to only variances. In the present work we propose a methodology to evaluate covariance matrices for thermal and fast neutron induced fission yields. The semi-empirical models adopted to evaluate the JEFF-3.1.1 FY library have been used in the Generalized Least Square Method available in CONRAD (COde for Nuclear Reaction Analysis and Data assimilation) to generate covariance matrices for several fissioning systems such as the thermal fission of U235, Pu239 and Pu241 and the fast fission of U238, Pu239 and Pu240. The impact of such covariances on nuclear applications has been estimated using deterministic and Monte Carlo uncertainty propagation techniques. We studied the effects on decay heat and reactivity loss uncertainty estimation for simplified test case geometries, such as PWR and SFR pin-cells. The impact on existing nuclear reactors, such as the Jules Horowitz Reactor under construction at CEA-Cadarache, has also been considered.
Hua, Hairui; Burke, Danielle L; Crowther, Michael J; Ensor, Joie; Tudur Smith, Catrin; Riley, Richard D
2017-02-28
Stratified medicine utilizes individual-level covariates that are associated with a differential treatment effect, also known as treatment-covariate interactions. When multiple trials are available, meta-analysis is used to help detect true treatment-covariate interactions by combining their data. Meta-regression of trial-level information is prone to low power and ecological bias, and therefore, individual participant data (IPD) meta-analyses are preferable to examine interactions utilizing individual-level information. However, one-stage IPD models are often wrongly specified, such that interactions are based on amalgamating within- and across-trial information. We compare, through simulations and an applied example, fixed-effect and random-effects models for a one-stage IPD meta-analysis of time-to-event data where the goal is to estimate a treatment-covariate interaction. We show that it is crucial to centre patient-level covariates by their mean value in each trial, in order to separate out within-trial and across-trial information. Otherwise, bias and coverage of interaction estimates may be adversely affected, leading to potentially erroneous conclusions driven by ecological bias. We revisit an IPD meta-analysis of five epilepsy trials and examine age as a treatment effect modifier. The interaction is -0.011 (95% CI: -0.019 to -0.003; p = 0.004), and thus highly significant, when amalgamating within-trial and across-trial information. However, when separating within-trial from across-trial information, the interaction is -0.007 (95% CI: -0.019 to 0.005; p = 0.22), and thus its magnitude and statistical significance are greatly reduced. We recommend that meta-analysts should only use within-trial information to examine individual predictors of treatment effect and that one-stage IPD models should separate within-trial from across-trial information to avoid ecological bias. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Data Fusion of Gridded Snow Products Enhanced with Terrain Covariates and a Simple Snow Model
NASA Astrophysics Data System (ADS)
Snauffer, A. M.; Hsieh, W. W.; Cannon, A. J.
2017-12-01
Hydrologic planning requires accurate estimates of regional snow water equivalent (SWE), particularly areas with hydrologic regimes dominated by spring melt. While numerous gridded data products provide such estimates, accurate representations are particularly challenging under conditions of mountainous terrain, heavy forest cover and large snow accumulations, contexts which in many ways define the province of British Columbia (BC), Canada. One promising avenue of improving SWE estimates is a data fusion approach which combines field observations with gridded SWE products and relevant covariates. A base artificial neural network (ANN) was constructed using three of the best performing gridded SWE products over BC (ERA-Interim/Land, MERRA and GLDAS-2) and simple location and time covariates. This base ANN was then enhanced to include terrain covariates (slope, aspect and Terrain Roughness Index, TRI) as well as a simple 1-layer energy balance snow model driven by gridded bias-corrected ANUSPLIN temperature and precipitation values. The ANN enhanced with all aforementioned covariates performed better than the base ANN, but most of the skill improvement was attributable to the snow model with very little contribution from the terrain covariates. The enhanced ANN improved station mean absolute error (MAE) by an average of 53% relative to the composing gridded products over the province. Interannual peak SWE correlation coefficient was found to be 0.78, an improvement of 0.05 to 0.18 over the composing products. This nonlinear approach outperformed a comparable multiple linear regression (MLR) model by 22% in MAE and 0.04 in interannual correlation. The enhanced ANN has also been shown to estimate better than the Variable Infiltration Capacity (VIC) hydrologic model calibrated and run for four BC watersheds, improving MAE by 22% and correlation by 0.05. The performance improvements of the enhanced ANN are statistically significant at the 5% level across the province and in four out of five physiographic regions.
On-line estimation of error covariance parameters for atmospheric data assimilation
NASA Technical Reports Server (NTRS)
Dee, Dick P.
1995-01-01
A simple scheme is presented for on-line estimation of covariance parameters in statistical data assimilation systems. The scheme is based on a maximum-likelihood approach in which estimates are produced on the basis of a single batch of simultaneous observations. Simple-sample covariance estimation is reasonable as long as the number of available observations exceeds the number of tunable parameters by two or three orders of magnitude. Not much is known at present about model error associated with actual forecast systems. Our scheme can be used to estimate some important statistical model error parameters such as regionally averaged variances or characteristic correlation length scales. The advantage of the single-sample approach is that it does not rely on any assumptions about the temporal behavior of the covariance parameters: time-dependent parameter estimates can be continuously adjusted on the basis of current observations. This is of practical importance since it is likely to be the case that both model error and observation error strongly depend on the actual state of the atmosphere. The single-sample estimation scheme can be incorporated into any four-dimensional statistical data assimilation system that involves explicit calculation of forecast error covariances, including optimal interpolation (OI) and the simplified Kalman filter (SKF). The computational cost of the scheme is high but not prohibitive; on-line estimation of one or two covariance parameters in each analysis box of an operational bozed-OI system is currently feasible. A number of numerical experiments performed with an adaptive SKF and an adaptive version of OI, using a linear two-dimensional shallow-water model and artificially generated model error are described. The performance of the nonadaptive versions of these methods turns out to depend rather strongly on correct specification of model error parameters. These parameters are estimated under a variety of conditions, including uniformly distributed model error and time-dependent model error statistics.
Analysis of filter tuning techniques for sequential orbit determination
NASA Technical Reports Server (NTRS)
Lee, T.; Yee, C.; Oza, D.
1995-01-01
This paper examines filter tuning techniques for a sequential orbit determination (OD) covariance analysis. Recently, there has been a renewed interest in sequential OD, primarily due to the successful flight qualification of the Tracking and Data Relay Satellite System (TDRSS) Onboard Navigation System (TONS) using Doppler data extracted onboard the Extreme Ultraviolet Explorer (EUVE) spacecraft. TONS computes highly accurate orbit solutions onboard the spacecraft in realtime using a sequential filter. As the result of the successful TONS-EUVE flight qualification experiment, the Earth Observing System (EOS) AM-1 Project has selected TONS as the prime navigation system. In addition, sequential OD methods can be used successfully for ground OD. Whether data are processed onboard or on the ground, a sequential OD procedure is generally favored over a batch technique when a realtime automated OD system is desired. Recently, OD covariance analyses were performed for the TONS-EUVE and TONS-EOS missions using the sequential processing options of the Orbit Determination Error Analysis System (ODEAS). ODEAS is the primary covariance analysis system used by the Goddard Space Flight Center (GSFC) Flight Dynamics Division (FDD). The results of these analyses revealed a high sensitivity of the OD solutions to the state process noise filter tuning parameters. The covariance analysis results show that the state estimate error contributions from measurement-related error sources, especially those due to the random noise and satellite-to-satellite ionospheric refraction correction errors, increase rapidly as the state process noise increases. These results prompted an in-depth investigation of the role of the filter tuning parameters in sequential OD covariance analysis. This paper analyzes how the spacecraft state estimate errors due to dynamic and measurement-related error sources are affected by the process noise level used. This information is then used to establish guidelines for determining optimal filter tuning parameters in a given sequential OD scenario for both covariance analysis and actual OD. Comparisons are also made with corresponding definitive OD results available from the TONS-EUVE analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palmiotti, Giuseppe; Salvatores, Massimo
2014-04-01
The Working Party on International Nuclear Data Evaluation Cooperation (WPEC) of the Nuclear Science Committee under the Nuclear Energy Agency (NEA/OECD) established a Subgroup (called “Subgroup 33”) in 2009 on “Methods and issues for the combined use of integral experiments and covariance data.” The first stage was devoted to producing the description of different adjustment methodologies and assessing their merits. A detailed document related to this first stage has been issued. Nine leading organizations (often with a long and recognized expertise in the field) have contributed: ANL, CEA, INL, IPPE, JAEA, JSI, NRG, IRSN and ORNL. In the second stagemore » a practical benchmark exercise was defined in order to test the reliability of the nuclear data adjustment methodology. A comparison of the results obtained by the participants and major lessons learned in the exercise are discussed in the present paper that summarizes individual contributions which often include several original developments not reported separately. The paper provides the analysis of the most important results of the adjustment of the main nuclear data of 11 major isotopes in a 33-group energy structure. This benchmark exercise was based on a set of 20 well defined integral parameters from 7 fast assembly experiments. The exercise showed that using a common shared set of integral experiments but different starting evaluated libraries and/or different covariance matrices, there is a good convergence of trends for adjustments. Moreover, a significant reduction of the original uncertainties is often observed. Using the a–posteriori covariance data, there is a strong reduction of the uncertainties of integral parameters for reference reactor designs, mainly due to the new correlations in the a–posteriori covariance matrix. Furthermore, criteria have been proposed and applied to verify the consistency of differential and integral data used in the adjustment. Finally, recommendations are given for an appropriate use of sensitivity analysis methods and indications for future work are provided.« less
Gosho, Masahiko; Hirakawa, Akihiro; Noma, Hisashi; Maruo, Kazushi; Sato, Yasunori
2017-10-01
In longitudinal clinical trials, some subjects will drop out before completing the trial, so their measurements towards the end of the trial are not obtained. Mixed-effects models for repeated measures (MMRM) analysis with "unstructured" (UN) covariance structure are increasingly common as a primary analysis for group comparisons in these trials. Furthermore, model-based covariance estimators have been routinely used for testing the group difference and estimating confidence intervals of the difference in the MMRM analysis using the UN covariance. However, using the MMRM analysis with the UN covariance could lead to convergence problems for numerical optimization, especially in trials with a small-sample size. Although the so-called sandwich covariance estimator is robust to misspecification of the covariance structure, its performance deteriorates in settings with small-sample size. We investigated the performance of the sandwich covariance estimator and covariance estimators adjusted for small-sample bias proposed by Kauermann and Carroll ( J Am Stat Assoc 2001; 96: 1387-1396) and Mancl and DeRouen ( Biometrics 2001; 57: 126-134) fitting simpler covariance structures through a simulation study. In terms of the type 1 error rate and coverage probability of confidence intervals, Mancl and DeRouen's covariance estimator with compound symmetry, first-order autoregressive (AR(1)), heterogeneous AR(1), and antedependence structures performed better than the original sandwich estimator and Kauermann and Carroll's estimator with these structures in the scenarios where the variance increased across visits. The performance based on Mancl and DeRouen's estimator with these structures was nearly equivalent to that based on the Kenward-Roger method for adjusting the standard errors and degrees of freedom with the UN structure. The model-based covariance estimator with the UN structure under unadjustment of the degrees of freedom, which is frequently used in applications, resulted in substantial inflation of the type 1 error rate. We recommend the use of Mancl and DeRouen's estimator in MMRM analysis if the number of subjects completing is ( n + 5) or less, where n is the number of planned visits. Otherwise, the use of Kenward and Roger's method with UN structure should be the best way.
Evaluation of digital soil mapping approaches with large sets of environmental covariates
NASA Astrophysics Data System (ADS)
Nussbaum, Madlene; Spiess, Kay; Baltensweiler, Andri; Grob, Urs; Keller, Armin; Greiner, Lucie; Schaepman, Michael E.; Papritz, Andreas
2018-01-01
The spatial assessment of soil functions requires maps of basic soil properties. Unfortunately, these are either missing for many regions or are not available at the desired spatial resolution or down to the required soil depth. The field-based generation of large soil datasets and conventional soil maps remains costly. Meanwhile, legacy soil data and comprehensive sets of spatial environmental data are available for many regions. Digital soil mapping (DSM) approaches relating soil data (responses) to environmental data (covariates) face the challenge of building statistical models from large sets of covariates originating, for example, from airborne imaging spectroscopy or multi-scale terrain analysis. We evaluated six approaches for DSM in three study regions in Switzerland (Berne, Greifensee, ZH forest) by mapping the effective soil depth available to plants (SD), pH, soil organic matter (SOM), effective cation exchange capacity (ECEC), clay, silt, gravel content and fine fraction bulk density for four soil depths (totalling 48 responses). Models were built from 300-500 environmental covariates by selecting linear models through (1) grouped lasso and (2) an ad hoc stepwise procedure for robust external-drift kriging (georob). For (3) geoadditive models we selected penalized smoothing spline terms by component-wise gradient boosting (geoGAM). We further used two tree-based methods: (4) boosted regression trees (BRTs) and (5) random forest (RF). Lastly, we computed (6) weighted model averages (MAs) from the predictions obtained from methods 1-5. Lasso, georob and geoGAM successfully selected strongly reduced sets of covariates (subsets of 3-6 % of all covariates). Differences in predictive performance, tested on independent validation data, were mostly small and did not reveal a single best method for 48 responses. Nevertheless, RF was often the best among methods 1-5 (28 of 48 responses), but was outcompeted by MA for 14 of these 28 responses. RF tended to over-fit the data. The performance of BRT was slightly worse than RF. GeoGAM performed poorly on some responses and was the best only for 7 of 48 responses. The prediction accuracy of lasso was intermediate. All models generally had small bias. Only the computationally very efficient lasso had slightly larger bias because it tended to under-fit the data. Summarizing, although differences were small, the frequencies of the best and worst performance clearly favoured RF if a single method is applied and MA if multiple prediction models can be developed.
NASA Mars rover: a testbed for evaluating applications of covariance intersection
NASA Astrophysics Data System (ADS)
Uhlmann, Jeffrey K.; Julier, Simon J.; Kamgar-Parsi, Behzad; Lanzagorta, Marco O.; Shyu, Haw-Jye S.
1999-07-01
The Naval Research Laboratory (NRL) has spearheaded the development and application of Covariance Intersection (CI) for a variety of decentralized data fusion problems. Such problems include distributed control, onboard sensor fusion, and dynamic map building and localization. In this paper we describe NRL's development of a CI-based navigation system for the NASA Mars rover that stresses almost all aspects of decentralized data fusion. We also describe how this project relates to NRL's augmented reality, advanced visualization, and REBOT projects.
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A.
2012-01-01
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. PMID:23843659
NASA Astrophysics Data System (ADS)
Horiuti, Kiyosi; Suzuki, Shu
2014-11-01
We study the elongation process of polymers released in the Newtonian homogeneous isotropic turbulence by connecting a mesoscopic description of ensemble of elastic dumbbells using Brownian dynamics (BDS) to the macroscopic description for the fluid using DNS. The dumbbells are allowed to be advected non-affinely with the macroscopically-imposed deformation. More drastic drag reduction is achieved when non-affinity is maximum than in the complete affine case. In the former case, the dumbbell is convected as a covariant vector, and in the latter as a contravariant vector. We derive the exact solution for the governing equation of the motion of dumbbells. The maximum stretching of dumbbell is achieved when the dumbbell aligns in the direction of vorticity in the contravariant case, and when the dumbbell directs outward perpendicularly on the vortex sheet in the covariant case. Alignment in the BDS-DNS data agrees with the theoretical results. In the mixture of contravariant and covariant dumbbells, the covariant dumbbells are transversely aligned with the contravariant dumbbells. Compared with the cases without mixture, stretching of covariant dumbbell is enhanced, while that of contravariant dumbbell is reduced. Application of this phenomenon is discussed.
Propensity score method: a non-parametric technique to reduce model dependence
2017-01-01
Propensity score analysis (PSA) is a powerful technique that it balances pretreatment covariates, making the causal effect inference from observational data as reliable as possible. The use of PSA in medical literature has increased exponentially in recent years, and the trend continue to rise. The article introduces rationales behind PSA, followed by illustrating how to perform PSA in R with MatchIt package. There are a variety of methods available for PS matching such as nearest neighbors, full matching, exact matching and genetic matching. The task can be easily done by simply assigning a string value to the method argument in the matchit() function. The generic summary() and plot() functions can be applied to an object of class matchit to check covariate balance after matching. Furthermore, there is a useful package PSAgraphics that contains several graphical functions to check covariate balance between treatment groups across strata. If covariate balance is not achieved, one can modify model specifications or use other techniques such as random forest and recursive partitioning to better represent the underlying structure between pretreatment covariates and treatment assignment. The process can be repeated until the desirable covariate balance is achieved. PMID:28164092
Convex Banding of the Covariance Matrix
Bien, Jacob; Bunea, Florentina; Xiao, Luo
2016-01-01
We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. PMID:28042189
Convex Banding of the Covariance Matrix.
Bien, Jacob; Bunea, Florentina; Xiao, Luo
2016-01-01
We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings.
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
NASA Astrophysics Data System (ADS)
Nunnenmann, Elena; Fischer, Ulrich; Stieglitz, Robert
2017-09-01
An uncertainty analysis was performed for the tritium breeding ratio (TBR) of a fusion power plant of the European DEMO type using the MCSEN patch to the MCNP Monte Carlo code. The breeding blanket was of the type Helium Cooled Pebble Bed (HCPB), currently under development in the European Power Plant Physics and Technology (PPPT) programme for a fusion power demonstration reactor (DEMO). A suitable 3D model of the DEMO reactor with HCPB blanket modules, as routinely used for blanket design calculations, was employed. The nuclear cross-section data were taken from the JEFF-3.2 data library. For the uncertainty analysis, the isotopes H-1, Li-6, Li-7, Be-9, O-16, Si-28, Si-29, Si-30, Cr-52, Fe-54, Fe-56, Ni-58, W-182, W-183, W-184 and W-186 were considered. The covariance data were taken from JEFF-3.2 where available. Otherwise a combination of FENDL-2.1 for Li-7, EFF-3 for Be-9 and JENDL-3.2 for O-16 were compared with data from TENDL-2014. Another comparison was performed with covariance data from JEFF-3.3T1. The analyses show an overall uncertainty of ± 3.2% for the TBR when using JEFF-3.2 covariance data with the mentioned additions. When using TENDL-2014 covariance data as replacement, the uncertainty increases to ± 8.6%. For JEFF-3.3T1 the uncertainty result is ± 5.6%. The uncertainty is dominated by O-16, Li-6 and Li-7 cross-sections.
A general regression framework for a secondary outcome in case-control studies.
Tchetgen Tchetgen, Eric J
2014-01-01
Modern case-control studies typically involve the collection of data on a large number of outcomes, often at considerable logistical and monetary expense. These data are of potentially great value to subsequent researchers, who, although not necessarily concerned with the disease that defined the case series in the original study, may want to use the available information for a regression analysis involving a secondary outcome. Because cases and controls are selected with unequal probability, regression analysis involving a secondary outcome generally must acknowledge the sampling design. In this paper, the author presents a new framework for the analysis of secondary outcomes in case-control studies. The approach is based on a careful re-parameterization of the conditional model for the secondary outcome given the case-control outcome and regression covariates, in terms of (a) the population regression of interest of the secondary outcome given covariates and (b) the population regression of the case-control outcome on covariates. The error distribution for the secondary outcome given covariates and case-control status is otherwise unrestricted. For a continuous outcome, the approach sometimes reduces to extending model (a) by including a residual of (b) as a covariate. However, the framework is general in the sense that models (a) and (b) can take any functional form, and the methodology allows for an identity, log or logit link function for model (a).
Addressing data privacy in matched studies via virtual pooling.
Saha-Chaudhuri, P; Weinberg, C R
2017-09-07
Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.
Adapting Covariance Propagation to Account for the Presence of Modeled and Unmodeled Maneuvers
NASA Technical Reports Server (NTRS)
Schiff, Conrad
2006-01-01
This paper explores techniques that can be used to adapt the standard linearized propagation of an orbital covariance matrix to the case where there is a maneuver and an associated execution uncertainty. A Monte Carlo technique is used to construct a final orbital covariance matrix for a 'prop-burn-prop' process that takes into account initial state uncertainty and execution uncertainties in the maneuver magnitude. This final orbital covariance matrix is regarded as 'truth' and comparisons are made with three methods using modified linearized covariance propagation. The first method accounts for the maneuver by modeling its nominal effect within the state transition matrix but excludes the execution uncertainty by omitting a process noise matrix from the computation. The second method does not model the maneuver but includes a process noise matrix to account for the uncertainty in its magnitude. The third method, which is essentially a hybrid of the first two, includes the nominal portion of the maneuver via the state transition matrix and uses a process noise matrix to account for the magnitude uncertainty. The first method is unable to produce the final orbit covariance except in the case of zero maneuver uncertainty. The second method yields good accuracy for the final covariance matrix but fails to model the final orbital state accurately. Agreement between the simulated covariance data produced by this method and the Monte Carlo truth data fell within 0.5-2.5 percent over a range of maneuver sizes that span two orders of magnitude (0.1-20 m/s). The third method, which yields a combination of good accuracy in the computation of the final covariance matrix and correct accounting for the presence of the maneuver in the nominal orbit, is the best method for applications involving the computation of times of closest approach and the corresponding probability of collision, PC. However, applications for the two other methods exist and are briefly discussed. Although the process model ("prop-burn-prop") that was studied is very simple - point-mass gravitational effects due to the Earth combined with an impulsive delta-V in the velocity direction for the maneuver - generalizations to more complex scenarios, including high fidelity force models, finite duration maneuvers, and maneuver pointing errors, are straightforward and are discussed in the conclusion.
Corrected score estimation in the proportional hazards model with misclassified discrete covariates
Zucker, David M.; Spiegelman, Donna
2013-01-01
SUMMARY We consider Cox proportional hazards regression when the covariate vector includes error-prone discrete covariates along with error-free covariates, which may be discrete or continuous. The misclassification in the discrete error-prone covariates is allowed to be of any specified form. Building on the work of Nakamura and his colleagues, we present a corrected score method for this setting. The method can handle all three major study designs (internal validation design, external validation design, and replicate measures design), both functional and structural error models, and time-dependent covariates satisfying a certain ‘localized error’ condition. We derive the asymptotic properties of the method and indicate how to adjust the covariance matrix of the regression coefficient estimates to account for estimation of the misclassification matrix. We present the results of a finite-sample simulation study under Weibull survival with a single binary covariate having known misclassification rates. The performance of the method described here was similar to that of related methods we have examined in previous works. Specifically, our new estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We also present simulation results for our method for the case where the misclassification probabilities are estimated from an external replicate measures study. Our method generally performed well in these simulations. The new estimator has a broader range of applicability than many other estimators proposed in the literature, including those described in our own earlier work, in that it can handle time-dependent covariates with an arbitrary misclassification structure. We illustrate the method on data from a study of the relationship between dietary calcium intake and distal colon cancer. PMID:18219700
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pritychenko, B., E-mail: pritychenko@bnl.gov
Nuclear astrophysics and californium fission neutron spectrum averaged cross sections and their uncertainties for ENDF materials have been calculated. Absolute values were deduced with Maxwellian and Mannhart spectra, while uncertainties are based on ENDF/B-VII.1, JEFF-3.1.2, JENDL-4.0 and Low-Fidelity covariances. These quantities are compared with available data, independent benchmarks, EXFOR library, and analyzed for a wide range of cases. Recommendations for neutron cross section covariances are given and implications are discussed.
Beason, Melissa; Smith, Christopher; Coffaro, Joseph; Belichki, Sara; Spychalsky, Jonathon; Titus, Franklin; Crabbs, Robert; Andrews, Larry; Phillips, Ronald
2018-06-01
Experimental measurements were recently made which displayed characteristics of plane wave propagation through anisotropic optical turbulence. A near-plane wave beam was propagated a distance of 1 and 2 km at a height of 2 m above the concrete runway at the Shuttle Landing Facility, Kennedy Space Center, Florida, during January and February of 2017. The spatial-temporal fluctuations of the beam were recorded, and the covariance of intensity was calculated. These data sets were compared to a theoretical calculation of covariance of intensity for a plane wave.
Filter Tuning Using the Chi-Squared Statistic
NASA Technical Reports Server (NTRS)
Lilly-Salkowski, Tyler
2017-01-01
The Goddard Space Flight Center (GSFC) Flight Dynamics Facility (FDF) performs orbit determination (OD) for the Aqua and Aura satellites. Both satellites are located in low Earth orbit (LEO), and are part of what is considered the A-Train satellite constellation. Both spacecraft are currently in the science phase of their respective missions. The FDF has recently been tasked with delivering definitive covariance for each satellite.The main source of orbit determination used for these missions is the Orbit Determination Toolkit developed by Analytical Graphics Inc. (AGI). This software uses an Extended Kalman Filter (EKF) to estimate the states of both spacecraft. The filter incorporates force modelling, ground station and space network measurements to determine spacecraft states. It also generates a covariance at each measurement. This covariance can be useful for evaluating the overall performance of the tracking data measurements and the filter itself. An accurate covariance is also useful for covariance propagation which is utilized in collision avoidance operations. It is also valuable when attempting to determine if the current orbital solution will meet mission requirements in the future.This paper examines the use of the Chi-square statistic as a means of evaluating filter performance. The Chi-square statistic is calculated to determine the realism of a covariance based on the prediction accuracy and the covariance values at a given point in time. Once calculated, it is the distribution of this statistic that provides insight on the accuracy of the covariance.For the EKF to correctly calculate the covariance, error models associated with tracking data measurements must be accurately tuned. Over estimating or under estimating these error values can have detrimental effects on the overall filter performance. The filter incorporates ground station measurements, which can be tuned based on the accuracy of the individual ground stations. It also includes measurements from the NASA space network (SN), which can be affected by the assumed accuracy of the TDRS satellite state at the time of the measurement.The force modelling in the EKF is also an important factor that affects the propagation accuracy and covariance sizing. The dominant force in the LEO orbit regime is the drag force caused by atmospheric drag. Accurate accounting of the drag force is especially important for the accuracy of the propagated state. The implementation of a box and wing model to improve drag estimation accuracy, and its overall effect on the covariance state is explored.The process of tuning the EKF for Aqua and Aura support is described, including examination of the measurement errors of available observation types (Doppler and range), and methods of dealing with potentially volatile atmospheric drag modeling. Predictive accuracy and the distribution of the Chi-square statistic, calculated based of the ODTK EKF solutions, are assessed versus accepted norms for the orbit regime.
NASA Astrophysics Data System (ADS)
Müller, Silvia; Brockmann, Jan Martin; Schuh, Wolf-Dieter
2015-04-01
The ocean's dynamic topography as the difference between the sea surface and the geoid reflects many characteristics of the general ocean circulation. Consequently, it provides valuable information for evaluating or tuning ocean circulation models. The sea surface is directly observed by satellite radar altimetry while the geoid cannot be observed directly. The satellite-based gravity field determination requires different measurement principles (satellite-to-satellite tracking (e.g. GRACE), satellite-gravity-gradiometry (GOCE)). In addition, hydrographic measurements (salinity, temperature and pressure; near-surface velocities) provide information on the dynamic topography. The observation types have different representations and spatial as well as temporal resolutions. Therefore, the determination of the dynamic topography is not straightforward. Furthermore, the integration of the dynamic topography into ocean circulation models requires not only the dynamic topography itself but also its inverse covariance matrix on the ocean model grid. We developed a rigorous combination method in which the dynamic topography is parameterized in space as well as in time. The altimetric sea surface heights are expressed as a sum of geoid heights represented in terms of spherical harmonics and the dynamic topography parameterized by a finite element method which can be directly related to the particular ocean model grid. Besides the difficult task of combining altimetry data with a gravity field model, a major aspect is the consistent combination of satellite data and in-situ observations. The particular characteristics and the signal content of the different observations must be adequately considered requiring the introduction of auxiliary parameters. Within our model the individual observation groups are combined in terms of normal equations considering their full covariance information; i.e. a rigorous variance/covariance propagation from the original measurements to the final product is accomplished. In conclusion, the developed integrated approach allows for estimating the dynamic topography and its inverse covariance matrix on arbitrary grids in space and time. The inverse covariance matrix contains the appropriate weights for model-data misfits in least-squares ocean model inversions. The focus of this study is on the North Atlantic Ocean. We will present the conceptual design and dynamic topography estimates based on time variable data from seven satellite altimeter missions (Jason-1, Jason-2, Topex/Poseidon, Envisat, ERS-2, GFO, Cryosat2) in combination with the latest GOCE gravity field model and in-situ data from the Argo floats and near-surface drifting buoys.
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
A trade-off solution between model resolution and covariance in surface-wave inversion
Xia, J.; Xu, Y.; Miller, R.D.; Zeng, C.
2010-01-01
Regularization is necessary for inversion of ill-posed geophysical problems. Appraisal of inverse models is essential for meaningful interpretation of these models. Because uncertainties are associated with regularization parameters, extra conditions are usually required to determine proper parameters for assessing inverse models. Commonly used techniques for assessment of a geophysical inverse model derived (generally iteratively) from a linear system are based on calculating the model resolution and the model covariance matrices. Because the model resolution and the model covariance matrices of the regularized solutions are controlled by the regularization parameter, direct assessment of inverse models using only the covariance matrix may provide incorrect results. To assess an inverted model, we use the concept of a trade-off between model resolution and covariance to find a proper regularization parameter with singular values calculated in the last iteration. We plot the singular values from large to small to form a singular value plot. A proper regularization parameter is normally the first singular value that approaches zero in the plot. With this regularization parameter, we obtain a trade-off solution between model resolution and model covariance in the vicinity of a regularized solution. The unit covariance matrix can then be used to calculate error bars of the inverse model at a resolution level determined by the regularization parameter. We demonstrate this approach with both synthetic and real surface-wave data. ?? 2010 Birkh??user / Springer Basel AG.
Alterations in Anatomical Covariance in the Prematurely Born.
Scheinost, Dustin; Kwon, Soo Hyun; Lacadie, Cheryl; Vohr, Betty R; Schneider, Karen C; Papademetris, Xenophon; Constable, R Todd; Ment, Laura R
2017-01-01
Preterm (PT) birth results in long-term alterations in functional and structural connectivity, but the related changes in anatomical covariance are just beginning to be explored. To test the hypothesis that PT birth alters patterns of anatomical covariance, we investigated brain volumes of 25 PTs and 22 terms at young adulthood using magnetic resonance imaging. Using regional volumetrics, seed-based analyses, and whole brain graphs, we show that PT birth is associated with reduced volume in bilateral temporal and inferior frontal lobes, left caudate, left fusiform, and posterior cingulate for prematurely born subjects at young adulthood. Seed-based analyses demonstrate altered patterns of anatomical covariance for PTs compared with terms. PTs exhibit reduced covariance with R Brodmann area (BA) 47, Broca's area, and L BA 21, Wernicke's area, and white matter volume in the left prefrontal lobe, but increased covariance with R BA 47 and left cerebellum. Graph theory analyses demonstrate that measures of network complexity are significantly less robust in PTs compared with term controls. Volumes in regions showing group differences are significantly correlated with phonological awareness, the fundamental basis for reading acquisition, for the PTs. These data suggest both long-lasting and clinically significant alterations in the covariance in the PTs at young adulthood. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Poon, Wai-Yin; Wong, Yuen-Kwan
2004-01-01
This study uses a Cook's distance type diagnostic statistic to identify unusual observations in a data set that unduly influence the estimation of a covariance matrix. Similar to many other deletion-type diagnostic statistics, this proposed measure is susceptible to masking or swamping effect in the presence of several unusual observations. In…
Uncertainty in eddy covariance flux estimates resulting from spectral attenuation [Chapter 4
W. J. Massman; R. Clement
2004-01-01
Surface exchange fluxes measured by eddy covariance tend to be underestimated as a result of limitations in sensor design, signal processing methods, and finite flux-averaging periods. But, careful system design, modern instrumentation, and appropriate data processing algorithms can minimize these losses, which, if not too large, can be estimated and corrected using...
ERIC Educational Resources Information Center
Thompson, Bruce
The relationship between analysis of variance (ANOVA) methods and their analogs (analysis of covariance and multiple analyses of variance and covariance--collectively referred to as OVA methods) and the more general analytic case is explored. A small heuristic data set is used, with a hypothetical sample of 20 subjects, randomly assigned to five…
ERIC Educational Resources Information Center
Wiesner, Margit; Silbereisen, Rainer K.
2003-01-01
This longitudinal study examined individual, family, and peer covariates of distinctive trajectories of juvenile delinquency, using data from a community sample of 318 German adolescents (mean age at the first wave was 11.45 years). Latent growth mixture modelling analysis revealed four trajectory groups: high-level offenders, medium-level…
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
The utility of covariances: a response to Ranta et al
J.E. Houlahan; K. Cottenie; G.S. Cumming; D.J. Currie; C.S. Findlay; U. Gaedke; P. Legendre; J.J. Magnuson; B.H. McArdle; R.D. Stevens; I.P. Woiwod; S.M. Wondzell
2008-01-01
In an earlier publication (Houlahan et al. 2007) we reviewed trends in population covariances within communities across a range of long-term empirical data sets. We used these results to argue that compensatory dynamics are rare in natural communities. Ranta et al. (2008) explored interspecific interactions in a simulated environment and showed that 'negative...
ERIC Educational Resources Information Center
Noell, George H.
2004-01-01
A preliminary set of analyses was conducted linking students to courses and courses to teachers based upon data collected by the Louisiana Department of Education's Divisions of Planning, Analysis, and Information Resources and Student Standards and Assessments. An analysis of covariance, a weighted analysis of covariance, and a hierarchical…
Alvarez, Otto; Guo, Qinghua; Klinger, Robert C.; Li, Wenkai; Doherty, Paul
2013-01-01
Climate models may be limited in their inferential use if they cannot be locally validated or do not account for spatial uncertainty. Much of the focus has gone into determining which interpolation method is best suited for creating gridded climate surfaces, which often a covariate such as elevation (Digital Elevation Model, DEM) is used to improve the interpolation accuracy. One key area where little research has addressed is in determining which covariate best improves the accuracy in the interpolation. In this study, a comprehensive evaluation was carried out in determining which covariates were most suitable for interpolating climatic variables (e.g. precipitation, mean temperature, minimum temperature, and maximum temperature). We compiled data for each climate variable from 1950 to 1999 from approximately 500 weather stations across the Western United States (32° to 49° latitude and −124.7° to −112.9° longitude). In addition, we examined the uncertainty of the interpolated climate surface. Specifically, Thin Plate Spline (TPS) was used as the interpolation method since it is one of the most popular interpolation techniques to generate climate surfaces. We considered several covariates, including DEM, slope, distance to coast (Euclidean distance), aspect, solar potential, radar, and two Normalized Difference Vegetation Index (NDVI) products derived from Advanced Very High Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectroradiometer (MODIS). A tenfold cross-validation was applied to determine the uncertainty of the interpolation based on each covariate. In general, the leading covariate for precipitation was radar, while DEM was the leading covariate for maximum, mean, and minimum temperatures. A comparison to other products such as PRISM and WorldClim showed strong agreement across large geographic areas but climate surfaces generated in this study (ClimSurf) had greater variability at high elevation regions, such as in the Sierra Nevada Mountains.
NASA Astrophysics Data System (ADS)
Ghosh, S.; Lopez-Coto, I.; Prasad, K.; Karion, A.; Mueller, K.; Gourdji, S.; Martin, C.; Whetstone, J. R.
2017-12-01
The National Institute of Standards and Technology (NIST) supports the North-East Corridor Baltimore Washington (NEC-B/W) project and Indianapolis Flux Experiment (INFLUX) aiming to quantify sources of Greenhouse Gas (GHG) emissions as well as their uncertainties. These projects employ different flux estimation methods including top-down inversion approaches. The traditional Bayesian inversion method estimates emission distributions by updating prior information using atmospheric observations of Green House Gases (GHG) coupled to an atmospheric and dispersion model. The magnitude of the update is dependent upon the observed enhancement along with the assumed errors such as those associated with prior information and the atmospheric transport and dispersion model. These errors are specified within the inversion covariance matrices. The assumed structure and magnitude of the specified errors can have large impact on the emission estimates from the inversion. The main objective of this work is to build a data-adaptive model for these covariances matrices. We construct a synthetic data experiment using a Kalman Filter inversion framework (Lopez et al., 2017) employing different configurations of transport and dispersion model and an assumed prior. Unlike previous traditional Bayesian approaches, we estimate posterior emissions using regularized sample covariance matrices associated with prior errors to investigate whether the structure of the matrices help to better recover our hypothetical true emissions. To incorporate transport model error, we use ensemble of transport models combined with space-time analytical covariance to construct a covariance that accounts for errors in space and time. A Kalman Filter is then run using these covariances along with Maximum Likelihood Estimates (MLE) of the involved parameters. Preliminary results indicate that specifying sptio-temporally varying errors in the error covariances can improve the flux estimates and uncertainties. We also demonstrate that differences between the modeled and observed meteorology can be used to predict uncertainties associated with atmospheric transport and dispersion modeling which can help improve the skill of an inversion at urban scales.
NASA Astrophysics Data System (ADS)
Chan, S.; Billesbach, D. P.; Hanson, C. V.; Biraud, S.
2014-12-01
The AmeriFlux quality assurance and quality control (QA/QC) technical team conducts short term (<2 weeks) intercomparisons using a portable eddy covariance system (PECS) to maintain high quality data observations and data consistency across the AmeriFlux network (http://ameriflux.lbl.gov/). Site intercomparisons identify discrepancies between the in situ and portable measurements and calculated fluxes. Findings are jointly discussed by the site staff and the QA/QC team to improve in the situ observations. Despite the relatively short duration of an individual site intercomparison, the accumulated record of all site visits (numbering over 100 since 2002) is a unique dataset. The ability to deploy redundant sensors provides a rare opportunity to identify, quantify, and understand uncertainties in eddy covariance and ancillary measurements. We present a few specific case studies from QA/QC site visits to highlight and share new and relevant findings related to eddy covariance instrumentation and operation.
A Functional Varying-Coefficient Single-Index Model for Functional Response Data
Li, Jialiang; Huang, Chao; Zhu, Hongtu
2016-01-01
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. PMID:29200540
A Functional Varying-Coefficient Single-Index Model for Functional Response Data.
Li, Jialiang; Huang, Chao; Zhu, Hongtu
2017-01-01
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI) study.
The SAMI Galaxy Survey: cubism and covariance, putting round pegs into square holes
NASA Astrophysics Data System (ADS)
Sharp, R.; Allen, J. T.; Fogarty, L. M. R.; Croom, S. M.; Cortese, L.; Green, A. W.; Nielsen, J.; Richards, S. N.; Scott, N.; Taylor, E. N.; Barnes, L. A.; Bauer, A. E.; Birchall, M.; Bland-Hawthorn, J.; Bloom, J. V.; Brough, S.; Bryant, J. J.; Cecil, G. N.; Colless, M.; Couch, W. J.; Drinkwater, M. J.; Driver, S.; Foster, C.; Goodwin, M.; Gunawardhana, M. L. P.; Ho, I.-T.; Hampton, E. J.; Hopkins, A. M.; Jones, H.; Konstantopoulos, I. S.; Lawrence, J. S.; Leslie, S. K.; Lewis, G. F.; Liske, J.; López-Sánchez, Á. R.; Lorente, N. P. F.; McElroy, R.; Medling, A. M.; Mahajan, S.; Mould, J.; Parker, Q.; Pracy, M. B.; Obreschkow, D.; Owers, M. S.; Schaefer, A. L.; Sweet, S. M.; Thomas, A. D.; Tonini, C.; Walcher, C. J.
2015-01-01
We present a methodology for the regularization and combination of sparse sampled and irregularly gridded observations from fibre-optic multiobject integral field spectroscopy. The approach minimizes interpolation and retains image resolution on combining subpixel dithered data. We discuss the methodology in the context of the Sydney-AAO multiobject integral field spectrograph (SAMI) Galaxy Survey underway at the Anglo-Australian Telescope. The SAMI instrument uses 13 fibre bundles to perform high-multiplex integral field spectroscopy across a 1° diameter field of view. The SAMI Galaxy Survey is targeting ˜3000 galaxies drawn from the full range of galaxy environments. We demonstrate the subcritical sampling of the seeing and incomplete fill factor for the integral field bundles results in only a 10 per cent degradation in the final image resolution recovered. We also implement a new methodology for tracking covariance between elements of the resulting data cubes which retains 90 per cent of the covariance information while incurring only a modest increase in the survey data volume.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ballard, Sanford; Hipp, James R.; Begnaud, Michael L.
The task of monitoring the Earth for nuclear explosions relies heavily on seismic data to detect, locate, and characterize suspected nuclear tests. In this study, motivated by the need to locate suspected explosions as accurately and precisely as possible, we developed a tomographic model of the compressional wave slowness in the Earth’s mantle with primary focus on the accuracy and precision of travel-time predictions for P and Pn ray paths through the model. Path-dependent travel-time prediction uncertainties are obtained by computing the full 3D model covariance matrix and then integrating slowness variance and covariance along ray paths from source tomore » receiver. Path-dependent travel-time prediction uncertainties reflect the amount of seismic data that was used in tomography with very low values for paths represented by abundant data in the tomographic data set and very high values for paths through portions of the model that were poorly sampled by the tomography data set. The pattern of travel-time prediction uncertainty is a direct result of the off-diagonal terms of the model covariance matrix and underscores the importance of incorporating the full model covariance matrix in the determination of travel-time prediction uncertainty. In addition, the computed pattern of uncertainty differs significantly from that of 1D distance-dependent travel-time uncertainties computed using traditional methods, which are only appropriate for use with travel times computed through 1D velocity models.« less
Ballard, Sanford; Hipp, James R.; Begnaud, Michael L.; ...
2016-10-11
The task of monitoring the Earth for nuclear explosions relies heavily on seismic data to detect, locate, and characterize suspected nuclear tests. In this study, motivated by the need to locate suspected explosions as accurately and precisely as possible, we developed a tomographic model of the compressional wave slowness in the Earth’s mantle with primary focus on the accuracy and precision of travel-time predictions for P and Pn ray paths through the model. Path-dependent travel-time prediction uncertainties are obtained by computing the full 3D model covariance matrix and then integrating slowness variance and covariance along ray paths from source tomore » receiver. Path-dependent travel-time prediction uncertainties reflect the amount of seismic data that was used in tomography with very low values for paths represented by abundant data in the tomographic data set and very high values for paths through portions of the model that were poorly sampled by the tomography data set. The pattern of travel-time prediction uncertainty is a direct result of the off-diagonal terms of the model covariance matrix and underscores the importance of incorporating the full model covariance matrix in the determination of travel-time prediction uncertainty. In addition, the computed pattern of uncertainty differs significantly from that of 1D distance-dependent travel-time uncertainties computed using traditional methods, which are only appropriate for use with travel times computed through 1D velocity models.« less
New Insights into Activity Patterns in Children, Found Using Functional Data Analyses.
Goldsmith, Jeff; Liu, Xinyue; Jacobson, Judith S; Rundle, Andrew
2016-09-01
Continuous monitoring of activity using accelerometers and other wearable devices provides objective, unbiased measurement of physical activity in minute-by-minute or finer resolutions. Accelerometers have already been widely deployed in studies of healthy aging, recovery of function after heart surgery, and other outcomes. Although common analyses of accelerometer data focus on single summary variables, such as the total or average activity count, there is growing interest in the determinants of diurnal profiles of activity. We use tools from functional data analysis (FDA), an area with an established statistical literature, to treat complete 24-h diurnal profiles as outcomes in a regression model. We illustrate the use of such models by analyzing data collected in New York City from 420 children participating in a Head Start program. Covariates of interest include season, sex, body mass index z-score, presence of an asthma diagnosis, and mother's birthplace. The FDA model finds several meaningful associations between several covariates and diurnal profiles of activity. In some cases, including shifted activity patterns for children of foreign-born mothers and time-specific effects of asthma on activity, these associations exist for covariates that are not associated with average activity count. FDA provides a useful statistical framework for settings in which the effect of covariates on the timing of activity is of interest. The use of similar models in other applications should be considered, and we make code public to facilitate this process.
Selecting a separable parametric spatiotemporal covariance structure for longitudinal imaging data.
George, Brandon; Aban, Inmaculada
2015-01-15
Longitudinal imaging studies allow great insight into how the structure and function of a subject's internal anatomy changes over time. Unfortunately, the analysis of longitudinal imaging data is complicated by inherent spatial and temporal correlation: the temporal from the repeated measures and the spatial from the outcomes of interest being observed at multiple points in a patient's body. We propose the use of a linear model with a separable parametric spatiotemporal error structure for the analysis of repeated imaging data. The model makes use of spatial (exponential, spherical, and Matérn) and temporal (compound symmetric, autoregressive-1, Toeplitz, and unstructured) parametric correlation functions. A simulation study, inspired by a longitudinal cardiac imaging study on mitral regurgitation patients, compared different information criteria for selecting a particular separable parametric spatiotemporal correlation structure as well as the effects on types I and II error rates for inference on fixed effects when the specified model is incorrect. Information criteria were found to be highly accurate at choosing between separable parametric spatiotemporal correlation structures. Misspecification of the covariance structure was found to have the ability to inflate the type I error or have an overly conservative test size, which corresponded to decreased power. An example with clinical data is given illustrating how the covariance structure procedure can be performed in practice, as well as how covariance structure choice can change inferences about fixed effects. Copyright © 2014 John Wiley & Sons, Ltd.
Modeling survival: application of the Andersen-Gill model to Yellowstone grizzly bears
Johnson, Christopher J.; Boyce, Mark S.; Schwartz, Charles C.; Haroldson, Mark A.
2004-01-01
Wildlife ecologists often use the Kaplan-Meier procedure or Cox proportional hazards model to estimate survival rates, distributions, and magnitude of risk factors. The Andersen-Gill formulation (A-G) of the Cox proportional hazards model has seen limited application to mark-resight data but has a number of advantages, including the ability to accommodate left-censored data, time-varying covariates, multiple events, and discontinuous intervals of risks. We introduce the A-G model including structure of data, interpretation of results, and assessment of assumptions. We then apply the model to 22 years of radiotelemetry data for grizzly bears (Ursus arctos) of the Greater Yellowstone Grizzly Bear Recovery Zone in Montana, Idaho, and Wyoming, USA. We used Akaike's Information Criterion (AICc) and multi-model inference to assess a number of potentially useful predictive models relative to explanatory covariates for demography, human disturbance, and habitat. Using the most parsimonious models, we generated risk ratios, hypothetical survival curves, and a map of the spatial distribution of high-risk areas across the recovery zone. Our results were in agreement with past studies of mortality factors for Yellowstone grizzly bears. Holding other covariates constant, mortality was highest for bears that were subjected to repeated management actions and inhabited areas with high road densities outside Yellowstone National Park. Hazard models developed with covariates descriptive of foraging habitats were not the most parsimonious, but they suggested that high-elevation areas offered lower risks of mortality when compared to agricultural areas.
Yoneoka, Daisuke; Henmi, Masayuki
2017-11-30
Recently, the number of clinical prediction models sharing the same regression task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these regression models have not been sufficiently studied, particularly in meta-analysis settings where only regression coefficients are available. One of the difficulties lies in the differences between the categorization schemes of continuous covariates across different studies. In general, categorization methods using cutoff values are study specific across available models, even if they focus on the same covariates of interest. Differences in the categorization of covariates could lead to serious bias in the estimated regression coefficients and thus in subsequent syntheses. To tackle this issue, we developed synthesis methods for linear regression models with different categorization schemes of covariates. A 2-step approach to aggregate the regression coefficient estimates is proposed. The first step is to estimate the joint distribution of covariates by introducing a latent sampling distribution, which uses one set of individual participant data to estimate the marginal distribution of covariates with categorization. The second step is to use a nonlinear mixed-effects model with correction terms for the bias due to categorization to estimate the overall regression coefficients. Especially in terms of precision, numerical simulations show that our approach outperforms conventional methods, which only use studies with common covariates or ignore the differences between categorization schemes. The method developed in this study is also applied to a series of WHO epidemiologic studies on white blood cell counts. Copyright © 2017 John Wiley & Sons, Ltd.
Multivariate Phylogenetic Comparative Methods: Evaluations, Comparisons, and Recommendations.
Adams, Dean C; Collyer, Michael L
2018-01-01
Recent years have seen increased interest in phylogenetic comparative analyses of multivariate data sets, but to date the varied proposed approaches have not been extensively examined. Here we review the mathematical properties required of any multivariate method, and specifically evaluate existing multivariate phylogenetic comparative methods in this context. Phylogenetic comparative methods based on the full multivariate likelihood are robust to levels of covariation among trait dimensions and are insensitive to the orientation of the data set, but display increasing model misspecification as the number of trait dimensions increases. This is because the expected evolutionary covariance matrix (V) used in the likelihood calculations becomes more ill-conditioned as trait dimensionality increases, and as evolutionary models become more complex. Thus, these approaches are only appropriate for data sets with few traits and many species. Methods that summarize patterns across trait dimensions treated separately (e.g., SURFACE) incorrectly assume independence among trait dimensions, resulting in nearly a 100% model misspecification rate. Methods using pairwise composite likelihood are highly sensitive to levels of trait covariation, the orientation of the data set, and the number of trait dimensions. The consequences of these debilitating deficiencies are that a user can arrive at differing statistical conclusions, and therefore biological inferences, simply from a dataspace rotation, like principal component analysis. By contrast, algebraic generalizations of the standard phylogenetic comparative toolkit that use the trace of covariance matrices are insensitive to levels of trait covariation, the number of trait dimensions, and the orientation of the data set. Further, when appropriate permutation tests are used, these approaches display acceptable Type I error and statistical power. We conclude that methods summarizing information across trait dimensions, as well as pairwise composite likelihood methods should be avoided, whereas algebraic generalizations of the phylogenetic comparative toolkit provide a useful means of assessing macroevolutionary patterns in multivariate data. Finally, we discuss areas in which multivariate phylogenetic comparative methods are still in need of future development; namely highly multivariate Ornstein-Uhlenbeck models and approaches for multivariate evolutionary model comparisons. © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Wu, Kai; Shu, Hong; Nie, Lei; Jiao, Zhenhang
2018-01-01
Spatially correlated errors are typically ignored in data assimilation, thus degenerating the observation error covariance R to a diagonal matrix. We argue that a nondiagonal R carries more observation information making assimilation results more accurate. A method, denoted TC_Cov, was proposed for soil moisture data assimilation to estimate spatially correlated observation error covariance based on triple collocation (TC). Assimilation experiments were carried out to test the performance of TC_Cov. AMSR-E soil moisture was assimilated with a diagonal R matrix computed using the TC and assimilated using a nondiagonal R matrix, as estimated by proposed TC_Cov. The ensemble Kalman filter was considered as the assimilation method. Our assimilation results were validated against climate change initiative data and ground-based soil moisture measurements using the Pearson correlation coefficient and unbiased root mean square difference metrics. These experiments confirmed that deterioration of diagonal R assimilation results occurred when model simulation is more accurate than observation data. Furthermore, nondiagonal R achieved higher correlation coefficient and lower ubRMSD values over diagonal R in experiments and demonstrated the effectiveness of TC_Cov to estimate richly structuralized R in data assimilation. In sum, compared with diagonal R, nondiagonal R may relieve the detrimental effects of assimilation when simulated model results outperform observation data.
Li, Shi; Mukherjee, Bhramar; Taylor, Jeremy M G; Rice, Kenneth M; Wen, Xiaoquan; Rice, John D; Stringham, Heather M; Boehnke, Michael
2014-07-01
With challenges in data harmonization and environmental heterogeneity across various data sources, meta-analysis of gene-environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed-effect meta-analysis: the standard inverse-variance weighted meta-analysis and a meta-regression approach. Akin to the results in Simmonds and Higgins (), we obtain analytic efficiency results for both methods under certain assumptions. The relative efficiency of the two methods depends on the ratio of within versus between cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta-analysis and meta-regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverse-variance weighted estimator retains full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta-analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high-density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes. © 2014 WILEY PERIODICALS, INC.
Guillaume, Bryan; Wang, Changqing; Poh, Joann; Shen, Mo Jun; Ong, Mei Lyn; Tan, Pei Fang; Karnani, Neerja; Meaney, Michael; Qiu, Anqi
2018-06-01
Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Rainfall and evapotranspiration data for southwest Medina County, Texas, August 2006-December 2009
Slattery, Richard N.; Asquith, William H.; Ockerman, Darwin J.
2011-01-01
During August 2006-December 2009, the U.S. Geological Survey (USGS), in cooperation with the U.S. Army Corps of Engineers, Fort Worth District, collected rainfall and evapotranspiration data to help characterize the hydrology of the Nueces River Basin, Texas. The USGS installed and operated a station to collect continuous (30-minute interval) rainfall and evapotranspiration data in southwest Medina County approximately 14 miles southwest of D'Hanis, Texas, and 23 miles northwest of Pearsall, Texas. Rainfall data were collected by using an 8-inch tipping bucket raingage. Meteorological and surface-energy flux data used to calculate evapotranspiration were collected by using an extended Open Path Eddy Covariance system from Campbell Scientific, Inc. Data recorded by the system were used to calculate evapotranspiration by using the eddy covariance and Bowen ratio closure methods and to analyze the surface energy budget closure. During August 2006-December 2009 (excluding days of missing record), measured rainfall totaled 86.85 inches. In 2007, 2008, and 2009, annual rainfall totaled 40.98, 12.35, and 27.15 inches, respectively. The largest monthly rainfall total, 12.30 inches, occurred in July 2007. During August 2006-December 2009, evapotranspiration calculated by using the eddy covariance method totaled 69.91 inches. Annual evapotranspiration calculated by using the eddy covariance method totaled 34.62 inches in 2007, 15.24 inches in 2008, and 15.57 inches in 2009. During August 2006-December 2009, evapotranspiration calculated by using the Bowen ratio closure method (the more refined of the two datasets) totaled 68.33 inches. Annual evapotranspiration calculated by using the Bowen ratio closure method totaled 32.49, 15.54, and 15.80 inches in 2007, 2008, and 2009, respectively (excluding days of missing record).
Chen, Jie; Li, Jiahong; Yang, Shuanghua; Deng, Fang
2017-11-01
The identification of the nonlinearity and coupling is crucial in nonlinear target tracking problem in collaborative sensor networks. According to the adaptive Kalman filtering (KF) method, the nonlinearity and coupling can be regarded as the model noise covariance, and estimated by minimizing the innovation or residual errors of the states. However, the method requires large time window of data to achieve reliable covariance measurement, making it impractical for nonlinear systems which are rapidly changing. To deal with the problem, a weighted optimization-based distributed KF algorithm (WODKF) is proposed in this paper. The algorithm enlarges the data size of each sensor by the received measurements and state estimates from its connected sensors instead of the time window. A new cost function is set as the weighted sum of the bias and oscillation of the state to estimate the "best" estimate of the model noise covariance. The bias and oscillation of the state of each sensor are estimated by polynomial fitting a time window of state estimates and measurements of the sensor and its neighbors weighted by the measurement noise covariance. The best estimate of the model noise covariance is computed by minimizing the weighted cost function using the exhaustive method. The sensor selection method is in addition to the algorithm to decrease the computation load of the filter and increase the scalability of the sensor network. The existence, suboptimality and stability analysis of the algorithm are given. The local probability data association method is used in the proposed algorithm for the multitarget tracking case. The algorithm is demonstrated in simulations on tracking examples for a random signal, one nonlinear target, and four nonlinear targets. Results show the feasibility and superiority of WODKF against other filtering algorithms for a large class of systems.
Geerligs, Linda; Cam-Can; Henson, Richard N
2016-07-15
Studies of brain-wide functional connectivity or structural covariance typically use measures like the Pearson correlation coefficient, applied to data that have been averaged across voxels within regions of interest (ROIs). However, averaging across voxels may result in biased connectivity estimates when there is inhomogeneity within those ROIs, e.g., sub-regions that exhibit different patterns of functional connectivity or structural covariance. Here, we propose a new measure based on "distance correlation"; a test of multivariate dependence of high dimensional vectors, which allows for both linear and non-linear dependencies. We used simulations to show how distance correlation out-performs Pearson correlation in the face of inhomogeneous ROIs. To evaluate this new measure on real data, we use resting-state fMRI scans and T1 structural scans from 2 sessions on each of 214 participants from the Cambridge Centre for Ageing & Neuroscience (Cam-CAN) project. Pearson correlation and distance correlation showed similar average connectivity patterns, for both functional connectivity and structural covariance. Nevertheless, distance correlation was shown to be 1) more reliable across sessions, 2) more similar across participants, and 3) more robust to different sets of ROIs. Moreover, we found that the similarity between functional connectivity and structural covariance estimates was higher for distance correlation compared to Pearson correlation. We also explored the relative effects of different preprocessing options and motion artefacts on functional connectivity. Because distance correlation is easy to implement and fast to compute, it is a promising alternative to Pearson correlations for investigating ROI-based brain-wide connectivity patterns, for functional as well as structural data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Hammerle, Albin; Haslwanter, Alois; Schmitt, Michael; Bahn, Michael; Tappeiner, Ulrike; Cernusca, Alexander; Wohlfahrt, Georg
2014-01-01
Carbon dioxide, latent and sensible energy fluxes were measured by means of the eddy covariance method above a mountain meadow situated on a steep slope in the Stubai Valley/Austria, based on the hypothesis that, due to the low canopy height, measurements can be made in the shallow equilibrium layer where the wind field exhibits characteristics akin to level terrain. In order to test the validity of this hypothesis and to identify effects of complex terrain in the turbulence measurements, data were subjected to a rigorous testing procedure using a series of quality control measures established for surface layer flows. The resulting high-quality data set comprised 36 % of the original observations, the substantial reduction being mainly due to a change in surface roughness and associated fetch limitations in the wind sector dominating during nighttime and transition periods. The validity of the high-quality data set was further assessed by two independent tests: i) a comparison with the net ecosystem carbon dioxide exchange measured by means of ecosystem chambers and ii) the ability of the eddy covariance measurements to close the energy balance. The net ecosystem CO2 exchange measured by the eddy covariance method agreed reasonably with ecosystem chamber measurements. The assessment of the energy balance closure showed that there was no significant difference in the correspondence between the meadow on the slope and another one situated on flat ground at the bottom of the Stubai Valley, available energy being underestimated by 28 and 29 %, respectively. We thus conclude that, appropriate quality control provided, the eddy covariance measurements made above a mountain meadow on a steep slope are of similar quality as compared to flat terrain. PMID:24465032
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
Eddy Covariance Method: Overview of General Guidelines and Conventional Workflow
NASA Astrophysics Data System (ADS)
Burba, G. G.; Anderson, D. J.; Amen, J. L.
2007-12-01
Atmospheric flux measurements are widely used to estimate water, heat, carbon dioxide and trace gas exchange between the ecosystem and the atmosphere. The Eddy Covariance method is one of the most direct, defensible ways to measure and calculate turbulent fluxes within the atmospheric boundary layer. However, the method is mathematically complex, and requires significant care to set up and process data. These reasons may be why the method is currently used predominantly by micrometeorologists. Modern instruments and software can potentially expand the use of this method beyond micrometeorology and prove valuable for plant physiology, hydrology, biology, ecology, entomology, and other non-micrometeorological areas of research. The main challenge of the method for a non-expert is the complexity of system design, implementation, and processing of the large volume of data. In the past several years, efforts of the flux networks (e.g., FluxNet, Ameriflux, CarboEurope, Fluxnet-Canada, Asiaflux, etc.) have led to noticeable progress in unification of the terminology and general standardization of processing steps. The methodology itself, however, is difficult to unify, because various experimental sites and different purposes of studies dictate different treatments, and site-, measurement- and purpose-specific approaches. Here we present an overview of theory and typical workflow of the Eddy Covariance method in a format specifically designed to (i) familiarize a non-expert with general principles, requirements, applications, and processing steps of the conventional Eddy Covariance technique, (ii) to assist in further understanding the method through more advanced references such as textbooks, network guidelines and journal papers, (iii) to help technicians, students and new researchers in the field deployment of the Eddy Covariance method, and (iv) to assist in its use beyond micrometeorology. The overview is based, to a large degree, on the frequently asked questions received from new users of the Eddy Covariance method and relevant instrumentation, and employs non-technical language to be of practical use to those new to this field. Information is provided on theory of the method (including state of methodology, basic derivations, practical formulations, major assumptions and sources of errors, error treatment, and use in non- traditional terrains), practical workflow (e.g., experimental design, implementation, data processing, and quality control), alternative methods and applications, and the most frequently overlooked details of the measurements. References and access to an extended 141-page Eddy Covariance Guideline in three electronic formats are also provided.
NASA Astrophysics Data System (ADS)
Papale, Dario; Fratini, Gerardo
2013-04-01
Eddy-covariance is the most direct and most commonly applied methodology for measuring exchange fluxes of mass and energy between ecosystems and the atmosphere. In recent years, the number of environmental monitoring stations deploying eddy-covariance systems increased dramatically at the global level, exceeding 500 sites worldwide and covering most climatic and ecological regions. Several long-term environmental research infrastructures such as ICOS, NEON and AmeriFlux selected the eddy-covariance as a method to monitor GHG fluxes and are currently collaboratively working towards defining common measurements standards, data processing approaches, QA/QC procedures and uncertainty estimation strategies, to the aim of increasing defensibility of resulting fluxes and intra and inter-comparability of flux databases. In the meanwhile, the eddy-covariance research community keeps identifying technical and methodological flaws that, in some cases, can introduce - and can have introduced to date - significant biases in measured fluxes or increase their uncertainty. Among those, we identify three issues of presumably greater concern, namely: (1) strong underestimation of water vapour fluxes in closed-path systems, and its dependency on relative humidity; (2) flux biases induced by erroneous measurement of absolute gas concentrations; (3) and systematic errors due to underestimation of vertical wind variance in non-orthogonal anemometers. If not properly addressed, these issues can reduce the quality and reliability of the method, especially as a standard methodology in long-term monitoring networks. In this work, we review the status of the art regarding such problems, and propose new evidences based on field experiments as well as numerical simulations. Our analyses confirm the potential relevance of these issues but also hint at possible coping approaches, to minimize problems during setup design, data collection and post-field flux correction. Corrections are under implementation in eddy-covariance processing software and will be readily applicable by individual investigators as well as by centralized processing facilities of long-term research infrastructures. This new understandings suggest that a reanalysis of eddy-covariance data collected in the last 20 years may be appropriate in order to obtain more accurate and consistent flux time series. The availability of dedicated powerful computing facilities at the research infrastructures today makes this goal achievable at an affordable cost.
Picture of All Solutions of Successive 2-Block Maxbet Problems
ERIC Educational Resources Information Center
Choulakian, Vartan
2011-01-01
The Maxbet method is a generalized principal components analysis of a data set, where the group structure of the variables is taken into account. Similarly, 3-block[12,13] partial Maxdiff method is a generalization of covariance analysis, where only the covariances between blocks (1, 2) and (1, 3) are taken into account. The aim of this paper is…
Might "Unique" Factors Be "Common"? On the Possibility of Indeterminate Common-Unique Covariances
ERIC Educational Resources Information Center
Grayson, Dave
2006-01-01
The present paper shows that the usual factor analytic structured data dispersion matrix lambda psi lambda' + delta can readily arise from a set of scores y = lambda eta + epsilon, shere the "common" (eta) and "unique" (epsilon) factors have nonzero covariance: gamma = Cov epsilon,eta) is not equal to 0. Implications of this finding are discussed…
ERIC Educational Resources Information Center
Steiner, Peter M.; Cook, Thomas D.; Shadish, William R.
2011-01-01
The effect of unreliability of measurement on propensity score (PS) adjusted treatment effects has not been previously studied. The authors report on a study simulating different degrees of unreliability in the multiple covariates that were used to estimate the PS. The simulation uses the same data as two prior studies. Shadish, Clark, and Steiner…
A LISREL Model for the Analysis of Repeated Measures with a Patterned Covariance Matrix.
ERIC Educational Resources Information Center
Rovine, Michael J.; Molenaar, Peter C. M.
1998-01-01
Presents a LISREL model for the estimation of the repeated measures analysis of variance (ANOVA) with a patterned covariance matrix. The model is demonstrated for a 5 x 2 (Time x Group) ANOVA in which the data are assumed to be serially correlated. Similarities with the Statistical Analysis System PROC MIXED model are discussed. (SLD)
Zhang, Zhongrui; Zhong, Quanlin; Niklas, Karl J; Cai, Liang; Yang, Yusheng; Cheng, Dongliang
2016-08-24
Metabolic scaling theory (MST) posits that the scaling exponents among plant height H, diameter D, and biomass M will covary across phyletically diverse species. However, the relationships between scaling exponents and normalization constants remain unclear. Therefore, we developed a predictive model for the covariation of H, D, and stem volume V scaling relationships and used data from Chinese fir (Cunninghamia lanceolata) in Jiangxi province, China to test it. As predicted by the model and supported by the data, normalization constants are positively correlated with their associated scaling exponents for D vs. V and H vs. V, whereas normalization constants are negatively correlated with the scaling exponents of H vs. D. The prediction model also yielded reliable estimations of V (mean absolute percentage error = 10.5 ± 0.32 SE across 12 model calibrated sites). These results (1) support a totally new covariation scaling model, (2) indicate that differences in stem volume scaling relationships at the intra-specific level are driven by anatomical or ecophysiological responses to site quality and/or management practices, and (3) provide an accurate non-destructive method for predicting Chinese fir stem volume.
Neustifter, Benjamin; Rathbun, Stephen L; Shiffman, Saul
2012-01-01
Ecological Momentary Assessment is an emerging method of data collection in behavioral research that may be used to capture the times of repeated behavioral events on electronic devices, and information on subjects' psychological states through the electronic administration of questionnaires at times selected from a probability-based design as well as the event times. A method for fitting a mixed Poisson point process model is proposed for the impact of partially-observed, time-varying covariates on the timing of repeated behavioral events. A random frailty is included in the point-process intensity to describe variation among subjects in baseline rates of event occurrence. Covariate coefficients are estimated using estimating equations constructed by replacing the integrated intensity in the Poisson score equations with a design-unbiased estimator. An estimator is also proposed for the variance of the random frailties. Our estimators are robust in the sense that no model assumptions are made regarding the distribution of the time-varying covariates or the distribution of the random effects. However, subject effects are estimated under gamma frailties using an approximate hierarchical likelihood. The proposed approach is illustrated using smoking data.
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.
Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit
2018-06-01
In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
Multisensor Parallel Largest Ellipsoid Distributed Data Fusion with Unknown Cross-Covariances
Liu, Baoyu; Zhan, Xingqun; Zhu, Zheng H.
2017-01-01
As the largest ellipsoid (LE) data fusion algorithm can only be applied to two-sensor system, in this contribution, parallel fusion structure is proposed to introduce the LE algorithm into a multisensor system with unknown cross-covariances, and three parallel fusion structures based on different estimate pairing methods are presented and analyzed. In order to assess the influence of fusion structure on fusion performance, two fusion performance assessment parameters are defined as Fusion Distance and Fusion Index. Moreover, the formula for calculating the upper bounds of actual fused error covariances of the presented multisensor LE fusers is also provided. Demonstrated with simulation examples, the Fusion Index indicates fuser’s actual fused accuracy and its sensitivity to the sensor orders, as well as its robustness to the accuracy of newly added sensors. Compared to the LE fuser with sequential structure, the LE fusers with proposed parallel structures not only significantly improve their properties in these aspects, but also embrace better performances in consistency and computation efficiency. The presented multisensor LE fusers generally have better accuracies than covariance intersection (CI) fusion algorithm and are consistent when the local estimates are weakly correlated. PMID:28661442
MISTIC2: comprehensive server to study coevolution in protein families.
Colell, Eloy A; Iserte, Javier A; Simonetti, Franco L; Marino-Buslje, Cristina
2018-06-14
Correlated mutations between residue pairs in evolutionarily related proteins arise from constraints needed to maintain a functional and stable protein. Identifying these inter-related positions narrows down the search for structurally or functionally important sites. MISTIC is a server designed to assist users to calculate covariation in protein families and provide them with an interactive tool to visualize the results. Here, we present MISTIC2, an update to the previous server, that allows to calculate four covariation methods (MIp, mfDCA, plmDCA and gaussianDCA). The results visualization framework has been reworked for improved performance, compatibility and user experience. It includes a circos representation of the information contained in the alignment, an interactive covariation network, a 3D structure viewer and a sequence logo. Others components provide additional information such as residue annotations, a roc curve for assessing contact prediction, data tables and different ways of filtering the data and exporting figures. Comparison of different methods is easily done and scores combination is also possible. A newly implemented web service allows users to access MISTIC2 programmatically using an API to calculate covariation and retrieve results. MISTIC2 is available at: https://mistic2.leloir.org.ar.
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
NASA Astrophysics Data System (ADS)
Langford, B.; Acton, W.; Ammann, C.; Valach, A.; Nemitz, E.
2015-03-01
All eddy-covariance flux measurements are associated with random uncertainties which are a combination of sampling error due to natural variability in turbulence and sensor noise. The former is the principal error for systems where the signal-to-noise ratio of the analyser is high, as is usually the case when measuring fluxes of heat, CO2 or H2O. Where signal is limited, which is often the case for measurements of other trace gases and aerosols, instrument uncertainties dominate. We are here applying a consistent approach based on auto- and cross-covariance functions to quantifying the total random flux error and the random error due to instrument noise separately. As with previous approaches, the random error quantification assumes that the time-lag between wind and concentration measurement is known. However, if combined with commonly used automated methods that identify the individual time-lag by looking for the maximum in the cross-covariance function of the two entities, analyser noise additionally leads to a systematic bias in the fluxes. Combining datasets from several analysers and using simulations we show that the method of time-lag determination becomes increasingly important as the magnitude of the instrument error approaches that of the sampling error. The flux bias can be particularly significant for disjunct data, whereas using a prescribed time-lag eliminates these effects (provided the time-lag does not fluctuate unduly over time). We also demonstrate that when sampling at higher elevations, where low frequency turbulence dominates and covariance peaks are broader, both the probability and magnitude of bias are magnified. We show that the statistical significance of noisy flux data can be increased (limit of detection can be decreased) by appropriate averaging of individual fluxes, but only if systematic biases are avoided by using a prescribed time-lag. Finally, we make recommendations for the analysis and reporting of data with low signal-to-noise and their associated errors.
Gray, B.R.; Rogala, J.R.; Houser, J.N.
2013-01-01
Contiguous floodplain lakes ('lakes') have historically been used as study units for comparative studies of limnological variables that vary within lakes. The hierarchical nature of these studies implies that study variables may be correlated within lakes and that covariate associations may differ not only among lakes but also by spatial scale. We evaluated the utility of treating lakes as study units for limnological variables that vary within lakes based on the criteria of important levels of among-lake variation in study variables and the observation of covariate associations that vary among lakes. These concerns were selected, respectively, to ensure that lake signatures were distinguishable from within-lake variation and that lake-scale effects on covariate associations might provide inferences not available by ignoring those effects. Study data represented chlorophyll a (CHL) and inorganic suspended solids (ISS) data from lakes within three reaches of the Upper Mississippi River. Sampling occurred in summer from 1993 through 2005 (except 2003); numbers of lakes per reach varied from 7 to 19, and median lake area varied from 53 to 101 ha. CHL and ISS levels were modelled linearly, with lake, year and lake x year effects treated as random. For all reaches, the proportions of variation in CHL and ISS attributable to differences among lakes (including lake and lake x year effects) were substantial (range: 18%-73%). Finally, among-lake variation in CHL and ISS was strongly associated with covariates and covariate effects that varied by lakes or lake-years (including with vegetation levels and, for CHL, log(ISS)). These findings demonstrate the utility of treating floodplain lakes as study units for the study of limnological variables and the importance of addressing hierarchy within study designs when making inferences from data collected within floodplain lakes.
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
Testing a single regression coefficient in high dimensional linear models
Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2017-01-01
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668
Testing a single regression coefficient in high dimensional linear models.
Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2016-11-01
In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.
Borah, Bijan J; Basu, Anirban
2013-09-01
The quantile regression (QR) framework provides a pragmatic approach in understanding the differential impacts of covariates along the distribution of an outcome. However, the QR framework that has pervaded the applied economics literature is based on the conditional quantile regression method. It is used to assess the impact of a covariate on a quantile of the outcome conditional on specific values of other covariates. In most cases, conditional quantile regression may generate results that are often not generalizable or interpretable in a policy or population context. In contrast, the unconditional quantile regression method provides more interpretable results as it marginalizes the effect over the distributions of other covariates in the model. In this paper, the differences between these two regression frameworks are highlighted, both conceptually and econometrically. Additionally, using real-world claims data from a large US health insurer, alternative QR frameworks are implemented to assess the differential impacts of covariates along the distribution of medication adherence among elderly patients with Alzheimer's disease. Copyright © 2013 John Wiley & Sons, Ltd.
Cai, Gaigai; Chen, Xuefeng; Li, Bing; Chen, Baojia; He, Zhengjia
2012-01-01
The reliability of cutting tools is critical to machining precision and production efficiency. The conventional statistic-based reliability assessment method aims at providing a general and overall estimation of reliability for a large population of identical units under given and fixed conditions. However, it has limited effectiveness in depicting the operational characteristics of a cutting tool. To overcome this limitation, this paper proposes an approach to assess the operation reliability of cutting tools. A proportional covariate model is introduced to construct the relationship between operation reliability and condition monitoring information. The wavelet packet transform and an improved distance evaluation technique are used to extract sensitive features from vibration signals, and a covariate function is constructed based on the proportional covariate model. Ultimately, the failure rate function of the cutting tool being assessed is calculated using the baseline covariate function obtained from a small sample of historical data. Experimental results and a comparative study show that the proposed method is effective for assessing the operation reliability of cutting tools. PMID:23201980
Using SAS PROC CALIS to fit Level-1 error covariance structures of latent growth models.
Ding, Cherng G; Jane, Ten-Der
2012-09-01
In the present article, we demonstrates the use of SAS PROC CALIS to fit various types of Level-1 error covariance structures of latent growth models (LGM). Advantages of the SEM approach, on which PROC CALIS is based, include the capabilities of modeling the change over time for latent constructs, measured by multiple indicators; embedding LGM into a larger latent variable model; incorporating measurement models for latent predictors; and better assessing model fit and the flexibility in specifying error covariance structures. The strength of PROC CALIS is always accompanied with technical coding work, which needs to be specifically addressed. We provide a tutorial on the SAS syntax for modeling the growth of a manifest variable and the growth of a latent construct, focusing the documentation on the specification of Level-1 error covariance structures. Illustrations are conducted with the data generated from two given latent growth models. The coding provided is helpful when the growth model has been well determined and the Level-1 error covariance structure is to be identified.
Nuclear data uncertainty propagation by the XSUSA method in the HELIOS2 lattice code
NASA Astrophysics Data System (ADS)
Wemple, Charles; Zwermann, Winfried
2017-09-01
Uncertainty quantification has been extensively applied to nuclear criticality analyses for many years and has recently begun to be applied to depletion calculations. However, regulatory bodies worldwide are trending toward requiring such analyses for reactor fuel cycle calculations, which also requires uncertainty propagation for isotopics and nuclear reaction rates. XSUSA is a proven methodology for cross section uncertainty propagation based on random sampling of the nuclear data according to covariance data in multi-group representation; HELIOS2 is a lattice code widely used for commercial and research reactor fuel cycle calculations. This work describes a technique to automatically propagate the nuclear data uncertainties via the XSUSA approach through fuel lattice calculations in HELIOS2. Application of the XSUSA methodology in HELIOS2 presented some unusual challenges because of the highly-processed multi-group cross section data used in commercial lattice codes. Currently, uncertainties based on the SCALE 6.1 covariance data file are being used, but the implementation can be adapted to other covariance data in multi-group structure. Pin-cell and assembly depletion calculations, based on models described in the UAM-LWR Phase I and II benchmarks, are performed and uncertainties in multiplication factor, reaction rates, isotope concentrations, and delayed-neutron data are calculated. With this extension, it will be possible for HELIOS2 users to propagate nuclear data uncertainties directly from the microscopic cross sections to subsequent core simulations.
On the computation and updating of the modified Cholesky decomposition of a covariance matrix
NASA Technical Reports Server (NTRS)
Vanrooy, D. L.
1976-01-01
Methods for obtaining and updating the modified Cholesky decomposition (MCD) for the particular case of a covariance matrix when one is given only the original data are described. These methods are the standard method of forming the covariance matrix K then solving for the MCD, L and D (where K=LDLT); a method based on Householder reflections; and lastly, a method employing the composite-t algorithm. For many cases in the analysis of remotely sensed data, the composite-t method is the superior method despite the fact that it is the slowest one, since (1) the relative amount of time computing MCD's is often quite small, (2) the stability properties of it are the best of the three, and (3) it affords an efficient and numerically stable procedure for updating the MCD. The properties of these methods are discussed and FORTRAN programs implementing these algorithms are listed.
MIMICKING COUNTERFACTUAL OUTCOMES TO ESTIMATE CAUSAL EFFECTS.
Lok, Judith J
2017-04-01
In observational studies, treatment may be adapted to covariates at several times without a fixed protocol, in continuous time. Treatment influences covariates, which influence treatment, which influences covariates, and so on. Then even time-dependent Cox-models cannot be used to estimate the net treatment effect. Structural nested models have been applied in this setting. Structural nested models are based on counterfactuals: the outcome a person would have had had treatment been withheld after a certain time. Previous work on continuous-time structural nested models assumes that counterfactuals depend deterministically on observed data, while conjecturing that this assumption can be relaxed. This article proves that one can mimic counterfactuals by constructing random variables, solutions to a differential equation, that have the same distribution as the counterfactuals, even given past observed data. These "mimicking" variables can be used to estimate the parameters of structural nested models without assuming the treatment effect to be deterministic.
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
Estimation of group means when adjusting for covariates in generalized linear models.
Qu, Yongming; Luo, Junxiang
2015-01-01
Generalized linear models are commonly used to analyze categorical data such as binary, count, and ordinal outcomes. Adjusting for important prognostic factors or baseline covariates in generalized linear models may improve the estimation efficiency. The model-based mean for a treatment group produced by most software packages estimates the response at the mean covariate, not the mean response for this treatment group for the studied population. Although this is not an issue for linear models, the model-based group mean estimates in generalized linear models could be seriously biased for the true group means. We propose a new method to estimate the group mean consistently with the corresponding variance estimation. Simulation showed the proposed method produces an unbiased estimator for the group means and provided the correct coverage probability. The proposed method was applied to analyze hypoglycemia data from clinical trials in diabetes. Copyright © 2014 John Wiley & Sons, Ltd.
Levy Matrices and Financial Covariances
NASA Astrophysics Data System (ADS)
Burda, Zdzislaw; Jurkiewicz, Jerzy; Nowak, Maciej A.; Papp, Gabor; Zahed, Ismail
2003-10-01
In a given market, financial covariances capture the intra-stock correlations and can be used to address statistically the bulk nature of the market as a complex system. We provide a statistical analysis of three SP500 covariances with evidence for raw tail distributions. We study the stability of these tails against reshuffling for the SP500 data and show that the covariance with the strongest tails is robust, with a spectral density in remarkable agreement with random Lévy matrix theory. We study the inverse participation ratio for the three covariances. The strong localization observed at both ends of the spectral density is analogous to the localization exhibited in the random Lévy matrix ensemble. We discuss two competitive mechanisms responsible for the occurrence of an extensive and delocalized eigenvalue at the edge of the spectrum: (a) the Lévy character of the entries of the correlation matrix and (b) a sort of off-diagonal order induced by underlying inter-stock correlations. (b) can be destroyed by reshuffling, while (a) cannot. We show that the stocks with the largest scattering are the least susceptible to correlations, and likely candidates for the localized states. We introduce a simple model for price fluctuations which captures behavior of the SP500 covariances. It may be of importance for assets diversification.
Taylor, J M; Law, N
1998-10-30
We investigate the importance of the assumed covariance structure for longitudinal modelling of CD4 counts. We examine how individual predictions of future CD4 counts are affected by the covariance structure. We consider four covariance structures: one based on an integrated Ornstein-Uhlenbeck stochastic process; one based on Brownian motion, and two derived from standard linear and quadratic random-effects models. Using data from the Multicenter AIDS Cohort Study and from a simulation study, we show that there is a noticeable deterioration in the coverage rate of confidence intervals if we assume the wrong covariance. There is also a loss in efficiency. The quadratic random-effects model is found to be the best in terms of correctly calibrated prediction intervals, but is substantially less efficient than the others. Incorrectly specifying the covariance structure as linear random effects gives too narrow prediction intervals with poor coverage rates. Fitting using the model based on the integrated Ornstein-Uhlenbeck stochastic process is the preferred one of the four considered because of its efficiency and robustness properties. We also use the difference between the future predicted and observed CD4 counts to assess an appropriate transformation of CD4 counts; a fourth root, cube root and square root all appear reasonable choices.
Davies, Christopher E; Glonek, Gary Fv; Giles, Lynne C
2017-08-01
One purpose of a longitudinal study is to gain a better understanding of how an outcome of interest changes among a given population over time. In what follows, a trajectory will be taken to mean the series of measurements of the outcome variable for an individual. Group-based trajectory modelling methods seek to identify subgroups of trajectories within a population, such that trajectories that are grouped together are more similar to each other than to trajectories in distinct groups. Group-based trajectory models generally assume a certain structure in the covariances between measurements, for example conditional independence, homogeneous variance between groups or stationary variance over time. Violations of these assumptions could be expected to result in poor model performance. We used simulation to investigate the effect of covariance misspecification on misclassification of trajectories in commonly used models under a range of scenarios. To do this we defined a measure of performance relative to the ideal Bayesian correct classification rate. We found that the more complex models generally performed better over a range of scenarios. In particular, incorrectly specified covariance matrices could significantly bias the results but using models with a correct but more complicated than necessary covariance matrix incurred little cost.
Murad, Havi; Kipnis, Victor; Freedman, Laurence S
2016-10-01
Assessing interactions in linear regression models when covariates have measurement error (ME) is complex.We previously described regression calibration (RC) methods that yield consistent estimators and standard errors for interaction coefficients of normally distributed covariates having classical ME. Here we extend normal based RC (NBRC) and linear RC (LRC) methods to a non-classical ME model, and describe more efficient versions that combine estimates from the main study and internal sub-study. We apply these methods to data from the Observing Protein and Energy Nutrition (OPEN) study. Using simulations we show that (i) for normally distributed covariates efficient NBRC and LRC were nearly unbiased and performed well with sub-study size ≥200; (ii) efficient NBRC had lower MSE than efficient LRC; (iii) the naïve test for a single interaction had type I error probability close to the nominal significance level, whereas efficient NBRC and LRC were slightly anti-conservative but more powerful; (iv) for markedly non-normal covariates, efficient LRC yielded less biased estimators with smaller variance than efficient NBRC. Our simulations suggest that it is preferable to use: (i) efficient NBRC for estimating and testing interaction effects of normally distributed covariates and (ii) efficient LRC for estimating and testing interactions for markedly non-normal covariates. © The Author(s) 2013.
A mixed model framework for teratology studies.
Braeken, Johan; Tuerlinckx, Francis
2009-10-01
A mixed model framework is presented to model the characteristic multivariate binary anomaly data as provided in some teratology studies. The key features of the model are the incorporation of covariate effects, a flexible random effects distribution by means of a finite mixture, and the application of copula functions to better account for the relation structure of the anomalies. The framework is motivated by data of the Boston Anticonvulsant Teratogenesis study and offers an integrated approach to investigate substantive questions, concerning general and anomaly-specific exposure effects of covariates, interrelations between anomalies, and objective diagnostic measurement.
Li, Xiujin; Lund, Mogens Sandø; Janss, Luc; Wang, Chonglong; Ding, Xiangdong; Zhang, Qin; Su, Guosheng
2017-03-15
With the development of SNP chips, SNP information provides an efficient approach to further disentangle different patterns of genomic variances and covariances across the genome for traits of interest. Due to the interaction between genotype and environment as well as possible differences in genetic background, it is reasonable to treat the performances of a biological trait in different populations as different but genetic correlated traits. In the present study, we performed an investigation on the patterns of region-specific genomic variances, covariances and correlations between Chinese and Nordic Holstein populations for three milk production traits. Variances and covariances between Chinese and Nordic Holstein populations were estimated for genomic regions at three different levels of genome region (all SNP as one region, each chromosome as one region and every 100 SNP as one region) using a novel multi-trait random regression model which uses latent variables to model heterogeneous variance and covariance. In the scenario of the whole genome as one region, the genomic variances, covariances and correlations obtained from the new multi-trait Bayesian method were comparable to those obtained from a multi-trait GBLUP for all the three milk production traits. In the scenario of each chromosome as one region, BTA 14 and BTA 5 accounted for very large genomic variance, covariance and correlation for milk yield and fat yield, whereas no specific chromosome showed very large genomic variance, covariance and correlation for protein yield. In the scenario of every 100 SNP as one region, most regions explained <0.50% of genomic variance and covariance for milk yield and fat yield, and explained <0.30% for protein yield, while some regions could present large variance and covariance. Although overall correlations between two populations for the three traits were positive and high, a few regions still showed weakly positive or highly negative genomic correlations for milk yield and fat yield. The new multi-trait Bayesian method using latent variables to model heterogeneous variance and covariance could work well for estimating the genomic variances and covariances for all genome regions simultaneously. Those estimated genomic parameters could be useful to improve the genomic prediction accuracy for Chinese and Nordic Holstein populations using a joint reference data in the future.
NASA Astrophysics Data System (ADS)
Kaiser, Olga; Martius, Olivia; Horenko, Illia
2017-04-01
Regression based Generalized Pareto Distribution (GPD) models are often used to describe the dynamics of hydrological threshold excesses relying on the explicit availability of all of the relevant covariates. But, in real application the complete set of relevant covariates might be not available. In this context, it was shown that under weak assumptions the influence coming from systematically missing covariates can be reflected by a nonstationary and nonhomogenous dynamics. We present a data-driven, semiparametric and an adaptive approach for spatio-temporal regression based clustering of threshold excesses in a presence of systematically missing covariates. The nonstationary and nonhomogenous behavior of threshold excesses is describes by a set of local stationary GPD models, where the parameters are expressed as regression models, and a non-parametric spatio-temporal hidden switching process. Exploiting nonparametric Finite Element time-series analysis Methodology (FEM) with Bounded Variation of the model parameters (BV) for resolving the spatio-temporal switching process, the approach goes beyond strong a priori assumptions made is standard latent class models like Mixture Models and Hidden Markov Models. Additionally, the presented FEM-BV-GPD provides a pragmatic description of the corresponding spatial dependence structure by grouping together all locations that exhibit similar behavior of the switching process. The performance of the framework is demonstrated on daily accumulated precipitation series over 17 different locations in Switzerland from 1981 till 2013 - showing that the introduced approach allows for a better description of the historical data.
Enhanced brainstem and cortical evoked response amplitudes: single-trial covariance analysis.
Galbraith, G C
2001-06-01
The purpose of the present study was to develop analytic procedures that improve the definition of sensory evoked response components. Such procedures could benefit all recordings but would especially benefit difficult recordings where many trials are contaminated by muscle and movement artifacts. First, cross-correlation and latency adjustment analyses were applied to the human brainstem frequency-following response and cortical auditory evoked response recorded on the same trials. Lagged cross-correlation functions were computed, for each of 17 subjects, between single-trial data and templates consisting of the sinusoid stimulus waveform for the brainstem response and the subject's own smoothed averaged evoked response P2 component for the cortical response. Trials were considered in the analysis only if the maximum correlation-squared (r2) exceeded .5 (negatively correlated trials were thus included). Identical correlation coefficients may be based on signals with quite different amplitudes, but it is possible to assess amplitude by the nonnormalized covariance function. Next, an algorithm is applied in which each trial with negative covariance is matched to a trial with similar, but positive, covariance and these matched-trial pairs are deleted. When an evoked response signal is present in the data, the majority of trials positively correlate with the template. Thus, a residual of positively correlated trials remains after matched covariance trials are deleted. When these residual trials are averaged, the resulting brainstem and cortical responses show greatly enhanced amplitudes. This result supports the utility of this analysis technique in clarifying and assessing evoked response signals.
On the Lighthill relationship and sound generation from isotropic turbulence
NASA Technical Reports Server (NTRS)
Zhou, YE; Praskovsky, Alexander; Oncley, Steven
1994-01-01
In 1952, Lighthill developed a theory for determining the sound generated by a turbulent motion of a fluid. With some statistical assumptions, Proudman applied this theory to estimate the acoustic power of isotropic turbulence. Recently, Lighthill established a simple relationship that relates the fourth-order retarded time and space covariance of his stress tensor to the corresponding second-order covariance and the turbulent flatness factor, without making statistical assumptions for a homogeneous turbulence. Lilley revisited Proudman's work and applied the Lighthill relationship to evaluate directly the radiated acoustic power from isotropic turbulence. After choosing the time separation dependence in the two-point velocity time and space covariance based on the insights gained from direct numerical simulations, Lilley concluded that the Proudman constant is determined by the turbulent flatness factor and the second-order spatial velocity covariance. In order to estimate the Proudman constant at high Reynolds numbers, we analyzed a unique data set of measurements in a large wind tunnel and atmospheric surface layer that covers a range of the Taylor microscale based on Reynolds numbers 2.0 x 10(exp 3) less than or equal to R(sub lambda) less than or equal to 12.7 x 10(exp 3). Our measurements demonstrate that the Lighthill relationship is a good approximation, providing additional support to Lilley's approach. The flatness factor is found between 2.7 - 3.3 and the second order spatial velocity covariance is obtained. Based on these experimental data, the Proudman constant is estimated to be 0.68 - 3.68.
Linden, Ariel; Yarnold, Paul R
2016-12-01
Program evaluations often utilize various matching approaches to emulate the randomization process for group assignment in experimental studies. Typically, the matching strategy is implemented, and then covariate balance is assessed before estimating treatment effects. This paper introduces a novel analytic framework utilizing a machine learning algorithm called optimal discriminant analysis (ODA) for assessing covariate balance and estimating treatment effects, once the matching strategy has been implemented. This framework holds several key advantages over the conventional approach: application to any variable metric and number of groups; insensitivity to skewed data or outliers; and use of accuracy measures applicable to all prognostic analyses. Moreover, ODA accepts analytic weights, thereby extending the methodology to any study design where weights are used for covariate adjustment or more precise (differential) outcome measurement. One-to-one matching on the propensity score was used as the matching strategy. Covariate balance was assessed using standardized difference in means (conventional approach) and measures of classification accuracy (ODA). Treatment effects were estimated using ordinary least squares regression and ODA. Using empirical data, ODA produced results highly consistent with those obtained via the conventional methodology for assessing covariate balance and estimating treatment effects. When ODA is combined with matching techniques within a treatment effects framework, the results are consistent with conventional approaches. However, given that it provides additional dimensions and robustness to the analysis versus what can currently be achieved using conventional approaches, ODA offers an appealing alternative. © 2016 John Wiley & Sons, Ltd.
Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.
Huang, Danyang; Li, Runze; Wang, Hansheng
2014-01-01
Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies, and illustrate the proposed method by two empirical datasets.
On the regularity of the covariance matrix of a discretized scalar field on the sphere
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bilbao-Ahedo, J.D.; Barreiro, R.B.; Herranz, D.
2017-02-01
We present a comprehensive study of the regularity of the covariance matrix of a discretized field on the sphere. In a particular situation, the rank of the matrix depends on the number of pixels, the number of spherical harmonics, the symmetries of the pixelization scheme and the presence of a mask. Taking into account the above mentioned components, we provide analytical expressions that constrain the rank of the matrix. They are obtained by expanding the determinant of the covariance matrix as a sum of determinants of matrices made up of spherical harmonics. We investigate these constraints for five different pixelizationsmore » that have been used in the context of Cosmic Microwave Background (CMB) data analysis: Cube, Icosahedron, Igloo, GLESP and HEALPix, finding that, at least in the considered cases, the HEALPix pixelization tends to provide a covariance matrix with a rank closer to the maximum expected theoretical value than the other pixelizations. The effect of the propagation of numerical errors in the regularity of the covariance matrix is also studied for different computational precisions, as well as the effect of adding a certain level of noise in order to regularize the matrix. In addition, we investigate the application of the previous results to a particular example that requires the inversion of the covariance matrix: the estimation of the CMB temperature power spectrum through the Quadratic Maximum Likelihood algorithm. Finally, some general considerations in order to achieve a regular covariance matrix are also presented.« less
Empirical likelihood inference in randomized clinical trials.
Zhang, Biao
2017-01-01
In individually randomized controlled trials, in addition to the primary outcome, information is often available on a number of covariates prior to randomization. This information is frequently utilized to undertake adjustment for baseline characteristics in order to increase precision of the estimation of average treatment effects; such adjustment is usually performed via covariate adjustment in outcome regression models. Although the use of covariate adjustment is widely seen as desirable for making treatment effect estimates more precise and the corresponding hypothesis tests more powerful, there are considerable concerns that objective inference in randomized clinical trials can potentially be compromised. In this paper, we study an empirical likelihood approach to covariate adjustment and propose two unbiased estimating functions that automatically decouple evaluation of average treatment effects from regression modeling of covariate-outcome relationships. The resulting empirical likelihood estimator of the average treatment effect is as efficient as the existing efficient adjusted estimators 1 when separate treatment-specific working regression models are correctly specified, yet are at least as efficient as the existing efficient adjusted estimators 1 for any given treatment-specific working regression models whether or not they coincide with the true treatment-specific covariate-outcome relationships. We present a simulation study to compare the finite sample performance of various methods along with some results on analysis of a data set from an HIV clinical trial. The simulation results indicate that the proposed empirical likelihood approach is more efficient and powerful than its competitors when the working covariate-outcome relationships by treatment status are misspecified.
NASA Astrophysics Data System (ADS)
Jolivet, R.; Simons, M.
2018-02-01
Interferometric synthetic aperture radar time series methods aim to reconstruct time-dependent ground displacements over large areas from sets of interferograms in order to detect transient, periodic, or small-amplitude deformation. Because of computational limitations, most existing methods consider each pixel independently, ignoring important spatial covariances between observations. We describe a framework to reconstruct time series of ground deformation while considering all pixels simultaneously, allowing us to account for spatial covariances, imprecise orbits, and residual atmospheric perturbations. We describe spatial covariances by an exponential decay function dependent of pixel-to-pixel distance. We approximate the impact of imprecise orbit information and residual long-wavelength atmosphere as a low-order polynomial function. Tests on synthetic data illustrate the importance of incorporating full covariances between pixels in order to avoid biased parameter reconstruction. An example of application to the northern Chilean subduction zone highlights the potential of this method.
A Kalman filter for a two-dimensional shallow-water model
NASA Technical Reports Server (NTRS)
Parrish, D. F.; Cohn, S. E.
1985-01-01
A two-dimensional Kalman filter is described for data assimilation for making weather forecasts. The filter is regarded as superior to the optimal interpolation method because the filter determines the forecast error covariance matrix exactly instead of using an approximation. A generalized time step is defined which includes expressions for one time step of the forecast model, the error covariance matrix, the gain matrix, and the evolution of the covariance matrix. Subsequent time steps are achieved by quantifying the forecast variables or employing a linear extrapolation from a current variable set, assuming the forecast dynamics are linear. Calculations for the evolution of the error covariance matrix are banded, i.e., are performed only with the elements significantly different from zero. Experimental results are provided from an application of the filter to a shallow-water simulation covering a 6000 x 6000 km grid.
Gentry, Amanda Elswick; Jackson-Cook, Colleen K; Lyon, Debra E; Archer, Kellie J
2015-01-01
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment.
Bernhardt, Paul W.; Zhang, Daowen; Wang, Huixia Judy
2014-01-01
Joint modeling techniques have become a popular strategy for studying the association between a response and one or more longitudinal covariates. Motivated by the GenIMS study, where it is of interest to model the event of survival using censored longitudinal biomarkers, a joint model is proposed for describing the relationship between a binary outcome and multiple longitudinal covariates subject to detection limits. A fast, approximate EM algorithm is developed that reduces the dimension of integration in the E-step of the algorithm to one, regardless of the number of random effects in the joint model. Numerical studies demonstrate that the proposed approximate EM algorithm leads to satisfactory parameter and variance estimates in situations with and without censoring on the longitudinal covariates. The approximate EM algorithm is applied to analyze the GenIMS data set. PMID:25598564
Geographic analysis of shigellosis in Vietnam.
Kim, Deok Ryun; Ali, Mohammad; Thiem, Vu Dinh; Park, Jin-Kyung; von Seidlein, Lorenz; Clemens, John
2008-12-01
Geographic and ecological analysis may provide investigators useful ecological information for the control of shigellosis. This paper provides distribution of individual Shigella species in space, and ecological covariates for shigellosis in Nha Trang, Vietnam. Data on shigellosis in neighborhoods were used to identify ecological covariates. A Bayesian hierarchical model was used to obtain joint posterior distribution of model parameters and to construct smoothed risk maps for shigellosis. Neighborhoods with a high proportion of worshippers of traditional religion, close proximity to hospital, or close proximity to the river had increased risk for shigellosis. The ecological covariates associated with Shigella flexneri differed from the covariates for Shigella sonnei. In contrast the spatial distribution of the two species was similar. The disease maps can help identify high-risk areas of shigellosis that can be targeted for interventions. This approach may be useful for the selection of populations and the analysis of vaccine trials.
Directly Reconstructing Principal Components of Heterogeneous Particles from Cryo-EM Images
Tagare, Hemant D.; Kucukelbir, Alp; Sigworth, Fred J.; Wang, Hongwei; Rao, Murali
2015-01-01
Structural heterogeneity of particles can be investigated by their three-dimensional principal components. This paper addresses the question of whether, and with what algorithm, the three-dimensional principal components can be directly recovered from cryo-EM images. The first part of the paper extends the Fourier slice theorem to covariance functions showing that the three-dimensional covariance, and hence the principal components, of a heterogeneous particle can indeed be recovered from two-dimensional cryo-EM images. The second part of the paper proposes a practical algorithm for reconstructing the principal components directly from cryo-EM images without the intermediate step of calculating covariances. This algorithm is based on maximizing the (posterior) likelihood using the Expectation-Maximization algorithm. The last part of the paper applies this algorithm to simulated data and to two real cryo-EM data sets: a data set of the 70S ribosome with and without Elongation Factor-G (EF-G), and a data set of the inluenza virus RNA dependent RNA Polymerase (RdRP). The first principal component of the 70S ribosome data set reveals the expected conformational changes of the ribosome as the EF-G binds and unbinds. The first principal component of the RdRP data set reveals a conformational change in the two dimers of the RdRP. PMID:26049077
Error due to unresolved scales in estimation problems for atmospheric data assimilation
NASA Astrophysics Data System (ADS)
Janjic, Tijana
The error arising due to unresolved scales in data assimilation procedures is examined. The problem of estimating the projection of the state of a passive scalar undergoing advection at a sequence of times is considered. The projection belongs to a finite- dimensional function space and is defined on the continuum. Using the continuum projection of the state of a passive scalar, a mathematical definition is obtained for the error arising due to the presence, in the continuum system, of scales unresolved by the discrete dynamical model. This error affects the estimation procedure through point observations that include the unresolved scales. In this work, two approximate methods for taking into account the error due to unresolved scales and the resulting correlations are developed and employed in the estimation procedure. The resulting formulas resemble the Schmidt-Kalman filter and the usual discrete Kalman filter, respectively. For this reason, the newly developed filters are called the Schmidt-Kalman filter and the traditional filter. In order to test the assimilation methods, a two- dimensional advection model with nonstationary spectrum was developed for passive scalar transport in the atmosphere. An analytical solution on the sphere was found depicting the model dynamics evolution. Using this analytical solution the model error is avoided, and the error due to unresolved scales is the only error left in the estimation problem. It is demonstrated that the traditional and the Schmidt- Kalman filter work well provided the exact covariance function of the unresolved scales is known. However, this requirement is not satisfied in practice, and the covariance function must be modeled. The Schmidt-Kalman filter cannot be computed in practice without further approximations. Therefore, the traditional filter is better suited for practical use. Also, the traditional filter does not require modeling of the full covariance function of the unresolved scales, but only modeling of the covariance matrix obtained by evaluating the covariance function at the observation points. We first assumed that this covariance matrix is stationary and that the unresolved scales are not correlated between the observation points, i.e., the matrix is diagonal, and that the values along the diagonal are constant. Tests with these assumptions were unsuccessful, indicating that a more sophisticated model of the covariance is needed for assimilation of data with nonstationary spectrum. A new method for modeling the covariance matrix based on an extended set of modeling assumptions is proposed. First, it is assumed that the covariance matrix is diagonal, that is, that the unresolved scales are not correlated between the observation points. It is postulated that the values on the diagonal depend on a wavenumber that is characteristic for the unresolved part of the spectrum. It is further postulated that this characteristic wavenumber can be diagnosed from the observations and from the estimate of the projection of the state that is being estimated. It is demonstrated that the new method successfully overcomes previously encountered difficulties.
Covariation assessment for neutral and emotional verbal stimuli in paranoid delusions.
Díez-Alegría, Cristina; Vázquez, Carmelo; Hernández-Lloreda, María J
2008-11-01
Selective processing of emotion-relevant information is considered a central feature in various types of psychopathology, yet the mechanisms underlying these biases are not well understood. One of the first steps in processing information is to gather data to judge the covariation or association of events. The aim of this study was to explore whether patients with persecutory delusions would show a covariation bias when processing stimuli related to social threat. We assessed estimations of covariation in-patients with current persecutory (CP) beliefs (N=40), patients with past persecutory (PP) beliefs (N=25), and a non-clinical control (NC) group (N=36). Covariation estimations were assessed under three different experimental conditions. The first two conditions focused on neutral behaviours (Condition 1) and psychological traits (Condition 2) for two distant cultural groups, while the third condition included self-relevant material by exposing the participant to either protective social (positive) or threatening social (negative) statements about the participant or a third person. Our results showed that all participants were precise in their covariation estimations. However, when judging covariation for self-relevant sentences related to social statements (Condition 3), all groups showed a significant tendency to associate positive social interaction (protection themed) sentences to the self. Yet, when using sentences related to social-threat, the CP group showed a bias consisting of overestimating the number of self-referent sentences. Our results showed that there was no specific covariation assessment bias related to paranoid beliefs. Both NCs and participants with persecutory beliefs showed a similar pattern of results when processing neutral or social threat-related sentences. The implications for understanding of the role of self-referent information processing biases in delusion formation are discussed.
DISSCO: direct imputation of summary statistics allowing covariates.
Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun
2015-08-01
Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
On the Use of the Log-Normal Particle Size Distribution to Characterize Global Rain
NASA Technical Reports Server (NTRS)
Meneghini, Robert; Rincon, Rafael; Liao, Liang
2003-01-01
Although most parameterizations of the drop size distributions (DSD) use the gamma function, there are several advantages to the log-normal form, particularly if we want to characterize the large scale space-time variability of the DSD and rain rate. The advantages of the distribution are twofold: the logarithm of any moment can be expressed as a linear combination of the individual parameters of the distribution; the parameters of the distribution are approximately normally distributed. Since all radar and rainfall-related parameters can be written approximately as a moment of the DSD, the first property allows us to express the logarithm of any radar/rainfall variable as a linear combination of the individual DSD parameters. Another consequence is that any power law relationship between rain rate, reflectivity factor, specific attenuation or water content can be expressed in terms of the covariance matrix of the DSD parameters. The joint-normal property of the DSD parameters has applications to the description of the space-time variation of rainfall in the sense that any radar-rainfall quantity can be specified by the covariance matrix associated with the DSD parameters at two arbitrary space-time points. As such, the parameterization provides a means by which we can use the spaceborne radar-derived DSD parameters to specify in part the covariance matrices globally. However, since satellite observations have coarse temporal sampling, the specification of the temporal covariance must be derived from ancillary measurements and models. Work is presently underway to determine whether the use of instantaneous rain rate data from the TRMM Precipitation Radar can provide good estimates of the spatial correlation in rain rate from data collected in 5(sup 0)x 5(sup 0) x 1 month space-time boxes. To characterize the temporal characteristics of the DSD parameters, disdrometer data are being used from the Wallops Flight Facility site where as many as 4 disdrometers have been used to acquire data over a 2 km path. These data should help quantify the temporal form of the covariance matrix at this site.
Application Of Multi-grid Method On China Seas' Temperature Forecast
NASA Astrophysics Data System (ADS)
Li, W.; Xie, Y.; He, Z.; Liu, K.; Han, G.; Ma, J.; Li, D.
2006-12-01
Correlation scales have been used in traditional scheme of 3-dimensional variational (3D-Var) data assimilation to estimate the background error covariance for the numerical forecast and reanalysis of atmosphere and ocean for decades. However there are still some drawbacks of this scheme. First, the correlation scales are difficult to be determined accurately. Second, the positive definition of the first-guess error covariance matrix cannot be guaranteed unless the correlation scales are sufficiently small. Xie et al. (2005) indicated that a traditional 3D-Var only corrects some certain wavelength errors and its accuracy depends on the accuracy of the first-guess covariance. And in general, short wavelength error can not be well corrected until long one is corrected and then inaccurate first-guess covariance may mistakenly take long wave error as short wave ones and result in erroneous analysis. For the purpose of quickly minimizing the errors of long and short waves successively, a new 3D-Var data assimilation scheme, called multi-grid data assimilation scheme, is proposed in this paper. By assimilating the shipboard SST and temperature profiles data into a numerical model of China Seas, we applied this scheme in two-month data assimilation and forecast experiment which ended in a favorable result. Comparing with the traditional scheme of 3D-Var, the new scheme has higher forecast accuracy and a lower forecast Root-Mean-Square (RMS) error. Furthermore, this scheme was applied to assimilate the SST of shipboard, AVHRR Pathfinder Version 5.0 SST and temperature profiles at the same time, and a ten-month forecast experiment on sea temperature of China Seas was carried out, in which a successful forecast result was obtained. Particularly, the new scheme is demonstrated a great numerical efficiency in these analyses.
Statistical Modeling of Fire Occurrence Using Data from the Tōhoku, Japan Earthquake and Tsunami.
Anderson, Dana; Davidson, Rachel A; Himoto, Keisuke; Scawthorn, Charles
2016-02-01
In this article, we develop statistical models to predict the number and geographic distribution of fires caused by earthquake ground motion and tsunami inundation in Japan. Using new, uniquely large, and consistent data sets from the 2011 Tōhoku earthquake and tsunami, we fitted three types of models-generalized linear models (GLMs), generalized additive models (GAMs), and boosted regression trees (BRTs). This is the first time the latter two have been used in this application. A simple conceptual framework guided identification of candidate covariates. Models were then compared based on their out-of-sample predictive power, goodness of fit to the data, ease of implementation, and relative importance of the framework concepts. For the ground motion data set, we recommend a Poisson GAM; for the tsunami data set, a negative binomial (NB) GLM or NB GAM. The best models generate out-of-sample predictions of the total number of ignitions in the region within one or two. Prefecture-level prediction errors average approximately three. All models demonstrate predictive power far superior to four from the literature that were also tested. A nonlinear relationship is apparent between ignitions and ground motion, so for GLMs, which assume a linear response-covariate relationship, instrumental intensity was the preferred ground motion covariate because it captures part of that nonlinearity. Measures of commercial exposure were preferred over measures of residential exposure for both ground motion and tsunami ignition models. This may vary in other regions, but nevertheless highlights the value of testing alternative measures for each concept. Models with the best predictive power included two or three covariates. © 2015 Society for Risk Analysis.
Wallace, Chris; Xue, Ming-Zhan; Newhouse, Stephen J.; Marçano, Ana Carolina B.; Onipinla, Abiodun K.; Burke, Beverley; Gungadoo, Johannie; Dobson, Richard J.; Brown, Morris; Connell, John M.; Dominiczak, Anna; Lathrop, G. Mark; Webster, John; Farrall, Martin; Mein, Charles; Samani, Nilesh J.; Caulfield, Mark J.; Clayton, David G.; Munroe, Patricia B.
2006-01-01
Identification of the genetic influences on human essential hypertension and other complex diseases has proved difficult, partly because of genetic heterogeneity. In many complex-trait resources, additional phenotypic data have been collected, allowing comorbid intermediary phenotypes to be used to characterize more genetically homogeneous subsets. The traditional approach to analyzing covariate-defined subsets has typically depended on researchers’ previous expectations for definition of a comorbid subset and leads to smaller data sets, with a concomitant attrition in power. An alternative is to test for dependence between genetic sharing and covariates across the entire data set. This approach offers the advantage of exploiting the full data set and could be widely applied to complex-trait genome scans. However, existing maximum-likelihood methods can be prohibitively computationally expensive, especially since permutation is often required to determine significance. We developed a less computationally intensive score test and applied it to biometric and biochemical covariate data, from 2,044 sibling pairs with severe hypertension, collected by the British Genetics of Hypertension (BRIGHT) study. We found genomewide-significant evidence for linkage with hypertension and several related covariates. The strongest signals were with leaner-body-mass measures on chromosome 20q (maximum LOD=4.24) and with parameters of renal function on chromosome 5p (maximum LOD=3.71). After correction for the multiple traits and genetic locations studied, our global genomewide P value was .046. This is the first identity-by-descent regression analysis of hypertension to our knowledge, and it demonstrates the value of this approach for the incorporation of additional phenotypic information in genetic studies of complex traits. PMID:16826522
Kumar, Jitendra; Hoffman, Forrest M.; Hargrove, William W.; ...
2016-08-23
Eddy covariance data from regional flux networks are direct in situ measurement of carbon, water, and energy fluxes and are of vital importance for understanding the spatio-temporal dynamics of the the global carbon cycle. FLUXNET links regional networks of eddy covariance sites across the globe to quantify the spatial and temporal variability of fluxes at regional to global scales and to detect emergent ecosystem properties. This study presents an assessment of the representativeness of FLUXNET based on the recently released FLUXNET2015 data set. We present a detailed high resolution analysis of the evolving representativeness of FLUXNET through time. Results providemore » quantitative insights into the extent that various biomes are sampled by the network of networks, the role of the spatial distribution of the sites on the network scale representativeness at any given time, and how that representativeness has changed through time due to changing operational status and data availability at sites in the network. To realize the full potential of FLUXNET observations for understanding emergent ecosystem properties at regional and global scales, we present an approach for upscaling eddy covariance measurements. Informed by the representativeness of observations at the flux sites in the network, the upscaled data reflects the spatio-temporal dynamics of the carbon cycle captured by the in situ measurements. In conclusion, this study presents a method for optimal use of the rich point measurements from FLUXNET to derive an understanding of upscaled carbon fluxes, which can be routinely updated as new data become available, and direct network expansion by identifying regions poorly sampled by the current network.« less
Sohl, Terry L.
2014-01-01
Species distribution models often use climate data to assess contemporary and/or future ranges for animal or plant species. Land use and land cover (LULC) data are important predictor variables for determining species range, yet are rarely used when modeling future distributions. In this study, maximum entropy modeling was used to construct species distribution maps for 50 North American bird species to determine relative contributions of climate and LULC for contemporary (2001) and future (2075) time periods. Species presence data were used as a dependent variable, while climate, LULC, and topographic data were used as predictor variables. Results varied by species, but in general, measures of model fit for 2001 indicated significantly poorer fit when either climate or LULC data were excluded from model simulations. Climate covariates provided a higher contribution to 2001 model results than did LULC variables, although both categories of variables strongly contributed. The area deemed to be "suitable" for 2001 species presence was strongly affected by the choice of model covariates, with significantly larger ranges predicted when LULC was excluded as a covariate. Changes in species ranges for 2075 indicate much larger overall range changes due to projected climate change than due to projected LULC change. However, the choice of study area impacted results for both current and projected model applications, with truncation of actual species ranges resulting in lower model fit scores and increased difficulty in interpreting covariate impacts on species range. Results indicate species-specific response to climate and LULC variables; however, both climate and LULC variables clearly are important for modeling both contemporary and potential future species ranges.
Real longitudinal data analysis for real people: building a good enough mixed model.
Cheng, Jing; Edwards, Lloyd J; Maldonado-Molina, Mildred M; Komro, Kelli A; Muller, Keith E
2010-02-20
Mixed effects models have become very popular, especially for the analysis of longitudinal data. One challenge is how to build a good enough mixed effects model. In this paper, we suggest a systematic strategy for addressing this challenge and introduce easily implemented practical advice to build mixed effects models. A general discussion of the scientific strategies motivates the recommended five-step procedure for model fitting. The need to model both the mean structure (the fixed effects) and the covariance structure (the random effects and residual error) creates the fundamental flexibility and complexity. Some very practical recommendations help to conquer the complexity. Centering, scaling, and full-rank coding of all the predictor variables radically improve the chances of convergence, computing speed, and numerical accuracy. Applying computational and assumption diagnostics from univariate linear models to mixed model data greatly helps to detect and solve the related computational problems. Applying computational and assumption diagnostics from the univariate linear models to the mixed model data can radically improve the chances of convergence, computing speed, and numerical accuracy. The approach helps to fit more general covariance models, a crucial step in selecting a credible covariance model needed for defensible inference. A detailed demonstration of the recommended strategy is based on data from a published study of a randomized trial of a multicomponent intervention to prevent young adolescents' alcohol use. The discussion highlights a need for additional covariance and inference tools for mixed models. The discussion also highlights the need for improving how scientists and statisticians teach and review the process of finding a good enough mixed model. (c) 2009 John Wiley & Sons, Ltd.
POLARIS: A 30-meter probabilistic soil series map of the contiguous United States
Chaney, Nathaniel W; Wood, Eric F; McBratney, Alexander B; Hempel, Jonathan W; Nauman, Travis; Brungard, Colby W.; Odgers, Nathan P
2016-01-01
A new complete map of soil series probabilities has been produced for the contiguous United States at a 30 m spatial resolution. This innovative database, named POLARIS, is constructed using available high-resolution geospatial environmental data and a state-of-the-art machine learning algorithm (DSMART-HPC) to remap the Soil Survey Geographic (SSURGO) database. This 9 billion grid cell database is possible using available high performance computing resources. POLARIS provides a spatially continuous, internally consistent, quantitative prediction of soil series. It offers potential solutions to the primary weaknesses in SSURGO: 1) unmapped areas are gap-filled using survey data from the surrounding regions, 2) the artificial discontinuities at political boundaries are removed, and 3) the use of high resolution environmental covariate data leads to a spatial disaggregation of the coarse polygons. The geospatial environmental covariates that have the largest role in assembling POLARIS over the contiguous United States (CONUS) are fine-scale (30 m) elevation data and coarse-scale (~ 2 km) estimates of the geographic distribution of uranium, thorium, and potassium. A preliminary validation of POLARIS using the NRCS National Soil Information System (NASIS) database shows variable performance over CONUS. In general, the best performance is obtained at grid cells where DSMART-HPC is most able to reduce the chance of misclassification. The important role of environmental covariates in limiting prediction uncertainty suggests including additional covariates is pivotal to improving POLARIS' accuracy. This database has the potential to improve the modeling of biogeochemical, water, and energy cycles in environmental models; enhance availability of data for precision agriculture; and assist hydrologic monitoring and forecasting to ensure food and water security.
Sohl, Terry L.
2014-01-01
Species distribution models often use climate data to assess contemporary and/or future ranges for animal or plant species. Land use and land cover (LULC) data are important predictor variables for determining species range, yet are rarely used when modeling future distributions. In this study, maximum entropy modeling was used to construct species distribution maps for 50 North American bird species to determine relative contributions of climate and LULC for contemporary (2001) and future (2075) time periods. Species presence data were used as a dependent variable, while climate, LULC, and topographic data were used as predictor variables. Results varied by species, but in general, measures of model fit for 2001 indicated significantly poorer fit when either climate or LULC data were excluded from model simulations. Climate covariates provided a higher contribution to 2001 model results than did LULC variables, although both categories of variables strongly contributed. The area deemed to be “suitable” for 2001 species presence was strongly affected by the choice of model covariates, with significantly larger ranges predicted when LULC was excluded as a covariate. Changes in species ranges for 2075 indicate much larger overall range changes due to projected climate change than due to projected LULC change. However, the choice of study area impacted results for both current and projected model applications, with truncation of actual species ranges resulting in lower model fit scores and increased difficulty in interpreting covariate impacts on species range. Results indicate species-specific response to climate and LULC variables; however, both climate and LULC variables clearly are important for modeling both contemporary and potential future species ranges. PMID:25372571
Wallace, Chris; Xue, Ming-Zhan; Newhouse, Stephen J; Marcano, Ana Carolina B; Onipinla, Abiodun K; Burke, Beverley; Gungadoo, Johannie; Dobson, Richard J; Brown, Morris; Connell, John M; Dominiczak, Anna; Lathrop, G Mark; Webster, John; Farrall, Martin; Mein, Charles; Samani, Nilesh J; Caulfield, Mark J; Clayton, David G; Munroe, Patricia B
2006-08-01
Identification of the genetic influences on human essential hypertension and other complex diseases has proved difficult, partly because of genetic heterogeneity. In many complex-trait resources, additional phenotypic data have been collected, allowing comorbid intermediary phenotypes to be used to characterize more genetically homogeneous subsets. The traditional approach to analyzing covariate-defined subsets has typically depended on researchers' previous expectations for definition of a comorbid subset and leads to smaller data sets, with a concomitant attrition in power. An alternative is to test for dependence between genetic sharing and covariates across the entire data set. This approach offers the advantage of exploiting the full data set and could be widely applied to complex-trait genome scans. However, existing maximum-likelihood methods can be prohibitively computationally expensive, especially since permutation is often required to determine significance. We developed a less computationally intensive score test and applied it to biometric and biochemical covariate data, from 2,044 sibling pairs with severe hypertension, collected by the British Genetics of Hypertension (BRIGHT) study. We found genomewide-significant evidence for linkage with hypertension and several related covariates. The strongest signals were with leaner-body-mass measures on chromosome 20q (maximum LOD = 4.24) and with parameters of renal function on chromosome 5p (maximum LOD = 3.71). After correction for the multiple traits and genetic locations studied, our global genomewide P value was .046. This is the first identity-by-descent regression analysis of hypertension to our knowledge, and it demonstrates the value of this approach for the incorporation of additional phenotypic information in genetic studies of complex traits.
Ranjeet John; Jiquan Chen; Asko Noormets; Xiangming Xiao; Jianye Xu; Nan Lu; Shiping Chen
2013-01-01
We evaluate the modelling of carbon fluxes from eddy covariance (EC) tower observations in different water-limited land-cover/land-use (LCLU) and biome types in semi-arid Inner Mongolia, China. The vegetation photosynthesis model (VPM) and modified VPM (MVPM), driven by the enhanced vegetation index (EVI) and land-surface water index (LSWI), which were derived from the...
Guided transect sampling - a new design combining prior information and field surveying
Anna Ringvall; Goran Stahl; Tomas Lamas
2000-01-01
Guided transect sampling is a two-stage sampling design in which prior information is used to guide the field survey in the second stage. In the first stage, broad strips are randomly selected and divided into grid-cells. For each cell a covariate value is estimated from remote sensing data, for example. The covariate is the basis for subsampling of a transect through...
Adam Wolf; Nick Saliendra; Kanat Akshalov; Douglas A. Johnson; Emilio Laca
2008-01-01
Eddy covariance (EC) and modified Bowen ratio (MBR) systems have been shown to yield subtly different estimates of sensible heat (H), latent heat (LE), and CO2 fluxes (Fc). Our study analyzed the discrepancies between these two systems by first considering the role of the data processing algorithm used to estimate fluxes using EC and later...
The causes of covariation between C and O isotopes in the inorganic carbonate record
NASA Astrophysics Data System (ADS)
Swart, Peter; Oehlert, Amanda
2017-04-01
The δ13C values of carbonate rocks are widely used as proxies for understanding the global carbon cycle. While most workers would prefer to use δ13C values measured in oceanic sediments, during older times the only records that exist are those found in sediments deposited in epeiric seas or on continental margins and carbonate platforms. However, such records are often compromised by near surface diagenesis and therefore care must be taken to exclude altered records. One approach, which has been widely applied, has been to examine the covariation between δ13C and δ18O values, where a positive covariation has been suggested to indicate alteration. In order to test this assumption we present data from a core taken in the Bahamas that has been unequivocally subjected to both freshwater and marine diagenetic processes. Our data suggest that the majority of the zone which has been altered by freshwater shows no correlation between δ13C and δ18O values, with small intervals associated with sub-aerial exposure exhibiting inverse correlations, and only the upper partially altered portion of the core exhibiting positive relationships. The zone below the region of freshwater alteration, previously interpreted as being the mixing-zone, is characterized by a strong covariation between δ13C and δ18O values as a result of the upper portion of this zone having been affected by fresh water diagenesis compared to the lower portion. Within the marine influenced realm a variety of relationships are produced as a result of differences in sediment origin and diagenesis. For example, non-depositional surfaces, where marine diagenetic processes are maximized, are typically expressed by sharp positive correlations between δ13C and δ18O values, while changes related to different sediment sources are expressed as weak positive covariations. While the data set presented here may not be applicable in every situation, the study certainly emphasizes that care must be taken when rules of thumb such as covariation of δ13C and δ18O values suggest diagenesis.
Quantile Regression for Recurrent Gap Time Data
Luo, Xianghua; Huang, Chiung-Yu; Wang, Lan
2014-01-01
Summary Evaluating covariate effects on gap times between successive recurrent events is of interest in many medical and public health studies. While most existing methods for recurrent gap time analysis focus on modeling the hazard function of gap times, a direct interpretation of the covariate effects on the gap times is not available through these methods. In this article, we consider quantile regression that can provide direct assessment of covariate effects on the quantiles of the gap time distribution. Following the spirit of the weighted risk-set method by Luo and Huang (2011, Statistics in Medicine 30, 301–311), we extend the martingale-based estimating equation method considered by Peng and Huang (2008, Journal of the American Statistical Association 103, 637–649) for univariate survival data to analyze recurrent gap time data. The proposed estimation procedure can be easily implemented in existing software for univariate censored quantile regression. Uniform consistency and weak convergence of the proposed estimators are established. Monte Carlo studies demonstrate the effectiveness of the proposed method. An application to data from the Danish Psychiatric Central Register is presented to illustrate the methods developed in this article. PMID:23489055
Separation in Logistic Regression: Causes, Consequences, and Control.
Mansournia, Mohammad Ali; Geroldinger, Angelika; Greenland, Sander; Heinze, Georg
2018-04-01
Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection.
A Closed-Form Error Model of Straight Lines for Improved Data Association and Sensor Fusing
2018-01-01
Linear regression is a basic tool in mobile robotics, since it enables accurate estimation of straight lines from range-bearing scans or in digital images, which is a prerequisite for reliable data association and sensor fusing in the context of feature-based SLAM. This paper discusses, extends and compares existing algorithms for line fitting applicable also in the case of strong covariances between the coordinates at each single data point, which must not be neglected if range-bearing sensors are used. Besides, in particular, the determination of the covariance matrix is considered, which is required for stochastic modeling. The main contribution is a new error model of straight lines in closed form for calculating quickly and reliably the covariance matrix dependent on just a few comprehensible and easily-obtainable parameters. The model can be applied widely in any case when a line is fitted from a number of distinct points also without a priori knowledge of the specific measurement noise. By means of extensive simulations, the performance and robustness of the new model in comparison to existing approaches is shown. PMID:29673205
Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model
NASA Astrophysics Data System (ADS)
Wilton, Darren C.
The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx concentrations from an independent data set from the EPA's Air Quality System (AQS) and MESA Air fixed site monitors, and performed very well (R²=0.82).
On computations of variance, covariance and correlation for interval data
NASA Astrophysics Data System (ADS)
Kishida, Masako
2017-02-01
In many practical situations, the data on which statistical analysis is to be performed is only known with interval uncertainty. Different combinations of values from the interval data usually lead to different values of variance, covariance, and correlation. Hence, it is desirable to compute the endpoints of possible values of these statistics. This problem is, however, NP-hard in general. This paper shows that the problem of computing the endpoints of possible values of these statistics can be rewritten as the problem of computing skewed structured singular values ν, for which there exist feasible (polynomial-time) algorithms that compute reasonably tight bounds in most practical cases. This allows one to find tight intervals of the aforementioned statistics for interval data.
Haider, Adil H; Saleem, Taimur; Leow, Jeffrey J; Villegas, Cassandra V; Kisat, Mehreen; Schneider, Eric B; Haut, Elliott R; Stevens, Kent A; Cornwell, Edward E; MacKenzie, Ellen J; Efron, David T
2012-05-01
Risk-adjusted analyses are critical in evaluating trauma outcomes. The National Trauma Data Bank (NTDB) is a statistically robust registry that allows such analyses; however, analytical techniques are not yet standardized. In this study, we examined peer-reviewed manuscripts published using NTDB data, with particular attention to characteristics strongly associated with trauma outcomes. Our objective was to determine if there are substantial variations in the methodology and quality of risk-adjusted analyses and therefore, whether development of best practices for risk-adjusted analyses is warranted. A database of all studies using NTDB data published through December 2010 was created by searching PubMed and Embase. Studies with multivariate risk-adjusted analyses were examined for their central question, main outcomes measures, analytical techniques, covariates in adjusted analyses, and handling of missing data. Of 286 NTDB publications, 122 performed a multivariable adjusted analysis. These studies focused on clinical outcomes (51 studies), public health policy or injury prevention (30), quality (16), disparities (15), trauma center designation (6), or scoring systems (4). Mortality was the main outcome in 98 of these studies. There were considerable differences in the covariates used for case adjustment. The 3 covariates most frequently controlled for were age (95%), Injury Severity Score (85%), and sex (78%). Up to 43% of studies did not control for the 5 basic covariates necessary to conduct a risk-adjusted analysis of trauma mortality. Less than 10% of studies used clustering to adjust for facility differences or imputation to handle missing data. There is significant variability in how risk-adjusted analyses using data from the NTDB are performed. Best practices are needed to further improve the quality of research from the NTDB. Copyright © 2012 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Papale, D.; Baldocchi, D. D.; Loescher, H. W.; Torn, M. S.
2014-12-01
Small networks of eddy covariance sites measuring exchanges of CO2, water and energy between ecosystems and atmosphere started to be organized in Europe and USA more than 15 years ago with the AmeriFlux and EuroFlux initiatives. They were composed by less than 20 sites each, mainly over undisturbed forest and without a strong coordination between sites, in particular across the ocean. In the following years the networks grew exponentially both at continental and global level, reaching more than 500 sites few years ago and expanding the eddy covariance measurement to different ecosystem types, climate regions and management/disturbance regimes. At the same time, important steps were done in terms of cooperation and harmonization related to data processing, data description and data sharing policies, leading to inter-continental and global activities under the FLUXNET framework. Today the networks are facing a new evolution step, moving from pure research activities to something that includes also monitoring and research infrastructure characteristics. AmeriFlux and NEON (National Ecological Observatory Network) in USA and ICOS (Integrated Carbon Observation System) in Europe are opening a new phase in the eddy covariance networks: with a long term perspective, increased level of standardization and a completely open access policy, will hopefully stimulate even more global synthesis studies and a wider use of the flux measurements by other scientific communities. AmeriFlux, NEON and ICOS are also strongly involved in cross-networks harmonization activities in terms of data acquisition, data processing and data format, in order to simplify and encourage the joint use of their measurements. A brief history of the development, challenges and solutions in the organization of the different networks and their common activities will be presented, to focus then on selected scientific results that have been possible only thanks to the global integration and international collaboration and finally discuss future developments and current ongoing activities in terms of data harmonization/standardization and sharing.
Real-Valued Covariance Vector Sparsity-Inducing DOA Estimation for Monostatic MIMO Radar
Wang, Xianpeng; Wang, Wei; Li, Xin; Liu, Jing
2015-01-01
In this paper, a real-valued covariance vector sparsity-inducing method for direction of arrival (DOA) estimation is proposed in monostatic multiple-input multiple-output (MIMO) radar. Exploiting the special configuration of monostatic MIMO radar, low-dimensional real-valued received data can be obtained by using the reduced-dimensional transformation and unitary transformation technique. Then, based on the Khatri–Rao product, a real-valued sparse representation framework of the covariance vector is formulated to estimate DOA. Compared to the existing sparsity-inducing DOA estimation methods, the proposed method provides better angle estimation performance and lower computational complexity. Simulation results verify the effectiveness and advantage of the proposed method. PMID:26569241
Real-Valued Covariance Vector Sparsity-Inducing DOA Estimation for Monostatic MIMO Radar.
Wang, Xianpeng; Wang, Wei; Li, Xin; Liu, Jing
2015-11-10
In this paper, a real-valued covariance vector sparsity-inducing method for direction of arrival (DOA) estimation is proposed in monostatic multiple-input multiple-output (MIMO) radar. Exploiting the special configuration of monostatic MIMO radar, low-dimensional real-valued received data can be obtained by using the reduced-dimensional transformation and unitary transformation technique. Then, based on the Khatri-Rao product, a real-valued sparse representation framework of the covariance vector is formulated to estimate DOA. Compared to the existing sparsity-inducing DOA estimation methods, the proposed method provides better angle estimation performance and lower computational complexity. Simulation results verify the effectiveness and advantage of the proposed method.
Preface by the CW2014 Organizers-Including Program, Advisory Board, Participants and Photo
Neudecker, Denise; Kawano, Toshihiko; Talou, Patrick; ...
2015-01-09
This issue of the Nuclear Data Sheets contains the proceedings of the 'International Workshop on Nuclear Data Covariances'. This workshop was the third one in a series that started with the 'Workshop on Neutron Cross Section Covariances' (Port Je erson, USA, 2008) and continued with the 'Second Workshop on Neutron Cross Section Covariances' (Vienna, Austria, 2011). The current workshop returned to the US and took place in the center of the beautiful and historic city of Santa Fe, New Mexico, USA from April 28 to May 1, 2014. The purpose of this workshop was to bring together scientists in themore » field of nuclear data evaluation, nuclear reaction theory, reactor physics and associated experiments to review recent developments in nuclear data evaluation methodology as well as assess and discuss open questions regarding uncertainty estimates and associated formatting requirements from the point of view of the experimentalist, the theoretician, the evaluator as well as from application side, e.g. in transport calculations. The workshop was open to contributions on a wide variety of nuclear data observables (cross sections, fission yields, energy and angle spectra, etc.), from the resonance range up to the high energy range as well as for light to heavy elements.« less
Revealing transient strain in geodetic data with Gaussian process regression
NASA Astrophysics Data System (ADS)
Hines, T. T.; Hetland, E. A.
2018-03-01
Transient strain derived from global navigation satellite system (GNSS) data can be used to detect and understand geophysical processes such as slow slip events and post-seismic deformation. Here we propose using Gaussian process regression (GPR) as a tool for estimating transient strain from GNSS data. GPR is a non-parametric, Bayesian method for interpolating scattered data. In our approach, we assume a stochastic prior model for transient displacements. The prior describes how much we expect transient displacements to covary spatially and temporally. A posterior estimate of transient strain is obtained by differentiating the posterior transient displacements, which are formed by conditioning the prior with the GNSS data. As a demonstration, we use GPR to detect transient strain resulting from slow slip events in the Pacific Northwest. Maximum likelihood methods are used to constrain a prior model for transient displacements in this region. The temporal covariance of our prior model is described by a compact Wendland covariance function, which significantly reduces the computational burden that can be associated with GPR. Our results reveal the spatial and temporal evolution of strain from slow slip events. We verify that the transient strain estimated with GPR is in fact geophysical signal by comparing it to the seismic tremor that is associated with Pacific Northwest slow slip events.
A Gibbs sampler for Bayesian analysis of site-occupancy data
Dorazio, Robert M.; Rodriguez, Daniel Taylor
2012-01-01
1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.
Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data
NASA Astrophysics Data System (ADS)
Sampson, Paul D.; Szpiro, Adam A.; Sheppard, Lianne; Lindström, Johan; Kaufman, Joel D.
2011-11-01
Statistical analyses of health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in "land use" regression models. More recently these spatial regression models have accounted for spatial correlation structure in combining monitoring data with land use covariates. We present a flexible spatio-temporal modeling framework and pragmatic, multi-step estimation procedure that accommodates essentially arbitrary patterns of missing data with respect to an ideally complete space by time matrix of observations on a network of monitoring sites. The methodology incorporates a model for smooth temporal trends with coefficients varying in space according to Partial Least Squares regressions on a large set of geographic covariates and nonstationary modeling of spatio-temporal residuals from these regressions. This work was developed to provide spatial point predictions of PM 2.5 concentrations for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) using irregular monitoring data derived from the AQS regulatory monitoring network and supplemental short-time scale monitoring campaigns conducted to better predict intra-urban variation in air quality. We demonstrate the interpretation and accuracy of this methodology in modeling data from 2000 through 2006 in six U.S. metropolitan areas and establish a basis for likelihood-based estimation.
Lee, MinJae; Rahbar, Mohammad H; Talebi, Hooshang
2018-01-01
We propose a nonparametric test for interactions when we are concerned with investigation of the simultaneous effects of two or more factors in a median regression model with right censored survival data. Our approach is developed to detect interaction in special situations, when the covariates have a finite number of levels with a limited number of observations in each level, and it allows varying levels of variance and censorship at different levels of the covariates. Through simulation studies, we compare the power of detecting an interaction between the study group variable and a covariate using our proposed procedure with that of the Cox Proportional Hazard (PH) model and censored quantile regression model. We also assess the impact of censoring rate and type on the standard error of the estimators of parameters. Finally, we illustrate application of our proposed method to real life data from Prospective Observational Multicenter Major Trauma Transfusion (PROMMTT) study to test an interaction effect between type of injury and study sites using median time for a trauma patient to receive three units of red blood cells. The results from simulation studies indicate that our procedure performs better than both Cox PH model and censored quantile regression model based on statistical power for detecting the interaction, especially when the number of observations is small. It is also relatively less sensitive to censoring rates or even the presence of conditionally independent censoring that is conditional on the levels of covariates.
Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study.
Thiébaut, Anne C M; Bénichou, Jacques
2004-12-30
Cox's regression model is widely used for assessing associations between potential risk factors and disease occurrence in epidemiologic cohort studies. Although age is often a strong determinant of disease risk, authors have frequently used time-on-study instead of age as the time-scale, as for clinical trials. Unless the baseline hazard is an exponential function of age, this approach can yield different estimates of relative hazards than using age as the time-scale, even when age is adjusted for. We performed a simulation study in order to investigate the existence and magnitude of bias for different degrees of association between age and the covariate of interest. Age to disease onset was generated from exponential, Weibull or piecewise Weibull distributions, and both fixed and time-dependent dichotomous covariates were considered. We observed no bias upon using age as the time-scale. Upon using time-on-study, we verified the absence of bias for exponentially distributed age to disease onset. For non-exponential distributions, we found that bias could occur even when the covariate of interest was independent from age. It could be severe in case of substantial association with age, especially with time-dependent covariates. These findings were illustrated on data from a cohort of 84,329 French women followed prospectively for breast cancer occurrence. In view of our results, we strongly recommend not using time-on-study as the time-scale for analysing epidemiologic cohort data. 2004 John Wiley & Sons, Ltd.
CONSTRUCTING A FLEXIBLE LIKELIHOOD FUNCTION FOR SPECTROSCOPIC INFERENCE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czekala, Ian; Andrews, Sean M.; Mandel, Kaisey S.
2015-10-20
We present a modular, extensible likelihood framework for spectroscopic inference based on synthetic model spectra. The subtraction of an imperfect model from a continuously sampled spectrum introduces covariance between adjacent datapoints (pixels) into the residual spectrum. For the high signal-to-noise data with large spectral range that is commonly employed in stellar astrophysics, that covariant structure can lead to dramatically underestimated parameter uncertainties (and, in some cases, biases). We construct a likelihood function that accounts for the structure of the covariance matrix, utilizing the machinery of Gaussian process kernels. This framework specifically addresses the common problem of mismatches in model spectralmore » line strengths (with respect to data) due to intrinsic model imperfections (e.g., in the atomic/molecular databases or opacity prescriptions) by developing a novel local covariance kernel formalism that identifies and self-consistently downweights pathological spectral line “outliers.” By fitting many spectra in a hierarchical manner, these local kernels provide a mechanism to learn about and build data-driven corrections to synthetic spectral libraries. An open-source software implementation of this approach is available at http://iancze.github.io/Starfish, including a sophisticated probabilistic scheme for spectral interpolation when using model libraries that are sparsely sampled in the stellar parameters. We demonstrate some salient features of the framework by fitting the high-resolution V-band spectrum of WASP-14, an F5 dwarf with a transiting exoplanet, and the moderate-resolution K-band spectrum of Gliese 51, an M5 field dwarf.« less
NASA Astrophysics Data System (ADS)
Henrot, Alexandra-Jane; François, Louis; Dury, Marie; Hambuckers, Alain; Jacquemin, Ingrid; Minet, Julien; Tychon, Bernard; Heinesch, Bernard; Horemans, Joanna; Deckmyn, Gaby
2015-04-01
Eddy covariance measurements are an essential resource to understand how ecosystem carbon fluxes react in response to climate change, and to help to evaluate and validate the performance of land surface and vegetation models at regional and global scale. In the framework of the MASC project (« Modelling and Assessing Surface Change impacts on Belgian and Western European climate »), vegetation dynamics and carbon fluxes of forest and grassland ecosystems simulated by the CARAIB dynamic vegetation model (Dury et al., iForest - Biogeosciences and Forestry, 4:82-99, 2011) are evaluated and validated by comparison of the model predictions with eddy covariance data. Here carbon fluxes (e.g. net ecosystem exchange (NEE), gross primary productivity (GPP), and ecosystem respiration (RECO)) and evapotranspiration (ET) simulated with the CARAIB model are compared with the fluxes measured at several eddy covariance flux tower sites in Belgium and Western Europe, chosen from the FLUXNET global network (http://fluxnet.ornl.gov/). CARAIB is forced either with surface atmospheric variables derived from the global CRU climatology, or with in situ meteorological data. Several tree (e.g. Pinus sylvestris, Fagus sylvatica, Picea abies) and grass species (e.g. Poaceae, Asteraceae) are simulated, depending on the species encountered on the studied sites. The aim of our work is to assess the model ability to reproduce the daily, seasonal and interannual variablility of carbon fluxes and the carbon dynamics of forest and grassland ecosystems in Belgium and Western Europe.
Reid, J M; Arcese, P; Losdat, S
2014-01-01
The evolutionary trajectories of reproductive systems, including both male and female multiple mating and hence polygyny and polyandry, are expected to depend on the additive genetic variances and covariances in and among components of male reproductive success achieved through different reproductive tactics. However, genetic covariances among key components of male reproductive success have not been estimated in wild populations. We used comprehensive paternity data from socially monogamous but genetically polygynandrous song sparrows (Melospiza melodia) to estimate additive genetic variance and covariance in the total number of offspring a male sired per year outside his social pairings (i.e. his total extra-pair reproductive success achieved through multiple mating) and his liability to sire offspring produced by his socially paired female (i.e. his success in defending within-pair paternity). Both components of male fitness showed nonzero additive genetic variance, and the estimated genetic covariance was positive, implying that males with high additive genetic value for extra-pair reproduction also have high additive genetic propensity to sire their socially paired female's offspring. There was consequently no evidence of a genetic or phenotypic trade-off between male within-pair paternity success and extra-pair reproductive success. Such positive genetic covariance might be expected to facilitate ongoing evolution of polygyny and could also shape the ongoing evolution of polyandry through indirect selection. PMID:25186454
Fast Component Pursuit for Large-Scale Inverse Covariance Estimation.
Han, Lei; Zhang, Yu; Zhang, Tong
2016-08-01
The maximum likelihood estimation (MLE) for the Gaussian graphical model, which is also known as the inverse covariance estimation problem, has gained increasing interest recently. Most existing works assume that inverse covariance estimators contain sparse structure and then construct models with the ℓ 1 regularization. In this paper, different from existing works, we study the inverse covariance estimation problem from another perspective by efficiently modeling the low-rank structure in the inverse covariance, which is assumed to be a combination of a low-rank part and a diagonal matrix. One motivation for this assumption is that the low-rank structure is common in many applications including the climate and financial analysis, and another one is that such assumption can reduce the computational complexity when computing its inverse. Specifically, we propose an efficient COmponent Pursuit (COP) method to obtain the low-rank part, where each component can be sparse. For optimization, the COP method greedily learns a rank-one component in each iteration by maximizing the log-likelihood. Moreover, the COP algorithm enjoys several appealing properties including the existence of an efficient solution in each iteration and the theoretical guarantee on the convergence of this greedy approach. Experiments on large-scale synthetic and real-world datasets including thousands of millions variables show that the COP method is faster than the state-of-the-art techniques for the inverse covariance estimation problem when achieving comparable log-likelihood on test data.
Chang, Jinyuan; Zhou, Wen; Zhou, Wen-Xin; Wang, Lan
2017-03-01
Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2016, The International Biometric Society.
Convex Regression with Interpretable Sharp Partitions
Petersen, Ashley; Simon, Noah; Witten, Daniela
2016-01-01
We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set. PMID:27635120
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Modeling Hubble Space Telescope flight data by Q-Markov cover identification
NASA Technical Reports Server (NTRS)
Liu, K.; Skelton, R. E.; Sharkey, J. P.
1992-01-01
A state space model for the Hubble Space Telescope under the influence of unknown disturbances in orbit is presented. This model was obtained from flight data by applying the Q-Markov covariance equivalent realization identification algorithm. This state space model guarantees the match of the first Q-Markov parameters and covariance parameters of the Hubble system. The flight data were partitioned into high- and low-frequency components for more efficient Q-Markov cover modeling, to reduce some computational difficulties of the Q-Markov cover algorithm. This identification revealed more than 20 lightly damped modes within the bandwidth of the attitude control system. Comparisons with the analytical (TREETOPS) model are also included.
Summary documentation of LASL nuclear data evaluations for ENDF/B-V
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, P.G.
1979-01-01
Summaries are presented of nuclear data evaluations performed at Los Alamos Scientific Laboratory (LASL) that will comprise part of Version V of the Evaluated Nuclear Data File, ENDF/B. A total of 18 general purpose evaluations of neutron-induced data are summarized, together with 6 summaries directed specifically at covariance data evaluations. The general purpose evaluation summaries cover the following isotopes: /sup 1 -3/H, /sup 3/ /sup 4/He, /sup 6/ /sup 7/Li, /sup 10/B, /sup 14/ /sup 15/N, /sup 16/O, /sup 27/Al, /sup 182/ /sup 183/ /sup 184/ /sup 186/W, /sup 233/U, and /sup 242/Pu. The covariance data summaries are given formore » /sup 1/H, /sup 6/Li, /sup 10/B, /sup 14/N, /sup 16/O, and /sup 27/Al. 28 figures.« less
NASA Technical Reports Server (NTRS)
Kalayeh, H. M.; Landgrebe, D. A.
1983-01-01
A criterion which measures the quality of the estimate of the covariance matrix of a multivariate normal distribution is developed. Based on this criterion, the necessary number of training samples is predicted. Experimental results which are used as a guide for determining the number of training samples are included. Previously announced in STAR as N82-28109
ERIC Educational Resources Information Center
Dolan, Conor V.; Colom, Roberto; Abad, Francisco J.; Wicherts, Jelte M.; Hessen, David J.; van de Sluis, Sophie
2006-01-01
We investigated sex effects and the effects of educational attainment (EA) on the covariance structure of the WAIS-III in a subsample of the Spanish standardization data. We fitted both first order common factor models and second order common factor models. The latter include general intelligence ("g") as a second order common factor.…
Statistical classification techniques for engineering and climatic data samples
NASA Technical Reports Server (NTRS)
Temple, E. C.; Shipman, J. R.
1981-01-01
Fisher's sample linear discriminant function is modified through an appropriate alteration of the common sample variance-covariance matrix. The alteration consists of adding nonnegative values to the eigenvalues of the sample variance covariance matrix. The desired results of this modification is to increase the number of correct classifications by the new linear discriminant function over Fisher's function. This study is limited to the two-group discriminant problem.
Depressive symptomatology in middle-aged and older married couples: a dyadic analysis.
Townsend, A L; Miller, B; Guo, S
2001-11-01
Depressive symptomatology has been frequently conceptualized as an individual matter, but social contextual models argue that symptom levels are likely to covary in close relationships. The present study investigated correlation between spouses' depressive symptomatology in middle-aged and older married couples, the influence of gender and race/ethnicity in predicting variability in symptom level, and the importance of individual-level covariates (education, health, and age) and couple-level covariates (household income and net worth). Results were based on secondary analysis of Wave 1 interviews with White, Black, and Mexican American married couples (N = 5,423) from the Health and Retirement Study (HRS) and the Study of Asset and Health Dynamics Among the Oldest Old (AHEAD). Dyadic data from husbands and wives were analyzed with multilevel modeling. Husbands' and wives' depressive symptoms were moderately correlated, gender and race/ethnicity (and their interaction) predicted depressive symptoms, and both individual-level and couple-level characteristics were significant covariates. Similarities as well as differences are noted between the HRS and AHEAD results. Results highlight the importance of dyadic data and multilevel models for understanding depressive symptomatology in married couples. The influence of race/ethnicity merits greater attention in future research. Differences in findings between HRS and AHEAD suggest life-course, cohort, or methodological influences.
Hastings, Kelly K; Jemison, Lauri A; Pendleton, Grey W
2018-01-01
Population dynamics of long-lived vertebrates depend critically on adult survival, yet factors affecting survival and covariation between survival and other vital rates in adults remain poorly examined for many taxonomic groups of long-lived mammals (e.g. actuarial senescence has been examined for only 9 of 34 extant pinniped species using longitudinal data). We used mark-recapture models and data from 2795 Steller sea lion ( Eumetopias jubatus ) pups individually marked at four of five rookeries in southeastern Alaska (SEAK) and resighted for 21 years to examine senescence, annual variability and covariation among life-history traits in this long-lived, sexually dimorphic pinniped. Sexes differed in age of onset (approx. 16-17 and approx. 8-9 years for females and males, respectively), but not rate (-0.047 and -0.046/year of age for females and males) of senescence. Survival of adult males from northern SEAK had greatest annual variability (approx. ±0.30 among years), whereas survival of adult females ranged approximately ±0.10 annually. Positive covariation between male survival and reproductive success was observed. Survival of territorial males was 0.20 higher than that of non-territorial males, resulting in the majority of males alive at oldest ages being territorial.
Accounting for baseline differences and measurement error in the analysis of change over time.
Braun, Julia; Held, Leonhard; Ledergerber, Bruno
2014-01-15
If change over time is compared in several groups, it is important to take into account baseline values so that the comparison is carried out under the same preconditions. As the observed baseline measurements are distorted by measurement error, it may not be sufficient to include them as covariate. By fitting a longitudinal mixed-effects model to all data including the baseline observations and subsequently calculating the expected change conditional on the underlying baseline value, a solution to this problem has been provided recently so that groups with the same baseline characteristics can be compared. In this article, we present an extended approach where a broader set of models can be used. Specifically, it is possible to include any desired set of interactions between the time variable and the other covariates, and also, time-dependent covariates can be included. Additionally, we extend the method to adjust for baseline measurement error of other time-varying covariates. We apply the methodology to data from the Swiss HIV Cohort Study to address the question if a joint infection with HIV-1 and hepatitis C virus leads to a slower increase of CD4 lymphocyte counts over time after the start of antiretroviral therapy. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Pauwels, V. R. N.; DeLannoy, G. J. M.; Hendricks Franssen, H.-J.; Vereecken, H.
2013-01-01
In this paper, we present a two-stage hybrid Kalman filter to estimate both observation and forecast bias in hydrologic models, in addition to state variables. The biases are estimated using the discrete Kalman filter, and the state variables using the ensemble Kalman filter. A key issue in this multi-component assimilation scheme is the exact partitioning of the difference between observation and forecasts into state, forecast bias and observation bias updates. Here, the error covariances of the forecast bias and the unbiased states are calculated as constant fractions of the biased state error covariance, and the observation bias error covariance is a function of the observation prediction error covariance. In a series of synthetic experiments, focusing on the assimilation of discharge into a rainfall-runoff model, it is shown that both static and dynamic observation and forecast biases can be successfully estimated. The results indicate a strong improvement in the estimation of the state variables and resulting discharge as opposed to the use of a bias-unaware ensemble Kalman filter. Furthermore, minimal code modification in existing data assimilation software is needed to implement the method. The results suggest that a better performance of data assimilation methods should be possible if both forecast and observation biases are taken into account.
Jemison, Lauri A.; Pendleton, Grey W.
2018-01-01
Population dynamics of long-lived vertebrates depend critically on adult survival, yet factors affecting survival and covariation between survival and other vital rates in adults remain poorly examined for many taxonomic groups of long-lived mammals (e.g. actuarial senescence has been examined for only 9 of 34 extant pinniped species using longitudinal data). We used mark–recapture models and data from 2795 Steller sea lion (Eumetopias jubatus) pups individually marked at four of five rookeries in southeastern Alaska (SEAK) and resighted for 21 years to examine senescence, annual variability and covariation among life-history traits in this long-lived, sexually dimorphic pinniped. Sexes differed in age of onset (approx. 16–17 and approx. 8–9 years for females and males, respectively), but not rate (−0.047 and −0.046/year of age for females and males) of senescence. Survival of adult males from northern SEAK had greatest annual variability (approx. ±0.30 among years), whereas survival of adult females ranged approximately ±0.10 annually. Positive covariation between male survival and reproductive success was observed. Survival of territorial males was 0.20 higher than that of non-territorial males, resulting in the majority of males alive at oldest ages being territorial. PMID:29410794
NASA Astrophysics Data System (ADS)
Bentel, Katrin; Meyer, Ulrich; Arnold, Daniel; Jean, Yoomin; Jäggi, Adrian
2017-04-01
The Astronomical Institute at the University of Bern (AIUB) derives static and time-variable gravity fields by means of the Celestial Mechanics Approach (CMA) from GRACE (level 1B) data. This approach makes use of the close link between orbit and gravity field determination. GPS-derived kinematic GRACE orbit positions, inter-satellite K-band observations, which are the core observations of GRACE, and accelerometer data are combined to rigorously estimate orbit and spherical harmonic gravity field coefficients in one adjustment step. Pseudo-stochastic orbit parameters are set up to absorb unmodeled noise. The K-band range measurements in along-track direction lead to a much higher correlation of the observations in this direction compared to the other directions and thus, to north-south stripes in the unconstrained gravity field solutions, so-called correlated errors. By using a full covariance matrix for the K-band observations the correlation can be taken into account. One possibility is to derive correlation information from post-processing K-band residuals. This is then used in a second iteration step to derive an improved gravity field solution. We study the effects of pre-defined covariance matrices and residual-derived covariance matrices on the final gravity field product with the CMA.
Kinyoki, Damaris K; Berkley, James A; Moloney, Grainne M; Odundo, Elijah O; Kandala, Ngianga-Bakwin; Noor, Abdisalan M
2016-07-28
Stunting among children under five years old is associated with long-term effects on cognitive development, school achievement, economic productivity in adulthood and maternal reproductive outcomes. Accurate estimation of stunting and tools to forecast risk are key to planning interventions. We estimated the prevalence and distribution of stunting among children under five years in Somalia from 2007 to 2010 and explored the role of environmental covariates in its forecasting. Data from household nutritional surveys in Somalia from 2007 to 2010 with a total of 1,066 clusters covering 73,778 children were included. We developed a Bayesian hierarchical space-time model to forecast stunting by using the relationship between observed stunting and environmental covariates in the preceding years. We then applied the model coefficients to environmental covariates in subsequent years. To determine the accuracy of the forecasting, we compared this model with a model that used data from all the years with the corresponding environmental covariates. Rainfall (OR = 0.994, 95 % Credible interval (CrI): 0.993, 0.995) and vegetation cover (OR = 0.719, 95 % CrI: 0.603, 0.858) were significant in forecasting stunting. The difference in estimates of stunting using the two approaches was less than 3 % in all the regions for all forecast years. Stunting in Somalia is spatially and temporally heterogeneous. Rainfall and vegetation are major drivers of these variations. The use of environmental covariates for forecasting of stunting is a potentially useful and affordable tool for planning interventions to reduce the high burden of malnutrition in Somalia.
Solid-state NMR covariance of homonuclear correlation spectra.
Hu, Bingwen; Amoureux, Jean-Paul; Trebosc, Julien; Deschamps, Michael; Tricot, Gregory
2008-04-07
Direct covariance NMR spectroscopy, which does not involve a Fourier transformation along the indirect dimension, is demonstrated to obtain homonuclear correlation two-dimensional (2D) spectra in the solid state. In contrast to the usual 2D Fourier transform (2D-FT) NMR, in a 2D covariance (2D-Cov) spectrum the spectral resolution in the indirect dimension is determined by the resolution along the detection dimension, thereby largely reducing the time-consuming indirect sampling requirement. The covariance method does not need any separate phase correction or apodization along the indirect dimension because it uses those applied in the detection dimension. We compare in detail the specifications obtained with 2D-FT and 2D-Cov, for narrow and broad resonances. The efficiency of the covariance data treatment is demonstrated in organic and inorganic samples that are both well crystallized and amorphous, for spin -1/2 nuclei with 13C, 29Si, and 31P through-space or through-bond homonuclear 2D correlation spectra. In all cases, the experimental time has been reduced by at least a factor of 10, without any loss of resolution and signal to noise ratio, with respect to what is necessary with the 2D-FT NMR. According to this method, we have been able to study the silicate network of glasses by 2D NMR within reasonable experimental time despite the very long relaxation time of the 29Si nucleus. The main limitation of the 2D-Cov data treatment is related to the introduction of autocorrelated peaks onto the diagonal, which does not represent any actual connectivity.
NASA Astrophysics Data System (ADS)
Schünemann, Adriano Luis; Inácio Fernandes Filho, Elpídio; Rocha Francelino, Marcio; Rodrigues Santos, Gérson; Thomazini, Andre; Batista Pereira, Antônio; Gonçalves Reynaud Schaefer, Carlos Ernesto
2017-04-01
The knowledge of environmental variables values, in non-sampled sites from a minimum data set can be accessed through interpolation technique. Kriging and the classifier Random Forest algorithm are examples of predictors with this aim. The objective of this work was to compare methods of soil attributes spatialization in a recent deglaciated environment with complex landforms. Prediction of the selected soil attributes (potassium, calcium and magnesium) from ice-free areas were tested by using morphometric covariables, and geostatistical models without these covariables. For this, 106 soil samples were collected at 0-10 cm depth in Keller Peninsula, King George Island, Maritime Antarctica. Soil chemical analysis was performed by the gravimetric method, determining values of potassium, calcium and magnesium for each sampled point. Digital terrain models (DTMs) were obtained by using Terrestrial Laser Scanner. DTMs were generated from a cloud of points with spatial resolutions of 1, 5, 10, 20 and 30 m. Hence, 40 morphometric covariates were generated. Simple Kriging was performed using the R package software. The same data set coupled with morphometric covariates, was used to predict values of the studied attributes in non-sampled sites through Random Forest interpolator. Little differences were observed on the DTMs generated by Simple kriging and Random Forest interpolators. Also, DTMs with better spatial resolution did not improved the quality of soil attributes prediction. Results revealed that Simple Kriging can be used as interpolator when morphometric covariates are not available, with little impact regarding quality. It is necessary to go further in soil chemical attributes prediction techniques, especially in periglacial areas with complex landforms.
Austin, Peter C
2018-01-01
Propensity score methods are increasingly being used to estimate the effects of treatments and exposures when using observational data. The propensity score was initially developed for use with binary exposures (e.g., active treatment vs. control). The generalized propensity score is an extension of the propensity score for use with quantitative exposures (e.g., dose or quantity of medication, income, years of education). A crucial component of any propensity score analysis is that of balance assessment. This entails assessing the degree to which conditioning on the propensity score (via matching, weighting, or stratification) has balanced measured baseline covariates between exposure groups. Methods for balance assessment have been well described and are frequently implemented when using the propensity score with binary exposures. However, there is a paucity of information on how to assess baseline covariate balance when using the generalized propensity score. We describe how methods based on the standardized difference can be adapted for use with quantitative exposures when using the generalized propensity score. We also describe a method based on assessing the correlation between the quantitative exposure and each covariate in the sample when weighted using generalized propensity score -based weights. We conducted a series of Monte Carlo simulations to evaluate the performance of these methods. We also compared two different methods of estimating the generalized propensity score: ordinary least squared regression and the covariate balancing propensity score method. We illustrate the application of these methods using data on patients hospitalized with a heart attack with the quantitative exposure being creatinine level.
Multivariate analysis of longitudinal rates of change.
Bryan, Matthew; Heagerty, Patrick J
2016-12-10
Longitudinal data allow direct comparison of the change in patient outcomes associated with treatment or exposure. Frequently, several longitudinal measures are collected that either reflect a common underlying health status, or characterize processes that are influenced in a similar way by covariates such as exposure or demographic characteristics. Statistical methods that can combine multivariate response variables into common measures of covariate effects have been proposed in the literature. Current methods for characterizing the relationship between covariates and the rate of change in multivariate outcomes are limited to select models. For example, 'accelerated time' methods have been developed which assume that covariates rescale time in longitudinal models for disease progression. In this manuscript, we detail an alternative multivariate model formulation that directly structures longitudinal rates of change and that permits a common covariate effect across multiple outcomes. We detail maximum likelihood estimation for a multivariate longitudinal mixed model. We show via asymptotic calculations the potential gain in power that may be achieved with a common analysis of multiple outcomes. We apply the proposed methods to the analysis of a trivariate outcome for infant growth and compare rates of change for HIV infected and uninfected infants. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Batson, Sarah; Score, Robert; Sutton, Alex J
2017-06-01
The aim of the study was to develop the three-dimensional (3D) evidence network plot system-a novel web-based interactive 3D tool to facilitate the visualization and exploration of covariate distributions and imbalances across evidence networks for network meta-analysis (NMA). We developed the 3D evidence network plot system within an AngularJS environment using a third party JavaScript library (Three.js) to create the 3D element of the application. Data used to enable the creation of the 3D element for a particular topic are inputted via a Microsoft Excel template spreadsheet that has been specifically formatted to hold these data. We display and discuss the findings of applying the tool to two NMA examples considering multiple covariates. These two examples have been previously identified as having potentially important covariate effects and allow us to document the various features of the tool while illustrating how it can be used. The 3D evidence network plot system provides an immediate, intuitive, and accessible way to assess the similarity and differences between the values of covariates for individual studies within and between each treatment contrast in an evidence network. In this way, differences between the studies, which may invalidate the usual assumptions of an NMA, can be identified for further scrutiny. Hence, the tool facilitates NMA feasibility/validity assessments and aids in the interpretation of NMA results. The 3D evidence network plot system is the first tool designed specifically to visualize covariate distributions and imbalances across evidence networks in 3D. This will be of primary interest to systematic review and meta-analysis researchers and, more generally, those assessing the validity and robustness of an NMA to inform reimbursement decisions. Copyright © 2017 Elsevier Inc. All rights reserved.
Directly reconstructing principal components of heterogeneous particles from cryo-EM images.
Tagare, Hemant D; Kucukelbir, Alp; Sigworth, Fred J; Wang, Hongwei; Rao, Murali
2015-08-01
Structural heterogeneity of particles can be investigated by their three-dimensional principal components. This paper addresses the question of whether, and with what algorithm, the three-dimensional principal components can be directly recovered from cryo-EM images. The first part of the paper extends the Fourier slice theorem to covariance functions showing that the three-dimensional covariance, and hence the principal components, of a heterogeneous particle can indeed be recovered from two-dimensional cryo-EM images. The second part of the paper proposes a practical algorithm for reconstructing the principal components directly from cryo-EM images without the intermediate step of calculating covariances. This algorithm is based on maximizing the posterior likelihood using the Expectation-Maximization algorithm. The last part of the paper applies this algorithm to simulated data and to two real cryo-EM data sets: a data set of the 70S ribosome with and without Elongation Factor-G (EF-G), and a data set of the influenza virus RNA dependent RNA Polymerase (RdRP). The first principal component of the 70S ribosome data set reveals the expected conformational changes of the ribosome as the EF-G binds and unbinds. The first principal component of the RdRP data set reveals a conformational change in the two dimers of the RdRP. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Heavens, A. F.; Seikel, M.; Nord, B. D.; Aich, M.; Bouffanais, Y.; Bassett, B. A.; Hobson, M. P.
2014-12-01
The Fisher Information Matrix formalism (Fisher 1935) is extended to cases where the data are divided into two parts (X, Y), where the expectation value of Y depends on X according to some theoretical model, and X and Y both have errors with arbitrary covariance. In the simplest case, (X, Y) represent data pairs of abscissa and ordinate, in which case the analysis deals with the case of data pairs with errors in both coordinates, but X can be any measured quantities on which Y depends. The analysis applies for arbitrary covariance, provided all errors are Gaussian, and provided the errors in X are small, both in comparison with the scale over which the expected signal Y changes, and with the width of the prior distribution. This generalizes the Fisher Matrix approach, which normally only considers errors in the `ordinate' Y. In this work, we include errors in X by marginalizing over latent variables, effectively employing a Bayesian hierarchical model, and deriving the Fisher Matrix for this more general case. The methods here also extend to likelihood surfaces which are not Gaussian in the parameter space, and so techniques such as DALI (Derivative Approximation for Likelihoods) can be generalized straightforwardly to include arbitrary Gaussian data error covariances. For simple mock data and theoretical models, we compare to Markov Chain Monte Carlo experiments, illustrating the method with cosmological supernova data. We also include the new method in the FISHER4CAST software.
ter Heine, Rob; van Maarseveen, Erik M; van der Westerlaken, Monique M L; Braun, Kees P J; Koudijs, Suzanne M; Berg, Maarten J Ten; Malingré, Mirte M
2014-06-01
Dosing of phenytoin is difficult in children because of its variable pharmacokinetics and protein binding. Possible covariates for this protein binding have mostly been univariately investigated in small, and often adult, adult populations. We conducted a study to identify and quantify these covariates in children. We extracted data on serum phenytoin concentrations, albumin, triglycerides, urea, total bilirubin and creatinine concentrations and data on coadministration of valproic acid or carbamazepine in 186 children. Using nonlinear mixed effects modeling the effects of covariates on the unbound phenytoin fraction were investigated. Serum albumin, serum urea concentrations, and concomitant valproic acid use significantly influenced the unbound phenytoin fraction. For clinical practice, we recommend that unbound phenytoin concentrations are measured routinely. However, if this is impossible, we suggest to use our model to calculate the unbound concentration. In selected children, close treatment monitoring and dose reductions should be considered to prevent toxicity. © The Author(s) 2013.
Blocking for Sequential Political Experiments
Moore, Sally A.
2013-01-01
In typical political experiments, researchers randomize a set of households, precincts, or individuals to treatments all at once, and characteristics of all units are known at the time of randomization. However, in many other experiments, subjects “trickle in” to be randomized to treatment conditions, usually via complete randomization. To take advantage of the rich background data that researchers often have (but underutilize) in these experiments, we develop methods that use continuous covariates to assign treatments sequentially. We build on biased coin and minimization procedures for discrete covariates and demonstrate that our methods outperform complete randomization, producing better covariate balance in simulated data. We then describe how we selected and deployed a sequential blocking method in a clinical trial and demonstrate the advantages of our having done so. Further, we show how that method would have performed in two larger sequential political trials. Finally, we compare causal effect estimates from differences in means, augmented inverse propensity weighted estimators, and randomization test inversion. PMID:24143061
NASA Astrophysics Data System (ADS)
Widyaningsih, Yekti; Saefuddin, Asep; Notodiputro, Khairil A.; Wigena, Aji H.
2012-05-01
The objective of this research is to build a nested generalized linear mixed model using an ordinal response variable with some covariates. There are three main jobs in this paper, i.e. parameters estimation procedure, simulation, and implementation of the model for the real data. At the part of parameters estimation procedure, concepts of threshold, nested random effect, and computational algorithm are described. The simulations data are built for 3 conditions to know the effect of different parameter values of random effect distributions. The last job is the implementation of the model for the data about poverty in 9 districts of Java Island. The districts are Kuningan, Karawang, and Majalengka chose randomly in West Java; Temanggung, Boyolali, and Cilacap from Central Java; and Blitar, Ngawi, and Jember from East Java. The covariates in this model are province, number of bad nutrition cases, number of farmer families, and number of health personnel. In this modeling, all covariates are grouped as ordinal scale. Unit observation in this research is sub-district (kecamatan) nested in district, and districts (kabupaten) are nested in province. For the result of simulation, ARB (Absolute Relative Bias) and RRMSE (Relative Root of mean square errors) scale is used. They show that prov parameters have the highest bias, but more stable RRMSE in all conditions. The simulation design needs to be improved by adding other condition, such as higher correlation between covariates. Furthermore, as the result of the model implementation for the data, only number of farmer family and number of medical personnel have significant contributions to the level of poverty in Central Java and East Java province, and only district 2 (Karawang) of province 1 (West Java) has different random effect from the others. The source of the data is PODES (Potensi Desa) 2008 from BPS (Badan Pusat Statistik).
NASA Astrophysics Data System (ADS)
Ali, Mohamed H.; Rakib, Fazle; Al-Saad, Khalid; Al-Saady, Rafif; Lyng, Fiona M.; Goormaghtigh, Erik
2018-07-01
Breast cancer is the second most common cancer after lung cancer. So far, in clinical practice, most cancer parameters originating from histopathology rely on the visualization by a pathologist of microscopic structures observed in stained tissue sections, including immunohistochemistry markers. Fourier transform infrared spectroscopy (FTIR) spectroscopy provides a biochemical fingerprint of a biopsy sample and, together with advanced data analysis techniques, can accurately classify cell types. Yet, one of the challenges when dealing with FTIR imaging is the slow recording of the data. One cm2 tissue section requires several hours of image recording. We show in the present paper that 2D covariance analysis singles out only a few wavenumbers where both variance and covariance are large. Simple models could be built using 4 wavenumbers to identify the 4 main cell types present in breast cancer tissue sections. Decision trees provide particularly simple models to reach discrimination between the 4 cell types. The robustness of these simple decision-tree models were challenged with FTIR spectral data obtained using different recording conditions. One test set was recorded by transflection on tissue sections in the presence of paraffin while the training set was obtained on dewaxed tissue sections by transmission. Furthermore, the test set was collected with a different brand of FTIR microscope and a different pixel size. Despite the different recording conditions, separating extracellular matrix (ECM) from carcinoma spectra was 100% successful, underlying the robustness of this univariate model and the utility of covariance analysis for revealing efficient wavenumbers. We suggest that 2D covariance maps using the full spectral range could be most useful to select the interesting wavenumbers and achieve very fast data acquisition on quantum cascade laser infrared imaging microscopes.
Adams, K M; Brown, G G; Grant, I
1985-08-01
Analysis of Covariance (ANCOVA) is often used in neuropsychological studies to effect ex-post-facto adjustment of performance variables amongst groups of subjects mismatched on some relevant demographic variable. This paper reviews some of the statistical assumptions underlying this usage. In an attempt to illustrate the complexities of this statistical technique, three sham studies using actual patient data are presented. These staged simulations have varying relationships between group test performance differences and levels of covariate discrepancy. The results were robust and consistent in their nature, and were held to support the wisdom of previous cautions by statisticians concerning the employment of ANCOVA to justify comparisons between incomparable groups. ANCOVA should not be used in neuropsychological research to equate groups unequal on variables such as age and education or to exert statistical control whose objective is to eliminate consideration of the covariate as an explanation for results. Finally, the report advocates by example the use of simulation to further our understanding of neuropsychological variables.
Liao, Hstau Y.; Hashem, Yaser; Frank, Joachim
2015-01-01
Summary Single-particle cryogenic electron microscopy (cryo-EM) is a powerful tool for the study of macromolecular structures at high resolution. Classification allows multiple structural states to be extracted and reconstructed from the same sample. One classification approach is via the covariance matrix, which captures the correlation between every pair of voxels. Earlier approaches employ computing-intensive resampling and estimate only the eigenvectors of the matrix, which are then used in a separate fast classification step. We propose an iterative scheme to explicitly estimate the covariance matrix in its entirety. In our approach, the flexibility in choosing the solution domain allows us to examine a part of the molecule in greater detail. 3D covariance maps obtained in this way from experimental data (cryo-EM images of the eukaryotic pre-initiation complex) prove to be in excellent agreement with conclusions derived by using traditional approaches, revealing in addition the interdependencies of ligand bindings and structural changes. PMID:25982529
Liao, Hstau Y; Hashem, Yaser; Frank, Joachim
2015-06-02
Single-particle cryogenic electron microscopy (cryo-EM) is a powerful tool for the study of macromolecular structures at high resolution. Classification allows multiple structural states to be extracted and reconstructed from the same sample. One classification approach is via the covariance matrix, which captures the correlation between every pair of voxels. Earlier approaches employ computing-intensive resampling and estimate only the eigenvectors of the matrix, which are then used in a separate fast classification step. We propose an iterative scheme to explicitly estimate the covariance matrix in its entirety. In our approach, the flexibility in choosing the solution domain allows us to examine a part of the molecule in greater detail. Three-dimensional covariance maps obtained in this way from experimental data (cryo-EM images of the eukaryotic pre-initiation complex) prove to be in excellent agreement with conclusions derived by using traditional approaches, revealing in addition the interdependencies of ligand bindings and structural changes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Multi-state models for colon cancer recurrence and death with a cured fraction.
Conlon, A S C; Taylor, J M G; Sargent, D J
2014-05-10
In cancer clinical trials, patients often experience a recurrence of disease prior to the outcome of interest, overall survival. Additionally, for many cancers, there is a cured fraction of the population who will never experience a recurrence. There is often interest in how different covariates affect the probability of being cured of disease and the time to recurrence, time to death, and time to death after recurrence. We propose a multi-state Markov model with an incorporated cured fraction to jointly model recurrence and death in colon cancer. A Bayesian estimation strategy is used to obtain parameter estimates. The model can be used to assess how individual covariates affect the probability of being cured and each of the transition rates. Checks for the adequacy of the model fit and for the functional forms of covariates are explored. The methods are applied to data from 12 randomized trials in colon cancer, where we show common effects of specific covariates across the trials. Copyright © 2013 John Wiley & Sons, Ltd.
Tests for detecting overdispersion in models with measurement error in covariates.
Yang, Yingsi; Wong, Man Yu
2015-11-30
Measurement error in covariates can affect the accuracy in count data modeling and analysis. In overdispersion identification, the true mean-variance relationship can be obscured under the influence of measurement error in covariates. In this paper, we propose three tests for detecting overdispersion when covariates are measured with error: a modified score test and two score tests based on the proposed approximate likelihood and quasi-likelihood, respectively. The proposed approximate likelihood is derived under the classical measurement error model, and the resulting approximate maximum likelihood estimator is shown to have superior efficiency. Simulation results also show that the score test based on approximate likelihood outperforms the test based on quasi-likelihood and other alternatives in terms of empirical power. By analyzing a real dataset containing the health-related quality-of-life measurements of a particular group of patients, we demonstrate the importance of the proposed methods by showing that the analyses with and without measurement error correction yield significantly different results. Copyright © 2015 John Wiley & Sons, Ltd.
Preconditioning of the background error covariance matrix in data assimilation for the Caspian Sea
NASA Astrophysics Data System (ADS)
Arcucci, Rossella; D'Amore, Luisa; Toumi, Ralf
2017-06-01
Data Assimilation (DA) is an uncertainty quantification technique used for improving numerical forecasted results by incorporating observed data into prediction models. As a crucial point into DA models is the ill conditioning of the covariance matrices involved, it is mandatory to introduce, in a DA software, preconditioning methods. Here we present first studies concerning the introduction of two different preconditioning methods in a DA software we are developing (we named S3DVAR) which implements a Scalable Three Dimensional Variational Data Assimilation model for assimilating sea surface temperature (SST) values collected into the Caspian Sea by using the Regional Ocean Modeling System (ROMS) with observations provided by the Group of High resolution sea surface temperature (GHRSST). We also present the algorithmic strategies we employ.
Orbit/attitude estimation with LANDSAT Landmark data
NASA Technical Reports Server (NTRS)
Hall, D. L.; Waligora, S.
1979-01-01
The use of LANDSAT landmark data for orbit/attitude and camera bias estimation was studied. The preliminary results of these investigations are presented. The Goddard Trajectory Determination System (GTDS) error analysis capability was used to perform error analysis studies. A number of questions were addressed including parameter observability and sensitivity, effects on the solve-for parameter errors of data span, density, and distribution an a priori covariance weighting. The use of the GTDS differential correction capability with acutal landmark data was examined. The rms line and element observation residuals were studied as a function of the solve-for parameter set, a priori covariance weighting, force model, attitude model and data characteristics. Sample results are presented. Finally, verfication and preliminary system evaluation of the LANDSAT NAVPAK system for sequential (extended Kalman Filter) estimation of orbit, and camera bias parameters is given.
Asymptotic approximations to posterior distributions via conditional moment equations
Yee, J.L.; Johnson, W.O.; Samaniego, F.J.
2002-01-01
We consider asymptotic approximations to joint posterior distributions in situations where the full conditional distributions referred to in Gibbs sampling are asymptotically normal. Our development focuses on problems where data augmentation facilitates simpler calculations, but results hold more generally. Asymptotic mean vectors are obtained as simultaneous solutions to fixed point equations that arise naturally in the development. Asymptotic covariance matrices flow naturally from the work of Arnold & Press (1989) and involve the conditional asymptotic covariance matrices and first derivative matrices for conditional mean functions. When the fixed point equations admit an analytical solution, explicit formulae are subsequently obtained for the covariance structure of the joint limiting distribution, which may shed light on the use of the given statistical model. Two illustrations are given. ?? 2002 Biometrika Trust.
Optimal trading strategies—a time series approach
NASA Astrophysics Data System (ADS)
Bebbington, Peter A.; Kühn, Reimer
2016-05-01
Motivated by recent advances in the spectral theory of auto-covariance matrices, we are led to revisit a reformulation of Markowitz’ mean-variance portfolio optimization approach in the time domain. In its simplest incarnation it applies to a single traded asset and allows an optimal trading strategy to be found which—for a given return—is minimally exposed to market price fluctuations. The model is initially investigated for a range of synthetic price processes, taken to be either second order stationary, or to exhibit second order stationary increments. Attention is paid to consequences of estimating auto-covariance matrices from small finite samples, and auto-covariance matrix cleaning strategies to mitigate against these are investigated. Finally we apply our framework to real world data.
A Cautious Note on Auxiliary Variables That Can Increase Bias in Missing Data Problems.
Thoemmes, Felix; Rose, Norman
2014-01-01
The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion; that is, taking all available auxiliary variables. In this article, we point out that there are some instances in which auxiliary variables exhibit the surprising property of increasing bias in missing data problems. In a series of focused simulation studies, we highlight some situations in which this type of biasing behavior can occur. We briefly discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables.
Risk-Stratified Imputation in Survival Analysis
Kennedy, Richard E.; Adragni, Kofi P.; Tiwari, Hemant K.; Voeks, Jenifer H.; Brott, Thomas G.; Howard, George
2013-01-01
Background Censoring that is dependent on covariates associated with survival can arise in randomized trials due to changes in recruitment and eligibility criteria to minimize withdrawals, potentially leading to biased treatment effect estimates. Imputation approaches have been proposed to address censoring in survival analysis; and while these approaches may provide unbiased estimates of treatment effects, imputation of a large number of outcomes may over- or underestimate the associated variance based on the imputation pool selected. Purpose We propose an improved method, risk-stratified imputation, as an alternative to address withdrawal related to the risk of events in the context of time-to-event analyses. Methods Our algorithm performs imputation from a pool of replacement subjects with similar values of both treatment and covariate(s) of interest, that is, from a risk-stratified sample. This stratification prior to imputation addresses the requirement of time-to-event analysis that censored observations are representative of all other observations in the risk group with similar exposure variables. We compared our risk-stratified imputation to case deletion and bootstrap imputation in a simulated dataset in which the covariate of interest (study withdrawal) was related to treatment. A motivating example from a recent clinical trial is also presented to demonstrate the utility of our method. Results In our simulations, risk-stratified imputation gives estimates of treatment effect comparable to bootstrap and auxiliary variable imputation while avoiding inaccuracies of the latter two in estimating the associated variance. Similar results were obtained in analysis of clinical trial data. Limitations Risk-stratified imputation has little advantage over other imputation methods when covariates of interest are not related to treatment, although its performance is superior when covariates are related to treatment. Risk-stratified imputation is intended for categorical covariates, and may be sensitive to the width of the matching window if continuous covariates are used. Conclusions The use of the risk-stratified imputation should facilitate the analysis of many clinical trials, in which one group has a higher withdrawal rate that is related to treatment. PMID:23818434
Dark matter statistics for large galaxy catalogs: power spectra and covariance matrices
NASA Astrophysics Data System (ADS)
Klypin, Anatoly; Prada, Francisco
2018-06-01
Large-scale surveys of galaxies require accurate theoretical predictions of the dark matter clustering for thousands of mock galaxy catalogs. We demonstrate that this goal can be achieve with the new Parallel Particle-Mesh (PM) N-body code GLAM at a very low computational cost. We run ˜22, 000 simulations with ˜2 billion particles that provide ˜1% accuracy of the dark matter power spectra P(k) for wave-numbers up to k ˜ 1hMpc-1. Using this large data-set we study the power spectrum covariance matrix. In contrast to many previous analytical and numerical results, we find that the covariance matrix normalised to the power spectrum C(k, k΄)/P(k)P(k΄) has a complex structure of non-diagonal components: an upturn at small k, followed by a minimum at k ≈ 0.1 - 0.2 hMpc-1, and a maximum at k ≈ 0.5 - 0.6 hMpc-1. The normalised covariance matrix strongly evolves with redshift: C(k, k΄)∝δα(t)P(k)P(k΄), where δ is the linear growth factor and α ≈ 1 - 1.25, which indicates that the covariance matrix depends on cosmological parameters. We also show that waves longer than 1h-1Gpc have very little impact on the power spectrum and covariance matrix. This significantly reduces the computational costs and complexity of theoretical predictions: relatively small volume ˜(1h-1Gpc)3 simulations capture the necessary properties of dark matter clustering statistics. As our results also indicate, achieving ˜1% errors in the covariance matrix for k < 0.50 hMpc-1 requires a resolution better than ɛ ˜ 0.5h-1Mpc.
Subirà, Marta; Cano, Marta; de Wit, Stella J; Alonso, Pino; Cardoner, Narcís; Hoexter, Marcelo Q; Kwon, Jun Soo; Nakamae, Takashi; Lochner, Christine; Sato, João R; Jung, Wi Hoon; Narumoto, Jin; Stein, Dan J; Pujol, Jesus; Mataix-Cols, David; Veltman, Dick J; Menchón, José M; van den Heuvel, Odile A; Soriano-Mas, Carles
2016-03-01
Frontostriatal and frontoamygdalar connectivity alterations in patients with obsessive-compulsive disorder (OCD) have been typically described in functional neuroimaging studies. However, structural covariance, or volumetric correlations across distant brain regions, also provides network-level information. Altered structural covariance has been described in patients with different psychiatric disorders, including OCD, but to our knowledge, alterations within frontostriatal and frontoamygdalar circuits have not been explored. We performed a mega-analysis pooling structural MRI scans from the Obsessive-compulsive Brain Imaging Consortium and assessed whole-brain voxel-wise structural covariance of 4 striatal regions (dorsal and ventral caudate nucleus, and dorsal-caudal and ventral-rostral putamen) and 2 amygdalar nuclei (basolateral and centromedial-superficial). Images were preprocessed with the standard pipeline of voxel-based morphometry studies using Statistical Parametric Mapping software. Our analyses involved 329 patients with OCD and 316 healthy controls. Patients showed increased structural covariance between the left ventral-rostral putamen and the left inferior frontal gyrus/frontal operculum region. This finding had a significant interaction with age; the association held only in the subgroup of older participants. Patients with OCD also showed increased structural covariance between the right centromedial-superficial amygdala and the ventromedial prefrontal cortex. This was a cross-sectional study. Because this is a multisite data set analysis, participant recruitment and image acquisition were performed in different centres. Most patients were taking medication, and treatment protocols differed across centres. Our results provide evidence for structural network-level alterations in patients with OCD involving 2 frontosubcortical circuits of relevance for the disorder and indicate that structural covariance contributes to fully characterizing brain alterations in patients with psychiatric disorders.
2010-01-01
Background The objective was to study if an association exists between the incidence of malaria and some weather parameters in tropical Maputo province, Mozambique. Methods A Bayesian hierarchical model to malaria count data aggregated at district level over a two years period is formulated. This model made it possible to account for spatial area variations. The model was extended to include environmental covariates temperature and rainfall. Study period was then divided into two climate conditions: rainy and dry seasons. The incidences of malaria between the two seasons were compared. Parameter estimation and inference were carried out using MCMC simulation techniques based on Poisson variation. Model comparisons are made using DIC. Results For winter season, in 2001 the temperature covariate with estimated value of -8.88 shows no association to malaria incidence. In year 2002, the parameter estimation of the same covariate resulted in 5.498 of positive level of association. In both years rainfall covariate determines no dependency to malaria incidence. Malaria transmission is higher in wet season with both covariates positively related to malaria with posterior means 1.99 and 2.83 in year 2001. For 2002 only temperature is associated to malaria incidence with estimated value 2.23. Conclusions The incidence of malaria in year 2001, presents an independent spatial pattern for temperature in summer and for rainfall in winter seasons respectively. In year 2002 temperature determines the spatial pattern of malaria incidence in the region. Temperature influences the model in cases where both covariates are introduced in winter and summer season. Its influence is extended to the summer model with temperature covariate only. It is reasonable to state that with the occurrence of high temperatures, malaria incidence had certainly escalated in this year. PMID:20302674
Using dark current data to estimate AVIRIS noise covariance and improve spectral analyses
NASA Technical Reports Server (NTRS)
Boardman, Joseph W.
1995-01-01
Starting in 1994, all AVIRIS data distributions include a new product useful for quantification and modeling of the noise in the reported radiance data. The 'postcal' file contains approximately 100 lines of dark current data collected at the end of each data acquisition run. In essence this is a regular spectral-image cube, with 614 samples, 100 lines and 224 channels, collected with a closed shutter. Since there is no incident radiance signal, the recorded DN measure only the DC signal level and the noise in the system. Similar dark current measurements, made at the end of each line are used, with a 100 line moving average, to remove the DC signal offset. Therefore, the pixel-by-pixel fluctuations about the mean of this dark current image provide an excellent model for the additive noise that is present in AVIRIS reported radiance data. The 61,400 dark current spectra can be used to calculate the noise levels in each channel and the noise covariance matrix. Both of these noise parameters should be used to improve spectral processing techniques. Some processing techniques, such as spectral curve fitting, will benefit from a robust estimate of the channel-dependent noise levels. Other techniques, such as automated unmixing and classification, will be improved by the stable and scene-independence noise covariance estimate. Future imaging spectrometry systems should have a similar ability to record dark current data, permitting this noise characterization and modeling.
A dynamic spatio-temporal model for spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin; Walsh, Daniel P.
2017-01-01
Analyzing spatial data often requires modeling dependencies created by a dynamic spatio-temporal data generating process. In many applications, a generalized linear mixed model (GLMM) is used with a random effect to account for spatial dependence and to provide optimal spatial predictions. Location-specific covariates are often included as fixed effects in a GLMM and may be collinear with the spatial random effect, which can negatively affect inference. We propose a dynamic approach to account for spatial dependence that incorporates scientific knowledge of the spatio-temporal data generating process. Our approach relies on a dynamic spatio-temporal model that explicitly incorporates location-specific covariates. We illustrate our approach with a spatially varying ecological diffusion model implemented using a computationally efficient homogenization technique. We apply our model to understand individual-level and location-specific risk factors associated with chronic wasting disease in white-tailed deer from Wisconsin, USA and estimate the location the disease was first introduced. We compare our approach to several existing methods that are commonly used in spatial statistics. Our spatio-temporal approach resulted in a higher predictive accuracy when compared to methods based on optimal spatial prediction, obviated confounding among the spatially indexed covariates and the spatial random effect, and provided additional information that will be important for containing disease outbreaks.
A Bayesian Approach for Population Pharmacokinetic Modeling of Alcohol in Japanese Individuals.
Nemoto, Asuka; Masaaki, Matsuura; Yamaoka, Kazue
2017-01-01
Blood alcohol concentration data that were previously obtained from 34 healthy Japanese subjects with limited sampling times were reanalyzed. Characteristics of the data were that the concentrations were obtained from only the early part of the time-concentration curve. To explore significant covariates for the population pharmacokinetic analysis of alcohol by incorporating external data using a Bayesian method, and to estimate effects of the covariates. The data were analyzed using a Markov chain Monte Carlo Bayesian estimation with NONMEM 7.3 (ICON Clinical Research LLC, North Wales, Pennsylvania). Informative priors were obtained from the external study. A 1-compartment model with Michaelis-Menten elimination was used. The typical value for the apparent volume of distribution was 49.3 L at the age of 29.4 years. Volume of distribution was estimated to be 20.4 L smaller in subjects with the ALDH2*1/*2 genotype than in subjects with the ALDH2*1/*1 genotype. A population pharmacokinetic model for alcohol was updated. A Bayesian approach allowed interpretation of significant covariate relationships, even if the current dataset is not informative about all parameters. This is the first study reporting an estimate of the effect of the ALDH2 genotype in a PPK model.
Qualitatively Assessing Randomness in SVD Results
NASA Astrophysics Data System (ADS)
Lamb, K. W.; Miller, W. P.; Kalra, A.; Anderson, S.; Rodriguez, A.
2012-12-01
Singular Value Decomposition (SVD) is a powerful tool for identifying regions of significant co-variability between two spatially distributed datasets. SVD has been widely used in atmospheric research to define relationships between sea surface temperatures, geopotential height, wind, precipitation and streamflow data for myriad regions across the globe. A typical application for SVD is to identify leading climate drivers (as observed in the wind or pressure data) for a particular hydrologic response variable such as precipitation, streamflow, or soil moisture. One can also investigate the lagged relationship between a climate variable and the hydrologic response variable using SVD. When performing these studies it is important to limit the spatial bounds of the climate variable to reduce the chance of random co-variance relationships being identified. On the other hand, a climate region that is too small may ignore climate signals which have more than a statistical relationship to a hydrologic response variable. The proposed research seeks to identify a qualitative method of identifying random co-variability relationships between two data sets. The research identifies the heterogeneous correlation maps from several past results and compares these results with correlation maps produced using purely random and quasi-random climate data. The comparison identifies a methodology to determine if a particular region on a correlation map may be explained by a physical mechanism or is simply statistical chance.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max
1999-01-01
A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.
Graph reconstruction using covariance-based methods.
Sulaimanov, Nurgazy; Koeppl, Heinz
2016-12-01
Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.
Analysis of fMRI data using noise-diffusion network models: a new covariance-coding perspective.
Gilson, Matthieu
2018-04-01
Since the middle of the 1990s, studies of resting-state fMRI/BOLD data have explored the correlation patterns of activity across the whole brain, which is referred to as functional connectivity (FC). Among the many methods that have been developed to interpret FC, a recently proposed model-based approach describes the propagation of fluctuating BOLD activity within the recurrently connected brain network by inferring the effective connectivity (EC). In this model, EC quantifies the strengths of directional interactions between brain regions, viewed from the proxy of BOLD activity. In addition, the tuning procedure for the model provides estimates for the local variability (input variances) to explain how the observed FC is generated. Generalizing, the network dynamics can be studied in the context of an input-output mapping-determined by EC-for the second-order statistics of fluctuating nodal activities. The present paper focuses on the following detection paradigm: observing output covariances, how discriminative is the (estimated) network model with respect to various input covariance patterns? An application with the model fitted to experimental fMRI data-movie viewing versus resting state-illustrates that changes in local variability and changes in brain coordination go hand in hand.
Medical Geography: a Promising Field of Application for Geostatistics.
Goovaerts, P
2009-01-01
The analysis of health data and putative covariates, such as environmental, socio-economic, behavioral or demographic factors, is a promising application for geostatistics. It presents, however, several methodological challenges that arise from the fact that data are typically aggregated over irregular spatial supports and consist of a numerator and a denominator (i.e. population size). This paper presents an overview of recent developments in the field of health geostatistics, with an emphasis on three main steps in the analysis of areal health data: estimation of the underlying disease risk, detection of areas with significantly higher risk, and analysis of relationships with putative risk factors. The analysis is illustrated using age-adjusted cervix cancer mortality rates recorded over the 1970-1994 period for 118 counties of four states in the Western USA. Poisson kriging allows the filtering of noisy mortality rates computed from small population sizes, enhancing the correlation with two putative explanatory variables: percentage of habitants living below the federally defined poverty line, and percentage of Hispanic females. Area-to-point kriging formulation creates continuous maps of mortality risk, reducing the visual bias associated with the interpretation of choropleth maps. Stochastic simulation is used to generate realizations of cancer mortality maps, which allows one to quantify numerically how the uncertainty about the spatial distribution of health outcomes translates into uncertainty about the location of clusters of high values or the correlation with covariates. Last, geographically-weighted regression highlights the non-stationarity in the explanatory power of covariates: the higher mortality values along the coast are better explained by the two covariates than the lower risk recorded in Utah.
Katsube, Takayuki; Ishibashi, Toru; Kano, Takeshi; Wajima, Toshihiro
2016-11-01
The aim of this study was to develop a population pharmacokinetic (PK)/pharmacodynamic (PD) model for describing plasma lusutrombopag concentrations and platelet response following oral lusutrombopag dosing and for evaluating covariates in the PK/PD profiles. A population PK/PD model was developed using a total of 2539 plasma lusutrombopag concentration data and 1408 platelet count data from 78 healthy adult subjects following oral single and multiple (14-day once-daily) dosing. Covariates in PK and PK/PD models were explored for subject age, body weight, sex, and ethnicity. A three-compartment model with first-order rate and lag time for absorption was selected as a PK model. A three-transit and one-platelet compartment model with a sigmoid E max model for drug effect and feedback of platelet production was selected as the PD model. The PK and PK/PD models well described the plasma lusutrombopag concentrations and the platelet response, respectively. Body weight was a significant covariate in PK. The bioavailability of non-Japanese subjects (White and Black/African American subjects) was 13 % lower than that of Japanese subjects, while the simulated platelet response profiles using the PK/PD model were similar between Japanese and non-Japanese subjects. There were no significant covariates of the tested background data including age, sex, and ethnicity (Japanese or non-Japanese) for the PD sensitivity. A population PK/PD model was developed for lusutrombopag and shown to provide good prediction for the PK/PD profiles. The model could be used as a basic PK/PD model in the drug development of lusutrombopag.
Nuclear Data Uncertainties for Typical LWR Fuel Assemblies and a Simple Reactor Core
NASA Astrophysics Data System (ADS)
Rochman, D.; Leray, O.; Hursin, M.; Ferroukhi, H.; Vasiliev, A.; Aures, A.; Bostelmann, F.; Zwermann, W.; Cabellos, O.; Diez, C. J.; Dyrda, J.; Garcia-Herranz, N.; Castro, E.; van der Marck, S.; Sjöstrand, H.; Hernandez, A.; Fleming, M.; Sublet, J.-Ch.; Fiorito, L.
2017-01-01
The impact of the current nuclear data library covariances such as in ENDF/B-VII.1, JEFF-3.2, JENDL-4.0, SCALE and TENDL, for relevant current reactors is presented in this work. The uncertainties due to nuclear data are calculated for existing PWR and BWR fuel assemblies (with burn-up up to 40 GWd/tHM, followed by 10 years of cooling time) and for a simplified PWR full core model (without burn-up) for quantities such as k∞, macroscopic cross sections, pin power or isotope inventory. In this work, the method of propagation of uncertainties is based on random sampling of nuclear data, either from covariance files or directly from basic parameters. Additionally, possible biases on calculated quantities are investigated such as the self-shielding treatment. Different calculation schemes are used, based on CASMO, SCALE, DRAGON, MCNP or FISPACT-II, thus simulating real-life assignments for technical-support organizations. The outcome of such a study is a comparison of uncertainties with two consequences. One: although this study is not expected to lead to similar results between the involved calculation schemes, it provides an insight on what can happen when calculating uncertainties and allows to give some perspectives on the range of validity on these uncertainties. Two: it allows to dress a picture of the state of the knowledge as of today, using existing nuclear data library covariances and current methods.
On the repeated measures designs and sample sizes for randomized controlled trials.
Tango, Toshiro
2016-04-01
For the analysis of longitudinal or repeated measures data, generalized linear mixed-effects models provide a flexible and powerful tool to deal with heterogeneity among subject response profiles. However, the typical statistical design adopted in usual randomized controlled trials is an analysis of covariance type analysis using a pre-defined pair of "pre-post" data, in which pre-(baseline) data are used as a covariate for adjustment together with other covariates. Then, the major design issue is to calculate the sample size or the number of subjects allocated to each treatment group. In this paper, we propose a new repeated measures design and sample size calculations combined with generalized linear mixed-effects models that depend not only on the number of subjects but on the number of repeated measures before and after randomization per subject used for the analysis. The main advantages of the proposed design combined with the generalized linear mixed-effects models are (1) it can easily handle missing data by applying the likelihood-based ignorable analyses under the missing at random assumption and (2) it may lead to a reduction in sample size, compared with the simple pre-post design. The proposed designs and the sample size calculations are illustrated with real data arising from randomized controlled trials. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Hamrén, Bengt; Kjellsson, Maria C.; Skrtic, Stanko
2016-01-01
Aim The integrated glucose–insulin (IGI) model is a semi‐mechanistic physiological model which can describe the glucose–insulin homeostasis system following various glucose challenge settings. The aim of the present work was to apply the model to a large and diverse population of metformin‐only‐treated type 2 diabetes mellitus (T2DM) patients and identify patient‐specific covariates. Methods Data from four clinical studies were pooled, including glucose and insulin concentration–time profiles from T2DM patients on stable treatment with metformin alone following mixed‐meal tolerance tests. The data were collected from a wide range of patients with respect to the duration of diabetes and level of glycaemic control. Results The IGI model was expanded by four patient‐specific covariates. The level of glycaemic control, represented by baseline glycosylated haemoglobin was identified as a significant covariate for steady‐state glucose, insulin‐dependent glucose clearance and the magnitude of the incretin effect, while baseline body mass index was a significant covariate for steady‐state insulin levels. In addition, glucose dose was found to have an impact on glucose absorption rate. The developed model was used to simulate glucose and insulin profiles in different groups of T2DM patients, across a range of glycaemic control, and it was found accurately to characterize their response to the standard oral glucose challenge. Conclusions The IGI model was successfully applied to characterize differences between T2DM patients across a wide range of glycaemic control. The addition of patient‐specific covariates in the IGI model might be valuable for the future development of antidiabetic treatment and for the design and simulation of clinical studies. PMID:27450071
Predictive Mapping of Topsoil Organic Carbon in an Alpine Environment Aided by Landsat TM
Yang, Renmin; Rossiter, David G.; Liu, Feng; Lu, Yuanyuan; Yang, Fan; Yang, Fei; Zhao, Yuguo; Li, Decheng; Zhang, Ganlin
2015-01-01
The objective of this study was to examine the reflectance of Landsat TM imagery for mapping soil organic Carbon (SOC) content in an Alpine environment. The studied area (ca. 3*104 km2) is the upper reaches of the Heihe River at the northeast edge of the Tibetan plateau, China. A set (105) of topsoil samples were analyzed for SOC. Boosted regression tree (BRT) models using Landsat TM imagery were built to predict SOC content, alone or with topography and climate covariates (temperature and precipitation). The best model, combining all covariates, was only marginally better than using only imagery. Imagery alone was sufficient to build a reasonable model; this was a bit better than only using topography and climate covariates. The Lin’s concordance correlation coefficient values of the imagery only model and the full model are very close, larger than the topography and climate variables based model. In the full model, SOC was mainly explained by Landsat TM imagery (65% relative importance), followed by climate variables (20%) and topography (15% of relative importance). The good results from imagery are likely due to (1) the strong dependence of SOC on native vegetation intensity in this Alpine environment; (2) the strong correlation in this environment between imagery and environmental covariables, especially elevation (corresponding to temperature), precipitation, and slope aspect. We conclude that multispectral satellite data from Landsat TM images may be used to predict topsoil SOC with reasonable accuracy in Alpine regions, and perhaps other regions covered with natural vegetation, and that adding topography and climate covariables to the satellite data can improve the predictive accuracy. PMID:26473739
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Ziyatdinov, Andrey; Vázquez-Santiago, Miquel; Brunel, Helena; Martinez-Perez, Angel; Aschard, Hugues; Soria, Jose Manuel
2018-02-27
Quantitative trait locus (QTL) mapping in genetic data often involves analysis of correlated observations, which need to be accounted for to avoid false association signals. This is commonly performed by modeling such correlations as random effects in linear mixed models (LMMs). The R package lme4 is a well-established tool that implements major LMM features using sparse matrix methods; however, it is not fully adapted for QTL mapping association and linkage studies. In particular, two LMM features are lacking in the base version of lme4: the definition of random effects by custom covariance matrices; and parameter constraints, which are essential in advanced QTL models. Apart from applications in linkage studies of related individuals, such functionalities are of high interest for association studies in situations where multiple covariance matrices need to be modeled, a scenario not covered by many genome-wide association study (GWAS) software. To address the aforementioned limitations, we developed a new R package lme4qtl as an extension of lme4. First, lme4qtl contributes new models for genetic studies within a single tool integrated with lme4 and its companion packages. Second, lme4qtl offers a flexible framework for scenarios with multiple levels of relatedness and becomes efficient when covariance matrices are sparse. We showed the value of our package using real family-based data in the Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) project. Our software lme4qtl enables QTL mapping models with a versatile structure of random effects and efficient computation for sparse covariances. lme4qtl is available at https://github.com/variani/lme4qtl .
He, Hua; McDermott, Michael P.
2012-01-01
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. PMID:21856650
Madrasi, Kumpal; Chaturvedula, Ayyappa; Haberer, Jessica E; Sale, Mark; Fossler, Michael J; Bangsberg, David; Baeten, Jared M; Celum, Connie; Hendrix, Craig W
2017-05-01
Adherence is a major factor in the effectiveness of preexposure prophylaxis (PrEP) for HIV prevention. Modeling patterns of adherence helps to identify influential covariates of different types of adherence as well as to enable clinical trial simulation so that appropriate interventions can be developed. We developed a Markov mixed-effects model to understand the covariates influencing adherence patterns to daily oral PrEP. Electronic adherence records (date and time of medication bottle cap opening) from the Partners PrEP ancillary adherence study with a total of 1147 subjects were used. This study included once-daily dosing regimens of placebo, oral tenofovir disoproxil fumarate (TDF), and TDF in combination with emtricitabine (FTC), administered to HIV-uninfected members of serodiscordant couples. One-coin and first- to third-order Markov models were fit to the data using NONMEM ® 7.2. Model selection criteria included objective function value (OFV), Akaike information criterion (AIC), visual predictive checks, and posterior predictive checks. Covariates were included based on forward addition (α = 0.05) and backward elimination (α = 0.001). Markov models better described the data than 1-coin models. A third-order Markov model gave the lowest OFV and AIC, but the simpler first-order model was used for covariate model building because no additional benefit on prediction of target measures was observed for higher-order models. Female sex and older age had a positive impact on adherence, whereas Sundays, sexual abstinence, and sex with a partner other than the study partner had a negative impact on adherence. Our findings suggest adherence interventions should consider the role of these factors. © 2016, The American College of Clinical Pharmacology.
NASA Technical Reports Server (NTRS)
Wargan, K.; Stajner, I.; Pawson, S.
2003-01-01
In a data assimilation system the forecast error covariance matrix governs the way in which the data information is spread throughout the model grid. Implementation of a correct method of assigning covariances is expected to have an impact on the analysis results. The simplest models assume that correlations are constant in time and isotropic or nearly isotropic. In such models the analysis depends on the dynamics only through assumed error standard deviations. In applications to atmospheric tracer data assimilation this may lead to inaccuracies, especially in regions with strong wind shears or high gradient of potential vorticity, as well as in areas where no data are available. In order to overcome this problem we have developed a flow-dependent covariance model that is based on short term evolution of error correlations. The presentation compares performance of a static and a flow-dependent model applied to a global three- dimensional ozone data assimilation system developed at NASA s Data Assimilation Office. We will present some results of validation against WMO balloon-borne sondes and the Polar Ozone and Aerosol Measurement (POAM) III instrument. Experiments show that allowing forecast error correlations to evolve with the flow results in positive impact on assimilated ozone within the regions where data were not assimilated, particularly at high latitudes in both hemispheres and in the troposphere. We will also discuss statistical characteristics of both models; in particular we will argue that including evolution of error correlations leads to stronger internal consistency of a data assimilation ,
Tao, Chenyang; Nichols, Thomas E.; Hua, Xue; Ching, Christopher R.K.; Rolls, Edmund T.; Thompson, Paul M.; Feng, Jianfeng
2017-01-01
We propose a generalized reduced rank latent factor regression model (GRRLF) for the analysis of tensor field responses and high dimensional covariates. The model is motivated by the need from imaging-genetic studies to identify genetic variants that are associated with brain imaging phenotypes, often in the form of high dimensional tensor fields. GRRLF identifies from the structure in the data the effective dimensionality of the data, and then jointly performs dimension reduction of the covariates, dynamic identification of latent factors, and nonparametric estimation of both covariate and latent response fields. After accounting for the latent and covariate effects, GRLLF performs a nonparametric test on the remaining factor of interest. GRRLF provides a better factorization of the signals compared with common solutions, and is less susceptible to overfitting because it exploits the effective dimensionality. The generality and the flexibility of GRRLF also allow various statistical models to be handled in a unified framework and solutions can be efficiently computed. Within the field of neuroimaging, it improves the sensitivity for weak signals and is a promising alternative to existing approaches. The operation of the framework is demonstrated with both synthetic datasets and a real-world neuroimaging example in which the effects of a set of genes on the structure of the brain at the voxel level were measured, and the results compared favorably with those from existing approaches. PMID:27666385
Lyles, Robert H.; Mitchell, Emily M.; Weinberg, Clarice R.; Umbach, David M.; Schisterman, Enrique F.
2016-01-01
Summary Potential reductions in laboratory assay costs afforded by pooling equal aliquots of biospecimens have long been recognized in disease surveillance and epidemiological research and, more recently, have motivated design and analytic developments in regression settings. For example, Weinberg and Umbach (1999, Biometrics 55, 718–726) provided methods for fitting set-based logistic regression models to case-control data when a continuous exposure variable (e.g., a biomarker) is assayed on pooled specimens. We focus on improving estimation efficiency by utilizing available subject-specific information at the pool allocation stage. We find that a strategy that we call “(y,c)-pooling,” which forms pooling sets of individuals within strata defined jointly by the outcome and other covariates, provides more precise estimation of the risk parameters associated with those covariates than does pooling within strata defined only by the outcome. We review the approach to set-based analysis through offsets developed by Weinberg and Umbach in a recent correction to their original paper. We propose a method for variance estimation under this design and use simulations and a real-data example to illustrate the precision benefits of (y,c)-pooling relative to y-pooling. We also note and illustrate that set-based models permit estimation of covariate interactions with exposure. PMID:26964741
Calus, Mario PL; Bijma, Piter; Veerkamp, Roel F
2004-01-01
Covariance functions have been proposed to predict breeding values and genetic (co)variances as a function of phenotypic within herd-year averages (environmental parameters) to include genotype by environment interaction. The objective of this paper was to investigate the influence of definition of environmental parameters and non-random use of sires on expected breeding values and estimated genetic variances across environments. Breeding values were simulated as a linear function of simulated herd effects. The definition of environmental parameters hardly influenced the results. In situations with random use of sires, estimated genetic correlations between the trait expressed in different environments were 0.93, 0.93 and 0.97 while simulated at 0.89 and estimated genetic variances deviated up to 30% from the simulated values. Non random use of sires, poor genetic connectedness and small herd size had a large impact on the estimated covariance functions, expected breeding values and calculated environmental parameters. Estimated genetic correlations between a trait expressed in different environments were biased upwards and breeding values were more biased when genetic connectedness became poorer and herd composition more diverse. The best possible solution at this stage is to use environmental parameters combining large numbers of animals per herd, while losing some information on genotype by environment interaction in the data. PMID:15339629
Large-region acoustic source mapping using a movable array and sparse covariance fitting.
Zhao, Shengkui; Tuna, Cagdas; Nguyen, Thi Ngoc Tho; Jones, Douglas L
2017-01-01
Large-region acoustic source mapping is important for city-scale noise monitoring. Approaches using a single-position measurement scheme to scan large regions using small arrays cannot provide clean acoustic source maps, while deploying large arrays spanning the entire region of interest is prohibitively expensive. A multiple-position measurement scheme is applied to scan large regions at multiple spatial positions using a movable array of small size. Based on the multiple-position measurement scheme, a sparse-constrained multiple-position vectorized covariance matrix fitting approach is presented. In the proposed approach, the overall sample covariance matrix of the incoherent virtual array is first estimated using the multiple-position array data and then vectorized using the Khatri-Rao (KR) product. A linear model is then constructed for fitting the vectorized covariance matrix and a sparse-constrained reconstruction algorithm is proposed for recovering source powers from the model. The user parameter settings are discussed. The proposed approach is tested on a 30 m × 40 m region and a 60 m × 40 m region using simulated and measured data. Much cleaner acoustic source maps and lower sound pressure level errors are obtained compared to the beamforming approaches and the previous sparse approach [Zhao, Tuna, Nguyen, and Jones, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (2016)].
Kernel Equating Under the Non-Equivalent Groups With Covariates Design
Bränberg, Kenny
2015-01-01
When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests. PMID:29881012
Multivariate Analysis of Longitudinal Rates of Change
Bryan, Matthew; Heagerty, Patrick J.
2016-01-01
Longitudinal data allow direct comparison of the change in patient outcomes associated with treatment or exposure. Frequently, several longitudinal measures are collected that either reflect a common underlying health status, or characterize processes that are influenced in a similar way by covariates such as exposure or demographic characteristics. Statistical methods that can combine multivariate response variables into common measures of covariate effects have been proposed by Roy and Lin [1]; Proust-Lima, Letenneur and Jacqmin-Gadda [2]; and Gray and Brookmeyer [3] among others. Current methods for characterizing the relationship between covariates and the rate of change in multivariate outcomes are limited to select models. For example, Gray and Brookmeyer [3] introduce an “accelerated time” method which assumes that covariates rescale time in longitudinal models for disease progression. In this manuscript we detail an alternative multivariate model formulation that directly structures longitudinal rates of change, and that permits a common covariate effect across multiple outcomes. We detail maximum likelihood estimation for a multivariate longitudinal mixed model. We show via asymptotic calculations the potential gain in power that may be achieved with a common analysis of multiple outcomes. We apply the proposed methods to the analysis of a trivariate outcome for infant growth and compare rates of change for HIV infected and uninfected infants. PMID:27417129
Thomson, James R; Kimmerer, Wim J; Brown, Larry R; Newman, Ken B; Mac Nally, Ralph; Bennett, William A; Feyrer, Frederick; Fleishman, Erica
2010-07-01
We examined trends in abundance of four pelagic fish species (delta smelt, longfin smelt, striped bass, and threadfin shad) in the upper San Francisco Estuary, California, USA, over 40 years using Bayesian change point models. Change point models identify times of abrupt or unusual changes in absolute abundance (step changes) or in rates of change in abundance (trend changes). We coupled Bayesian model selection with linear regression splines to identify biotic or abiotic covariates with the strongest associations with abundances of each species. We then refitted change point models conditional on the selected covariates to explore whether those covariates could explain statistical trends or change points in species abundances. We also fitted a multispecies change point model that identified change points common to all species. All models included hierarchical structures to model data uncertainties, including observation errors and missing covariate values. There were step declines in abundances of all four species in the early 2000s, with a likely common decline in 2002. Abiotic variables, including water clarity, position of the 2 per thousand isohaline (X2), and the volume of freshwater exported from the estuary, explained some variation in species' abundances over the time series, but no selected covariates could explain statistically the post-2000 change points for any species.
Kernel Equating Under the Non-Equivalent Groups With Covariates Design.
Wiberg, Marie; Bränberg, Kenny
2015-07-01
When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests.
Lin, Li-An; Luo, Sheng; Davis, Barry R
2018-01-01
In the course of hypertension, cardiovascular disease events (e.g., stroke, heart failure) occur frequently and recurrently. The scientific interest in such study may lie in the estimation of treatment effect while accounting for the correlation among event times. The correlation among recurrent event times come from two sources: subject-specific heterogeneity (e.g., varied lifestyles, genetic variations, and other unmeasurable effects) and event dependence (i.e., event incidences may change the risk of future recurrent events). Moreover, event incidences may change the disease progression so that there may exist event-varying covariate effects (the covariate effects may change after each event) and event effect (the effect of prior events on the future events). In this article, we propose a Bayesian regression model that not only accommodates correlation among recurrent events from both sources, but also explicitly characterizes the event-varying covariate effects and event effect. This model is especially useful in quantifying how the incidences of events change the effects of covariates and risk of future events. We compare the proposed model with several commonly used recurrent event models and apply our model to the motivating lipid-lowering trial (LLT) component of the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) (ALLHAT-LLT).
Lin, Li-An; Luo, Sheng; Davis, Barry R.
2017-01-01
In the course of hypertension, cardiovascular disease events (e.g., stroke, heart failure) occur frequently and recurrently. The scientific interest in such study may lie in the estimation of treatment effect while accounting for the correlation among event times. The correlation among recurrent event times come from two sources: subject-specific heterogeneity (e.g., varied lifestyles, genetic variations, and other unmeasurable effects) and event dependence (i.e., event incidences may change the risk of future recurrent events). Moreover, event incidences may change the disease progression so that there may exist event-varying covariate effects (the covariate effects may change after each event) and event effect (the effect of prior events on the future events). In this article, we propose a Bayesian regression model that not only accommodates correlation among recurrent events from both sources, but also explicitly characterizes the event-varying covariate effects and event effect. This model is especially useful in quantifying how the incidences of events change the effects of covariates and risk of future events. We compare the proposed model with several commonly used recurrent event models and apply our model to the motivating lipid-lowering trial (LLT) component of the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) (ALLHAT-LLT). PMID:29755162
Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection
NASA Astrophysics Data System (ADS)
Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd
2015-02-01
Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.