Sample records for non-ignorable missing values

  1. Empirical likelihood method for non-ignorable missing data problems.

    PubMed

    Guan, Zhong; Qin, Jing

    2017-01-01

    Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.

  2. Simulation-based sensitivity analysis for non-ignorably missing data.

    PubMed

    Yin, Peng; Shi, Jian Q

    2017-01-01

    Sensitivity analysis is popular in dealing with missing data problems particularly for non-ignorable missingness, where full-likelihood method cannot be adopted. It analyses how sensitively the conclusions (output) may depend on assumptions or parameters (input) about missing data, i.e. missing data mechanism. We call models with the problem of uncertainty sensitivity models. To make conventional sensitivity analysis more useful in practice we need to define some simple and interpretable statistical quantities to assess the sensitivity models and make evidence based analysis. We propose a novel approach in this paper on attempting to investigate the possibility of each missing data mechanism model assumption, by comparing the simulated datasets from various MNAR models with the observed data non-parametrically, using the K-nearest-neighbour distances. Some asymptotic theory has also been provided. A key step of this method is to plug in a plausibility evaluation system towards each sensitivity parameter, to select plausible values and reject unlikely values, instead of considering all proposed values of sensitivity parameters as in the conventional sensitivity analysis method. The method is generic and has been applied successfully to several specific models in this paper including meta-analysis model with publication bias, analysis of incomplete longitudinal data and mean estimation with non-ignorable missing data.

  3. Growth Modeling with Non-Ignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial

    PubMed Central

    Muthén, Bengt; Asparouhov, Tihomir; Hunter, Aimee; Leuchter, Andrew

    2011-01-01

    This paper uses a general latent variable framework to study a series of models for non-ignorable missingness due to dropout. Non-ignorable missing data modeling acknowledges that missingness may depend on not only covariates and observed outcomes at previous time points as with the standard missing at random (MAR) assumption, but also on latent variables such as values that would have been observed (missing outcomes), developmental trends (growth factors), and qualitatively different types of development (latent trajectory classes). These alternative predictors of missing data can be explored in a general latent variable framework using the Mplus program. A flexible new model uses an extended pattern-mixture approach where missingness is a function of latent dropout classes in combination with growth mixture modeling using latent trajectory classes. A new selection model allows not only an influence of the outcomes on missingness, but allows this influence to vary across latent trajectory classes. Recommendations are given for choosing models. The missing data models are applied to longitudinal data from STAR*D, the largest antidepressant clinical trial in the U.S. to date. Despite the importance of this trial, STAR*D growth model analyses using non-ignorable missing data techniques have not been explored until now. The STAR*D data are shown to feature distinct trajectory classes, including a low class corresponding to substantial improvement in depression, a minority class with a U-shaped curve corresponding to transient improvement, and a high class corresponding to no improvement. The analyses provide a new way to assess drug efficiency in the presence of dropout. PMID:21381817

  4. Impact of missing data mechanism on the estimate of change: a case study on cognitive function and polypharmacy among older persons

    PubMed Central

    Lavikainen, Piia; Leskinen, Esko; Hartikainen, Sirpa; Möttönen, Jyrki; Sulkava, Raimo; Korhonen, Maarit J

    2015-01-01

    Longitudinal studies typically suffer from incompleteness of data. Attrition is a major problem in studies of older persons since participants may die during the study or are too frail to participate in follow-up examinations. Attrition is typically related to an individual’s health; therefore, ignoring it may lead to too optimistic inferences, for example, about cognitive decline or changes in polypharmacy. The objective of this study is to compare the estimates of level and slope of change in 1) cognitive function and 2) number of drugs in use between the assumptions of ignorable and non-ignorable missingness. This study demonstrates the usefulness of latent variable modeling framework. The results suggest that when the missing data mechanism is not known, it is preferable to conduct analyses both under ignorable and non-ignorable missing data assumptions. PMID:25678815

  5. Impact of missing data mechanism on the estimate of change: a case study on cognitive function and polypharmacy among older persons.

    PubMed

    Lavikainen, Piia; Leskinen, Esko; Hartikainen, Sirpa; Möttönen, Jyrki; Sulkava, Raimo; Korhonen, Maarit J

    2015-01-01

    Longitudinal studies typically suffer from incompleteness of data. Attrition is a major problem in studies of older persons since participants may die during the study or are too frail to participate in follow-up examinations. Attrition is typically related to an individual's health; therefore, ignoring it may lead to too optimistic inferences, for example, about cognitive decline or changes in polypharmacy. The objective of this study is to compare the estimates of level and slope of change in 1) cognitive function and 2) number of drugs in use between the assumptions of ignorable and non-ignorable missingness. This study demonstrates the usefulness of latent variable modeling framework. The results suggest that when the missing data mechanism is not known, it is preferable to conduct analyses both under ignorable and non-ignorable missing data assumptions.

  6. Non-ignorable missingness item response theory models for choice effects in examinee-selected items.

    PubMed

    Liu, Chen-Wei; Wang, Wen-Chung

    2017-11-01

    Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.

  7. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Power Analysis for Anticipated Non-Response in Randomized Block Designs

    ERIC Educational Resources Information Center

    Pustejovsky, James E.

    2011-01-01

    Recent guidance on the treatment of missing data in experiments advocates the use of sensitivity analysis and worst-case bounds analysis for addressing non-ignorable missing data mechanisms; moreover, plans for the analysis of missing data should be specified prior to data collection (Puma et al., 2009). While these authors recommend only that…

  9. A Correlated Random Effects Model for Nonignorable Missing Data in Value-Added Assessment of Teacher Effects

    ERIC Educational Resources Information Center

    Karl, Andrew T.; Yang, Yan; Lohr, Sharon L.

    2013-01-01

    Value-added models have been widely used to assess the contributions of individual teachers and schools to students' academic growth based on longitudinal student achievement outcomes. There is concern, however, that ignoring the presence of missing values, which are common in longitudinal studies, can bias teachers' value-added scores.…

  10. Strategies for Handling Missing Data with Maximum Likelihood Estimation in Career and Technical Education Research

    ERIC Educational Resources Information Center

    Lee, In Heok

    2012-01-01

    Researchers in career and technical education often ignore more effective ways of reporting and treating missing data and instead implement traditional, but ineffective, missing data methods (Gemici, Rojewski, & Lee, 2012). The recent methodological, and even the non-methodological, literature has increasingly emphasized the importance of…

  11. Clustering with Missing Values: No Imputation Required

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  12. A Two-Step Approach for Analysis of Nonignorable Missing Outcomes in Longitudinal Regression: an Application to Upstate KIDS Study.

    PubMed

    Liu, Danping; Yeung, Edwina H; McLain, Alexander C; Xie, Yunlong; Buck Louis, Germaine M; Sundaram, Rajeshwari

    2017-09-01

    Imperfect follow-up in longitudinal studies commonly leads to missing outcome data that can potentially bias the inference when the missingness is nonignorable; that is, the propensity of missingness depends on missing values in the data. In the Upstate KIDS Study, we seek to determine if the missingness of child development outcomes is nonignorable, and how a simple model assuming ignorable missingness would compare with more complicated models for a nonignorable mechanism. To correct for nonignorable missingness, the shared random effects model (SREM) jointly models the outcome and the missing mechanism. However, the computational complexity and lack of software packages has limited its practical applications. This paper proposes a novel two-step approach to handle nonignorable missing outcomes in generalized linear mixed models. We first analyse the missing mechanism with a generalized linear mixed model and predict values of the random effects; then, the outcome model is fitted adjusting for the predicted random effects to account for heterogeneity in the missingness propensity. Extensive simulation studies suggest that the proposed method is a reliable approximation to SREM, with a much faster computation. The nonignorability of missing data in the Upstate KIDS Study is estimated to be mild to moderate, and the analyses using the two-step approach or SREM are similar to the model assuming ignorable missingness. The two-step approach is a computationally straightforward method that can be conducted as sensitivity analyses in longitudinal studies to examine violations to the ignorable missingness assumption and the implications relative to health outcomes. © 2017 John Wiley & Sons Ltd.

  13. Bayesian Sensitivity Analysis of Statistical Models with Missing Data

    PubMed Central

    ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

    2013-01-01

    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718

  14. Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data.

    PubMed

    Gottfredson, Nisha C; Sterba, Sonya K; Jackson, Kristina M

    2017-01-01

    Random coefficient-dependent (RCD) missingness is a non-ignorable mechanism through which missing data can arise in longitudinal designs. RCD, for which we cannot test, is a problematic form of missingness that occurs if subject-specific random effects correlate with propensity for missingness or dropout. Particularly when covariate missingness is a problem, investigators typically handle missing longitudinal data by using single-level multiple imputation procedures implemented with long-format data, which ignores within-person dependency entirely, or implemented with wide-format (i.e., multivariate) data, which ignores some aspects of within-person dependency. When either of these standard approaches to handling missing longitudinal data is used, RCD missingness leads to parameter bias and incorrect inference. We explain why multilevel multiple imputation (MMI) should alleviate bias induced by a RCD missing data mechanism under conditions that contribute to stronger determinacy of random coefficients. We evaluate our hypothesis with a simulation study. Three design factors are considered: intraclass correlation (ICC; ranging from .25 to .75), number of waves (ranging from 4 to 8), and percent of missing data (ranging from 20 to 50%). We find that MMI greatly outperforms the single-level wide-format (multivariate) method for imputation under a RCD mechanism. For the MMI analyses, bias was most alleviated when the ICC is high, there were more waves of data, and when there was less missing data. Practical recommendations for handling longitudinal missing data are suggested.

  15. Addressing missing covariates for the regression analysis of competing risks: Prognostic modelling for triaging patients diagnosed with prostate cancer.

    PubMed

    Escarela, Gabriel; Ruiz-de-Chavez, Juan; Castillo-Morales, Alberto

    2016-08-01

    Competing risks arise in medical research when subjects are exposed to various types or causes of death. Data from large cohort studies usually exhibit subsets of regressors that are missing for some study subjects. Furthermore, such studies often give rise to censored data. In this article, a carefully formulated likelihood-based technique for the regression analysis of right-censored competing risks data when two of the covariates are discrete and partially missing is developed. The approach envisaged here comprises two models: one describes the covariate effects on both long-term incidence and conditional latencies for each cause of death, whilst the other deals with the observation process by which the covariates are missing. The former is formulated with a well-established mixture model and the latter is characterised by copula-based bivariate probability functions for both the missing covariates and the missing data mechanism. The resulting formulation lends itself to the empirical assessment of non-ignorability by performing sensitivity analyses using models with and without a non-ignorable component. The methods are illustrated on a 20-year follow-up involving a prostate cancer cohort from the National Cancer Institutes Surveillance, Epidemiology, and End Results program. © The Author(s) 2013.

  16. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  17. Missing data imputation: focusing on single imputation.

    PubMed

    Zhang, Zhongheng

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

  18. Missing data imputation: focusing on single imputation

    PubMed Central

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations. PMID:26855945

  19. Subjective Prior Distributions for Modeling Longitudinal Continuous Outcomes with Non-Ignorable Dropout

    PubMed Central

    Paddock, Susan M.; Ebener, Patricia

    2010-01-01

    Substance abuse treatment research is complicated by the pervasive problem of non-ignorable missing data – i.e., the occurrence of the missing data is related to the unobserved outcomes. Missing data frequently arise due to early client departure from treatment. Pattern-mixture models (PMMs) are often employed in such situations to jointly model the outcome and the missing data mechanism. PMMs require non-testable assumptions to identify model parameters. Several approaches to parameter identification have therefore been explored for longitudinal modeling of continuous outcomes, and informative priors have been developed in other contexts. In this paper, we describe an expert interview conducted with five substance abuse treatment clinical experts who have familiarity with the Therapeutic Community modality of substance abuse treatment and with treatment process scores collected using the Dimensions of Change Instrument. The goal of the interviews was to obtain expert opinion about the rate of change in continuous client-level treatment process scores for clients who leave before completing two assessments and whose rate of change (slope) in treatment process scores is unidentified by the data. We find that the experts’ opinions differed dramatically from widely-utilized assumptions used to identify parameters in the PMM. Further, subjective prior assessment allows one to properly address the uncertainty inherent in the subjective decisions required to identify parameters in the PMM and to measure their effect on conclusions drawn from the analysis. PMID:19012279

  20. Exact Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data.

    PubMed

    Lin, Yan; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett; Lipshultz, Steven

    2017-01-01

    Altham (Altham PME. Exact Bayesian analysis of a 2 × 2 contingency table, and Fisher's "exact" significance test. J R Stat Soc B 1969; 31: 261-269) showed that a one-sided p-value from Fisher's exact test of independence in a 2 × 2 contingency table is equal to the posterior probability of negative association in the 2 × 2 contingency table under a Bayesian analysis using an improper prior. We derive an extension of Fisher's exact test p-value in the presence of missing data, assuming the missing data mechanism is ignorable (i.e., missing at random or completely at random). Further, we propose Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data using alternative priors; we also present results from a simulation study exploring the Type I error rate and power of the proposed exact test p-values. An example, using data on the association between blood pressure and a cardiac enzyme, is presented to illustrate the methods.

  1. Longitudinal data analysis with non-ignorable missing data.

    PubMed

    Tseng, Chi-hong; Elashoff, Robert; Li, Ning; Li, Gang

    2016-02-01

    A common problem in the longitudinal data analysis is the missing data problem. Two types of missing patterns are generally considered in statistical literature: monotone and non-monotone missing data. Nonmonotone missing data occur when study participants intermittently miss scheduled visits, while monotone missing data can be from discontinued participation, loss to follow-up, and mortality. Although many novel statistical approaches have been developed to handle missing data in recent years, few methods are available to provide inferences to handle both types of missing data simultaneously. In this article, a latent random effects model is proposed to analyze longitudinal outcomes with both monotone and non-monotone missingness in the context of missing not at random. Another significant contribution of this article is to propose a new computational algorithm for latent random effects models. To reduce the computational burden of high-dimensional integration problem in latent random effects models, we develop a new computational algorithm that uses a new adaptive quadrature approach in conjunction with the Taylor series approximation for the likelihood function to simplify the E-step computation in the expectation-maximization algorithm. Simulation study is performed and the data from the scleroderma lung study are used to demonstrate the effectiveness of this method. © The Author(s) 2012.

  2. A novel approach for incremental uncertainty rule generation from databases with missing values handling: application to dynamic medical databases.

    PubMed

    Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos

    2005-09-01

    Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.

  3. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    PubMed

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Predicting missing values in a home care database using an adaptive uncertainty rule method.

    PubMed

    Konias, S; Gogou, G; Bamidis, P D; Vlahavas, I; Maglaveras, N

    2005-01-01

    Contemporary literature illustrates an abundance of adaptive algorithms for mining association rules. However, most literature is unable to deal with the peculiarities, such as missing values and dynamic data creation, that are frequently encountered in fields like medicine. This paper proposes an uncertainty rule method that uses an adaptive threshold for filling missing values in newly added records. A new approach for mining uncertainty rules and filling missing values is proposed, which is in turn particularly suitable for dynamic databases, like the ones used in home care systems. In this study, a new data mining method named FiMV (Filling Missing Values) is illustrated based on the mined uncertainty rules. Uncertainty rules have quite a similar structure to association rules and are extracted by an algorithm proposed in previous work, namely AURG (Adaptive Uncertainty Rule Generation). The main target was to implement an appropriate method for recovering missing values in a dynamic database, where new records are continuously added, without needing to specify any kind of thresholds beforehand. The method was applied to a home care monitoring system database. Randomly, multiple missing values for each record's attributes (rate 5-20% by 5% increments) were introduced in the initial dataset. FiMV demonstrated 100% completion rates with over 90% success in each case, while usual approaches, where all records with missing values are ignored or thresholds are required, experienced significantly reduced completion and success rates. It is concluded that the proposed method is appropriate for the data-cleaning step of the Knowledge Discovery process in databases. The latter, containing much significance for the output efficiency of any data mining technique, can improve the quality of the mined information.

  5. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

    PubMed

    Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz

    2017-12-01

    Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.

  6. The Missing Dimension of Modern Education: Values Education

    ERIC Educational Resources Information Center

    Kenan, Seyfi

    2009-01-01

    Modern education today, some argue, easily integrates and adjusts to new technological developments through flexible curricula in the areas where these developments are taking place such as in the field of information technology or in the widespread use of the Internet. However, modern education can be criticized for ignoring or failing to lead…

  7. LinkImputeR: user-guided genotype calling and imputation for non-model organisms.

    PubMed

    Money, Daniel; Migicovsky, Zoë; Gardner, Kyle; Myles, Sean

    2017-07-10

    Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from NGS technologies. It enables the user to quickly and easily examine the effects of varying thresholds and filters on the number and quality of the resulting genotype calls. In this manner, users can decide on thresholds that are most suitable for their purposes. We show that LinkImputeR can significantly augment the value and utility of NGS data sets, especially in non-model organisms with poor genomic resources.

  8. Common Scientific and Statistical Errors in Obesity Research

    PubMed Central

    George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

    2015-01-01

    We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280

  9. Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity.

    PubMed

    Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki

    2018-04-01

    Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.

  10. Improving Long-term Quality and Continuity of Landsat-7 Data Through Inpainting of Lost Data Based on the Nonconvex Model of Dynamic Dictionary Learning

    NASA Astrophysics Data System (ADS)

    Miao, J.; Zhou, Z.; Zhou, X.; Huang, T.

    2017-12-01

    On May 31, 2003, the scan line corrector (SLC) of the Enhance Thematic Mapper Plus (ETM+) on board the Landsat-7 satellite was broken down, resulting in strips of lost data in the Landsat-7 images, which seriously affected the quality and continuous applications of the ETM+ data for space and earth science. This paper proposes a new inpainting method for repairing the Landsat-7 ETM+ images taking into account the physical characteristics and geometric features of the ground area of which the data are missed. Firstly, the two geometric slopes of the boundaries of each missing stripe of the georeferenced ETM+ image is calculated by the Hough, ignoring the slope of the part of the missing strip that are on the same edges of the whole image. Secondly, an adaptive dictionary was developed and trained using a large number of Landsat-7 ETM+ SLC-ON images. When the adaptive dictionary is used to restore an image with missing data, the dictionary is actually dynamic. Then the data-missing strips were repaired along their slope directions by using the logdet (.) low-rank non-convex model along with dynamic dictionary. Imperfect points are defined as the pixels whose values are quite different from its surrounding pixel values. They can be real values but most likely can be noise. Lastly, the imperfect points after the second step were replaced by using the method of sparse restoration of the overlapping groups. We take the Landsat ETM+ images of June 10, 2002 as the test image for our algorithm evaluation. There is no data missing in this image. Therefore we extract the same missing -stripes of the images of the same WRS path and WRS row as the 2002 image but acquired after 2003 to form the missing-stripe model. Then we overlay the missing-stripe model over the image of 2002 to get the simulated missing image. Fig.1(a)-(c) show the simulated missing images of Bands 1, 3, and 5 of the 2002 ETM+ image data. We apply the algorithm to restore the missing stripes. Fig.1(d)-(f) show the restored images of Bands 1, 3, and 5, corresponding to the images (a)-(c). The repaired images are then compared with the original images band by band and it is found the algorithm works very well. We will show application of the algorithm to other images and the details in comparison.

  11. Bayesian analysis of longitudinal dyadic data with informative missing data using a dyadic shared-parameter model.

    PubMed

    Ahn, Jaeil; Morita, Satoshi; Wang, Wenyi; Yuan, Ying

    2017-01-01

    Analyzing longitudinal dyadic data is a challenging task due to the complicated correlations from repeated measurements and within-dyad interdependence, as well as potentially informative (or non-ignorable) missing data. We propose a dyadic shared-parameter model to analyze longitudinal dyadic data with ordinal outcomes and informative intermittent missing data and dropouts. We model the longitudinal measurement process using a proportional odds model, which accommodates the within-dyad interdependence using the concept of the actor-partner interdependence effects, as well as dyad-specific random effects. We model informative dropouts and intermittent missing data using a transition model, which shares the same set of random effects as the longitudinal measurement model. We evaluate the performance of the proposed method through extensive simulation studies. As our approach relies on some untestable assumptions on the missing data mechanism, we perform sensitivity analyses to evaluate how the analysis results change when the missing data mechanism is misspecified. We demonstrate our method using a longitudinal dyadic study of metastatic breast cancer.

  12. Predicting the Effects of Longitudinal Variables on Cost and Schedule Performance

    DTIC Science & Technology

    2007-03-01

    budget so that as cost growth occurs, it can be absorbed (Moore, 2003:2). This number padding is very tempting since it relieves the program...presence of a value, zero was entered for the missing variables because without any value assigned, the analysis software would ignore all data for the...program in question, reducing the already small dataset. Second, if we considered the variable in isolation, we removed the zero and left the field

  13. Reverse engineering gene regulatory networks from measurement with missing values.

    PubMed

    Ogundijo, Oyetunji E; Elmas, Abdulkadir; Wang, Xiaodong

    2016-12-01

    Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values. We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements . The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae . PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.

  14. Identifying Heat Waves in Florida: Considerations of Missing Weather Data

    PubMed Central

    Leary, Emily; Young, Linda J.; DuClos, Chris; Jordan, Melissa M.

    2015-01-01

    Background Using current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information. Objectives To identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida. Methods In addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida. Results Modeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September). Conclusions Missing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised. PMID:26619198

  15. Identifying Heat Waves in Florida: Considerations of Missing Weather Data.

    PubMed

    Leary, Emily; Young, Linda J; DuClos, Chris; Jordan, Melissa M

    2015-01-01

    Using current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information. To identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida. In addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida. Modeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September). Missing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised.

  16. Approach to addressing missing data for electronic medical records and pharmacy claims data research.

    PubMed

    Bounthavong, Mark; Watanabe, Jonathan H; Sullivan, Kevin M

    2015-04-01

    The complete capture of all values for each variable of interest in pharmacy research studies remains aspirational. The absence of these possibly influential values is a common problem for pharmacist investigators. Failure to account for missing data may translate to biased study findings and conclusions. Our goal in this analysis was to apply validated statistical methods for missing data to a previously analyzed data set and compare results when missing data methods were implemented versus standard analytics that ignore missing data effects. Using data from a retrospective cohort study, the statistical method of multiple imputation was used to provide regression-based estimates of the missing values to improve available data usable for study outcomes measurement. These findings were then contrasted with a complete-case analysis that restricted estimation to subjects in the cohort that had no missing values. Odds ratios were compared to assess differences in findings of the analyses. A nonadjusted regression analysis ("crude analysis") was also performed as a reference for potential bias. Veterans Integrated Systems Network that includes VA facilities in the Southern California and Nevada regions. New statin users between November 30, 2006, and December 2, 2007, with a diagnosis of dyslipidemia. We compared the odds ratios (ORs) and 95% confidence intervals (CIs) for the crude, complete-case, and multiple imputation analyses for the end points of a 25% or greater reduction in atherogenic lipids. Data were missing for 21.5% of identified patients (1665 subjects of 7739). Regression model results were similar for the crude, complete-case, and multiple imputation analyses with overlap of 95% confidence limits at each end point. The crude, complete-case, and multiple imputation ORs (95% CIs) for a 25% or greater reduction in low-density lipoprotein cholesterol were 3.5 (95% CI 3.1-3.9), 4.3 (95% CI 3.8-4.9), and 4.1 (95% CI 3.7-4.6), respectively. The crude, complete-case, and multiple imputation ORs (95% CIs) for a 25% or greater reduction in non-high-density lipoprotein cholesterol were 3.5 (95% CI 3.1-3.9), 4.5 (95% CI 4.0-5.2), and 4.4 (95% CI 3.9-4.9), respectively. The crude, complete-case, and multiple imputation ORs (95% CIs) for 25% or greater reduction in TGs were 3.1 (95% CI 2.8-3.6), 4.0 (95% CI 3.5-4.6), and 4.1 (95% CI 3.6-4.6), respectively. The use of the multiple imputation method to account for missing data did not alter conclusions based on a complete-case analysis. Given the frequency of missing data in research using electronic health records and pharmacy claims data, multiple imputation may play an important role in the validation of study findings. © 2015 Pharmacotherapy Publications, Inc.

  17. EEG and Eye Tracking Signatures of Target Encoding during Structured Visual Search

    PubMed Central

    Brouwer, Anne-Marie; Hogervorst, Maarten A.; Oudejans, Bob; Ries, Anthony J.; Touryan, Jonathan

    2017-01-01

    EEG and eye tracking variables are potential sources of information about the underlying processes of target detection and storage during visual search. Fixation duration, pupil size and event related potentials (ERPs) locked to the onset of fixation or saccade (saccade-related potentials, SRPs) have been reported to differ dependent on whether a target or a non-target is currently fixated. Here we focus on the question of whether these variables also differ between targets that are subsequently reported (hits) and targets that are not (misses). Observers were asked to scan 15 locations that were consecutively highlighted for 1 s in pseudo-random order. Highlighted locations displayed either a target or a non-target stimulus with two, three or four targets per trial. After scanning, participants indicated which locations had displayed a target. To induce memory encoding failures, participants concurrently performed an aurally presented math task (high load condition). In a low load condition, participants ignored the math task. As expected, more targets were missed in the high compared with the low load condition. For both conditions, eye tracking features distinguished better between hits and misses than between targets and non-targets (with larger pupil size and shorter fixations for missed compared with correctly encoded targets). In contrast, SRP features distinguished better between targets and non-targets than between hits and misses (with average SRPs showing larger P300 waveforms for targets than for non-targets). Single trial classification results were consistent with these averages. This work suggests complementary contributions of eye and EEG measures in potential applications to support search and detect tasks. SRPs may be useful to monitor what objects are relevant to an observer, and eye variables may indicate whether the observer should be reminded of them later. PMID:28559807

  18. Network sampling coverage II: The effect of non-random missing data on network measurement

    PubMed Central

    Smith, Jeffrey A.; Moody, James; Morgan, Jonathan

    2016-01-01

    Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data. PMID:27867254

  19. Network sampling coverage II: The effect of non-random missing data on network measurement.

    PubMed

    Smith, Jeffrey A; Moody, James; Morgan, Jonathan

    2017-01-01

    Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.

  20. Sensitivity to imputation models and assumptions in receiver operating characteristic analysis with incomplete data

    PubMed Central

    Karakaya, Jale; Karabulut, Erdem; Yucel, Recai M.

    2015-01-01

    Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. PMID:26379316

  1. Accounting for informatively missing data in logistic regression by means of reassessment sampling.

    PubMed

    Lin, Ji; Lyles, Robert H

    2015-05-20

    We explore the 'reassessment' design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non-ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.

  2. Comparison of Missing Data Treatments in Producing Factor Scores.

    ERIC Educational Resources Information Center

    Witta, E. Lea

    Because ignoring the missing data in an evaluation may lead to results that are questionable, this study investigated the effects of use of four missing data handling techniques on a survey instrument. A questionnaire containing 35 5-point Likert-style questions was completed by 384 respondents. Of these, 166 (43%) questionnaires contained 1 or…

  3. Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

    ERIC Educational Resources Information Center

    Pohl, Steffi; Gräfe, Linda; Rose, Norman

    2014-01-01

    Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…

  4. Modeling Nonignorable Missing Data in Speeded Tests

    ERIC Educational Resources Information Center

    Glas, Cees A. W.; Pimentel, Jonald L.

    2008-01-01

    In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and…

  5. Estimation of Item Response Theory Parameters in the Presence of Missing Data

    ERIC Educational Resources Information Center

    Finch, Holmes

    2008-01-01

    Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…

  6. Allowing for uncertainty due to missing continuous outcome data in pairwise and network meta-analysis.

    PubMed

    Mavridis, Dimitris; White, Ian R; Higgins, Julian P T; Cipriani, Andrea; Salanti, Georgia

    2015-02-28

    Missing outcome data are commonly encountered in randomized controlled trials and hence may need to be addressed in a meta-analysis of multiple trials. A common and simple approach to deal with missing data is to restrict analysis to individuals for whom the outcome was obtained (complete case analysis). However, estimated treatment effects from complete case analyses are potentially biased if informative missing data are ignored. We develop methods for estimating meta-analytic summary treatment effects for continuous outcomes in the presence of missing data for some of the individuals within the trials. We build on a method previously developed for binary outcomes, which quantifies the degree of departure from a missing at random assumption via the informative missingness odds ratio. Our new model quantifies the degree of departure from missing at random using either an informative missingness difference of means or an informative missingness ratio of means, both of which relate the mean value of the missing outcome data to that of the observed data. We propose estimating the treatment effects, adjusted for informative missingness, and their standard errors by a Taylor series approximation and by a Monte Carlo method. We apply the methodology to examples of both pairwise and network meta-analysis with multi-arm trials. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  7. Purposeful Variable Selection and Stratification to Impute Missing FAST Data in Trauma Research

    PubMed Central

    Fuchs, Paul A.; del Junco, Deborah J.; Fox, Erin E.; Holcomb, John B.; Rahbar, Mohammad H.; Wade, Charles A.; Alarcon, Louis H.; Brasel, Karen J.; Bulger, Eileen M.; Cohen, Mitchell J.; Myers, John G.; Muskat, Peter; Phelan, Herb A.; Schreiber, Martin A.; Cotton, Bryan A.

    2013-01-01

    Background The Focused Assessment with Sonography for Trauma (FAST) exam is an important variable in many retrospective trauma studies. The purpose of this study was to devise an imputation method to overcome missing data for the FAST exam. Due to variability in patients’ injuries and trauma care, these data are unlikely to be missing completely at random (MCAR), raising concern for validity when analyses exclude patients with missing values. Methods Imputation was conducted under a less restrictive, more plausible missing at random (MAR) assumption. Patients with missing FAST exams had available data on alternate, clinically relevant elements that were strongly associated with FAST results in complete cases, especially when considered jointly. Subjects with missing data (32.7%) were divided into eight mutually exclusive groups based on selected variables that both described the injury and were associated with missing FAST values. Additional variables were selected within each group to classify missing FAST values as positive or negative, and correct FAST exam classification based on these variables was determined for patients with non-missing FAST values. Results Severe head/neck injury (odds ratio, OR=2.04), severe extremity injury (OR=4.03), severe abdominal injury (OR=1.94), no injury (OR=1.94), other abdominal injury (OR=0.47), other head/neck injury (OR=0.57) and other extremity injury (OR=0.45) groups had significant ORs for missing data; the other group odds ratio was not significant (OR=0.84). All 407 missing FAST values were imputed, with 109 classified as positive. Correct classification of non-missing FAST results using the alternate variables was 87.2%. Conclusions Purposeful imputation for missing FAST exams based on interactions among selected variables assessed by simple stratification may be a useful adjunct to sensitivity analysis in the evaluation of imputation strategies under different missing data mechanisms. This approach has the potential for widespread application in clinical and translational research and validation is warranted. Level of Evidence Level II Prognostic or Epidemiological PMID:23778515

  8. Evaluation of Approaches to Deal with Low-Frequency Nuisance Covariates in Population Pharmacokinetic Analyses.

    PubMed

    Lagishetty, Chakradhar V; Duffull, Stephen B

    2015-11-01

    Clinical studies include occurrences of rare variables, like genotypes, which due to their frequency and strength render their effects difficult to estimate from a dataset. Variables that influence the estimated value of a model-based parameter are termed covariates. It is often difficult to determine if such an effect is significant, since type I error can be inflated when the covariate is rare. Their presence may have either an insubstantial effect on the parameters of interest, hence are ignorable, or conversely they may be influential and therefore non-ignorable. In the case that these covariate effects cannot be estimated due to power and are non-ignorable, then these are considered nuisance, in that they have to be considered but due to type 1 error are of limited interest. This study assesses methods of handling nuisance covariate effects. The specific objectives include (1) calibrating the frequency of a covariate that is associated with type 1 error inflation, (2) calibrating its strength that renders it non-ignorable and (3) evaluating methods for handling these non-ignorable covariates in a nonlinear mixed effects model setting. Type 1 error was determined for the Wald test. Methods considered for handling the nuisance covariate effects were case deletion, Box-Cox transformation and inclusion of a specific fixed effects parameter. Non-ignorable nuisance covariates were found to be effectively handled through addition of a fixed effect parameter.

  9. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  10. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    PubMed

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  11. Gaussian mixture clustering and imputation of microarray data.

    PubMed

    Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

    2004-04-12

    In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.

  12. Accommodating Missing Data in Mixture Models for Classification by Opinion-Changing Behavior.

    ERIC Educational Resources Information Center

    Hill, Jennifer L.

    2001-01-01

    Explored the assumptions implicit in models reflecting three different approaches to missing survey response data using opinion data collected from Swiss citizens at four time points over nearly 2 years. Results suggest that the latently ignorable model has the least restrictive structural assumptions. Discusses the idea of "durable…

  13. Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

    PubMed

    Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

    2005-05-15

    Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. The CMVE software is available upon request from the authors.

  14. Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random.

    PubMed

    Doidge, James C

    2018-02-01

    Population-based cohort studies are invaluable to health research because of the breadth of data collection over time, and the representativeness of their samples. However, they are especially prone to missing data, which can compromise the validity of analyses when data are not missing at random. Having many waves of data collection presents opportunity for participants' responsiveness to be observed over time, which may be informative about missing data mechanisms and thus useful as an auxiliary variable. Modern approaches to handling missing data such as multiple imputation and maximum likelihood can be difficult to implement with the large numbers of auxiliary variables and large amounts of non-monotone missing data that occur in cohort studies. Inverse probability-weighting can be easier to implement but conventional wisdom has stated that it cannot be applied to non-monotone missing data. This paper describes two methods of applying inverse probability-weighting to non-monotone missing data, and explores the potential value of including measures of responsiveness in either inverse probability-weighting or multiple imputation. Simulation studies are used to compare methods and demonstrate that responsiveness in longitudinal studies can be used to mitigate bias induced by missing data, even when data are not missing at random.

  15. Covariance Structure Model Fit Testing under Missing Data: An Application of the Supplemented EM Algorithm

    ERIC Educational Resources Information Center

    Cai, Li; Lee, Taehun

    2009-01-01

    We apply the Supplemented EM algorithm (Meng & Rubin, 1991) to address a chronic problem with the "two-stage" fitting of covariance structure models in the presence of ignorable missing data: the lack of an asymptotically chi-square distributed goodness-of-fit statistic. We show that the Supplemented EM algorithm provides a…

  16. Assessing Agreement between Multiple Raters with Missing Rating Information, Applied to Breast Cancer Tumour Grading

    PubMed Central

    Ellis, Ian O.; Green, Andrew R.; Hanka, Rudolf

    2008-01-01

    Background We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only ‘moderate’ agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24177 grades, on a discrete 1–3 scale, provided by 732 pathologists for 52 samples. Methodology/Principal Findings We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1–2 and 2–3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively ‘easy’ set of samples. Conclusions/Significance Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the ‘true’ grade of many of the breast cancer tumours, a fact often ignored in clinical studies. PMID:18698346

  17. A Comparison of Factor Score Estimation Methods in the Presence of Missing Data: Reliability and an Application to Nicotine Dependence

    ERIC Educational Resources Information Center

    Estabrook, Ryne; Neale, Michael

    2013-01-01

    Factor score estimation is a controversial topic in psychometrics, and the estimation of factor scores from exploratory factor models has historically received a great deal of attention. However, both confirmatory factor models and the existence of missing data have generally been ignored in this debate. This article presents a simulation study…

  18. A nonparametric multiple imputation approach for missing categorical data.

    PubMed

    Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

    2017-06-06

    Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

  19. Modelling variable dropout in randomised controlled trials with longitudinal outcomes: application to the MAGNETIC study.

    PubMed

    Kolamunnage-Dona, Ruwanthi; Powell, Colin; Williamson, Paula Ruth

    2016-04-28

    Clinical trials with longitudinally measured outcomes are often plagued by missing data due to patients withdrawing or dropping out from the trial before completing the measurement schedule. The reasons for dropout are sometimes clearly known and recorded during the trial, but in many instances these reasons are unknown or unclear. Often such reasons for dropout are non-ignorable. However, the standard methods for analysing longitudinal outcome data assume that missingness is non-informative and ignore the reasons for dropout, which could result in a biased comparison between the treatment groups. In this article, as a post hoc analysis, we explore the impact of informative dropout due to competing reasons on the evaluation of treatment effect in the MAGNETIC trial, the largest randomised placebo-controlled study to date comparing the addition of nebulised magnesium sulphate to standard treatment in acute severe asthma in children. We jointly model longitudinal outcome and informative dropout process to incorporate the information regarding the reasons for dropout by treatment group. The effect of nebulised magnesium sulphate compared with standard treatment is evaluated more accurately using a joint longitudinal-competing risk model by taking account of such complexities. The corresponding estimates indicate that the rate of dropout due to good prognosis is about twice as high in the magnesium group compared with standard treatment. We emphasise the importance of identifying reasons for dropout and undertaking an appropriate statistical analysis accounting for such dropout. The joint modelling approach accounting for competing reasons for dropout is proposed as a general approach for evaluating the sensitivity of conclusions to assumptions regarding missing data in clinical trials with longitudinal outcomes. EudraCT number 2007-006227-12 . Registration date 18 Mar 2008.

  20. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods

    PubMed Central

    Shara, Nawar; Yassin, Sayf A.; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V.; Wang, Wenyu; Lee, Elisa T.; Umans, Jason G.

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989–1991), 2 (1993–1995), and 3 (1998–1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results. PMID:26414328

  1. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    PubMed

    Shara, Nawar; Yassin, Sayf A; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V; Wang, Wenyu; Lee, Elisa T; Umans, Jason G

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991), 2 (1993-1995), and 3 (1998-1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  2. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  3. NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES

    PubMed Central

    He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.

    2017-01-01

    Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225

  4. Improving record linkage performance in the presence of missing linkage data.

    PubMed

    Ong, Toan C; Mannino, Michael V; Schilling, Lisa M; Kahn, Michael G

    2014-12-01

    Existing record linkage methods do not handle missing linking field values in an efficient and effective manner. The objective of this study is to investigate three novel methods for improving the accuracy and efficiency of record linkage when record linkage fields have missing values. By extending the Fellegi-Sunter scoring implementations available in the open-source Fine-grained Record Linkage (FRIL) software system we developed three novel methods to solve the missing data problem in record linkage, which we refer to as: Weight Redistribution, Distance Imputation, and Linkage Expansion. Weight Redistribution removes fields with missing data from the set of quasi-identifiers and redistributes the weight from the missing attribute based on relative proportions across the remaining available linkage fields. Distance Imputation imputes the distance between the missing data fields rather than imputing the missing data value. Linkage Expansion adds previously considered non-linkage fields to the linkage field set to compensate for the missing information in a linkage field. We tested the linkage methods using simulated data sets with varying field value corruption rates. The methods developed had sensitivity ranging from .895 to .992 and positive predictive values (PPV) ranging from .865 to 1 in data sets with low corruption rates. Increased corruption rates lead to decreased sensitivity for all methods. These new record linkage algorithms show promise in terms of accuracy and efficiency and may be valuable for combining large data sets at the patient level to support biomedical and clinical research. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. The treatment of missing data in a large cardiovascular clinical outcomes study.

    PubMed

    Little, Roderick J; Wang, Julia; Sun, Xiang; Tian, Hong; Suh, Eun-Young; Lee, Michael; Sarich, Troy; Oppenheimer, Leonard; Plotnikov, Alexei; Wittes, Janet; Cook-Bruns, Nancy; Burton, Paul; Gibson, C Michael; Mohanty, Surya

    2016-06-01

    The potential impact of missing data on the results of clinical trials has received heightened attention recently. A National Research Council study provides recommendations for limiting missing data in clinical trial design and conduct, and principles for analysis, including the need for sensitivity analyses to assess robustness of findings to alternative assumptions about the missing data. A Food and Drug Administration advisory committee raised missing data as a serious concern in their review of results from the ATLAS ACS 2 TIMI 51 study, a large clinical trial that assessed rivaroxaban for its ability to reduce the risk of cardiovascular death, myocardial infarction or stroke in patients with acute coronary syndrome. This case study describes a variety of measures that were taken to address concerns about the missing data. A range of analyses are described to assess the potential impact of missing data on conclusions. In particular, measures of the amount of missing data are discussed, and the fraction of missing information from multiple imputation is proposed as an alternative measure. The sensitivity analysis in the National Research Council study is modified in the context of survival analysis where some individuals are lost to follow-up. The impact of deviations from ignorable censoring is assessed by differentially increasing the hazard of the primary outcome in the treatment groups and multiply imputing events between dropout and the end of the study. Tipping-point analyses are described, where the deviation from ignorable censoring that results in a reversal of significance of the treatment effect is determined. A study to determine the vital status of participants lost to follow-up was also conducted, and the results of including this additional information are assessed. Sensitivity analyses suggest that findings of the ATLAS ACS 2 TIMI 51 study are robust to missing data; this robustness is reinforced by the follow-up study, since inclusion of data from this study had little impact on the study conclusions. Missing data are a serious problem in clinical trials. The methods presented here, namely, the sensitivity analyses, the follow-up study to determine survival of missing cases, and the proposed measurement of missing data via the fraction of missing information, have potential application in other studies involving survival analysis where missing data are a concern. © The Author(s) 2016.

  6. Improving cluster-based missing value estimation of DNA microarray data.

    PubMed

    Brás, Lígia P; Menezes, José C

    2007-06-01

    We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

  7. MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.

    PubMed

    Wu, Wei-Sheng; Jhou, Meng-Jhun

    2017-01-13

    Missing value imputation is important for microarray data analyses because microarray data with missing values would significantly degrade the performance of the downstream analyses. Although many microarray missing value imputation algorithms have been developed, an objective and comprehensive performance comparison framework is still lacking. To solve this problem, we previously proposed a framework which can perform a comprehensive performance comparison of different existing algorithms. Also the performance of a new algorithm can be evaluated by our performance comparison framework. However, constructing our framework is not an easy task for the interested researchers. To save researchers' time and efforts, here we present an easy-to-use web tool named MVIAeval (Missing Value Imputation Algorithm evaluator) which implements our performance comparison framework. MVIAeval provides a user-friendly interface allowing users to upload the R code of their new algorithm and select (i) the test datasets among 20 benchmark microarray (time series and non-time series) datasets, (ii) the compared algorithms among 12 existing algorithms, (iii) the performance indices from three existing ones, (iv) the comprehensive performance scores from two possible choices, and (v) the number of simulation runs. The comprehensive performance comparison results are then generated and shown as both figures and tables. MVIAeval is a useful tool for researchers to easily conduct a comprehensive and objective performance evaluation of their newly developed missing value imputation algorithm for microarray data or any data which can be represented as a matrix form (e.g. NGS data or proteomics data). Thus, MVIAeval will greatly expedite the progress in the research of missing value imputation algorithms.

  8. Do people treat missing information adaptively when making inferences?

    PubMed

    Garcia-Retamero, Rocio; Rieskamp, Jörg

    2009-10-01

    When making inferences, people are often confronted with situations with incomplete information. Previous research has led to a mixed picture about how people react to missing information. Options include ignoring missing information, treating it as either positive or negative, using the average of past observations for replacement, or using the most frequent observation of the available information as a placeholder. The accuracy of these inference mechanisms depends on characteristics of the environment. When missing information is uniformly distributed, it is most accurate to treat it as the average, whereas when it is negatively correlated with the criterion to be judged, treating missing information as if it were negative is most accurate. Whether people treat missing information adaptively according to the environment was tested in two studies. The results show that participants were sensitive to how missing information was distributed in an environment and most frequently selected the mechanism that was most adaptive. From these results the authors conclude that reacting to missing information in different ways is an adaptive response to environmental characteristics.

  9. Modeling missing data in knowledge space theory.

    PubMed

    de Chiusole, Debora; Stefanutti, Luca; Anselmi, Pasquale; Robusto, Egidio

    2015-12-01

    Missing data are a well known issue in statistical inference, because some responses may be missing, even when data are collected carefully. The problem that arises in these cases is how to deal with missing data. In this article, the missingness is analyzed in knowledge space theory, and in particular when the basic local independence model (BLIM) is applied to the data. Two extensions of the BLIM to missing data are proposed: The former, called ignorable missing BLIM (IMBLIM), assumes that missing data are missing completely at random; the latter, called missing BLIM (MissBLIM), introduces specific dependencies of the missing data on the knowledge states, thus assuming that the missing data are missing not at random. The IMBLIM and the MissBLIM modeled the missingness in a satisfactory way, in both a simulation study and an empirical application, depending on the process that generates the missingness: If the missing data-generating process is of type missing completely at random, then either IMBLIM or MissBLIM provide adequate fit to the data. However, if the pattern of missingness is functionally dependent upon unobservable features of the data (e.g., missing answers are more likely to be wrong), then only a correctly specified model of the missingness distribution provides an adequate fit to the data. (c) 2015 APA, all rights reserved).

  10. Aviation Trainer Technology Test Plan. Volume II. Software Development

    DTIC Science & Technology

    1991-11-25

    feild values in new node *Ieg>X=x newg->X = Y newg->Len =len; newg->Help =help; newg->Ignore = ignore; newg->Format = format; newg->Validation - NULL;I... vector : North long varVFE; /* F-16A velocity vector : East long varVFU; /* F-16A velocity vector : Up */ long varH; /* plane heading */ long varC; /* plane...31\\\\ETH523.sys" parmsdr.args=getds(); parmsdr.non7=OxOO; /*save interrupt vector for future restoration */ cSavvecso; rc=getdso; rc=cInitParameters

  11. Power in randomized group comparisons: the value of adding a single intermediate time point to a traditional pretest-posttest design.

    PubMed

    Venter, Anre; Maxwell, Scott E; Bolig, Erika

    2002-06-01

    Adding a pretest as a covariate to a randomized posttest-only design increases statistical power, as does the addition of intermediate time points to a randomized pretest-posttest design. Although typically 5 waves of data are required in this instance to produce meaningful gains in power, a 3-wave intensive design allows the evaluation of the straight-line growth model and may reduce the effect of missing data. The authors identify the statistically most powerful method of data analysis in the 3-wave intensive design. If straight-line growth is assumed, the pretest-posttest slope must assume fairly extreme values for the intermediate time point to increase power beyond the standard analysis of covariance on the posttest with the pretest as covariate, ignoring the intermediate time point.

  12. Sensitivity Analysis of Multiple Informant Models When Data are Not Missing at Random

    PubMed Central

    Blozis, Shelley A.; Ge, Xiaojia; Xu, Shu; Natsuaki, Misaki N.; Shaw, Daniel S.; Neiderhiser, Jenae; Scaramella, Laura; Leve, Leslie; Reiss, David

    2014-01-01

    Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups may be retained even if only one member of a group contributes data. Statistical inference is based on the assumption that data are missing completely at random or missing at random. Importantly, whether or not data are missing is assumed to be independent of the missing data. A saturated correlates model that incorporates correlates of the missingness or the missing data into an analysis and multiple imputation that may also use such correlates offer advantages over the standard implementation of SEM when data are not missing at random because these approaches may result in a data analysis problem for which the missingness is ignorable. This paper considers these approaches in an analysis of family data to assess the sensitivity of parameter estimates to assumptions about missing data, a strategy that may be easily implemented using SEM software. PMID:25221420

  13. Reading "The Fountainhead": The Missing Self in Ayn Rand's Ethical Individualism

    ERIC Educational Resources Information Center

    Fand, Roxanne J.

    2009-01-01

    Ayn Rand's novel "The Fountainhead" can be a useful text in an undergraduate English class, helping students think through issues of individualism. Rand's own concept of the self, however, ignores its social dimensions. (Contains 7 notes.)

  14. Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data

    PubMed Central

    Welch, Catherine A; Petersen, Irene; Bartlett, Jonathan W; White, Ian R; Marston, Louise; Morris, Richard W; Nazareth, Irwin; Walters, Kate; Carpenter, James

    2014-01-01

    Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One approach is to divide data into time blocks and implement MI independently at each block. An alternative approach is to include all time blocks in the same MI model. With increasing numbers of time blocks, this approach is likely to break down because of co-linearity and over-fitting. The new two-fold fully conditional specification (FCS) MI algorithm addresses these issues, by only conditioning on measurements, which are local in time. We describe and report the results of a novel simulation study to critically evaluate the two-fold FCS algorithm and its suitability for imputation of longitudinal electronic health records. After generating a full data set, approximately 70% of selected continuous and categorical variables were made missing completely at random in each of ten time blocks. Subsequently, we applied a simple time-to-event model. We compared efficiency of estimated coefficients from a complete records analysis, MI of data in the baseline time block and the two-fold FCS algorithm. The results show that the two-fold FCS algorithm maximises the use of data available, with the gain relative to baseline MI depending on the strength of correlations within and between variables. Using this approach also increases plausibility of the missing at random assumption by using repeated measures over time of variables whose baseline values may be missing. PMID:24782349

  15. iVAR: a program for imputing missing data in multivariate time series using vector autoregressive models.

    PubMed

    Liu, Siwei; Molenaar, Peter C M

    2014-12-01

    This article introduces iVAR, an R program for imputing missing data in multivariate time series on the basis of vector autoregressive (VAR) models. We conducted a simulation study to compare iVAR with three methods for handling missing data: listwise deletion, imputation with sample means and variances, and multiple imputation ignoring time dependency. The results showed that iVAR produces better estimates for the cross-lagged coefficients than do the other three methods. We demonstrate the use of iVAR with an empirical example of time series electrodermal activity data and discuss the advantages and limitations of the program.

  16. Artful Dodges Principals Use to Beat Bureaucracy.

    ERIC Educational Resources Information Center

    Ficklen, Ellen

    1982-01-01

    A study of Chicago (Illinois) principals revealed many ways principals practiced "creative insubordination"--avoiding following instructions but still getting things done. Among the dodges are deliberately missing deadlines, following orders literally, ignoring channels to procure teachers or materials, and using community members to…

  17. Analysis of cigarette purchase task instrument data with a left-censored mixed effects model.

    PubMed

    Liao, Wenjie; Luo, Xianghua; Le, Chap T; Chu, Haitao; Epstein, Leonard H; Yu, Jihnhee; Ahluwalia, Jasjit S; Thomas, Janet L

    2013-04-01

    The drug purchase task is a frequently used instrument for measuring the relative reinforcing efficacy (RRE) of a substance, a central concept in psychopharmacological research. Although a purchase task instrument, such as the cigarette purchase task (CPT), provides a comprehensive and inexpensive way to assess various aspects of a drug's RRE, the application of conventional statistical methods to data generated from such an instrument may not be adequate by simply ignoring or replacing the extra zeros or missing values in the data with arbitrary small consumption values, for example, 0.001. We applied the left-censored mixed effects model to CPT data from a smoking cessation study of college students and demonstrated its superiority over the existing methods with simulation studies. Theoretical implications of the findings, limitations of the proposed method, and future directions of research are also discussed.

  18. Analysis of Cigarette Purchase Task Instrument Data with a Left-Censored Mixed Effects Model

    PubMed Central

    Liao, Wenjie; Luo, Xianghua; Le, Chap; Chu, Haitao; Epstein, Leonard H.; Yu, Jihnhee; Ahluwalia, Jasjit S.; Thomas, Janet L.

    2015-01-01

    The drug purchase task is a frequently used instrument for measuring the relative reinforcing efficacy (RRE) of a substance, a central concept in psychopharmacological research. While a purchase task instrument, such as the cigarette purchase task (CPT), provides a comprehensive and inexpensive way to assess various aspects of a drug’s RRE, the application of conventional statistical methods to data generated from such an instrument may not be adequate by simply ignoring or replacing the extra zeros or missing values in the data with arbitrary small consumption values, e.g. 0.001. We applied the left-censored mixed effects model to CPT data from a smoking cessation study of college students and demonstrated its superiority over the existing methods with simulation studies. Theoretical implications of the findings, limitations of the proposed method and future directions of research are also discussed. PMID:23356731

  19. Accounting for dropout bias using mixed-effects models.

    PubMed

    Mallinckrodt, C H; Clark, W S; David, S R

    2001-01-01

    Treatment effects are often evaluated by comparing change over time in outcome measures. However, valid analyses of longitudinal data can be problematic when subjects discontinue (dropout) prior to completing the study. This study assessed the merits of likelihood-based repeated measures analyses (MMRM) compared with fixed-effects analysis of variance where missing values were imputed using the last observation carried forward approach (LOCF) in accounting for dropout bias. Comparisons were made in simulated data and in data from a randomized clinical trial. Subject dropout was introduced in the simulated data to generate ignorable and nonignorable missingness. Estimates of treatment group differences in mean change from baseline to endpoint from MMRM were, on average, markedly closer to the true value than estimates from LOCF in every scenario simulated. Standard errors and confidence intervals from MMRM accurately reflected the uncertainty of the estimates, whereas standard errors and confidence intervals from LOCF underestimated uncertainty.

  20. College drinking behaviors: mediational links between parenting styles, impulse control, and alcohol-related outcomes.

    PubMed

    Patock-Peckham, Julie A; Morgan-Lopez, Antonio A

    2006-06-01

    Mediational links between parenting styles (authoritative, authoritarian, permissive), impulsiveness (general control), drinking control (specific control), and alcohol use and abuse were tested. A pattern-mixture approach (for modeling non-ignorable missing data) with multiple-group structural equation models with 421 (206 female, 215 male) college students was used. Gender was examined as a potential moderator of parenting styles on control processes related to drinking. Specifically, the parent-child gender match was found to have implications for increased levels of impulsiveness (a significant mediator of parenting effects on drinking control). These findings suggest that a parent with a permissive parenting style who is the same gender as the respondent can directly influence control processes and indirectly influence alcohol use and abuse.

  1. Learning from Non-Reported Data: Interpreting Missing Body Mass Index Values in Young Children

    ERIC Educational Resources Information Center

    Arbour-Nicitopoulos, Kelly P.; Faulkner, Guy E.; Leatherdale, Scott T.

    2010-01-01

    The objective of this study was to examine the pattern of relations between missing weight and height (BMI) data and a range of demographic, physical activity, sedentary behavior, and academic measures in a young sample of elementary school children. A secondary analysis of a large cross-sectional study, PLAY-On, was conducted using self-reported…

  2. The effect of missing data on linkage disequilibrium mapping and haplotype association analysis in the GAW14 simulated datasets

    PubMed Central

    McCaskie, Pamela A; Carter, Kim W; McCaskie, Simon R; Palmer, Lyle J

    2005-01-01

    We used our newly developed linkage disequilibrium (LD) plotting software, JLIN, to plot linkage disequilibrium between pairs of single-nucleotide polymorphisms (SNPs) for three chromosomes of the Genetic Analysis Workshop 14 Aipotu simulated population to assess the effect of missing data on LD calculations. Our haplotype analysis program, SIMHAP, was used to assess the effect of missing data on haplotype-phenotype association. Genotype data was removed at random, at levels of 1%, 5%, and 10%, and the LD calculations and haplotype association results for these levels of missingness were compared to those for the complete dataset. It was concluded that ignoring individuals with missing data substantially affects the number of regions of LD detected which, in turn, could affect tagging SNPs chosen to generate haplotypes. PMID:16451612

  3. Association Between Breast Cancer Disease Progression and Workplace Productivity in the United States.

    PubMed

    Yin, Wesley; Horblyuk, Ruslan; Perkins, Julia Jane; Sison, Steve; Smith, Greg; Snider, Julia Thornton; Wu, Yanyu; Philipson, Tomas J

    2017-02-01

    Determine workplace productivity losses attributable to breast cancer progression. Longitudinal analysis linking 2005 to 2012 medical and pharmacy claims and workplace absence data in the US patients were commercially insured women aged 18 to 64 diagnosed with breast cancer. Productivity was measured as employment status and total quarterly workplace hours missed, and valued using average US wages. Six thousand four hundred and nine women were included. Breast cancer progression was associated with a lower probability of employment (hazard ratio [HR] = 0.65, P < 0.01) and increased workplace hours missed. The annual value of missed work was $24,166 for non-metastatic and $30,666 for metastatic patients. Thus, progression to metastatic disease is associated with an additional $6500 in lost work time (P < 0.05), or 14% of average US wages. Breast cancer progression leads to diminished likelihood of employment, increased workplace hours missed, and increased cost burden.

  4. Analyzing communication skills of Pediatric Postgraduate Residents in Clinical Encounter by using video recordings

    PubMed Central

    Bari, Attia; Khan, Rehan Ahmed; Jabeen, Uzma; Rathore, Ahsan Waheed

    2017-01-01

    Objective: To analyze communication skills of pediatric postgraduate residents in clinical encounter by using video recordings. Methods: This qualitative exploratory research was conducted through video recording at The Children’s Hospital Lahore, Pakistan. Residents who had attended the mandatory communication skills workshop offered by CPSP were included. The video recording of clinical encounter was done by a trained audiovisual person while the resident was interacting with the patient in the clinical encounter. Data was analyzed by thematic analysis. Results: Initially on open coding 36 codes emerged and then through axial and selective coding these were condensed to 17 subthemes. Out of these four main themes emerged: (1) Courteous and polite attitude, (2) Marginal nonverbal communication skills, (3) Power game/Ignoring child participation and (4) Patient as medical object/Instrumental behaviour. All residents treated the patient as a medical object to reach a right diagnosis and ignored them as a human being. There was dominant role of doctors and marginal nonverbal communication skills were displayed by the residents in the form of lack of social touch, and appropriate eye contact due to documenting notes. A brief non-medical interaction for rapport building at the beginning of interaction was missing and there was lack of child involvement. Conclusion: Paediatric postgraduate residents were polite while communicating with parents and child but lacking in good nonverbal communication skills. Communication pattern in our study was mostly one-way showing doctor’s instrumental behaviour and ignoring the child participation. PMID:29492050

  5. A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

    PubMed

    De Silva, Anurika Priyanjali; Moreno-Betancur, Margarita; De Livera, Alysha Madhu; Lee, Katherine Jane; Simpson, Julie Anne

    2017-07-25

    Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies. We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. We recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window.

  6. Manifold regularized matrix completion for multi-label learning with ADMM.

    PubMed

    Liu, Bin; Li, Yingming; Xu, Zenglin

    2018-05-01

    Multi-label learning is a common machine learning problem arising from numerous real-world applications in diverse fields, e.g, natural language processing, bioinformatics, information retrieval and so on. Among various multi-label learning methods, the matrix completion approach has been regarded as a promising approach to transductive multi-label learning. By constructing a joint matrix comprising the feature matrix and the label matrix, the missing labels of test samples are regarded as missing values of the joint matrix. With the low-rank assumption of the constructed joint matrix, the missing labels can be recovered by minimizing its rank. Despite its success, most matrix completion based approaches ignore the smoothness assumption of unlabeled data, i.e., neighboring instances should also share a similar set of labels. Thus they may under exploit the intrinsic structures of data. In addition, the matrix completion problem can be less efficient. To this end, we propose to efficiently solve the multi-label learning problem as an enhanced matrix completion model with manifold regularization, where the graph Laplacian is used to ensure the label smoothness over it. To speed up the convergence of our model, we develop an efficient iterative algorithm, which solves the resulted nuclear norm minimization problem with the alternating direction method of multipliers (ADMM). Experiments on both synthetic and real-world data have shown the promising results of the proposed approach. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Maximum Likelihood Estimation of Nonlinear Structural Equation Models with Ignorable Missing Data

    ERIC Educational Resources Information Center

    Lee, Sik-Yum; Song, Xin-Yuan; Lee, John C. K.

    2003-01-01

    The existing maximum likelihood theory and its computer software in structural equation modeling are established on the basis of linear relationships among latent variables with fully observed data. However, in social and behavioral sciences, nonlinear relationships among the latent variables are important for establishing more meaningful models…

  8. Pulling Together: Civic Capacity and Urban School Reform

    ERIC Educational Resources Information Center

    Shipps, Dorothy

    2003-01-01

    Educators often ignore the political requirements of urban reform in their focus on the research and models that guide it. Conversely, political scientists frequently miss the differences among reforms in their focus on coalitions and resources. Integrating Clarence N. Stone's concept of "civic capacity" with an educator's view of reform…

  9. Diversity and Equity in Educational Administration: Missing in Theory and in Action.

    ERIC Educational Resources Information Center

    Gosetti, Penny Poplin; Rusch, Edith A.

    This paper argues that the texts, conversations, writings, and professional activities that construct our understanding of leadership come from an embedded, privileged perspective that has largely ignored issues of status, gender, and race. This perspective insidiously perpetuates a view of leadership that discourages diversity and equity. Two…

  10. The impact of loss to follow-up on hypothesis tests of the treatment effect for several statistical methods in substance abuse clinical trials.

    PubMed

    Hedden, Sarra L; Woolson, Robert F; Carter, Rickey E; Palesch, Yuko; Upadhyaya, Himanshu P; Malcolm, Robert J

    2009-07-01

    "Loss to follow-up" can be substantial in substance abuse clinical trials. When extensive losses to follow-up occur, one must cautiously analyze and interpret the findings of a research study. Aims of this project were to introduce the types of missing data mechanisms and describe several methods for analyzing data with loss to follow-up. Furthermore, a simulation study compared Type I error and power of several methods when missing data amount and mechanism varies. Methods compared were the following: Last observation carried forward (LOCF), multiple imputation (MI), modified stratified summary statistics (SSS), and mixed effects models. Results demonstrated nominal Type I error for all methods; power was high for all methods except LOCF. Mixed effect model, modified SSS, and MI are generally recommended for use; however, many methods require that the data are missing at random or missing completely at random (i.e., "ignorable"). If the missing data are presumed to be nonignorable, a sensitivity analysis is recommended.

  11. Ignoring Intermarker Linkage Disequilibrium Induces False-Positive Evidence of Linkage for Consanguineous Pedigrees when Genotype Data Is Missing for Any Pedigree Member

    PubMed Central

    Li, Bingshan; Leal, Suzanne M.

    2008-01-01

    Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al. [1] that no bias occurs in this situation for affected sib-pairs with unrelated parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for autosomal recessive consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. False-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage, and which family members aid in its reduction, is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. For the situation, when parental genotypes are unavailable, false-positive evidence for linkage can be reduced by including genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents in the analysis. PMID:18073490

  12. Handling missing Mini-Mental State Examination (MMSE) values: Results from a cross-sectional long-term-care study.

    PubMed

    Godin, Judith; Keefe, Janice; Andrew, Melissa K

    2017-04-01

    Missing values are commonly encountered on the Mini Mental State Examination (MMSE), particularly when administered to frail older people. This presents challenges for MMSE scoring in research settings. We sought to describe missingness in MMSEs administered in long-term-care facilities (LTCF) and to compare and contrast approaches to dealing with missing items. As part of the Care and Construction project in Nova Scotia, Canada, LTCF residents completed an MMSE. Different methods of dealing with missing values (e.g., use of raw scores, raw scores/number of items attempted, scale-level multiple imputation [MI], and blended approaches) are compared to item-level MI. The MMSE was administered to 320 residents living in 23 LTCF. The sample was predominately female (73%), and 38% of participants were aged >85 years. At least one item was missing from 122 (38.2%) of the MMSEs. Data were not Missing Completely at Random (MCAR), χ 2 (1110) = 1,351, p < 0.001. Using raw scores for those missing <6 items in combination with scale-level MI resulted in the regression coefficients and standard errors closest to item-level MI. Patterns of missing items often suggest systematic problems, such as trouble with manual dexterity, literacy, or visual impairment. While these observations may be relatively easy to take into account in clinical settings, non-random missingness presents challenges for research and must be considered in statistical analyses. We present suggestions for dealing with missing MMSE data based on the extent of missingness and the goal of analyses. Copyright © 2016 The Authors. Production and hosting by Elsevier B.V. All rights reserved.

  13. Missed rib fractures on evaluation of initial chest CT for trauma patients: pattern analysis and diagnostic value of coronal multiplanar reconstruction images with multidetector row CT.

    PubMed

    Cho, S H; Sung, Y M; Kim, M S

    2012-10-01

    The objective of this study was to review the prevalence and radiological features of rib fractures missed on initial chest CT evaluation, and to examine the diagnostic value of additional coronal images in a large series of trauma patients. 130 patients who presented to an emergency room for blunt chest trauma underwent multidetector row CT of the thorax within the first hour during their stay, and had follow-up CT or bone scans as diagnostic gold standards. Images were evaluated on two separate occasions: once with axial images and once with both axial and coronal images. The detection rates of missed rib fractures were compared between readings using a non-parametric method of clustered data. In the cases of missed rib fractures, the shapes, locations and associated fractures were evaluated. 58 rib fractures were missed with axial images only and 52 were missed with both axial and coronal images (p=0.088). The most common shape of missed rib fractures was buckled (56.9%), and the anterior arc (55.2%) was most commonly involved. 21 (36.2%) missed rib fractures had combined fractures on the same ribs, and 38 (65.5%) were accompanied by fracture on neighbouring ribs. Missed rib fractures are not uncommon, and radiologists should be familiar with buckle fractures, which are frequently missed. Additional coronal imagescan be helpful in the diagnosis of rib fractures that are not seen on axial images.

  14. How shared preferences in music create bonds between people: values as the missing link.

    PubMed

    Boer, Diana; Fischer, Ronald; Strack, Micha; Bond, Michael H; Lo, Eva; Lam, Jason

    2011-09-01

    How can shared music preferences create social bonds between people? A process model is developed in which music preferences as value-expressive attitudes create social bonds via conveyed value similarity. The musical bonding model links two research streams: (a) music preferences as indicators of similarity in value orientations and (b) similarity in value orientations leading to social attraction. Two laboratory experiments and one dyadic field study demonstrated that music can create interpersonal bonds between young people because music preferences can be cues for similar or dissimilar value orientations, with similarity in values then contributing to social attraction. One study tested and ruled out an alternative explanation (via personality similarity), illuminating the differential impact of perceived value similarity versus personality similarity on social attraction. Value similarity is the missing link in explaining the musical bonding phenomenon, which seems to hold for Western and non-Western samples and in experimental and natural settings.

  15. Understanding identifiability as a crucial step in uncertainty assessment

    NASA Astrophysics Data System (ADS)

    Jakeman, A. J.; Guillaume, J. H. A.; Hill, M. C.; Seo, L.

    2016-12-01

    The topic of identifiability analysis offers concepts and approaches to identify why unique model parameter values cannot be identified, and can suggest possible responses that either increase uniqueness or help to understand the effect of non-uniqueness on predictions. Identifiability analysis typically involves evaluation of the model equations and the parameter estimation process. Non-identifiability can have a number of undesirable effects. In terms of model parameters these effects include: parameters not being estimated uniquely even with ideal data; wildly different values being returned for different initialisations of a parameter optimisation algorithm; and parameters not being physically meaningful in a model attempting to represent a process. This presentation illustrates some of the drastic consequences of ignoring model identifiability analysis. It argues for a more cogent framework and use of identifiability analysis as a way of understanding model limitations and systematically learning about sources of uncertainty and their importance. The presentation specifically distinguishes between five sources of parameter non-uniqueness (and hence uncertainty) within the modelling process, pragmatically capturing key distinctions within existing identifiability literature. It enumerates many of the various approaches discussed in the literature. Admittedly, improving identifiability is often non-trivial. It requires thorough understanding of the cause of non-identifiability, and the time, knowledge and resources to collect or select new data, modify model structures or objective functions, or improve conditioning. But ignoring these problems is not a viable solution. Even simple approaches such as fixing parameter values or naively using a different model structure may have significant impacts on results which are too often overlooked because identifiability analysis is neglected.

  16. Private Practice: Exploring the Missing Social Dimension in "Reflective Practice"

    ERIC Educational Resources Information Center

    Kotzee, Ben

    2012-01-01

    In professional education today, Schon's concept of "reflective practice" underpins much thinking about learning at work. This approach--with its emphasis on the inner life of the professional and on her own interpretations of her learning experiences--is increasingly being challenged: often cited objections are that the model ignores factors like…

  17. A Unified Approach to Measurement Error and Missing Data: Overview and Applications

    ERIC Educational Resources Information Center

    Blackwell, Matthew; Honaker, James; King, Gary

    2017-01-01

    Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model…

  18. Retrospective Mining of Toxicology Data to Discover Multispecies Effects: Anemia as a Case Study (SOT)

    EPA Science Inventory

    In vivo toxicology data is subject to multiple sources of uncertainty: observer severity bias (a pathologist may record only more severe effects and ignore less severe ones); dose spacing issues (this can lead to missing data, e.g. if a severe effect has a less severe precursor, ...

  19. Taking the Missing Propensity Into Account When Estimating Competence Scores

    PubMed Central

    Pohl, Steffi; Carstensen, Claus H.

    2014-01-01

    When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically made when using these models: (1) The missing propensity is unidimensional and (2) the missing propensity and the ability are bivariate normally distributed. These assumptions may, however, be violated in real data sets and could, thus, pose a threat to the validity of this approach. The present study focuses on modeling competencies in various domains, using data from a school sample (N = 15,396) and an adult sample (N = 7,256) from the National Educational Panel Study. Our interest was to investigate whether violations of unidimensionality and the normal distribution assumption severely affect the performance of the model-based approach in terms of differences in ability estimates. We propose a model with a competence dimension, a unidimensional missing propensity and a distributional assumption more flexible than a multivariate normal. Using this model for ability estimation results in different ability estimates compared with a model ignoring missing responses. Implications for ability estimation in large-scale assessments are discussed. PMID:29795844

  20. Missed rib fractures on evaluation of initial chest CT for trauma patients: pattern analysis and diagnostic value of coronal multiplanar reconstruction images with multidetector row CT

    PubMed Central

    Cho, S H; Sung, Y M; Kim, M S

    2012-01-01

    Objective The objective of this study was to review the prevalence and radiological features of rib fractures missed on initial chest CT evaluation, and to examine the diagnostic value of additional coronal images in a large series of trauma patients. Methods 130 patients who presented to an emergency room for blunt chest trauma underwent multidetector row CT of the thorax within the first hour during their stay, and had follow-up CT or bone scans as diagnostic gold standards. Images were evaluated on two separate occasions: once with axial images and once with both axial and coronal images. The detection rates of missed rib fractures were compared between readings using a non-parametric method of clustered data. In the cases of missed rib fractures, the shapes, locations and associated fractures were evaluated. Results 58 rib fractures were missed with axial images only and 52 were missed with both axial and coronal images (p=0.088). The most common shape of missed rib fractures was buckled (56.9%), and the anterior arc (55.2%) was most commonly involved. 21 (36.2%) missed rib fractures had combined fractures on the same ribs, and 38 (65.5%) were accompanied by fracture on neighbouring ribs. Conclusion Missed rib fractures are not uncommon, and radiologists should be familiar with buckle fractures, which are frequently missed. Additional coronal imagescan be helpful in the diagnosis of rib fractures that are not seen on axial images. PMID:22514102

  1. Replacing missing values using trustworthy data values from web data sources

    NASA Astrophysics Data System (ADS)

    Izham Jaya, M.; Sidi, Fatimah; Mat Yusof, Sharmila; Suriani Affendey, Lilly; Ishak, Iskandar; Jabar, Marzanah A.

    2017-09-01

    In practice, collected data usually are incomplete and contains missing value. Existing approaches in managing missing values overlook the importance of trustworthy data values in replacing missing values. In view that trusted completed data is very important in data analysis, we proposed a framework of missing value replacement using trustworthy data values from web data sources. The proposed framework adopted ontology to map data values from web data sources to the incomplete dataset. As data from web is conflicting with each other, we proposed a trust score measurement based on data accuracy and data reliability. Trust score is then used to select trustworthy data values from web data sources for missing values replacement. We successfully implemented the proposed framework using financial dataset and presented the findings in this paper. From our experiment, we manage to show that replacing missing values with trustworthy data values is important especially in a case of conflicting data to solve missing values problem.

  2. Improving the recognition of near-miss events on NASA missions

    NASA Astrophysics Data System (ADS)

    Dillon, R. L.; Rogers, E. W.; Madsen, P.; Tinsley, C. H.

    Organizations that ignore near-miss data may be inappropriately rewarding risky behavior. If managers engage in risky behavior and succeed, research shows that these managers are likely to be promoted without close scrutiny of their risky decisions, even if the success is because of good fortune. Over time such risk taking compounds as similar near-misses are repeatedly observed and the ability to recognize anomalies and document the events decreases (i.e., normalization of deviance). History from the shuttle program shows that only the occasional large failure increases attention to anomalies again. This research demonstrates the presence of normalization of deviance in NASA missions and also examines a factor (the significance of the project) that may increase people's awareness of near-misses to counter this trend. Increasing awareness of chance success should increase the likelihood that significant learning can occur from the mission regardless of outcome. We conclude with prescriptions for project managers based on several on-going activities at NASA Goddard Space Flight Center (GSFC) to improve organizational learning. We discuss how these efforts can contribute to reducing near-miss bias and the normalization of deviance. This research should help organizations design learning processes that draw lessons from near-misses.

  3. The Case of the Missing Organizations: Co-operatives and the Textbooks.

    ERIC Educational Resources Information Center

    Hill, Roderick

    2000-01-01

    States that co-operative (co-op) economics organizations are ignored in economic introductory textbooks in North America and provides evidence for this assertion. Addresses how to deal with this form of economic organization. Argues that asking who makes the decisions in firms and why, using co-ops as an example, raises important questions. (CMK)

  4. A Missing Piece of the Contemporary Character Education Puzzle: The Individualisation of Moral Character

    ERIC Educational Resources Information Center

    Chen, Yi-Lin

    2013-01-01

    The different sorts of virtuous people who display various virtues to a remarkable degree have brought the issue of individualisation of moral character to the forefront. It signals a more personal dimension of character development which is notoriously ignored in the current discourse on character education. The case is made that since in…

  5. Toxin constraint explains diet choice, survival and population dynamics in a molluscivore shorebird

    PubMed Central

    van Gils, Jan A.; van der Geest, Matthijs; Leyrer, Jutta; Oudman, Thomas; Lok, Tamar; Onrust, Jeroen; de Fouw, Jimmy; van der Heide, Tjisse; van den Hout, Piet J.; Spaans, Bernard; Dekinga, Anne; Brugge, Maarten; Piersma, Theunis

    2013-01-01

    Recent insights suggest that predators should include (mildly) toxic prey when non-toxic food is scarce. However, the assumption that toxic prey is energetically as profitable as non-toxic prey misses the possibility that non-toxic prey have other ways to avoid being eaten, such as the formation of an indigestible armature. In that case, predators face a trade-off between avoiding toxins and minimizing indigestible ballast intake. Here, we report on the trophic interactions between a shorebird (red knot, Calidris canutus canutus) and its two main bivalve prey, one being mildly toxic but easily digestible, and the other being non-toxic but harder to digest. A novel toxin-based optimal diet model is developed and tested against an existing one that ignores toxin constraints on the basis of data on prey abundance, diet choice, local survival and numbers of red knots at Banc d'Arguin (Mauritania) over 8 years. Observed diet and annual survival rates closely fit the predictions of the toxin-based model, with survival and population size being highest in years when the non-toxic prey is abundant. In the 6 of 8 years when the non-toxic prey is not abundant enough to satisfy the energy requirements, red knots must rely on the toxic alternative. PMID:23740782

  6. Toxin constraint explains diet choice, survival and population dynamics in a molluscivore shorebird.

    PubMed

    van Gils, Jan A; van der Geest, Matthijs; Leyrer, Jutta; Oudman, Thomas; Lok, Tamar; Onrust, Jeroen; de Fouw, Jimmy; van der Heide, Tjisse; van den Hout, Piet J; Spaans, Bernard; Dekinga, Anne; Brugge, Maarten; Piersma, Theunis

    2013-07-22

    Recent insights suggest that predators should include (mildly) toxic prey when non-toxic food is scarce. However, the assumption that toxic prey is energetically as profitable as non-toxic prey misses the possibility that non-toxic prey have other ways to avoid being eaten, such as the formation of an indigestible armature. In that case, predators face a trade-off between avoiding toxins and minimizing indigestible ballast intake. Here, we report on the trophic interactions between a shorebird (red knot, Calidris canutus canutus) and its two main bivalve prey, one being mildly toxic but easily digestible, and the other being non-toxic but harder to digest. A novel toxin-based optimal diet model is developed and tested against an existing one that ignores toxin constraints on the basis of data on prey abundance, diet choice, local survival and numbers of red knots at Banc d'Arguin (Mauritania) over 8 years. Observed diet and annual survival rates closely fit the predictions of the toxin-based model, with survival and population size being highest in years when the non-toxic prey is abundant. In the 6 of 8 years when the non-toxic prey is not abundant enough to satisfy the energy requirements, red knots must rely on the toxic alternative.

  7. Estimating a structured covariance matrix from multi-lab measurements in high-throughput biology.

    PubMed

    Franks, Alexander M; Csárdi, Gábor; Drummond, D Allan; Airoldi, Edoardo M

    2015-03-01

    We consider the problem of quantifying the degree of coordination between transcription and translation, in yeast. Several studies have reported a surprising lack of coordination over the years, in organisms as different as yeast and human, using diverse technologies. However, a close look at this literature suggests that the lack of reported correlation may not reflect the biology of regulation. These reports do not control for between-study biases and structure in the measurement errors, ignore key aspects of how the data connect to the estimand, and systematically underestimate the correlation as a consequence. Here, we design a careful meta-analysis of 27 yeast data sets, supported by a multilevel model, full uncertainty quantification, a suite of sensitivity analyses and novel theory, to produce a more accurate estimate of the correlation between mRNA and protein levels-a proxy for coordination. From a statistical perspective, this problem motivates new theory on the impact of noise, model mis-specifications and non-ignorable missing data on estimates of the correlation between high dimensional responses. We find that the correlation between mRNA and protein levels is quite high under the studied conditions, in yeast, suggesting that post-transcriptional regulation plays a less prominent role than previously thought.

  8. Estimating a structured covariance matrix from multi-lab measurements in high-throughput biology

    PubMed Central

    Franks, Alexander M.; Csárdi, Gábor; Drummond, D. Allan; Airoldi, Edoardo M.

    2015-01-01

    We consider the problem of quantifying the degree of coordination between transcription and translation, in yeast. Several studies have reported a surprising lack of coordination over the years, in organisms as different as yeast and human, using diverse technologies. However, a close look at this literature suggests that the lack of reported correlation may not reflect the biology of regulation. These reports do not control for between-study biases and structure in the measurement errors, ignore key aspects of how the data connect to the estimand, and systematically underestimate the correlation as a consequence. Here, we design a careful meta-analysis of 27 yeast data sets, supported by a multilevel model, full uncertainty quantification, a suite of sensitivity analyses and novel theory, to produce a more accurate estimate of the correlation between mRNA and protein levels—a proxy for coordination. From a statistical perspective, this problem motivates new theory on the impact of noise, model mis-specifications and non-ignorable missing data on estimates of the correlation between high dimensional responses. We find that the correlation between mRNA and protein levels is quite high under the studied conditions, in yeast, suggesting that post-transcriptional regulation plays a less prominent role than previously thought. PMID:25954056

  9. Missing value imputation: with application to handwriting data

    NASA Astrophysics Data System (ADS)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  10. Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns.

    PubMed

    Wolfe, Edward W; McGill, Michael T

    2011-01-01

    This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.

  11. Cultural values embodying universal norms: a critique of a popular assumption about cultures and human rights.

    PubMed

    Jing-Bao, Nie

    2005-09-01

    In Western and non-Western societies, it is a widely held belief that the concept of human rights is, by and large, a Western cultural norm, often at odds with non-Western cultures and, therefore, not applicable in non-Western societies. The Universal Draft Declaration on Bioethics and Human Rights reflects this deep-rooted and popular assumption. By using Chinese culture(s) as an illustration, this article points out the problems of this widespread misconception and stereotypical view of cultures and human rights. It highlights the often ignored positive elements in Chinese cultures that promote and embody universal human values such as human dignity and human rights. It concludes, accordingly, with concrete suggestions on how to modify the Declaration.

  12. Multiple imputation in the presence of non-normal data.

    PubMed

    Lee, Katherine J; Carlin, John B

    2017-02-20

    Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Climate change and coastal vulnerability assessment: Scenarios for integrated assessment

    USGS Publications Warehouse

    Nicholls, R.J.; Wong, P.P.; Burkett, V.; Woodroffe, C.D.; Hay, J.

    2008-01-01

    Coastal vulnerability assessments still focus mainly on sea-level rise, with less attention paid to other dimensions of climate change. The influence of non-climatic environmental change or socio-economic change is even less considered, and is often completely ignored. Given that the profound coastal changes of the twentieth century are likely to continue through the twenty-first century, this is a major omission, which may overstate the importance of climate change, and may also miss significant interactions of climate change with other non-climate drivers. To better support climate and coastal management policy development, more integrated assessments of climatic change in coastal areas are required, including the significant non-climatic changes. This paper explores the development of relevant climate and non-climate drivers, with an emphasis on the non-climate drivers. While these issues are applicable within any scenario framework, our ideas are illustrated using the widely used SRES scenarios, with both impacts and adaptation being considered. Importantly, scenario development is a process, and the assumptions that are made about future conditions concerning the coast need to be explicit, transparent and open to scientific debate concerning their realism and likelihood. These issues are generic across other sectors. ?? Integrated Research System for Sustainability Science and Springer 2008.

  14. Biological Sciences Division 1991 Programs

    DTIC Science & Technology

    1991-08-01

    missing offending polysaccharides and 2) identify monosaccharide peaks in gas chromatography that we know are not holdfast- derived and can ignore. 3-On...ACCOMPLISHMENTS: 1. The polysaccharidic component of the extracellular slime of Flexibacter maritimus is predominantly a glucose polymer. In collaboration...are due to the presence of polypeptide(s), not polysaccharide as predicted. W.H. Schwarz (John Hopkins) has performed rheological analysis of this

  15. "The Great Unmentionable": Exploring the Pleasures and Benefits of Ecstasy from the Perspectives of Drug Users

    ERIC Educational Resources Information Center

    Hunt, Geoffrey P.; Evans, Kristin

    2008-01-01

    Although legal and illegal drugs have throughout history given pleasure to those who consume them, research in the drug field has ignored this central and fundamental feature. The absence of any discussion of pleasure is striking when one considers the contemporary literature on ecstasy and the dance scene. Pleasure is still missing within much of…

  16. Multivariate test power approximations for balanced linear mixed models in studies with missing data.

    PubMed

    Ringham, Brandy M; Kreidler, Sarah M; Muller, Keith E; Glueck, Deborah H

    2016-07-30

    Multilevel and longitudinal studies are frequently subject to missing data. For example, biomarker studies for oral cancer may involve multiple assays for each participant. Assays may fail, resulting in missing data values that can be assumed to be missing completely at random. Catellier and Muller proposed a data analytic technique to account for data missing at random in multilevel and longitudinal studies. They suggested modifying the degrees of freedom for both the Hotelling-Lawley trace F statistic and its null case reference distribution. We propose parallel adjustments to approximate power for this multivariate test in studies with missing data. The power approximations use a modified non-central F statistic, which is a function of (i) the expected number of complete cases, (ii) the expected number of non-missing pairs of responses, or (iii) the trimmed sample size, which is the planned sample size reduced by the anticipated proportion of missing data. The accuracy of the method is assessed by comparing the theoretical results to the Monte Carlo simulated power for the Catellier and Muller multivariate test. Over all experimental conditions, the closest approximation to the empirical power of the Catellier and Muller multivariate test is obtained by adjusting power calculations with the expected number of complete cases. The utility of the method is demonstrated with a multivariate power analysis for a hypothetical oral cancer biomarkers study. We describe how to implement the method using standard, commercially available software products and give example code. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  17. The Value of Non-Work Time in Cross-National Quality of Life Comparisons: The Case of the United States vs. the Netherlands

    ERIC Educational Resources Information Center

    Verbakel, Ellen; DiPrete, Thomas A.

    2008-01-01

    Comparisons of wellbeing between the United States and Western Europe generally show that most Americans have higher standards of living than do Western Europeans at comparable locations in their national income distributions. These comparisons of wellbeing typically privilege disposable income and cash transfers while ignoring other aspects of…

  18. Large Scale Crop Classification in Ukraine using Multi-temporal Landsat-8 Images with Missing Data

    NASA Astrophysics Data System (ADS)

    Kussul, N.; Skakun, S.; Shelestov, A.; Lavreniuk, M. S.

    2014-12-01

    At present, there are no globally available Earth observation (EO) derived products on crop maps. This issue is being addressed within the Sentinel-2 for Agriculture initiative where a number of test sites (including from JECAM) participate to provide coherent protocols and best practices for various global agriculture systems, and subsequently crop maps from Sentinel-2. One of the problems in dealing with optical images for large territories (more than 10,000 sq. km) is the presence of clouds and shadows that result in having missing values in data sets. In this abstract, a new approach to classification of multi-temporal optical satellite imagery with missing data due to clouds and shadows is proposed. First, self-organizing Kohonen maps (SOMs) are used to restore missing pixel values in a time series of satellite imagery. SOMs are trained for each spectral band separately using non-missing values. Missing values are restored through a special procedure that substitutes input sample's missing components with neuron's weight coefficients. After missing data restoration, a supervised classification is performed for multi-temporal satellite images. For this, an ensemble of neural networks, in particular multilayer perceptrons (MLPs), is proposed. Ensembling of neural networks is done by the technique of average committee, i.e. to calculate the average class probability over classifiers and select the class with the highest average posterior probability for the given input sample. The proposed approach is applied for large scale crop classification using multi temporal Landsat-8 images for the JECAM test site in Ukraine [1-2]. It is shown that ensemble of MLPs provides better performance than a single neural network in terms of overall classification accuracy and kappa coefficient. The obtained classification map is also validated through estimated crop and forest areas and comparison to official statistics. 1. A.Yu. Shelestov et al., "Geospatial information system for agricultural monitoring," Cybernetics Syst. Anal., vol. 49, no. 1, pp. 124-132, 2013. 2. J. Gallego et al., "Efficiency Assessment of Different Approaches to Crop Classification Based on Satellite and Ground Observations," J. Autom. Inform. Scie., vol. 44, no. 5, pp. 67-80, 2012.

  19. Shrinkage regression-based methods for microarray missing value imputation.

    PubMed

    Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng

    2013-01-01

    Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.

  20. The consequences of ignoring measurement invariance for path coefficients in structural equation models

    PubMed Central

    Guenole, Nigel; Brown, Anna

    2014-01-01

    We report a Monte Carlo study examining the effects of two strategies for handling measurement non-invariance – modeling and ignoring non-invariant items – on structural regression coefficients between latent variables measured with item response theory models for categorical indicators. These strategies were examined across four levels and three types of non-invariance – non-invariant loadings, non-invariant thresholds, and combined non-invariance on loadings and thresholds – in simple, partial, mediated and moderated regression models where the non-invariant latent variable occupied predictor, mediator, and criterion positions in the structural regression models. When non-invariance is ignored in the latent predictor, the focal group regression parameters are biased in the opposite direction to the difference in loadings and thresholds relative to the referent group (i.e., lower loadings and thresholds for the focal group lead to overestimated regression parameters). With criterion non-invariance, the focal group regression parameters are biased in the same direction as the difference in loadings and thresholds relative to the referent group. While unacceptable levels of parameter bias were confined to the focal group, bias occurred at considerably lower levels of ignored non-invariance than was previously recognized in referent and focal groups. PMID:25278911

  1. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Effects of correcting missing daily feed intake values on the genetic parameters and estimated breeding values for feeding traits in pigs.

    PubMed

    Ito, Tetsuya; Fukawa, Kazuo; Kamikawa, Mai; Nikaidou, Satoshi; Taniguchi, Masaaki; Arakawa, Aisaku; Tanaka, Genki; Mikawa, Satoshi; Furukawa, Tsutomu; Hirose, Kensuke

    2018-01-01

    Daily feed intake (DFI) is an important consideration for improving feed efficiency, but measurements using electronic feeder systems contain many missing and incorrect values. Therefore, we evaluated three methods for correcting missing DFI data (quadratic, orthogonal polynomial, and locally weighted (Loess) regression equations) and assessed the effects of these missing values on the genetic parameters and the estimated breeding values (EBV) for feeding traits. DFI records were obtained from 1622 Duroc pigs, comprising 902 individuals without missing DFI and 720 individuals with missing DFI. The Loess equation was the most suitable method for correcting the missing DFI values in 5-50% randomly deleted datasets among the three equations. Both variance components and heritability for the average DFI (ADFI) did not change because of the missing DFI proportion and Loess correction. In terms of rank correlation and information criteria, Loess correction improved the accuracy of EBV for ADFI compared to randomly deleted cases. These findings indicate that the Loess equation is useful for correcting missing DFI values for individual pigs and that the correction of missing DFI values could be effective for the estimation of breeding values and genetic improvement using EBV for feeding traits. © 2017 The Authors. Animal Science Journal published by John Wiley & Sons Australia, Ltd on behalf of Japanese Society of Animal Science.

  3. Missing Data in the Field of Otorhinolaryngology and Head & Neck Surgery: Need for Improvement.

    PubMed

    Netten, Anouk P; Dekker, Friedo W; Rieffe, Carolien; Soede, Wim; Briaire, Jeroen J; Frijns, Johan H M

    Clinical studies are often facing missing data. Data can be missing for various reasons, for example, patients moved, certain measurements are only administered in high-risk groups, and patients are unable to attend clinic because of their health status. There are various ways to handle these missing data (e.g., complete cases analyses, mean substitution). Each of these techniques potentially influences both the analyses and the results of a study. The first aim of this structured review was to analyze how often researchers in the field of otorhinolaryngology/head & neck surgery report missing data. The second aim was to systematically describe how researchers handle missing data in their analyses. The third aim was to provide a solution on how to deal with missing data by means of the multiple imputation technique. With this review, we aim to contribute to a higher quality of reporting in otorhinolaryngology research. Clinical studies among the 398 most recently published research articles in three major journals in the field of otorhinolaryngology/head & neck surgery were analyzed based on how researchers reported and handled missing data. Of the 316 clinical studies, 85 studies reported some form of missing data. Of those 85, only a small number (12 studies, 3.8%) actively handled the missingness in their data. The majority of researchers exclude incomplete cases, which results in biased outcomes and a drop in statistical power. Within otorhinolaryngology research, missing data are largely ignored and underreported, and consequently, handled inadequately. This has major impact on the results and conclusions drawn from this research. Based on the outcomes of this review, we provide solutions on how to deal with missing data. To illustrate, we clarify the use of multiple imputation techniques, which recently became widely available in standard statistical programs.

  4. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets.

    PubMed

    Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

    2018-01-01

    Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  5. Missing value imputation strategies for metabolomics data.

    PubMed

    Armitage, Emily Grace; Godzien, Joanna; Alonso-Herranz, Vanesa; López-Gonzálvez, Ángeles; Barbas, Coral

    2015-12-01

    The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.

    PubMed

    Wei, Runmin; Wang, Jingye; Su, Mingming; Jia, Erik; Chen, Shaoqiu; Chen, Tianlu; Ni, Yan

    2018-01-12

    Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student's t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics ( https://metabolomics.cc.hawaii.edu/software/MetImp/ ).

  7. An Exploratory Analysis of Societal Preferences for Research-Driven Quality of Life Improvements in Canada

    ERIC Educational Resources Information Center

    Rudd, Murray A.

    2011-01-01

    Research in the humanities, arts, and social sciences (HASS) tends to have impacts that enhance quality of life (QOL) but that are not amenable to pricing in established markets. If the economic value of "non-market" research impacts is ignored when making the business case for HASS research, society will under-invest in it. My goal in…

  8. Early pregnancy factor as a marker for assessing embryonic viability in threatened and missed abortions.

    PubMed

    Shahani, S K; Moniz, C L; Bordekar, A D; Gupta, S M; Naik, K

    1994-01-01

    It is now well recognized that the presence of early pregnancy factor (EPF) can signify the occurrence of fertilization, continuation of pregnancy and the existence of a viable embryo. With this in view, a study was undertaken to observe the potential of EPF as a marker in assessing embryo viability in cases complicated with vaginal bleeding during early pregnancy. The results indicated that the sensitivity of EPF as a marker in predicting threatened or missed abortion was 78.9% and the specificity 95.6%. The positive predictive value was observed to be 93.8% and the negative predictive value 84.6%. Our studies have shown that since EPF is present in viable but absent in non-viable pregnancies, it could be a useful marker of prognostic value in threatened abortions.

  9. Error mitigation for CCSD compressed imager data

    NASA Astrophysics Data System (ADS)

    Gladkova, Irina; Grossberg, Michael; Gottipati, Srikanth; Shahriar, Fazlul; Bonev, George

    2009-08-01

    To efficiently use the limited bandwidth available on the downlink from satellite to ground station, imager data is usually compressed before transmission. Transmission introduces unavoidable errors, which are only partially removed by forward error correction and packetization. In the case of the commonly used CCSD Rice-based compression, it results in a contiguous sequence of dummy values along scan lines in a band of the imager data. We have developed a method capable of using the image statistics to provide a principled estimate of the missing data. Our method outperforms interpolation yet can be performed fast enough to provide uninterrupted data flow. The estimation of the lost data provides significant value to end users who may use only part of the data, may not have statistical tools, or lack the expertise to mitigate the impact of the lost data. Since the locations of the lost data will be clearly marked as meta-data in the HDF or NetCDF header, experts who prefer to handle error mitigation themselves will be free to use or ignore our estimates as they see fit.

  10. Assessment of score- and Rasch-based methods for group comparison of longitudinal patient-reported outcomes with intermittent missing data (informative and non-informative).

    PubMed

    de Bock, Élodie; Hardouin, Jean-Benoit; Blanchin, Myriam; Le Neel, Tanguy; Kubis, Gildas; Sébille, Véronique

    2015-01-01

    The purpose of this study was to identify the most adequate strategy for group comparison of longitudinal patient-reported outcomes in the presence of possibly informative intermittent missing data. Models coming from classical test theory (CTT) and item response theory (IRT) were compared. Two groups of patients' responses to dichotomous items with three times of assessment were simulated. Different cases were considered: presence or absence of a group effect and/or a time effect, a total of 100 or 200 patients, 4 or 7 items and two different values for the correlation coefficient of the latent trait between two consecutive times (0.4 or 0.9). Cases including informative and non-informative intermittent missing data were compared at different rates (15, 30 %). These simulated data were analyzed with CTT using score and mixed model (SM) and with IRT using longitudinal Rasch mixed model (LRM). The type I error, the power and the bias of the group effect estimations were compared between the two methods. This study showed that LRM performs better than SM. When the rate of missing data rose to 30 %, estimations were biased with SM mainly for informative missing data. Otherwise, LRM and SM methods were comparable concerning biases. However, regardless of the rate of intermittent missing data, power of LRM was higher compared to power of SM. In conclusion, LRM should be favored when the rate of missing data is higher than 15 %. For other cases, SM and LRM provide similar results.

  11. The Effects of Methods of Imputation for Missing Values on the Validity and Reliability of Scales

    ERIC Educational Resources Information Center

    Cokluk, Omay; Kayri, Murat

    2011-01-01

    The main aim of this study is the comparative examination of the factor structures, corrected item-total correlations, and Cronbach-alpha internal consistency coefficients obtained by different methods used in imputation for missing values in conditions of not having missing values, and having missing values of different rates in terms of testing…

  12. Multiple imputation to deal with missing EQ-5D-3L data: Should we impute individual domains or the actual index?

    PubMed

    Simons, Claire L; Rivero-Arias, Oliver; Yu, Ly-Mee; Simon, Judit

    2015-04-01

    Missing data are a well-known and widely documented problem in cost-effectiveness analyses alongside clinical trials using individual patient-level data. Current methodological research recommends multiple imputation (MI) to deal with missing health outcome data, but there is little guidance on whether MI for multi-attribute questionnaires, such as the EQ-5D-3L, should be carried out at domain or at summary score level. In this paper, we evaluated the impact of imputing individual domains versus imputing index values to deal with missing EQ-5D-3L data using a simulation study and developed recommendations for future practice. We simulated missing data in a patient-level dataset with complete EQ-5D-3L data at one point in time from a large multinational clinical trial (n = 1,814). Different proportions of missing data were generated using a missing at random (MAR) mechanism and three different scenarios were studied. The performance of using each method was evaluated using root mean squared error and mean absolute error of the actual versus predicted EQ-5D-3L indices. In large sample sizes (n > 500) and a missing data pattern that follows mainly unit non-response, imputing domains or the index produced similar results. However, domain imputation became more accurate than index imputation with pattern of missingness following an item non-response. For smaller sample sizes (n < 100), index imputation was more accurate. When MI models were misspecified, both domain and index imputations were inaccurate for any proportion of missing data. The decision between imputing the domains or the EQ-5D-3L index scores depends on the observed missing data pattern and the sample size available for analysis. Analysts conducting this type of exercises should also evaluate the sensitivity of the analysis to the MAR assumption and whether the imputation model is correctly specified.

  13. Handling missing values in the MDS-UPDRS.

    PubMed

    Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T

    2015-10-01

    This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.

  14. Statistical Discourse Analysis: A Method for Modelling Online Discussion Processes

    ERIC Educational Resources Information Center

    Chiu, Ming Ming; Fujita, Nobuko

    2014-01-01

    Online forums (synchronous and asynchronous) offer exciting data opportunities to analyze how people influence one another through their interactions. However, researchers must address several analytic difficulties involving the data (missing values, nested structure [messages within topics], non-sequential messages), outcome variables (discrete…

  15. Autoregressive-model-based missing value estimation for DNA microarray time series data.

    PubMed

    Choong, Miew Keen; Charbit, Maurice; Yan, Hong

    2009-01-01

    Missing value estimation is important in DNA microarray data analysis. A number of algorithms have been developed to solve this problem, but they have several limitations. Most existing algorithms are not able to deal with the situation where a particular time point (column) of the data is missing entirely. In this paper, we present an autoregressive-model-based missing value estimation method (ARLSimpute) that takes into account the dynamic property of microarray temporal data and the local similarity structures in the data. ARLSimpute is especially effective for the situation where a particular time point contains many missing values or where the entire time point is missing. Experiment results suggest that our proposed algorithm is an accurate missing value estimator in comparison with other imputation methods on simulated as well as real microarray time series datasets.

  16. Missing data within a quantitative research study: How to assess it, treat it, and why you should care.

    PubMed

    Bannon, William

    2015-04-01

    Missing data typically refer to the absence of one or more values within a study variable(s) contained in a dataset. The development is often the result of a study participant choosing not to provide a response to a survey item. In general, a greater number of missing values within a dataset reflects a greater challenge to the data analyst. However, if researchers are armed with just a few basic tools, they can quite effectively diagnose how serious the issue of missing data is within a dataset, as well as prescribe the most appropriate solution. Specifically, the keys to effectively assessing and treating missing data values within a dataset involve specifying how missing data will be defined in a study, assessing the amount of missing data, identifying the pattern of the missing data, and selecting the best way to treat the missing data values. I will touch on each of these processes and provide a brief illustration of how the validity of study findings are at great risk if missing data values are not treated effectively. ©2015 American Association of Nurse Practitioners.

  17. Missing persons-missing data: the need to collect antemortem dental records of missing persons.

    PubMed

    Blau, Soren; Hill, Anthony; Briggs, Christopher A; Cordner, Stephen M

    2006-03-01

    The subject of missing persons is of great concern to the community with numerous associated emotional, financial, and health costs. This paper examines the forensic medical issues raised by the delayed identification of individuals classified as "missing" and highlights the importance of including dental data in the investigation of missing persons. Focusing on Australia, the current approaches employed in missing persons investigations are outlined. Of particular significance is the fact that each of the eight Australian states and territories has its own Missing Persons Unit that operates within distinct state and territory legislation. Consequently, there is a lack of uniformity within Australia about the legal and procedural framework within which investigations of missing persons are conducted, and the interaction of that framework with coronial law procedures. One of the main investigative problems in missing persons investigations is the lack of forensic medical, particularly, odontological input. Forensic odontology has been employed in numerous cases in Australia where identity is unknown or uncertain because of remains being skeletonized, incinerated, or partly burnt. The routine employment of the forensic odontologist to assist in missing person inquiries, has however, been ignored. The failure to routinely employ forensic odontology in missing persons inquiries has resulted in numerous delays in identification. Three Australian cases are presented where the investigation of individuals whose identity was uncertain or unknown was prolonged due to the failure to utilize the appropriate (and available) dental resources. In light of the outcomes of these cases, we suggest that a national missing persons dental records database be established for future missing persons investigations. Such a database could be easily managed between a coronial system and a forensic medical institute. In Australia, a national missing persons dental records database could be incorporated into the National Coroners Information System (NCIS) managed, on behalf of Australia's Coroners, by the Victorian Institute of Forensic Medicine. The existence of the NCIS would ensure operational collaboration in the implementation of the system and cost savings to Australian policing agencies involved in missing person inquiries. The implementation of such a database would facilitate timely and efficient reconciliation of clinical and postmortem dental records and have subsequent social and financial benefits.

  18. Optimal simultaneous superpositioning of multiple structures with missing data.

    PubMed

    Theobald, Douglas L; Steindel, Phillip A

    2012-08-01

    Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. dtheobald@brandeis.edu Supplementary data are available at Bioinformatics online.

  19. Toward a hybrid brain-computer interface based on repetitive visual stimuli with missing events.

    PubMed

    Wu, Yingying; Li, Man; Wang, Jing

    2016-07-26

    Steady-state visually evoked potentials (SSVEPs) can be elicited by repetitive stimuli and extracted in the frequency domain with satisfied performance. However, the temporal information of such stimulus is often ignored. In this study, we utilized repetitive visual stimuli with missing events to present a novel hybrid BCI paradigm based on SSVEP and omitted stimulus potential (OSP). Four discs flickering from black to white with missing flickers served as visual stimulators to simultaneously elicit subject's SSVEPs and OSPs. Key parameters in the new paradigm, including flicker frequency, optimal electrodes, missing flicker duration and intervals of missing events were qualitatively discussed with offline data. Two omitted flicker patterns including missing black/white disc were proposed and compared. Averaging times were optimized with Information Transfer Rate (ITR) in online experiments, where SSVEPs and OSPs were identified using Canonical Correlation Analysis in the frequency domain and Support Vector Machine (SVM)-Bayes fusion in the time domain, respectively. The online accuracy and ITR (mean ± standard deviation) over nine healthy subjects were 79.29 ± 18.14 % and 19.45 ± 11.99 bits/min with missing black disc pattern, and 86.82 ± 12.91 % and 24.06 ± 10.95 bits/min with missing white disc pattern, respectively. The proposed BCI paradigm, for the first time, demonstrated that SSVEPs and OSPs can be simultaneously elicited in single visual stimulus pattern and recognized in real-time with satisfied performance. Besides the frequency features such as SSVEP elicited by repetitive stimuli, we found a new feature (OSP) in the time domain to design a novel hybrid BCI paradigm by adding missing events in repetitive stimuli.

  20. A Spatiotemporal Prediction Framework for Air Pollution Based on Deep RNN

    NASA Astrophysics Data System (ADS)

    Fan, J.; Li, Q.; Hou, J.; Feng, X.; Karimian, H.; Lin, S.

    2017-10-01

    Time series data in practical applications always contain missing values due to sensor malfunction, network failure, outliers etc. In order to handle missing values in time series, as well as the lack of considering temporal properties in machine learning models, we propose a spatiotemporal prediction framework based on missing value processing algorithms and deep recurrent neural network (DRNN). By using missing tag and missing interval to represent time series patterns, we implement three different missing value fixing algorithms, which are further incorporated into deep neural network that consists of LSTM (Long Short-term Memory) layers and fully connected layers. Real-world air quality and meteorological datasets (Jingjinji area, China) are used for model training and testing. Deep feed forward neural networks (DFNN) and gradient boosting decision trees (GBDT) are trained as baseline models against the proposed DRNN. Performances of three missing value fixing algorithms, as well as different machine learning models are evaluated and analysed. Experiments show that the proposed DRNN framework outperforms both DFNN and GBDT, therefore validating the capacity of the proposed framework. Our results also provides useful insights for better understanding of different strategies that handle missing values.

  1. Comparing multiple imputation methods for systematically missing subject-level data.

    PubMed

    Kline, David; Andridge, Rebecca; Kaizar, Eloise

    2017-06-01

    When conducting research synthesis, the collection of studies that will be combined often do not measure the same set of variables, which creates missing data. When the studies to combine are longitudinal, missing data can occur on the observation-level (time-varying) or the subject-level (non-time-varying). Traditionally, the focus of missing data methods for longitudinal data has been on missing observation-level variables. In this paper, we focus on missing subject-level variables and compare two multiple imputation approaches: a joint modeling approach and a sequential conditional modeling approach. We find the joint modeling approach to be preferable to the sequential conditional approach, except when the covariance structure of the repeated outcome for each individual has homogenous variance and exchangeable correlation. Specifically, the regression coefficient estimates from an analysis incorporating imputed values based on the sequential conditional method are attenuated and less efficient than those from the joint method. Remarkably, the estimates from the sequential conditional method are often less efficient than a complete case analysis, which, in the context of research synthesis, implies that we lose efficiency by combining studies. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  2. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

    PubMed

    Hopke, P K; Liu, C; Rubin, D B

    2001-03-01

    Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.

  3. Facing uncertainty in ecosystem services-based resource management.

    PubMed

    Grêt-Regamey, Adrienne; Brunner, Sibyl H; Altwegg, Jürg; Bebi, Peter

    2013-09-01

    The concept of ecosystem services is increasingly used as a support for natural resource management decisions. While the science for assessing ecosystem services is improving, appropriate methods to address uncertainties in a quantitative manner are missing. Ignoring parameter uncertainties, modeling uncertainties and uncertainties related to human-environment interactions can modify decisions and lead to overlooking important management possibilities. In this contribution, we present a new approach for mapping the uncertainties in the assessment of multiple ecosystem services. The spatially explicit risk approach links Bayesian networks to a Geographic Information System for forecasting the value of a bundle of ecosystem services and quantifies the uncertainties related to the outcomes in a spatially explicit manner. We demonstrate that mapping uncertainties in ecosystem services assessments provides key information for decision-makers seeking critical areas in the delivery of ecosystem services in a case study in the Swiss Alps. The results suggest that not only the total value of the bundle of ecosystem services is highly dependent on uncertainties, but the spatial pattern of the ecosystem services values changes substantially when considering uncertainties. This is particularly important for the long-term management of mountain forest ecosystems, which have long rotation stands and are highly sensitive to pressing climate and socio-economic changes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Integrative missing value estimation for microarray data.

    PubMed

    Hu, Jianjun; Li, Haifeng; Waterman, Michael S; Zhou, Xianghong Jasmine

    2006-10-12

    Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests. We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

  5. A Comparison of Male and Female Addicts and Non-Addicts on the Tennessee Self Concept Scale.

    ERIC Educational Resources Information Center

    Jarka, Joyce M.

    Many mental health professionals ignore chemical addiction, whereas many chemical dependency professionals see addiction as the entire problem and ignore everything else. This study investigated differences between addicts and non-addicts on the Tennessee Self Concept Scale. Subjects were undergraduate and graduate students, selected from a…

  6. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  7. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  8. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  9. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  10. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  11. Missing data exploration: highlighting graphical presentation of missing pattern.

    PubMed

    Zhang, Zhongheng

    2015-12-01

    Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations.

  12. Counting missing values in a metabolite-intensity data set for measuring the analytical performance of a metabolomics platform.

    PubMed

    Huan, Tao; Li, Liang

    2015-01-20

    Metabolomics requires quantitative comparison of individual metabolites present in an entire sample set. Unfortunately, missing intensity values in one or more samples are very common. Because missing values can have a profound influence on metabolomic results, the extent of missing values found in a metabolomic data set should be treated as an important parameter for measuring the analytical performance of a technique. In this work, we report a study on the scope of missing values and a robust method of filling the missing values in a chemical isotope labeling (CIL) LC-MS metabolomics platform. Unlike conventional LC-MS, CIL LC-MS quantifies the concentration differences of individual metabolites in two comparative samples based on the mass spectral peak intensity ratio of a peak pair from a mixture of differentially labeled samples. We show that this peak-pair feature can be explored as a unique means of extracting metabolite intensity information from raw mass spectra. In our approach, a peak-pair peaking algorithm, IsoMS, is initially used to process the LC-MS data set to generate a CSV file or table that contains metabolite ID and peak ratio information (i.e., metabolite-intensity table). A zero-fill program, freely available from MyCompoundID.org , is developed to automatically find a missing value in the CSV file and go back to the raw LC-MS data to find the peak pair and, then, calculate the intensity ratio and enter the ratio value into the table. Most of the missing values are found to be low abundance peak pairs. We demonstrate the performance of this method in analyzing an experimental and technical replicate data set of human urine metabolome. Furthermore, we propose a standardized approach of counting missing values in a replicate data set as a way of gauging the extent of missing values in a metabolomics platform. Finally, we illustrate that applying the zero-fill program, in conjunction with dansylation CIL LC-MS, can lead to a marked improvement in finding significant metabolites that differentiate bladder cancer patients and their controls in a metabolomics study of 109 subjects.

  13. Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.

    PubMed

    Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R

    2018-05-26

    Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  14. Brucella endocarditis in a non-endemic area presenting as pyrexia of unknown origin

    PubMed Central

    Manade, Vivek Vilas; Kakrani, Arjun; Gadage, Siddharth Narayan; Misra, Rabindra

    2014-01-01

    A 67-year-old man with type 2 diabetes mellitus and hypertension since 7 years presented with a 3-month history of low-grade fever and malaise. Cardiac auscultation revealed the presence of an ejection systolic murmur in the primary aortic area. Most of the investigations for febrile illness were reported normal. His two-dimensional (2D) echocardiogram revealed a calcified aortic valve with mild aortic stenosis. In view of the prolonged fever and calcified aortic valve with mild aortic stenosis, a transoesophageal echocardiogram was performed, which showed small vegetation noted on right coronary cusp about 2.2 mm with free independent mobility. Blood culture was positive for Brucella spp from all the three venepuncture sites. Medical therapy for brucellosis was given with ciprofloxacin, doxycycline, co-trimoxazole and streptomycin, resulting in complete recovery. Brucella endocarditis is a rare, mostly ignored and missed clinical infection. It requires a high index of clinical suspicion for prompt diagnosis and treatment. PMID:25239983

  15. A Review On Missing Value Estimation Using Imputation Algorithm

    NASA Astrophysics Data System (ADS)

    Armina, Roslan; Zain, Azlan Mohd; Azizah Ali, Nor; Sallehuddin, Roselina

    2017-09-01

    The presence of the missing value in the data set has always been a major problem for precise prediction. The method for imputing missing value needs to minimize the effect of incomplete data sets for the prediction model. Many algorithms have been proposed for countermeasure of missing value problem. In this review, we provide a comprehensive analysis of existing imputation algorithm, focusing on the technique used and the implementation of global or local information of data sets for missing value estimation. In addition validation method for imputation result and way to measure the performance of imputation algorithm also described. The objective of this review is to highlight possible improvement on existing method and it is hoped that this review gives reader better understanding of imputation method trend.

  16. Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation.

    PubMed

    Matafora, Vittoria; Corno, Andrea; Ciliberto, Andrea; Bachi, Angela

    2017-04-07

    In global proteomic analysis, it is estimated that proteins span from millions to less than 100 copies per cell. The challenge of protein quantitation by classic shotgun proteomic techniques relies on the presence of missing values in peptides belonging to low-abundance proteins that lowers intraruns reproducibility affecting postdata statistical analysis. Here, we present a new analytical workflow MvM (missing value monitoring) able to recover quantitation of missing values generated by shotgun analysis. In particular, we used confident data-dependent acquisition (DDA) quantitation only for proteins measured in all the runs, while we filled the missing values with data-independent acquisition analysis using the library previously generated in DDA. We analyzed cell cycle regulated proteins, as they are low abundance proteins with highly dynamic expression levels. Indeed, we found that cell cycle related proteins are the major components of the missing values-rich proteome. Using the MvM workflow, we doubled the number of robustly quantified cell cycle related proteins, and we reduced the number of missing values achieving robust quantitation for proteins over ∼50 molecules per cell. MvM allows lower quantification variance among replicates for low abundance proteins with respect to DDA analysis, which demonstrates the potential of this novel workflow to measure low abundance, dynamically regulated proteins.

  17. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  18. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  19. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  20. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  1. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  2. Family Annualized Cost of Leaving: The Household as the Decision Unit in Military Retention

    DTIC Science & Technology

    1990-05-01

    the equation shown in the text. Note that the advantage of this approach is that the family is "on" its demand curve for leisure or non... the nonmember spouse while the family remains in the Army, HA. It is the spouse’s labor supply equation and is a function of the spouse’s market wage...typically, overstate losses because it ignores the spouse’s value

  3. Missing data exploration: highlighting graphical presentation of missing pattern

    PubMed Central

    2015-01-01

    Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations. PMID:26807411

  4. Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.

    PubMed

    Ritz, Cecilia; Edén, Patrik

    2008-01-19

    For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.

  5. Order-restricted inference for means with missing values.

    PubMed

    Wang, Heng; Zhong, Ping-Shou

    2017-09-01

    Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.

  6. GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

    PubMed Central

    Jia, Erik; Chen, Tianlu

    2018-01-01

    Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130

  7. New Insights into Handling Missing Values in Environmental Epidemiological Studies

    PubMed Central

    Roda, Célina; Nicolis, Ioannis; Momas, Isabelle; Guihenneuc, Chantal

    2014-01-01

    Missing data are unavoidable in environmental epidemiologic surveys. The aim of this study was to compare methods for handling large amounts of missing values: omission of missing values, single and multiple imputations (through linear regression or partial least squares regression), and a fully Bayesian approach. These methods were applied to the PARIS birth cohort, where indoor domestic pollutant measurements were performed in a random sample of babies' dwellings. A simulation study was conducted to assess performances of different approaches with a high proportion of missing values (from 50% to 95%). Different simulation scenarios were carried out, controlling the true value of the association (odds ratio of 1.0, 1.2, and 1.4), and varying the health outcome prevalence. When a large amount of data is missing, omitting these missing data reduced statistical power and inflated standard errors, which affected the significance of the association. Single imputation underestimated the variability, and considerably increased risk of type I error. All approaches were conservative, except the Bayesian joint model. In the case of a common health outcome, the fully Bayesian approach is the most efficient approach (low root mean square error, reasonable type I error, and high statistical power). Nevertheless for a less prevalent event, the type I error is increased and the statistical power is reduced. The estimated posterior distribution of the OR is useful to refine the conclusion. Among the methods handling missing values, no approach is absolutely the best but when usual approaches (e.g. single imputation) are not sufficient, joint modelling approach of missing process and health association is more efficient when large amounts of data are missing. PMID:25226278

  8. Data estimation and prediction for natural resources public data

    Treesearch

    Hans T. Schreuder; Robin M. Reich

    1998-01-01

    A key product of both Forest Inventory and Analysis (FIA) of the USDA Forest Service and the Natural Resources Inventory (NRI) of the Natural Resources Conservation Service is a scientific data base that should be defensible in court. Multiple imputation procedures (MIPs) have been proposed both for missing value estimation and prediction of non-remeasured cells in...

  9. Instrumental Variable Methods for Continuous Outcomes That Accommodate Nonignorable Missing Baseline Values.

    PubMed

    Ertefaie, Ashkan; Flory, James H; Hennessy, Sean; Small, Dylan S

    2017-06-15

    Instrumental variable (IV) methods provide unbiased treatment effect estimation in the presence of unmeasured confounders under certain assumptions. To provide valid estimates of treatment effect, treatment effect confounders that are associated with the IV (IV-confounders) must be included in the analysis, and not including observations with missing values may lead to bias. Missing covariate data are particularly problematic when the probability that a value is missing is related to the value itself, which is known as nonignorable missingness. In such cases, imputation-based methods are biased. Using health-care provider preference as an IV method, we propose a 2-step procedure with which to estimate a valid treatment effect in the presence of baseline variables with nonignorable missing values. First, the provider preference IV value is estimated by performing a complete-case analysis using a random-effects model that includes IV-confounders. Second, the treatment effect is estimated using a 2-stage least squares IV approach that excludes IV-confounders with missing values. Simulation results are presented, and the method is applied to an analysis comparing the effects of sulfonylureas versus metformin on body mass index, where the variables baseline body mass index and glycosylated hemoglobin have missing values. Our result supports the association of sulfonylureas with weight gain. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Dealing with gene expression missing data.

    PubMed

    Brás, L P; Menezes, J C

    2006-05-01

    Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.

  11. Methods for using clinical laboratory test results as baseline confounders in multi-site observational database studies when missing data are expected.

    PubMed

    Raebel, Marsha A; Shetterly, Susan; Lu, Christine Y; Flory, James; Gagne, Joshua J; Harrell, Frank E; Haynes, Kevin; Herrinton, Lisa J; Patorno, Elisabetta; Popovic, Jennifer; Selvan, Mano; Shoaibi, Azadeh; Wang, Xingmei; Roy, Jason

    2016-07-01

    Our purpose was to quantify missing baseline laboratory results, assess predictors of missingness, and examine performance of missing data methods. Using the Mini-Sentinel Distributed Database from three sites, we selected three exposure-outcome scenarios with laboratory results as baseline confounders. We compared hazard ratios (HRs) or risk differences (RDs) and 95% confidence intervals (CIs) from models that omitted laboratory results, included only available results (complete cases), and included results after applying missing data methods (multiple imputation [MI] regression, MI predictive mean matching [PMM] indicator). Scenario 1 considered glucose among second-generation antipsychotic users and diabetes. Across sites, glucose was available for 27.7-58.9%. Results differed between complete case and missing data models (e.g., olanzapine: HR 0.92 [CI 0.73, 1.12] vs 1.02 [0.90, 1.16]). Across-site models employing different MI approaches provided similar HR and CI; site-specific models provided differing estimates. Scenario 2 evaluated creatinine among individuals starting high versus low dose lisinopril and hyperkalemia. Creatinine availability: 44.5-79.0%. Results differed between complete case and missing data models (e.g., HR 0.84 [CI 0.77, 0.92] vs. 0.88 [0.83, 0.94]). HR and CI were identical across MI methods. Scenario 3 examined international normalized ratio (INR) among warfarin users starting interacting versus noninteracting antimicrobials and bleeding. INR availability: 20.0-92.9%. Results differed between ignoring INR versus including INR using missing data methods (e.g., RD 0.05 [CI -0.03, 0.13] vs 0.09 [0.00, 0.18]). Indicator and PMM methods gave similar estimates. Multi-site studies must consider site variability in missing data. Different missing data methods performed similarly. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  12. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    PubMed

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. A context-intensive approach to imputation of missing values in data sets from networks of environmental monitors.

    PubMed

    Larsen, Lawrence C; Shah, Mena

    2016-01-01

    Although networks of environmental monitors are constantly improving through advances in technology and management, instances of missing data still occur. Many methods of imputing values for missing data are available, but they are often difficult to use or produce unsatisfactory results. I-Bot (short for "Imputation Robot") is a context-intensive approach to the imputation of missing data in data sets from networks of environmental monitors. I-Bot is easy to use and routinely produces imputed values that are highly reliable. I-Bot is described and demonstrated using more than 10 years of California data for daily maximum 8-hr ozone, 24-hr PM2.5 (particulate matter with an aerodynamic diameter <2.5 μm), mid-day average surface temperature, and mid-day average wind speed. I-Bot performance is evaluated by imputing values for observed data as if they were missing, and then comparing the imputed values with the observed values. In many cases, I-Bot is able to impute values for long periods with missing data, such as a week, a month, a year, or even longer. Qualitative visual methods and standard quantitative metrics demonstrate the effectiveness of the I-Bot methodology. Many resources are expended every year to analyze and interpret data sets from networks of environmental monitors. A large fraction of those resources is used to cope with difficulties due to the presence of missing data. The I-Bot method of imputing values for such missing data may help convert incomplete data sets into virtually complete data sets that facilitate the analysis and reliable interpretation of vital environmental data.

  14. Accounting for Dependence Induced by Weighted KNN Imputation in Paired Samples, Motivated by a Colorectal Cancer Study

    PubMed Central

    Suyundikov, Anvar; Stevens, John R.; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K.; Slattery, Martha L.

    2015-01-01

    Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control. PMID:25849489

  15. Model specification and bootstrapping for multiply imputed data: An application to count models for the frequency of alcohol use

    PubMed Central

    Comulada, W. Scott

    2015-01-01

    Stata’s mi commands provide powerful tools to conduct multiple imputation in the presence of ignorable missing data. In this article, I present Stata code to extend the capabilities of the mi commands to address two areas of statistical inference where results are not easily aggregated across imputed datasets. First, mi commands are restricted to covariate selection. I show how to address model fit to correctly specify a model. Second, the mi commands readily aggregate model-based standard errors. I show how standard errors can be bootstrapped for situations where model assumptions may not be met. I illustrate model specification and bootstrapping on frequency counts for the number of times that alcohol was consumed in data with missing observations from a behavioral intervention. PMID:26973439

  16. Recurrent Neural Networks for Multivariate Time Series with Missing Values.

    PubMed

    Che, Zhengping; Purushotham, Sanjay; Cho, Kyunghyun; Sontag, David; Liu, Yan

    2018-04-17

    Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.

  17. Lonely Skies: Air-to-Air Training for a 5th Generation Fighter Force

    DTIC Science & Technology

    2015-06-01

    Missing Attitude Indicator….…………………14 4 Lt James Doolittle during Blind Flight Test…..……………………...15 5 An Early Link Trainer Cockpit...during visual flight because it deceived pilots about the actual aircraft attitude and acceleration. 1st Lt James Doolittle used Doctor David Meyers...flying pioneer and leader, Doolittle believed that pilots should learn to ignore their physical sense of motion while flying blind and to trust their

  18. Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

    PubMed

    Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

    2014-01-01

    Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.

  19. [Study on correction of data bias caused by different missing mechanisms in survey of medical expenditure among students enrolling in Urban Resident Basic Medical Insurance].

    PubMed

    Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong

    2015-05-01

    The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.

  20. Optical design for consumer products

    NASA Astrophysics Data System (ADS)

    Gupta, Anurag

    2014-10-01

    Optical engineers often limit their focus on meeting the provided targets on performance and geometry and assume that the specifications are largely non-negotiable. Such approach ignores the value proposition behind the product and the challenges associated with overall product design, manufacturing, business development and legal issues. As a result, the design effort can be expensive, time consuming and can result in product failure. We discuss a product based systems engineering approach that leads to an application specific optical design that is more effective and efficient to implement.

  1. Self-reported weight and predictors of missing responses in youth.

    PubMed

    Aceves-Martins, Magaly; Whitehead, Ross; Inchley, Jo; Giralt, Montse; Currie, Candace; Solà, Rosa

    2018-02-12

    The aims of the present manuscript are to analyse self-reported data on weight, including the missing data, from the 2014 Scottish Health Behaviour in School-Aged Children (HBSC) Study, and to investigate whether behavioural factors related with overweight and obesity, namely dietary habits, physical activity and sedentary behaviour, are associated with weight non-response. 10839 11-, 13- and 15-year-olds participated in the cross-national 2014 Scottish HBSC Study. Weight missing data was evaluated using Little's Missing Completely at Random (MCAR) test. Afterwards, a fitted multivariate logistic regression model was used to determine all possible multivariate associations between weight response and each of the behavioural factors related with obesity. 58.9% of self-reported weight was missing, not at random (MCAR p < 0.001). Weight was self-reported less frequently by girls (19.2%) than by boys (21.9%). Participants who reported low physical activity (OR 1.2, p < 0.001), low vegetable consumption (OR 1.24, p < 0.001) and high computer gaming on weekdays (OR 1.18, p = 0.003) were more likely to not report their weight. There are groups of young people in Scotland who are less likely to report their weight. Their weight status may be of the greatest concern because of their poorer health profile, based on key behaviours associated with their non-response. Furthermore, knowing the value of a healthy weight and reinforcing healthy lifestyle messages may help raise youth awareness of how diet, physical activity and sedentary behaviours can influence weight. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Adapting to an Uncertain World: Cognitive Capacity and Causal Reasoning with Ambiguous Observations

    PubMed Central

    Shou, Yiyun; Smithson, Michael

    2015-01-01

    Ambiguous causal evidence in which the covariance of the cause and effect is partially known is pervasive in real life situations. Little is known about how people reason about causal associations with ambiguous information and the underlying cognitive mechanisms. This paper presents three experiments exploring the cognitive mechanisms of causal reasoning with ambiguous observations. Results revealed that the influence of ambiguous observations manifested by missing information on causal reasoning depended on the availability of cognitive resources, suggesting that processing ambiguous information may involve deliberative cognitive processes. Experiment 1 demonstrated that subjects did not ignore the ambiguous observations in causal reasoning. They also had a general tendency to treat the ambiguous observations as negative evidence against the causal association. Experiment 2 and Experiment 3 included a causal learning task requiring a high cognitive demand in which paired stimuli were presented to subjects sequentially. Both experiments revealed that processing ambiguous or missing observations can depend on the availability of cognitive resources. Experiment 2 suggested that the contribution of working memory capacity to the comprehensiveness of evidence retention was reduced when there were ambiguous or missing observations. Experiment 3 demonstrated that an increase in cognitive demand due to a change in the task format reduced subjects’ tendency to treat ambiguous-missing observations as negative cues. PMID:26468653

  3. Bayesian sensitivity analysis methods to evaluate bias due to misclassification and missing data using informative priors and external validation data.

    PubMed

    Luta, George; Ford, Melissa B; Bondy, Melissa; Shields, Peter G; Stamey, James D

    2013-04-01

    Recent research suggests that the Bayesian paradigm may be useful for modeling biases in epidemiological studies, such as those due to misclassification and missing data. We used Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to the potential effect of these two important sources of bias. We used data from a study of the joint associations of radiotherapy and smoking with primary lung cancer among breast cancer survivors. We used Bayesian methods to provide an operational way to combine both validation data and expert opinion to account for misclassification of the two risk factors and missing data. For comparative purposes we considered a "full model" that allowed for both misclassification and missing data, along with alternative models that considered only misclassification or missing data, and the naïve model that ignored both sources of bias. We identified noticeable differences between the four models with respect to the posterior distributions of the odds ratios that described the joint associations of radiotherapy and smoking with primary lung cancer. Despite those differences we found that the general conclusions regarding the pattern of associations were the same regardless of the model used. Overall our results indicate a nonsignificantly decreased lung cancer risk due to radiotherapy among nonsmokers, and a mildly increased risk among smokers. We described easy to implement Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to misclassification and missing data. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Meta‐analysis of test accuracy studies using imputation for partial reporting of multiple thresholds

    PubMed Central

    Deeks, J.J.; Martin, E.C.; Riley, R.D.

    2017-01-01

    Introduction For tests reporting continuous results, primary studies usually provide test performance at multiple but often different thresholds. This creates missing data when performing a meta‐analysis at each threshold. A standard meta‐analysis (no imputation [NI]) ignores such missing data. A single imputation (SI) approach was recently proposed to recover missing threshold results. Here, we propose a new method that performs multiple imputation of the missing threshold results using discrete combinations (MIDC). Methods The new MIDC method imputes missing threshold results by randomly selecting from the set of all possible discrete combinations which lie between the results for 2 known bounding thresholds. Imputed and observed results are then synthesised at each threshold. This is repeated multiple times, and the multiple pooled results at each threshold are combined using Rubin's rules to give final estimates. We compared the NI, SI, and MIDC approaches via simulation. Results Both imputation methods outperform the NI method in simulations. There was generally little difference in the SI and MIDC methods, but the latter was noticeably better in terms of estimating the between‐study variances and generally gave better coverage, due to slightly larger standard errors of pooled estimates. Given selective reporting of thresholds, the imputation methods also reduced bias in the summary receiver operating characteristic curve. Simulations demonstrate the imputation methods rely on an equal threshold spacing assumption. A real example is presented. Conclusions The SI and, in particular, MIDC methods can be used to examine the impact of missing threshold results in meta‐analysis of test accuracy studies. PMID:29052347

  5. Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review

    PubMed Central

    Mercieca-Bebber, Rebecca; Palmer, Michael J; Brundage, Michael; Stockler, Martin R; King, Madeleine T

    2016-01-01

    Objectives Patient-reported outcomes (PROs) provide important information about the impact of treatment from the patients' perspective. However, missing PRO data may compromise the interpretability and value of the findings. We aimed to report: (1) a non-technical summary of problems caused by missing PRO data; and (2) a systematic review by collating strategies to: (A) minimise rates of missing PRO data, and (B) facilitate transparent interpretation and reporting of missing PRO data in clinical research. Our systematic review does not address statistical handling of missing PRO data. Data sources MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases (inception to 31 March 2015), and citing articles and reference lists from relevant sources. Eligibility criteria English articles providing recommendations for reducing missing PRO data rates, or strategies to facilitate transparent interpretation and reporting of missing PRO data were included. Methods 2 reviewers independently screened articles against eligibility criteria. Discrepancies were resolved with the research team. Recommendations were extracted and coded according to framework synthesis. Results 117 sources (55% discussion papers, 26% original research) met the eligibility criteria. Design and methodological strategies for reducing rates of missing PRO data included: incorporating PRO-specific information into the protocol; carefully designing PRO assessment schedules and defining termination rules; minimising patient burden; appointing a PRO coordinator; PRO-specific training for staff; ensuring PRO studies are adequately resourced; and continuous quality assurance. Strategies for transparent interpretation and reporting of missing PRO data include utilising auxiliary data to inform analysis; transparently reporting baseline PRO scores, rates and reasons for missing data; and methods for handling missing PRO data. Conclusions The instance of missing PRO data and its potential to bias clinical research can be minimised by implementing thoughtful design, rigorous methodology and transparent reporting strategies. All members of the research team have a responsibility in implementing such strategies. PMID:27311907

  6. "Wish You Were Here": Examining Characteristics, Outcomes, and Statistical Solutions for Missing Cases in Web-Based Psychotherapeutic Trials.

    PubMed

    Karin, Eyal; Dear, Blake F; Heller, Gillian Z; Crane, Monique F; Titov, Nickolai

    2018-04-19

    Missing cases following treatment are common in Web-based psychotherapy trials. Without the ability to directly measure and evaluate the outcomes for missing cases, the ability to measure and evaluate the effects of treatment is challenging. Although common, little is known about the characteristics of Web-based psychotherapy participants who present as missing cases, their likely clinical outcomes, or the suitability of different statistical assumptions that can characterize missing cases. Using a large sample of individuals who underwent Web-based psychotherapy for depressive symptoms (n=820), the aim of this study was to explore the characteristics of cases who present as missing cases at posttreatment (n=138), their likely treatment outcomes, and compare between statistical methods for replacing their missing data. First, common participant and treatment features were tested through binary logistic regression models, evaluating the ability to predict missing cases. Second, the same variables were screened for their ability to increase or impede the rate symptom change that was observed following treatment. Third, using recontacted cases at 3-month follow-up to proximally represent missing cases outcomes following treatment, various simulated replacement scores were compared and evaluated against observed clinical follow-up scores. Missing cases were dominantly predicted by lower treatment adherence and increased symptoms at pretreatment. Statistical methods that ignored these characteristics can overlook an important clinical phenomenon and consequently produce inaccurate replacement outcomes, with symptoms estimates that can swing from -32% to 70% from the observed outcomes of recontacted cases. In contrast, longitudinal statistical methods that adjusted their estimates for missing cases outcomes by treatment adherence rates and baseline symptoms scores resulted in minimal measurement bias (<8%). Certain variables can characterize and predict missing cases likelihood and jointly predict lesser clinical improvement. Under such circumstances, individuals with potentially worst off treatment outcomes can become concealed, and failure to adjust for this can lead to substantial clinical measurement bias. Together, this preliminary research suggests that missing cases in Web-based psychotherapeutic interventions may not occur as random events and can be systematically predicted. Critically, at the same time, missing cases may experience outcomes that are distinct and important for a complete understanding of the treatment effect. ©Eyal Karin, Blake F Dear, Gillian Z Heller, Monique F Crane, Nickolai Titov. Originally published in JMIR Mental Health (http://mental.jmir.org), 19.04.2018.

  7. “Wish You Were Here”: Examining Characteristics, Outcomes, and Statistical Solutions for Missing Cases in Web-Based Psychotherapeutic Trials

    PubMed Central

    Dear, Blake F; Heller, Gillian Z; Crane, Monique F; Titov, Nickolai

    2018-01-01

    Background Missing cases following treatment are common in Web-based psychotherapy trials. Without the ability to directly measure and evaluate the outcomes for missing cases, the ability to measure and evaluate the effects of treatment is challenging. Although common, little is known about the characteristics of Web-based psychotherapy participants who present as missing cases, their likely clinical outcomes, or the suitability of different statistical assumptions that can characterize missing cases. Objective Using a large sample of individuals who underwent Web-based psychotherapy for depressive symptoms (n=820), the aim of this study was to explore the characteristics of cases who present as missing cases at posttreatment (n=138), their likely treatment outcomes, and compare between statistical methods for replacing their missing data. Methods First, common participant and treatment features were tested through binary logistic regression models, evaluating the ability to predict missing cases. Second, the same variables were screened for their ability to increase or impede the rate symptom change that was observed following treatment. Third, using recontacted cases at 3-month follow-up to proximally represent missing cases outcomes following treatment, various simulated replacement scores were compared and evaluated against observed clinical follow-up scores. Results Missing cases were dominantly predicted by lower treatment adherence and increased symptoms at pretreatment. Statistical methods that ignored these characteristics can overlook an important clinical phenomenon and consequently produce inaccurate replacement outcomes, with symptoms estimates that can swing from −32% to 70% from the observed outcomes of recontacted cases. In contrast, longitudinal statistical methods that adjusted their estimates for missing cases outcomes by treatment adherence rates and baseline symptoms scores resulted in minimal measurement bias (<8%). Conclusions Certain variables can characterize and predict missing cases likelihood and jointly predict lesser clinical improvement. Under such circumstances, individuals with potentially worst off treatment outcomes can become concealed, and failure to adjust for this can lead to substantial clinical measurement bias. Together, this preliminary research suggests that missing cases in Web-based psychotherapeutic interventions may not occur as random events and can be systematically predicted. Critically, at the same time, missing cases may experience outcomes that are distinct and important for a complete understanding of the treatment effect. PMID:29674311

  8. Probabilistic atlas based labeling of the cerebral vessel tree

    NASA Astrophysics Data System (ADS)

    Van de Giessen, Martijn; Janssen, Jasper P.; Brouwer, Patrick A.; Reiber, Johan H. C.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke

    2015-03-01

    Preoperative imaging of the cerebral vessel tree is essential for planning therapy on intracranial stenoses and aneurysms. Usually, a magnetic resonance angiography (MRA) or computed tomography angiography (CTA) is acquired from which the cerebral vessel tree is segmented. Accurate analysis is helped by the labeling of the cerebral vessels, but labeling is non-trivial due to anatomical topological variability and missing branches due to acquisition issues. In recent literature, labeling the cerebral vasculature around the Circle of Willis has mainly been approached as a graph-based problem. The most successful method, however, requires the definition of all possible permutations of missing vessels, which limits application to subsets of the tree and ignores spatial information about the vessel locations. This research aims to perform labeling using probabilistic atlases that model spatial vessel and label likelihoods. A cerebral vessel tree is aligned to a probabilistic atlas and subsequently each vessel is labeled by computing the maximum label likelihood per segment from label-specific atlases. The proposed method was validated on 25 segmented cerebral vessel trees. Labeling accuracies were close to 100% for large vessels, but dropped to 50-60% for small vessels that were only present in less than 50% of the set. With this work we showed that using solely spatial information of the vessel labels, vessel segments from stable vessels (>50% presence) were reliably classified. This spatial information will form the basis for a future labeling strategy with a very loose topological model.

  9. Missing value imputation for gene expression data by tailored nearest neighbors.

    PubMed

    Faisal, Shahla; Tutz, Gerhard

    2017-04-25

    High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.

  10. Missing data in FFQs: making assumptions about item non-response.

    PubMed

    Lamb, Karen E; Olstad, Dana Lee; Nguyen, Cattram; Milte, Catherine; McNaughton, Sarah A

    2017-04-01

    FFQs are a popular method of capturing dietary information in epidemiological studies and may be used to derive dietary exposures such as nutrient intake or overall dietary patterns and diet quality. As FFQs can involve large numbers of questions, participants may fail to respond to all questions, leaving researchers to decide how to deal with missing data when deriving intake measures. The aim of the present commentary is to discuss the current practice for dealing with item non-response in FFQs and to propose a research agenda for reporting and handling missing data in FFQs. Single imputation techniques, such as zero imputation (assuming no consumption of the item) or mean imputation, are commonly used to deal with item non-response in FFQs. However, single imputation methods make strong assumptions about the missing data mechanism and do not reflect the uncertainty created by the missing data. This can lead to incorrect inference about associations between diet and health outcomes. Although the use of multiple imputation methods in epidemiology has increased, these have seldom been used in the field of nutritional epidemiology to address missing data in FFQs. We discuss methods for dealing with item non-response in FFQs, highlighting the assumptions made under each approach. Researchers analysing FFQs should ensure that missing data are handled appropriately and clearly report how missing data were treated in analyses. Simulation studies are required to enable systematic evaluation of the utility of various methods for handling item non-response in FFQs under different assumptions about the missing data mechanism.

  11. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    PubMed

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  12. Missing value imputation in DNA microarrays based on conjugate gradient method.

    PubMed

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Meaning of Missing Values in Eyewitness Recall and Accident Records

    PubMed Central

    Uttl, Bob; Kisinger, Kelly

    2010-01-01

    Background Eyewitness recalls and accident records frequently do not mention the conditions and behaviors of interest to researchers and lead to missing values and to uncertainty about the prevalence of these conditions and behaviors surrounding accidents. Missing values may occur because eyewitnesses report the presence but not the absence of obvious clues/accident features. We examined this possibility. Methodology/Principal Findings Participants watched car accident videos and were asked to recall as much information as they could remember about each accident. The results showed that eyewitnesses were far more likely to report the presence of present obvious clues than the absence of absent obvious clues even though they were aware of their absence. Conclusions One of the principal mechanisms causing missing values may be eyewitnesses' tendency to not report the absence of obvious features. We discuss the implications of our findings for both retrospective and prospective analyses of accident records, and illustrate the consequences of adopting inappropriate assumptions about the meaning of missing values using the Avaluator Avalanche Accident Prevention Card. PMID:20824054

  14. Meaning of missing values in eyewitness recall and accident records.

    PubMed

    Uttl, Bob; Kisinger, Kelly

    2010-09-02

    Eyewitness recalls and accident records frequently do not mention the conditions and behaviors of interest to researchers and lead to missing values and to uncertainty about the prevalence of these conditions and behaviors surrounding accidents. Missing values may occur because eyewitnesses report the presence but not the absence of obvious clues/accident features. We examined this possibility. Participants watched car accident videos and were asked to recall as much information as they could remember about each accident. The results showed that eyewitnesses were far more likely to report the presence of present obvious clues than the absence of absent obvious clues even though they were aware of their absence. One of the principal mechanisms causing missing values may be eyewitnesses' tendency to not report the absence of obvious features. We discuss the implications of our findings for both retrospective and prospective analyses of accident records, and illustrate the consequences of adopting inappropriate assumptions about the meaning of missing values using the Avaluator Avalanche Accident Prevention Card.

  15. Selection-Fusion Approach for Classification of Datasets with Missing Values

    PubMed Central

    Ghannad-Rezaie, Mostafa; Soltanian-Zadeh, Hamid; Ying, Hao; Dong, Ming

    2010-01-01

    This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. PMID:20212921

  16. Accounting for unknown foster dams in the genetic evaluation of embryo transfer progeny.

    PubMed

    Suárez, M J; Munilla, S; Cantet, R J C

    2015-02-01

    Animals born by embryo transfer (ET) are usually not included in the genetic evaluation of beef cattle for preweaning growth if the recipient dam is unknown. This is primarily to avoid potential bias in the estimation of the unknown age of dam. We present a method that allows including records of calves with unknown age of dam. Assumptions are as follows: (i) foster cows belong to the same breed being evaluated, (ii) there is no correlation between the breeding value (BV) of the calf and the maternal BV of the recipient cow, and (iii) cows of all ages are used as recipients. We examine the issue of bias for the fixed level of unknown age of dam (AOD) and propose an estimator of the effect based on classical measurement error theory (MEM) and a Bayesian approach. Using stochastic simulation under random mating or selection, the MEM estimating equations were compared with BLUP in two situations as follows: (i) full information (FI); (ii) missing AOD information on some dams. Predictions of breeding value (PBV) from the FI situation had the smallest empirical average bias followed by PBV obtained without taking measurement error into account. In turn, MEM displayed the highest bias, although the differences were small. On the other hand, MEM showed the smallest MSEP, for either random mating or selection, followed by FI, whereas ignoring measurement error produced the largest MSEP. As a consequence from the smallest MSEP with a relatively small bias, empirical accuracies of PBV were larger for MEM than those for full information, which in turn showed larger accuracies than the situation ignoring measurement error. It is concluded that MEM equations are a useful alternative for analysing weaning weight data when recipient cows are unknown, as it mitigates the effects of bias in AOD by decreasing MSEP. © 2014 Blackwell Verlag GmbH.

  17. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

  18. Economic values under inappropriate normal distribution assumptions.

    PubMed

    Sadeghi-Sefidmazgi, A; Nejati-Javaremi, A; Moradi-Shahrbabak, M; Miraei-Ashtiani, S R; Amer, P R

    2012-08-01

    The objectives of this study were to quantify the errors in economic values (EVs) for traits affected by cost or price thresholds when skewed or kurtotic distributions of varying degree are assumed to be normal and when data with a normal distribution is subject to censoring. EVs were estimated for a continuous trait with dichotomous economic implications because of a price premium or penalty arising from a threshold ranging between -4 and 4 standard deviations from the mean. In order to evaluate the impacts of skewness, positive and negative excess kurtosis, standard skew normal, Pearson and the raised cosine distributions were used, respectively. For the various evaluable levels of skewness and kurtosis, the results showed that EVs can be underestimated or overestimated by more than 100% when price determining thresholds fall within a range from the mean that might be expected in practice. Estimates of EVs were very sensitive to censoring or missing data. In contrast to practical genetic evaluation, economic evaluation is very sensitive to lack of normality and missing data. Although in some special situations, the presence of multiple thresholds may attenuate the combined effect of errors at each threshold point, in practical situations there is a tendency for a few key thresholds to dominate the EV, and there are many situations where errors could be compounded across multiple thresholds. In the development of breeding objectives for non-normal continuous traits influenced by value thresholds, it is necessary to select a transformation that will resolve problems of non-normality or consider alternative methods that are less sensitive to non-normality.

  19. 77 FR 60089 - Approval and Promulgation of Air Quality Implementation Plans; Delaware, New Jersey, and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-02

    ... quarter substitution test. ``Collocated'' indicates that the collocated data was substituted for missing... 24-hour standard design value is greater than the level of the standard. EPA addresses missing data... substituted for the missing data. In the maximum quarter test, maximum recorded values are substituted for the...

  20. Effect of data gaps on correlation dimension computed from light curves of variable stars

    NASA Astrophysics Data System (ADS)

    George, Sandip V.; Ambika, G.; Misra, R.

    2015-11-01

    Observational data, especially astrophysical data, is often limited by gaps in data that arises due to lack of observations for a variety of reasons. Such inadvertent gaps are usually smoothed over using interpolation techniques. However the smoothing techniques can introduce artificial effects, especially when non-linear analysis is undertaken. We investigate how gaps can affect the computed values of correlation dimension of the system, without using any interpolation. For this we introduce gaps artificially in synthetic data derived from standard chaotic systems, like the Rössler and Lorenz, with frequency of occurrence and size of missing data drawn from two Gaussian distributions. Then we study the changes in correlation dimension with change in the distributions of position and size of gaps. We find that for a considerable range of mean gap frequency and size, the value of correlation dimension is not significantly affected, indicating that in such specific cases, the calculated values can still be reliable and acceptable. Thus our study introduces a method of checking the reliability of computed correlation dimension values by calculating the distribution of gaps with respect to its size and position. This is illustrated for the data from light curves of three variable stars, R Scuti, U Monocerotis and SU Tauri. We also demonstrate how a cubic spline interpolation can cause a time series of Gaussian noise with missing data to be misinterpreted as being chaotic in origin. This is demonstrated for the non chaotic light curve of variable star SS Cygni, which gives a saturated D2 value, when interpolated using a cubic spline. In addition we also find that a careful choice of binning, in addition to reducing noise, can help in shifting the gap distribution to the reliable range for D2 values.

  1. Treatment of Missing Data in Workforce Education Research

    ERIC Educational Resources Information Center

    Gemici, Sinan; Rojewski, Jay W.; Lee, In Heok

    2012-01-01

    Most quantitative analyses in workforce education are affected by missing data. Traditional approaches to remedy missing data problems often result in reduced statistical power and biased parameter estimates due to systematic differences between missing and observed values. This article examines the treatment of missing data in pertinent…

  2. Results of Database Studies in Spine Surgery Can Be Influenced by Missing Data.

    PubMed

    Basques, Bryce A; McLynn, Ryan P; Fice, Michael P; Samuel, Andre M; Lukasiewicz, Adam M; Bohl, Daniel D; Ahn, Junyoung; Singh, Kern; Grauer, Jonathan N

    2017-12-01

    National databases are increasingly being used for research in spine surgery; however, one limitation of such databases that has received sparse mention is the frequency of missing data. Studies using these databases often do not emphasize the percentage of missing data for each variable used and do not specify how patients with missing data are incorporated into analyses. This study uses the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database to examine whether different treatments of missing data can influence the results of spine studies. (1) What is the frequency of missing data fields for demographics, medical comorbidities, preoperative laboratory values, operating room times, and length of stay recorded in ACS-NSQIP? (2) Using three common approaches to handling missing data, how frequently do those approaches agree in terms of finding particular variables to be associated with adverse events? (3) Do different approaches to handling missing data influence the outcomes and effect sizes of an analysis testing for an association with these variables with occurrence of adverse events? Patients who underwent spine surgery between 2005 and 2013 were identified from the ACS-NSQIP database. A total of 88,471 patients undergoing spine surgery were identified. The most common procedures were anterior cervical discectomy and fusion, lumbar decompression, and lumbar fusion. Demographics, comorbidities, and perioperative laboratory values were tabulated for each patient, and the percent of missing data was noted for each variable. These variables were tested for an association with "any adverse event" using three separate multivariate regressions that used the most common treatments for missing data. In the first regression, patients with any missing data were excluded. In the second regression, missing data were treated as a negative or "reference" value; for continuous variables, the mean of each variable's reference range was computed and imputed. In the third regression, any variables with > 10% rate of missing data were removed from the regression; among variables with ≤ 10% missing data, individual cases with missing values were excluded. The results of these regressions were compared to determine how the different treatments of missing data could affect the results of spine studies using the ACS-NSQIP database. Of the 88,471 patients, as many as 4441 (5%) had missing elements among demographic data, 69,184 (72%) among comorbidities, 70,892 (80%) among preoperative laboratory values, and 56,551 (64%) among operating room times. Considering the three different treatments of missing data, we found different risk factors for adverse events. Of 44 risk factors found to be associated with adverse events in any analysis, only 15 (34%) of these risk factors were common among the three regressions. The second treatment of missing data (assuming "normal" value) found the most risk factors (40) to be associated with any adverse event, whereas the first treatment (deleting patients with missing data) found the fewest associations at 20. Among the risk factors associated with any adverse event, the 10 with the greatest effect size (odds ratio) by each regression were ranked. Of the 15 variables in the top 10 for any regression, six of these were common among all three lists. Differing treatments of missing data can influence the results of spine studies using the ACS-NSQIP. The current study highlights the importance of considering how such missing data are handled. Until there are better guidelines on the best approaches to handle missing data, investigators should report how missing data were handled to increase the quality and transparency of orthopaedic database research. Readers of large database studies should note whether handling of missing data was addressed and consider potential bias with high rates or unspecified or weak methods for handling missing data.

  3. [The Attitude to One's Own Health Among Native Residents of Yakutia].

    PubMed

    Ammosova, E P; Zakharova, R N; Klimova, T M; Timofeieva, A V; Fedorov, A I; Baltakhinova, M E

    2017-07-01

    The study was carried out concerning value reference points and attitudes of native population of the Republic of Sakha (Yakutia) intended for health preservation. The analysis was applied to answers of 292 respondents residing in rural area of Yakutia. The analysis of answers of value motivation section demonstrated that for most respondents their health is a priority value of life. At that, most of respondents misunderstand and underestimate role of health in their life because they are unaware of health as an instrumental value. The health as an instrumental value gives its priority to persistence and diligence and hence ranks lower as compared with terminal value. The respondents assume that health mainly depends on diet, life-style, ecology. The results of analysis of answers to questions of "emotional" and "behavioral" sections testify in most of the respondents absence of comprehension about responsibility for one's own health, inadequate commitment to healthy life-style and passive attitude to health. The more active attitude to health is established in respondents of mature age that is rather related to deterioration of well-being feeling and presence of chronic diseases. In the younger age, most of respondents consider they are healthy and hence ignore prevention of diseases. Thereby, in spite of that health is mentioned by respondents as one of priority values most respondents are missing both attitudes to health preservation and clear-cut strategy of health-preserving behavior. In conditions of cardinal alteration of traditional way of life and life-style of native population of the North, behavioral habits of self-preserving behavior helping to survive in the severe climate conditions became inadequate. Nowadays, it is necessary to develop a new behavior strategy of health preservation.

  4. Extended resource allocation index for link prediction of complex network

    NASA Astrophysics Data System (ADS)

    Liu, Shuxin; Ji, Xinsheng; Liu, Caixia; Bai, Yi

    2017-08-01

    Recently, a number of similarity-based methods have been proposed to predict the missing links in complex network. Among these indices, the resource allocation index performs very well with lower time complexity. However, it ignores potential resources transferred by local paths between two endpoints. Motivated by the resource exchange taking places between endpoints, an extended resource allocation index is proposed. Empirical study on twelve real networks and three synthetic dynamic networks has shown that the index we proposed can achieve a good performance, compared with eight mainstream baselines.

  5. Quantifying cerebral asymmetries for language in dextrals and adextrals with random-effects meta analysis

    PubMed Central

    Carey, David P.; Johnstone, Leah T.

    2014-01-01

    Speech and language-related functions tend to depend on the left hemisphere more than the right in most right-handed (dextral) participants. This relationship is less clear in non-right handed (adextral) people, resulting in surprisingly polarized opinion on whether or not they are as lateralized as right handers. The present analysis investigates this issue by largely ignoring methodological differences between the different neuroscientific approaches to language lateralization, as well as discrepancies in how dextral and adextral participants were recruited or defined. Here we evaluate the tendency for dextrals to be more left hemisphere dominant than adextrals, using random effects meta analyses. In spite of several limitations, including sample size (in the adextrals in particular), missing details on proportions of groups who show directional effects in many experiments, and so on, the different paradigms all point to proportionally increased left hemispheric dominance in the dextrals. These results are analyzed in light of the theoretical importance of these subtle differences for understanding the cognitive neuroscience of language, as well as the unusual asymmetry in most adextrals. PMID:25408673

  6. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...

  7. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...

  8. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...

  9. Optimal simultaneous superpositioning of multiple structures with missing data

    PubMed Central

    Theobald, Douglas L.; Steindel, Phillip A.

    2012-01-01

    Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually ‘missing’ from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation–maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. Contact: dtheobald@brandeis.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22543369

  10. Establishing a threshold for the number of missing days using 7 d pedometer data.

    PubMed

    Kang, Minsoo; Hart, Peter D; Kim, Youngdeok

    2012-11-01

    The purpose of this study was to examine the threshold of the number of missing days of recovery using the individual information (II)-centered approach. Data for this study came from 86 participants, aged from 17 to 79 years old, who had 7 consecutive days of complete pedometer (Yamax SW 200) wear. Missing datasets (1 d through 5 d missing) were created by a SAS random process 10,000 times each. All missing values were replaced using the II-centered approach. A 7 d average was calculated for each dataset, including the complete dataset. Repeated measure ANOVA was used to determine the differences between 1 d through 5 d missing datasets and the complete dataset. Mean absolute percentage error (MAPE) was also computed. Mean (SD) daily step count for the complete 7 d dataset was 7979 (3084). Mean (SD) values for the 1 d through 5 d missing datasets were 8072 (3218), 8066 (3109), 7968 (3273), 7741 (3050) and 8314 (3529), respectively (p > 0.05). The lower MAPEs were estimated for 1 d missing (5.2%, 95% confidence interval (CI) 4.4-6.0) and 2 d missing (8.4%, 95% CI 7.0-9.8), while all others were greater than 10%. The results of this study show that the 1 d through 5 d missing datasets, with replaced values, were not significantly different from the complete dataset. Based on the MAPE results, it is not recommended to replace more than two days of missing step counts.

  11. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... measured parameters used in the GHG emissions calculations is required (e.g., carbon content values, etc... such estimates. (a) For each missing value of the monthly carbon content of calcined petroleum coke the substitute data value shall be the arithmetic average of the quality-assured values of carbon contents for...

  12. Improvement of Parameter Estimations in Tumor Growth Inhibition Models on Xenografted Animals: Handling Sacrifice Censoring and Error Caused by Experimental Measurement on Larger Tumor Sizes.

    PubMed

    Pierrillas, Philippe B; Tod, Michel; Amiel, Magali; Chenel, Marylore; Henin, Emilie

    2016-09-01

    The purpose of this study was to explore the impact of censoring due to animal sacrifice on parameter estimates and tumor volume calculated from two diameters in larger tumors during tumor growth experiments in preclinical studies. The type of measurement error that can be expected was also investigated. Different scenarios were challenged using the stochastic simulation and estimation process. One thousand datasets were simulated under the design of a typical tumor growth study in xenografted mice, and then, eight approaches were used for parameter estimation with the simulated datasets. The distribution of estimates and simulation-based diagnostics were computed for comparison. The different approaches were robust regarding the choice of residual error and gave equivalent results. However, by not considering missing data induced by sacrificing the animal, parameter estimates were biased and led to false inferences in terms of compound potency; the threshold concentration for tumor eradication when ignoring censoring was 581 ng.ml(-1), but the true value was 240 ng.ml(-1).

  13. A pattern-mixture model approach for handling missing continuous outcome data in longitudinal cluster randomized trials.

    PubMed

    Fiero, Mallorie H; Hsu, Chiu-Hsieh; Bell, Melanie L

    2017-11-20

    We extend the pattern-mixture approach to handle missing continuous outcome data in longitudinal cluster randomized trials, which randomize groups of individuals to treatment arms, rather than the individuals themselves. Individuals who drop out at the same time point are grouped into the same dropout pattern. We approach extrapolation of the pattern-mixture model by applying multilevel multiple imputation, which imputes missing values while appropriately accounting for the hierarchical data structure found in cluster randomized trials. To assess parameters of interest under various missing data assumptions, imputed values are multiplied by a sensitivity parameter, k, which increases or decreases imputed values. Using simulated data, we show that estimates of parameters of interest can vary widely under differing missing data assumptions. We conduct a sensitivity analysis using real data from a cluster randomized trial by increasing k until the treatment effect inference changes. By performing a sensitivity analysis for missing data, researchers can assess whether certain missing data assumptions are reasonable for their cluster randomized trial. Copyright © 2017 John Wiley & Sons, Ltd.

  14. 78 FR 49403 - Approval and Promulgation of Air Quality Implementation Plans; Pennsylvania; Determination of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-14

    ... requirement for one or more quarters during 2010-2012 monitoring period. EPA has addressed missing data from... recorded values are substituted for the missing data, and the resulting 24-hour design value is compared to... missing data from the Greensburg monitor by performing a statistical analysis of the data, in which a...

  15. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  16. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

    PubMed Central

    Palmer, Cameron; Pe’er, Itsik

    2016-01-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  17. Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review.

    PubMed

    Mercieca-Bebber, Rebecca; Palmer, Michael J; Brundage, Michael; Calvert, Melanie; Stockler, Martin R; King, Madeleine T

    2016-06-15

    Patient-reported outcomes (PROs) provide important information about the impact of treatment from the patients' perspective. However, missing PRO data may compromise the interpretability and value of the findings. We aimed to report: (1) a non-technical summary of problems caused by missing PRO data; and (2) a systematic review by collating strategies to: (A) minimise rates of missing PRO data, and (B) facilitate transparent interpretation and reporting of missing PRO data in clinical research. Our systematic review does not address statistical handling of missing PRO data. MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases (inception to 31 March 2015), and citing articles and reference lists from relevant sources. English articles providing recommendations for reducing missing PRO data rates, or strategies to facilitate transparent interpretation and reporting of missing PRO data were included. 2 reviewers independently screened articles against eligibility criteria. Discrepancies were resolved with the research team. Recommendations were extracted and coded according to framework synthesis. 117 sources (55% discussion papers, 26% original research) met the eligibility criteria. Design and methodological strategies for reducing rates of missing PRO data included: incorporating PRO-specific information into the protocol; carefully designing PRO assessment schedules and defining termination rules; minimising patient burden; appointing a PRO coordinator; PRO-specific training for staff; ensuring PRO studies are adequately resourced; and continuous quality assurance. Strategies for transparent interpretation and reporting of missing PRO data include utilising auxiliary data to inform analysis; transparently reporting baseline PRO scores, rates and reasons for missing data; and methods for handling missing PRO data. The instance of missing PRO data and its potential to bias clinical research can be minimised by implementing thoughtful design, rigorous methodology and transparent reporting strategies. All members of the research team have a responsibility in implementing such strategies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  18. mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.

    PubMed

    Lee, Geunho; Lee, Hyun Beom; Jung, Byung Hwa; Nam, Hojung

    2017-07-01

    Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These 'dirty data' problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine-learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open-source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp-applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.

  19. Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

    PubMed

    Chiu, Chia-Chun; Chan, Shih-Yao; Wang, Chung-Ching; Wu, Wei-Sheng

    2013-01-01

    Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.

  20. Depth inpainting by tensor voting.

    PubMed

    Kulkarni, Mandar; Rajagopalan, Ambasamudram N

    2013-06-01

    Depth maps captured by range scanning devices or by using optical cameras often suffer from missing regions due to occlusions, reflectivity, limited scanning area, sensor imperfections, etc. In this paper, we propose a fast and reliable algorithm for depth map inpainting using the tensor voting (TV) framework. For less complex missing regions, local edge and depth information is utilized for synthesizing missing values. The depth variations are modeled by local planes using 3D TV, and missing values are estimated using plane equations. For large and complex missing regions, we collect and evaluate depth estimates from self-similar (training) datasets. We align the depth maps of the training set with the target (defective) depth map and evaluate the goodness of depth estimates among candidate values using 3D TV. We demonstrate the effectiveness of the proposed approaches on real as well as synthetic data.

  1. Modified Dempster-Shafer approach using an expected utility interval decision rule

    NASA Astrophysics Data System (ADS)

    Cheaito, Ali; Lecours, Michael; Bosse, Eloi

    1999-03-01

    The combination operation of the conventional Dempster- Shafer algorithm has a tendency to increase exponentially the number of propositions involved in bodies of evidence by creating new ones. The aim of this paper is to explore a 'modified Dempster-Shafer' approach of fusing identity declarations emanating form different sources which include a number of radars, IFF and ESM systems in order to limit the explosion of the number of propositions. We use a non-ad hoc decision rule based on the expected utility interval to select the most probable object in a comprehensive Platform Data Base containing all the possible identity values that a potential target may take. We study the effect of the redistribution of the confidence levels of the eliminated propositions which otherwise overload the real-time data fusion system; these eliminated confidence levels can in particular be assigned to ignorance, or uniformly added to the remaining propositions and to ignorance. A scenario has been selected to demonstrate the performance of our modified Dempster-Shafer method of evidential reasoning.

  2. Robust Coefficients Alpha and Omega and Confidence Intervals With Outlying Observations and Missing Data: Methods and Software.

    PubMed

    Zhang, Zhiyong; Yuan, Ke-Hai

    2016-06-01

    Cronbach's coefficient alpha is a widely used reliability measure in social, behavioral, and education sciences. It is reported in nearly every study that involves measuring a construct through multiple items. With non-tau-equivalent items, McDonald's omega has been used as a popular alternative to alpha in the literature. Traditional estimation methods for alpha and omega often implicitly assume that data are complete and normally distributed. This study proposes robust procedures to estimate both alpha and omega as well as corresponding standard errors and confidence intervals from samples that may contain potential outlying observations and missing values. The influence of outlying observations and missing data on the estimates of alpha and omega is investigated through two simulation studies. Results show that the newly developed robust method yields substantially improved alpha and omega estimates as well as better coverage rates of confidence intervals than the conventional nonrobust method. An R package coefficientalpha is developed and demonstrated to obtain robust estimates of alpha and omega.

  3. Robust Coefficients Alpha and Omega and Confidence Intervals With Outlying Observations and Missing Data

    PubMed Central

    Zhang, Zhiyong; Yuan, Ke-Hai

    2015-01-01

    Cronbach’s coefficient alpha is a widely used reliability measure in social, behavioral, and education sciences. It is reported in nearly every study that involves measuring a construct through multiple items. With non-tau-equivalent items, McDonald’s omega has been used as a popular alternative to alpha in the literature. Traditional estimation methods for alpha and omega often implicitly assume that data are complete and normally distributed. This study proposes robust procedures to estimate both alpha and omega as well as corresponding standard errors and confidence intervals from samples that may contain potential outlying observations and missing values. The influence of outlying observations and missing data on the estimates of alpha and omega is investigated through two simulation studies. Results show that the newly developed robust method yields substantially improved alpha and omega estimates as well as better coverage rates of confidence intervals than the conventional nonrobust method. An R package coefficientalpha is developed and demonstrated to obtain robust estimates of alpha and omega. PMID:29795870

  4. Nearest neighbor imputation using spatial–temporal correlations in wireless sensor networks

    PubMed Central

    Li, YuanYuan; Parker, Lynne E.

    2016-01-01

    Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network’s performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a kd-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for kd-tree construction, and Euclidean distance for kd-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental results show that our proposed 𝒦-NN imputation method has a competitive accuracy with state-of-the-art Expectation–Maximization (EM) techniques, while using much simpler computational techniques, thus making it suitable for use in resource-constrained WSNs. PMID:28435414

  5. Non-invasive collection and analysis of semen in wild macaques.

    PubMed

    Thomsen, Ruth

    2014-04-01

    Assessments of primate male fertility via semen analyses are so far restricted to captivity. This study describes a non-invasive method to collect and analyse semen in wild primates, based on fieldwork with Yakushima macaques (Macaca fuscata yakui). Over nine mating seasons between 1993 and 2010, 128 masturbatory ejaculations were recorded in 21 males of 5 study troops, and in 11 non-troop males. In 55%, ejaculate volume was directly estimated, and in 37%, pH-value, sperm vitality, numbers, morphology and swimming velocity could also be determined. This approach of assessing semen production rates and individual male fertility can be applied to other primate taxa, in particular to largely terrestrial populations where males masturbate frequently, such as macaques and baboons. Furthermore, since explanations of male reproductive skew in non-human primate populations have until now ignored the potential role of semen quality, the method presented here will also help to answer this question.

  6. 29 CFR Appendix B to Part 4050 - Examples of Benefit Payments for Missing Participants Under §§ 4050.8 Through 4050.10

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ...) of the definition of “missing participant annuity assumptions” in § 4050.2, the present value as of... Plan B's deemed distribution date (and using the missing participant annuity assumptions), the present value per dollar of annual benefit (payable monthly as a joint and 50 percent survivor annuity...

  7. Effects of cloud cover and meteorology in estimating ground-level air pollution using MAIAC AOD in the Yangtze River Delta

    NASA Astrophysics Data System (ADS)

    Xiao, Q.; Liu, Y.

    2017-12-01

    Satellite aerosol optical depth (AOD) has been used to assess fine particulate matter (PM2.5) pollution worldwide. However, non-random missing AOD due to cloud cover or high surface reflectance can cause up to 80% data loss and bias model-estimated spatial and temporal trends of PM2.5. Previous studies filled the data gap largely by spatial smoothing which ignored the impact of cloud cover and meteorology on aerosol loadings and has been shown to exhibit poor performance when monitoring stations are sparse or when there is seasonal large-scale missingness. Using the Yangtze River Delta of China as an example, we present a flexible Multiple Imputation (MI) method that combines cloud fraction, elevation, humidity, temperature, and spatiotemporal trends to impute the missing AOD. A two-stage statistical model driven by gap-filled AOD, meteorology and land use information was then fitted to estimate daily ground PM2.5 concentrations in 2013 and 2014 at 1 km resolution with complete coverage in space and time. The daily MI models have an average R2 of 0.77, with an inter-quartile range of 0.71 to 0.82 across days. The overall model 10-fold cross-validation R2 were 0.81 and 0.73 (for year 2013 and 2014, respectively. Predictions with only observational AOD or only imputed AOD showed similar accuracy. This method provides reliable PM2.5 predictions with complete coverage at high resolution. By including all the pixels of all days into model development, this method corrected the sampling bias in satellite-driven air pollution modelling due to non-random missingness in AOD. Comparing with previously reported gap-filling methods, the MI method has the strength of not relying on ground PM2.5 measurements, therefore allows the prediction of historical PM2.5 levels prior to the establishment of regular ground monitoring networks.

  8. Dealing with missing data in remote sensing images within land and crop classification

    NASA Astrophysics Data System (ADS)

    Skakun, Sergii; Kussul, Nataliia; Basarab, Ruslan

    Optical remote sensing images from space provide valuable data for environmental monitoring, disaster management [1], agriculture mapping [2], so forth. In many cases, a time-series of satellite images is used to discriminate or estimate particular land parameters. One of the factors that influence the efficiency of satellite imagery is the presence of clouds. This leads to the occurrence of missing data that need to be addressed. Numerous approaches have been proposed to fill in missing data (or gaps) and can be categorized into inpainting-based, multispectral-based, and multitemporal-based. In [3], ancillary MODIS data are utilized for filling gaps and predicting Landsat data. In this paper we propose to use self-organizing Kohonen maps (SOMs) for missing data restoration in time-series of satellite imagery. Such approach was previously used for MODIS data [4], but applying this approach for finer spatial resolution data such as Sentinel-2 and Landsat-8 represents a challenge. Moreover, data for training the SOMs are selected manually in [4] that complicates the use of the method in an automatic mode. SOM is a type of artificial neural network that is trained using unsupervised learning to produce a discretised representation of the input space of the training samples, called a map. The map seeks to preserve the topological properties of the input space. The reconstruction of satellite images is performed for each spectral band separately, i.e. a separate SOM is trained for each spectral band. Pixels that have no missing values in the time-series are selected for training. Selecting the number of training pixels represent a trade-off, in particular increasing the number of training samples will lead to the increased time of SOM training while increasing the quality of restoration. Also, training data sets should be selected automatically. As such, we propose to select training samples on a regular grid of pixels. Therefore, the SOM seeks to project a large number of non-missing data to the subspace vectors in the map. Restoration of the missing values is performed in the following way. The multi-temporal pixel values (with gaps) are put to the neural network. A neuron-winner (or a best matching unit, BMU) in the SOM is selected based on the distance metric (for example, Euclidian). It should be noted that missing values are omitted from metric estimation when selecting BMU. When the BMU is selected, missing values are substituted by corresponding components of the BMU values. The efficiency of the proposed approach was tested on a time-series of Landsat-8 images over the JECAM test site in Ukraine and Sich-2 images over Crimea (Sich-2 is Ukrainian remote sensing satellite acquiring images at 8m spatial resolution). Landsat-8 images were first converted to the TOA reflectance, and then were atmospherically corrected so each pixel value represents a surface reflectance in the range from 0 to 1. The error of reconstruction (error of quantization) on training data was: band-2: 0.015; band-3: 0.020; band-4: 0.026; band-5: 0.070; band-6: 0.060; band-7: 0.055. The reconstructed images were also used for crop classification using a multi-layer perceptron (MLP). Overall accuracy was 85.98% and Cohen's kappa was 0.83. References. 1. Skakun, S., Kussul, N., Shelestov, A. and Kussul, O. “Flood Hazard and Flood Risk Assessment Using a Time Series of Satellite Images: A Case Study in Namibia,” Risk Analysis, 2013, doi: 10.1111/risa.12156. 2. Gallego, F.J., Kussul, N., Skakun, S., Kravchenko, O., Shelestov, A., Kussul, O. “Efficiency assessment of using satellite data for crop area estimation in Ukraine,” International Journal of Applied Earth Observation and Geoinformation, vol. 29, pp. 22-30, 2014. 3. Roy D.P., Ju, J., Lewis, P., Schaaf, C., Gao, F., Hansen, M., and Lindquist, E., “Multi-temporal MODIS-Landsat data fusion for relative radiometric normalization, gap filling, and prediction of Landsat data,” Remote Sensing of Environment, 112(6), pp. 3112-3130, 2008. 4. Latif, B.A., and Mercier, G., “Self-Organizing maps for processing of data with missing values and outliers: application to remote sensing images,” Self-Organizing Maps. InTech, pp. 189-210, 2010.

  9. [Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM].

    PubMed

    Jiang, Z; Dou, Z; Song, W L; Xu, J; Wu, Z Y

    2017-11-10

    Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.

  10. Accounting for Attribute-Level Non-Attendance in a Health Choice Experiment: Does it Matter?

    PubMed

    Erdem, Seda; Campbell, Danny; Hole, Arne Risa

    2015-07-01

    An extensive literature has established that it is common for respondents to ignore attributes of the alternatives within choice experiments. In most of the studies on attribute non-attendance, it is assumed that respondents consciously (or unconsciously) ignore one or more attributes of the alternatives, regardless of their levels. In this paper, we present a new line of enquiry and approach for modelling non-attendance in the context of investigating preferences for health service innovations. This approach recognises that non-attendance may not just be associated with attributes but may also apply to the attribute's levels. Our results show that respondents process each level of an attribute differently: while attending to the attribute, they ignore a subset of the attribute's levels. In such cases, the usual approach of assuming that respondents either attend to the attribute or not, irrespective of its levels, is erroneous and could lead to misguided policy recommendations. Our results indicate that allowing for attribute-level non-attendance leads to substantial improvements in the model fit and has an impact on estimated marginal willingness to pay and choice predictions. Copyright © 2014 John Wiley & Sons, Ltd.

  11. Kalman Filtering for Genetic Regulatory Networks with Missing Values

    PubMed Central

    Liu, Qiuhua; Lai, Tianyue; Wang, Wu

    2017-01-01

    The filter problem with missing value for genetic regulation networks (GRNs) is addressed, in which the noises exist in both the state dynamics and measurement equations; furthermore, the correlation between process noise and measurement noise is also taken into consideration. In order to deal with the filter problem, a class of discrete-time GRNs with missing value, noise correlation, and time delays is established. Then a new observation model is proposed to decrease the adverse effect caused by the missing value and to decouple the correlation between process noise and measurement noise in theory. Finally, a Kalman filtering is used to estimate the states of GRNs. Meanwhile, a typical example is provided to verify the effectiveness of the proposed method, and it turns out to be the case that the concentrations of mRNA and protein could be estimated accurately. PMID:28814967

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pichara, Karim; Protopapas, Pavlos

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine howmore » classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.« less

  13. The detection of the imprint of filaments on cosmic microwave background lensing

    NASA Astrophysics Data System (ADS)

    He, Siyu; Alam, Shadab; Ferraro, Simone; Chen, Yen-Chi; Ho, Shirley

    2018-05-01

    Galaxy redshift surveys, such as the 2-Degree-Field Survey (2dF)1, Sloan Digital Sky Survey (SDSS)2, 6-Degree-Field Survey (6dF)3, Galaxy And Mass Assembly survey (GAMA)4 and VIMOS Public Extragalactic Redshift Survey (VIPERS)5, have shown that the spatial distribution of matter forms a rich web, known as the cosmic web6. Most galaxy survey analyses measure the amplitude of galaxy clustering as a function of scale, ignoring information beyond a small number of summary statistics. Because the matter density field becomes highly non-Gaussian as structure evolves under gravity, we expect other statistical descriptions of the field to provide us with additional information. One way to study the non-Gaussianity is to study filaments, which evolve non-linearly from the initial density fluctuations produced in the primordial Universe. In our study, we report the detection of lensing of the cosmic microwave background (CMB) by filaments, and we apply a null test to confirm our detection. Furthermore, we propose a phenomenological model to interpret the detected signal, and we measure how filaments trace the matter distribution on large scales through filament bias, which we measure to be around 1.5. Our study provides new scope to understand the environmental dependence of galaxy formation. In the future, the joint analysis of lensing and Sunyaev-Zel'dovich observations might reveal the properties of `missing baryons', the vast majority of the gas that resides in the intergalactic medium, which has so far evaded most observations.

  14. Identification of significant features by the Global Mean Rank test.

    PubMed

    Klammer, Martin; Dybowski, J Nikolaj; Hoffmann, Daniel; Schaab, Christoph

    2014-01-01

    With the introduction of omics-technologies such as transcriptomics and proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions of features. With the MeanRank test we aimed at developing a test that performs robustly under these conditions, while favorably scaling with the number of replicates. The test proposed here is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation. In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset. The tests were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies. MeanRank outperformed the other global rank-based methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed similar or better. Furthermore, MeanRank exhibits more consistent behavior regarding the degree of regulation and is robust against the choice of preprocessing methods. MeanRank does not require any imputation of missing values, is easy to understand, and yields results that are easy to interpret. The software implementing the algorithm is freely available for academic and commercial use.

  15. Benefits of rebuilding global marine fisheries outweigh costs.

    PubMed

    Sumaila, Ussif Rashid; Cheung, William; Dyck, Andrew; Gueye, Kamal; Huang, Ling; Lam, Vicky; Pauly, Daniel; Srinivasan, Thara; Swartz, Wilf; Watson, Reginald; Zeller, Dirk

    2012-01-01

    Global marine fisheries are currently underperforming, largely due to overfishing. An analysis of global databases finds that resource rent net of subsidies from rebuilt world fisheries could increase from the current negative US$13 billion to positive US$54 billion per year, resulting in a net gain of US$600 to US$1,400 billion in present value over fifty years after rebuilding. To realize this gain, governments need to implement a rebuilding program at a cost of about US$203 (US$130-US$292) billion in present value. We estimate that it would take just 12 years after rebuilding begins for the benefits to surpass the cost. Even without accounting for the potential boost to recreational fisheries, and ignoring ancillary and non-market values that would likely increase, the potential benefits of rebuilding global fisheries far outweigh the costs.

  16. Benefits of Rebuilding Global Marine Fisheries Outweigh Costs

    PubMed Central

    Sumaila, Ussif Rashid; Cheung, William; Dyck, Andrew; Gueye, Kamal; Huang, Ling; Lam, Vicky; Pauly, Daniel; Srinivasan, Thara; Swartz, Wilf; Watson, Reginald; Zeller, Dirk

    2012-01-01

    Global marine fisheries are currently underperforming, largely due to overfishing. An analysis of global databases finds that resource rent net of subsidies from rebuilt world fisheries could increase from the current negative US$13 billion to positive US$54 billion per year, resulting in a net gain of US$600 to US$1,400 billion in present value over fifty years after rebuilding. To realize this gain, governments need to implement a rebuilding program at a cost of about US$203 (US$130–US$292) billion in present value. We estimate that it would take just 12 years after rebuilding begins for the benefits to surpass the cost. Even without accounting for the potential boost to recreational fisheries, and ignoring ancillary and non-market values that would likely increase, the potential benefits of rebuilding global fisheries far outweigh the costs. PMID:22808187

  17. Evaluation of missing value methods for predicting ambient BTEX concentrations in two neighbouring cities in Southwestern Ontario Canada

    NASA Astrophysics Data System (ADS)

    Miller, Lindsay; Xu, Xiaohong; Wheeler, Amanda; Zhang, Tianchu; Hamadani, Mariam; Ejaz, Unam

    2018-05-01

    High density air monitoring campaigns provide spatial patterns of pollutant concentrations which are integral in exposure assessment. Such analysis can assist with the determination of links between air quality and health outcomes, however, problems due to missing data can threaten to compromise these studies. This research evaluates four methods; mean value imputation, inverse distance weighting (IDW), inter-species ratios, and regression, to address missing spatial concentration data ranging from one missing data point up to 50% missing data. BTEX (benzene, toluene, ethylbenzene, and xylenes) concentrations were measured in Windsor and Sarnia, Ontario in the fall of 2005. Concentrations and inter-species ratios were generally similar between the two cities. Benzene (B) was observed to be higher in Sarnia, whereas toluene (T) and the T/B ratios were higher in Windsor. Using these urban, industrialized cities as case studies, this research demonstrates that using inter-species ratios or regression of the data for which there is complete information, along with one measured concentration (i.e. benzene) to predict for missing concentrations (i.e. TEX) results in good agreement between predicted and measured values. In both cities, the general trend remains that best agreement is observed for the leave-one-out scenario, followed by 10% and 25% missing, and the least agreement for the 50% missing cases. In the absence of any known concentrations IDW can provide reasonable agreement between observed and estimated concentrations for the BTEX species, and was superior over mean value imputation which was not able to preserve the spatial trend. The proposed methods can be used to fill in missing data, while preserving the general characteristics and rank order of the data which are sufficient for epidemiologic studies.

  18. How much of the income inequality effect can be explained by public policy? Evidence from oral health in Brazil.

    PubMed

    Celeste, Roger Keller; Nadanovsky, Paulo

    2010-10-01

    To evaluate the association between income inequality, a public policy scale and to oral health. Analysis, using the Brazilian oral health survey in 2002-2003, included 23,573 15-19-year-old subjects clustered in 330 municipalities. Missing and decayed teeth and malocclusion assessments were the outcomes. Gini coefficient and a novel Scale of Municipal Public Policies were the main exposure variables. Individual level covariates were used as controls in multilevel regressions. An increase from the lowest to the highest Gini value in Brazil was associated with an increase in the number of missing (rate ratio, RR=2.11 confidence interval 95% 1.18-3.77) and decayed teeth (RR=2.92 CI 95% 1.83-4.65). After adjustment for public policies and water fluoridation, the Gini effect was non-significant and public policies explained most of the variation in missing and decayed teeth. The public policy scale remained significant after adjustment with a rate ratio of 0.64 for missing and 0.72 for decayed teeth. Neither Gini nor public policies were significantly related to malocclusion. The public policy effect on missing and decayed teeth was stronger among those with higher education and income. Income inequality effect was explained mainly by public policies, which had an independent effect that was greater among the better-off. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.

  19. Evidence and Clinical Trials.

    NASA Astrophysics Data System (ADS)

    Goodman, Steven N.

    1989-11-01

    This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.

  20. The Neuroendocrine System and Stress, Emotions, Thoughts and Feelings**

    PubMed Central

    Vaillant, George E.

    2011-01-01

    The philosophy of mind is intimately connected with the philosophy of action. Therefore, concepts like free will, motivation, emotions (especially positive emotions), and also the ethical issues related to these concepts are of abiding interest. However, the concepts of consciousness and free will are usually discussed solely in linguistic, ideational and cognitive (i.e. “left brain”) terms. Admittedly, consciousness requires language and the left-brain, but the aphasic right brain is equally conscious; however, what it “hears” are more likely to be music and emotions. Joy can be as conscious as the conscious motivation produced by the left-brain reading a sign that says, “Danger mines!!” However, look in the index of a Western textbook of psychology, psychiatry or philosophy for positive emotions located in the limbic system. Notice how discussion of positive spiritual/emotional issues in consciousness and motivation are scrupulously ignored. For example, the popular notions of “love” being either Eros (raw, amoral instinct) or agape (noble, non-specific valuing of all other people) miss the motivational forest for the trees. Neither Eros (hypothalamic) nor agape (cortical) has a fraction of the power to relieve stress as attachment (limbic love), yet until the 1950s attachment was neither appreciated nor discussed by academic minds. This paper will point out that the prosocial, “spiritual” positive emotions like hope, faith, forgiveness, joy, compassion and gratitude are extremely important in the relief of stress and in regulation of the neuroendocrine system, protecting us against stress. The experimental work reviewed by Antonio Damasio and Barbara Fredrickson, and the clinical example of Alcoholics Anonymous, will be used to illustrate these points. PMID:21694965

  1. Methods for Estimating Kidney Disease Stage Transition Probabilities Using Electronic Medical Records

    PubMed Central

    Luo, Lola; Small, Dylan; Stewart, Walter F.; Roy, Jason A.

    2013-01-01

    Chronic diseases are often described by stages of severity. Clinical decisions about what to do are influenced by the stage, whether a patient is progressing, and the rate of progression. For chronic kidney disease (CKD), relatively little is known about the transition rates between stages. To address this, we used electronic health records (EHR) data on a large primary care population, which should have the advantage of having both sufficient follow-up time and sample size to reliably estimate transition rates for CKD. However, EHR data have some features that threaten the validity of any analysis. In particular, the timing and frequency of laboratory values and clinical measurements are not determined a priori by research investigators, but rather, depend on many factors, including the current health of the patient. We developed an approach for estimating CKD stage transition rates using hidden Markov models (HMMs), when the level of information and observation time vary among individuals. To estimate the HMMs in a computationally manageable way, we used a “discretization” method to transform daily data into intervals of 30 days, 90 days, or 180 days. We assessed the accuracy and computation time of this method via simulation studies. We also used simulations to study the effect of informative observation times on the estimated transition rates. Our simulation results showed good performance of the method, even when missing data are non-ignorable. We applied the methods to EHR data from over 60,000 primary care patients who have chronic kidney disease (stage 2 and above). We estimated transition rates between six underlying disease states. The results were similar for men and women. PMID:25848580

  2. Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information.

    PubMed

    Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor

    2013-01-01

    In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.

  3. Dental health state utility values associated with tooth loss in two contrasting cultures.

    PubMed

    Nassani, M Z; Locker, D; Elmesallati, A A; Devlin, H; Mohammadi, T M; Hajizamani, A; Kay, E J

    2009-08-01

    The study aimed to assess the value placed on oral health states by measuring the utility of mouths in which teeth had been lost and to explore variations in utility values within and between two contrasting cultures, UK and Iran. One hundred and fifty eight patients, 84 from UK and 74 from Iran, were recruited from clinics at University-based faculties of dentistry. All had experienced tooth loss and had restored or unrestored dental spaces. They were presented with 19 different scenarios of mouths with missing teeth. Fourteen involved the loss of one tooth and five involved shortened dental arches (SDAs) with varying numbers of missing posterior teeth. Each written description was accompanied by a verbal explanation and digital pictures of mouth models. Participants were asked to indicate on a standardized Visual Analogue Scale how they would value the health of their mouth if they had lost the tooth/teeth described and the resulting space was left unrestored. With a utility value of 0.0 representing the worst possible health state for a mouth and 1.0 representing the best, the mouth with the upper central incisor missing attracted the lowest utility value in both samples (UK = 0.16; Iran = 0.06), while the one with a missing upper second molar the highest utility values (0.42, 0.39 respectively). In both countries the utility value increased as the tooth in the scenario moved from the anterior towards the posterior aspect of the mouth. There were significant differences in utility values between UK and Iranian samples for four scenarios all involving the loss of anterior teeth. These differences remained after controlling for gender, age and the state of the dentition. With respect to the SDA scenarios, a mouth with a SDA with only the second molar teeth missing in all quadrants attracted the highest utility values, while a mouth with an extreme SDA with both missing molar and premolar teeth in all quadrants attracted the lowest utility values. The study provided further evidence of the validity of the scaling approach to utility measurement in mouths with missing teeth. Some cross-cultural variations in values were observed but these should be viewed with due caution because the magnitude of the differences was small.

  4. Missing value imputation for microarray data: a comprehensive comparison study and a web tool

    PubMed Central

    2013-01-01

    Background Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. Results In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. Conclusions In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses. PMID:24565220

  5. A comparison of model-based imputation methods for handling missing predictor values in a linear regression model: A simulation study

    NASA Astrophysics Data System (ADS)

    Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah

    2017-08-01

    In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.

  6. Settlers Unsettled: Using Field Schools and Digital Stories to Transform Geographies of Ignorance about Indigenous Peoples in Canada

    ERIC Educational Resources Information Center

    Castleden, Heather; Daley, Kiley; Sloan Morgan, Vanessa; Sylvestre, Paul

    2013-01-01

    Geography is a product of colonial processes, and in Canada, the exclusion from educational curricula of Indigenous worldviews and their lived realities has produced "geographies of ignorance". Transformative learning is an approach geographers can use to initiate changes in non-Indigenous student attitudes about Indigenous…

  7. Modeling Achievement Trajectories when Attrition Is Informative

    ERIC Educational Resources Information Center

    Feldman, Betsy J.; Rabe-Hesketh, Sophia

    2012-01-01

    In longitudinal education studies, assuming that dropout and missing data occur completely at random is often unrealistic. When the probability of dropout depends on covariates and observed responses (called "missing at random" [MAR]), or on values of responses that are missing (called "informative" or "not missing at random" [NMAR]),…

  8. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  9. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  10. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  11. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  12. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  13. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  14. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  15. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  16. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  17. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  18. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  19. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  20. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  1. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  2. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  3. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  4. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  5. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  6. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  7. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  8. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  9. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  10. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  11. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  12. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  13. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  14. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  15. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  16. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  17. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  18. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  19. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  20. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  1. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  2. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  3. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  4. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  5. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  6. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  7. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  8. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  9. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  10. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  11. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  12. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  13. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  14. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  15. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  16. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  17. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  18. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  19. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  20. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  1. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter must be used in the calculations as specified in paragraphs...

  2. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  3. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  4. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  5. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  6. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  7. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  8. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  9. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  10. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  11. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  12. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  13. Enhanced accessibility of ignored neutral and negative items in nonclinical dissociative individuals.

    PubMed

    Chiu, Chui-De

    2018-01-01

    While clinical studies showed paradoxical memory phenomena, including the intrusion and amnesia of stressful experiences that are features of dissociation, the results of laboratory studies on dissociative individuals' forgetting of experimental stimuli through cognitive control varied. Some studies demonstrated ineffective inhibition, and others found that dissociative individuals could remember fewer trauma words in a divided-attention context. Dissociative individuals may utilize superior cognitive disengagement to forget the representations. This hypothesis was tested in nonclinical individuals with high, medium, and low dissociation proneness. In the study phase, the participants learned several lists of experimental words and kept updating working memory by remembering the last four items on a list (target) and ignoring those non-target items. A recognition test was then conducted. The high dissociation group performed better on updating working memory. However, the accessibility of the representations of neutral and negative non-target items was elevated. Dissociative individuals disengaged attention effectively from items they intended to ignore, and the representations of the ignored items were more accessible when cues were available. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Working with Missing Values

    ERIC Educational Resources Information Center

    Acock, Alan C.

    2005-01-01

    Less than optimum strategies for missing values can produce biased estimates, distorted statistical power, and invalid conclusions. After reviewing traditional approaches (listwise, pairwise, and mean substitution), selected alternatives are covered including single imputation, multiple imputation, and full information maximum likelihood…

  15. How coagulation zone size is underestimated in computer modeling of RF ablation by ignoring the cooling phase just after RF power is switched off.

    PubMed

    Irastorza, Ramiro M; Trujillo, Macarena; Berjano, Enrique

    2017-11-01

    All the numerical models developed for radiofrequency ablation so far have ignored the possible effect of the cooling phase (just after radiofrequency power is switched off) on the dimensions of the coagulation zone. Our objective was thus to quantify the differences in the minor radius of the coagulation zone computed by including and ignoring the cooling phase. We built models of RF tumor ablation with 2 needle-like electrodes: a dry electrode (5 mm long and 17G in diameter) with a constant temperature protocol (70°C) and a cooled electrode (30 mm long and 17G in diameter) with a protocol of impedance control. We observed that the computed coagulation zone dimensions were always underestimated when the cooling phase was ignored. The mean values of the differences computed along the electrode axis were always lower than 0.15 mm for the dry electrode and 1.5 mm for the cooled electrode, which implied a value lower than 5% of the minor radius of the coagulation zone (which was 3 mm for the dry electrode and 30 mm for the cooled electrode). The underestimation was found to be dependent on the tissue characteristics: being more marked for higher values of specific heat and blood perfusion and less marked for higher values of thermal conductivity. Copyright © 2017 John Wiley & Sons, Ltd.

  16. Nature Disaster Risk Evaluation with a Group Decision Making Method Based on Incomplete Hesitant Fuzzy Linguistic Preference Relations.

    PubMed

    Tang, Ming; Liao, Huchang; Li, Zongmin; Xu, Zeshui

    2018-04-13

    Because the natural disaster system is a very comprehensive and large system, the disaster reduction scheme must rely on risk analysis. Experts' knowledge and experiences play a critical role in disaster risk assessment. The hesitant fuzzy linguistic preference relation is an effective tool to express experts' preference information when comparing pairwise alternatives. Owing to the lack of knowledge or a heavy workload, information may be missed in the hesitant fuzzy linguistic preference relation. Thus, an incomplete hesitant fuzzy linguistic preference relation is constructed. In this paper, we firstly discuss some properties of the additive consistent hesitant fuzzy linguistic preference relation. Next, the incomplete hesitant fuzzy linguistic preference relation, the normalized hesitant fuzzy linguistic preference relation, and the acceptable hesitant fuzzy linguistic preference relation are defined. Afterwards, three procedures to estimate the missing information are proposed. The first one deals with the situation in which there are only n-1 known judgments involving all the alternatives; the second one is used to estimate the missing information of the hesitant fuzzy linguistic preference relation with more known judgments; while the third procedure is used to deal with ignorance situations in which there is at least one alternative with totally missing information. Furthermore, an algorithm for group decision making with incomplete hesitant fuzzy linguistic preference relations is given. Finally, we illustrate our model with a case study about flood disaster risk evaluation. A comparative analysis is presented to testify the advantage of our method.

  17. The multiple imputation method: a case study involving secondary data analysis.

    PubMed

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  18. Apparatus And Method For Reconstructing Data Using Cross-Parity Stripes On Storage Media

    DOEpatents

    Hughes, James Prescott

    2003-06-17

    An apparatus and method for reconstructing missing data using cross-parity stripes on a storage medium is provided. The apparatus and method may operate on data symbols having sizes greater than a data bit. The apparatus and method makes use of a plurality of parity stripes for reconstructing missing data stripes. The parity symbol values in the parity stripes are used as a basis for determining the value of the missing data symbol in a data stripe. A correction matrix is shifted along the data stripes, correcting missing data symbols as it is shifted. The correction is performed from the outside data stripes towards the inner data stripes to thereby use previously reconstructed data symbols to reconstruct other missing data symbols.

  19. The vulnerability of being ill informed: the Trans-Pacific Partnership Agreement and Global Public Health.

    PubMed

    Greenberg, Henry; Shiau, Stephanie

    2014-09-01

    The Trans Pacific Partnership Agreement (TPPA) is a regional trade agreement currently being negotiated by 11 Pacific Rim countries, excluding China. While the negotiations are being conducted under a veil of secrecy, substantive leaks over the past 4 years have revealed a broad view of the proposed contents. As it stands the TPPA poses serious risks to global public health, particularly chronic, non-communicable diseases. At greatest risk are national tobacco regulations, regulations governing the emergence of generic drugs and controls over food imports by transnational corporations. Aside from a small group of public health professionals from Australia, the academic public health community has missed these threats to the global community, although many other health-related entities, international lawyers and health-conscious politicians have voiced serious concerns. As of mid-2014 there has been no comment in the leading public health journals. This large lacuna in interest or recognition reflects the larger problem that the public health education community has all but ignored global non-communicable diseases. Without such a focus, the risks are unseen and the threats not perceived. This cautionary tale of the TPPA reflects the vulnerability of being ill informed of contemporary realities. © The Author 2014. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Consequences of repeated discovery and benign neglect of non-interaction of waves (NIW)

    NASA Astrophysics Data System (ADS)

    Roychoudhuri, ChandraSekhar

    2017-08-01

    This paper presents the historical background behind the repeated discovery and repeated ignoring of the generic important property of all propagating waves, the Non-Interaction of Waves (NIW). The focus will be on the implications of NIW in most of the major optical phenomena with brief hints of importance. We argue that the prevailing postulate of wave-particle duality becomes unnecessary, once we accept NIW. Semi-classical model of treating light-matter interactions should be the preferred approach since the quantumness actually arises from within the structure of the energy levels (bands) in materials. Waves, and wave equations, do not support bullet-like propagation. We follow the historical trend starting from the tenth century physicist Alhazen, to the seventeenth century Newton and Huygens, then to the nineteenth century Young and Fresnel. Then we jump to twentieth century physicists Planck, Einstein, Bose, Dirac and Feynman. Had we recognized and appreciated NIW property of waves from the time of Alhazen, the evolutionary history of physics would have been dramatically different from what we have today. The prevailing dominance of the postulate of wave-particle duality is keeping us confused from seeking out actual reality; and hence, we should abandon this concept and search out better models. The paper demonstrates that NIW provides us with a platform for deeper understanding of the nature of EM waves that we have missed; it is not just semantics.

  1. Detailed budget analysis of HONO in central London reveals a missing daytime source

    NASA Astrophysics Data System (ADS)

    Lee, J. D.; Whalley, L. K.; Heard, D. E.; Stone, D.; Dunmore, R. E.; Hamilton, J. F.; Young, D. E.; Allan, J. D.; Laufs, S.; Kleffmann, J.

    2015-08-01

    Measurements of HONO were carried out at an urban background site near central London as part of the Clean air for London (ClearfLo) project in summer 2012. Data was collected from 22 July-18 August 2014, with peak values of up to 1.8 ppbV at night and non-zero values of between 0.2 and 0.6 ppbV seen during the day. A wide range of other gas phase, aerosol, radiation and meteorological measurements were made concurrently at the same site, allowing a detailed analysis of the chemistry to be carried out. The peak HONO/NOx ratio of 0.04 is seen at ~ 02:00 UTC, with the presence of a second, daytime peak in HONO/NOx of similar magnitude to the night-time peak suggesting a significant secondary daytime HONO source. A photostationary state calculation of HONO involving formation from the reaction of OH and NO and loss from photolysis, reaction with OH and dry deposition shows a significant underestimation during the day, with calculated values being close to zero, compared to the measurement average of 0.4 ppbV at midday. The addition of further HONO sources, including postulated formation from the reaction of HO2 with NO2 and photolysis of HNO3, increases the daytime modelled HONO to 0.1 ppbV, still leaving a significant extra daytime source. The missing HONO is plotted against a series of parameters including NO2 and OH reactivity, with little correlation seen. Much better correlation is observed with the product of these species with j(NO2), in particular NO2 and the product of NO2 with OH reactivity. This suggests the missing HONO source is in some way related to NO2 and also requires sunlight. The effect of the missing HONO to OH radical production is also investigated and it is shown that the model needs to be constrained to measured HONO in order to accurately reproduce the OH radical measurements.

  2. Implicit Valuation of the Near-Miss is Dependent on Outcome Context.

    PubMed

    Banks, Parker J; Tata, Matthew S; Bennett, Patrick J; Sekuler, Allison B; Gruber, Aaron J

    2018-03-01

    Gambling studies have described a "near-miss effect" wherein the experience of almost winning increases gambling persistence. The near-miss has been proposed to inflate the value of preceding actions through its perceptual similarity to wins. We demonstrate here, however, that it acts as a conditioned stimulus to positively or negatively influence valuation, dependent on reward expectation and cognitive engagement. When subjects are asked to choose between two simulated slot machines, near-misses increase valuation of machines with a low payout rate, whereas they decrease valuation of high payout machines. This contextual effect impairs decisions and persists regardless of manipulations to outcome feedback or financial incentive provided for good performance. It is consistent with proposals that near-misses cause frustration when wins are expected, and we propose that it increases choice stochasticity and overrides avoidance of low-valued options. Intriguingly, the near-miss effect disappears when subjects are required to explicitly value machines by placing bets, rather than choosing between them. We propose that this task increases cognitive engagement and recruits participation of brain regions involved in cognitive processing, causing inhibition of otherwise dominant systems of decision-making. Our results reveal that only implicit, rather than explicit strategies of decision-making are affected by near-misses, and that the brain can fluidly shift between these strategies according to task demands.

  3. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  4. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  5. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  6. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  7. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  8. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  9. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  10. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  11. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  12. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  13. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  14. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  15. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a) of this subpart cannot... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  16. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  17. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  18. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  19. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  20. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  1. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  2. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  3. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  4. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  5. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  6. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  7. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  8. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  9. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... all available process data or data used for accounting purposes. (b) For missing values related to the...

  10. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  11. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  12. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  13. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  14. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  15. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  16. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  17. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  18. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  19. Improving data sharing in research with context-free encoded missing data.

    PubMed

    Hoevenaar-Blom, Marieke P; Guillemont, Juliette; Ngandu, Tiia; Beishuizen, Cathrien R L; Coley, Nicola; Moll van Charante, Eric P; Andrieu, Sandrine; Kivipelto, Miia; Soininen, Hilkka; Brayne, Carol; Meiller, Yannick; Richard, Edo

    2017-01-01

    Lack of attention to missing data in research may result in biased results, loss of power and reduced generalizability. Registering reasons for missing values at the time of data collection, or-in the case of sharing existing data-before making data available to other teams, can save time and efforts, improve scientific value and help to prevent erroneous assumptions and biased results. To ensure that encoding of missing data is sufficient to understand the reason why data are missing, it should ideally be context-free. Therefore, 11 context-free codes of missing data were carefully designed based on three completed randomized controlled clinical trials and tested in a new randomized controlled clinical trial by an international team consisting of clinical researchers and epidemiologists with extended experience in designing and conducting trials and an Information System expert. These codes can be divided into missing due to participant and/or participation characteristics (n = 6), missing by design (n = 4), and due to a procedural error (n = 1). Broad implementation of context-free missing data encoding may enhance the possibilities of data sharing and pooling, thus allowing more powerful analyses using existing data.

  20. METHODS FOR CLUSTERING TIME SERIES DATA ACQUIRED FROM MOBILE HEALTH APPS.

    PubMed

    Tignor, Nicole; Wang, Pei; Genes, Nicholas; Rogers, Linda; Hershman, Steven G; Scott, Erick R; Zweig, Micol; Yvonne Chan, Yu-Feng; Schadt, Eric E

    2017-01-01

    In our recent Asthma Mobile Health Study (AMHS), thousands of asthma patients across the country contributed medical data through the iPhone Asthma Health App on a daily basis for an extended period of time. The collected data included daily self-reported asthma symptoms, symptom triggers, and real time geographic location information. The AMHS is just one of many studies occurring in the context of now many thousands of mobile health apps aimed at improving wellness and better managing chronic disease conditions, leveraging the passive and active collection of data from mobile, handheld smart devices. The ability to identify patient groups or patterns of symptoms that might predict adverse outcomes such as asthma exacerbations or hospitalizations from these types of large, prospectively collected data sets, would be of significant general interest. However, conventional clustering methods cannot be applied to these types of longitudinally collected data, especially survey data actively collected from app users, given heterogeneous patterns of missing values due to: 1) varying survey response rates among different users, 2) varying survey response rates over time of each user, and 3) non-overlapping periods of enrollment among different users. To handle such complicated missing data structure, we proposed a probability imputation model to infer missing data. We also employed a consensus clustering strategy in tandem with the multiple imputation procedure. Through simulation studies under a range of scenarios reflecting real data conditions, we identified favorable performance of the proposed method over other strategies that impute the missing value through low-rank matrix completion. When applying the proposed new method to study asthma triggers and symptoms collected as part of the AMHS, we identified several patient groups with distinct phenotype patterns. Further validation of the methods described in this paper might be used to identify clinically important patterns in large data sets with complicated missing data structure, improving the ability to use such data sets to identify at-risk populations for potential intervention.

  1. The Political Consequences of Latino Prejudice against Blacks

    PubMed Central

    Krupnikov, Yanna; Piston, Spencer

    2016-01-01

    A good deal of scholarship examines the effects of prejudice against blacks on public opinion and vote choice in the United States. Despite producing valuable insights, this research largely ignores the attitudes of Latinos—a critical omission, since Latinos constitute a rapidly growing share of the population. Using two nationally representative survey data sets, we find that the level of racial prejudice is comparable for Latinos and non-Hispanic whites. Equally comparable are associations between prejudice and political preferences: policy opinion and support for Obama in the 2008 presidential election. Our findings suggest that despite demographic changes, efforts to enact policies intended to assist blacks and elect black candidates will continue to be undermined by prejudice. That said, Latinos are more likely than non-Hispanic whites to support policies intended to assist blacks, because Latinos are more Democratic than non-Hispanic whites, more egalitarian, and less committed to the value of limited government. PMID:27274574

  2. Exploring Social Networking: Developing Critical Literacies

    ERIC Educational Resources Information Center

    Watson, Pauline

    2012-01-01

    While schools have been using computers within their classrooms for years now, there has been a purposeful ignoring of the growing power of social networks such as Facebook and Twitter. Many schools ban students from accessing and using sites such as Facebook at school and many English and literacy teachers ignore or deny their value as a teaching…

  3. Infilling and quality checking of discharge, precipitation and temperature data using a copula based approach

    NASA Astrophysics Data System (ADS)

    Anwar, Faizan; Bárdossy, András; Seidel, Jochen

    2017-04-01

    Estimating missing values in a time series of a hydrological variable is an everyday task for a hydrologist. Existing methods such as inverse distance weighting, multivariate regression, and kriging, though simple to apply, provide no indication of the quality of the estimated value and depend mainly on the values of neighboring stations at a given step in the time series. Copulas have the advantage of representing the pure dependence structure between two or more variables (given the relationship between them is monotonic). They rid us of questions such as transforming the data before use or calculating functions that model the relationship between the considered variables. A copula-based approach is suggested to infill discharge, precipitation, and temperature data. As a first step the normal copula is used, subsequently, the necessity to use non-normal / non-symmetrical dependence is investigated. Discharge and temperature are treated as regular continuous variables and can be used without processing for infilling and quality checking. Due to the mixed distribution of precipitation values, it has to be treated differently. This is done by assigning a discrete probability to the zeros and treating the rest as a continuous distribution. Building on the work of others, along with infilling, the normal copula is also utilized to identify values in a time series that might be erroneous. This is done by treating the available value as missing, infilling it using the normal copula and checking if it lies within a confidence band (5 to 95% in our case) of the obtained conditional distribution. Hydrological data from two catchments Upper Neckar River (Germany) and Santa River (Peru) are used to demonstrate the application for datasets with different data quality. The Python code used here is also made available on GitHub. The required input is the time series of a given variable at different stations.

  4. 29 CFR 4050.11 - Limitations.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... missing participants. (b) Limitation on benefit value. The total actuarial present value of all benefits... Relating to Labor (Continued) PENSION BENEFIT GUARANTY CORPORATION PLAN TERMINATIONS MISSING PARTICIPANTS § 4050.11 Limitations. (a) Exclusive benefit. The benefits provided for under this part will be the only...

  5. A Review of Missing Data Handling Methods in Education Research

    ERIC Educational Resources Information Center

    Cheema, Jehanzeb R.

    2014-01-01

    Missing data are a common occurrence in survey-based research studies in education, and the way missing values are handled can significantly affect the results of analyses based on such data. Despite known problems with performance of some missing data handling methods, such as mean imputation, many researchers in education continue to use those…

  6. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...

  7. 77 FR 3147 - Approval and Promulgation of Air Quality Implementation Plans; Delaware, New Jersey, and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-23

    ... monitors with missing data. Maximum recorded values are substituted for the missing data. The resulting... which the incomplete site is missing data. The linear regression relationship is based on time periods... between the monitors is used to fill in missing data for the incomplete monitor, so that the normal data...

  8. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...

  9. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(2), a complete record of all measured parameters... process data or data used for accounting purposes. (b) For missing values related to the CaO and MgO...

  10. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...

  11. Examining solutions to missing data in longitudinal nursing research.

    PubMed

    Roberts, Mary B; Sullivan, Mary C; Winchester, Suzy B

    2017-04-01

    Longitudinal studies are highly valuable in pediatrics because they provide useful data about developmental patterns of child health and behavior over time. When data are missing, the value of the research is impacted. The study's purpose was to (1) introduce a three-step approach to assess and address missing data and (2) illustrate this approach using categorical and continuous-level variables from a longitudinal study of premature infants. A three-step approach with simulations was followed to assess the amount and pattern of missing data and to determine the most appropriate imputation method for the missing data. Patterns of missingness were Missing Completely at Random, Missing at Random, and Not Missing at Random. Missing continuous-level data were imputed using mean replacement, stochastic regression, multiple imputation, and fully conditional specification (FCS). Missing categorical-level data were imputed using last value carried forward, hot-decking, stochastic regression, and FCS. Simulations were used to evaluate these imputation methods under different patterns of missingness at different levels of missing data. The rate of missingness was 16-23% for continuous variables and 1-28% for categorical variables. FCS imputation provided the least difference in mean and standard deviation estimates for continuous measures. FCS imputation was acceptable for categorical measures. Results obtained through simulation reinforced and confirmed these findings. Significant investments are made in the collection of longitudinal data. The prudent handling of missing data can protect these investments and potentially improve the scientific information contained in pediatric longitudinal studies. © 2017 Wiley Periodicals, Inc.

  12. A Comparison of Methods to Analyze Aquatic Heterotrophic Flagellates of Different Taxonomic Groups.

    PubMed

    Jeuck, Alexandra; Nitsche, Frank; Wylezich, Claudia; Wirth, Olaf; Bergfeld, Tanja; Brutscher, Fabienne; Hennemann, Melanie; Monir, Shahla; Scherwaß, Anja; Troll, Nicole; Arndt, Hartmut

    2017-08-01

    Heterotrophic flagellates contribute significantly to the matter flux in aquatic and terrestrial ecosystems. Still today their quantification and taxonomic classification bear several problems in field studies, though these methodological problems seem to be increasingly ignored in current ecological studies. Here we describe and test different methods, the live-counting technique, different fixation techniques, cultivation methods like the liquid aliquot method (LAM), and a molecular single cell survey called aliquot PCR (aPCR). All these methods have been tested either using aquatic field samples or cultures of freshwater and marine taxa. Each of the described methods has its advantages and disadvantages, which have to be considered in every single case. With the live-counting technique a detection of living cells up to morphospecies level is possible. Fixation of cells and staining methods are advantageous due to the possible long-term storage and observation of samples. Cultivation methods (LAM) offer the possibility of subsequent molecular analyses, and aPCR tools might complete the deficiency of LAM in terms of the missing detection of non-cultivable flagellates. In summary, we propose a combination of several investigation techniques reducing the gap between the different methodological problems. Copyright © 2017 Elsevier GmbH. All rights reserved.

  13. DESCRIBING LYMPHEDEMA IN FEMALES WITH TURNER SYNDROME.

    PubMed

    Rothbauer, J; Driver, S; Callender, L

    2015-09-01

    Turner syndrome (TS) is a chromosomal condition affecting an estimated 1 in 2,500 girls where the second X chromosome is missing, or partially formed. This abnormality affects multiple body systems and can lead to short stature, cardiac, neural, and renal abnormalities. Due to the chronic, non-life threatening nature of lymphedema in comparison to other symptoms of TS, it is often ignored by girls and women with TS and their physicians. Consequently, little is known about how lymphedema affects girls and women with TS across the lifespan. Therefore, the objective of the study was to deliver an online survey for females with TS and caregivers in the US, UK, and Canada to provide a worldwide perspective on their current experience with lymphedema within the spectrum of TS. There were 219 participants who completed the survey, and we were able to identify incidence and characteristics of lymphedema across the lifespan. In addition, we found that females with 45,X karyotyping were more likely to report lymphedema symptoms. Lymphedema is not the most significant concern of females with TS, but education, physician evaluation, and assistance with referrals for treatment and management would improve the ease of managing lymphedema in girls and women with TS.

  14. The systemic theory of living systems and relevance to CAM: the theory (Part III).

    PubMed

    Olalde Rangel, José A

    2005-09-01

    Western medical science lacks a solid philosophical and theoretical approach to disease cognition and therapeutics. My first two articles provided a framework for a humane medicine based on Modern Biophysics. Its precepts encompass modern therapeutics and CAM. Modern Biophysics and its concepts are presently missing in medicine, whether orthodox or CAM, albeit they probably provide the long sought explanation that bridges the abyss between East and West. Key points that differentiate Systemic from other systems' approaches are 'Intelligence', 'Energy' and the objective 'to survive'. The General System Theory (GST) took a forward step by proposing a departure from the mechanistic biological concept-of analyzing parts and processes in isolation-and brought us towards an organismic model. GST examines the system's components and results of their interaction. However, GST still does not go far enough. GST assumes 'Self-Organization' as a spontaneous phenomenon, ignoring a causative entity or central controller to all systems: Intelligence. It also neglects 'Survive' as the directional motivation common to any living system, and scarcely assigns 'Energy' its true inherent value. These three parameters, Intelligence, Energy and Survive, are vital variables to be considered, in our human quest, if we are to achieve a unified theory of life.

  15. Impact of Healthcare Information Technology on Nursing Practice.

    PubMed

    Piscotty, Ronald J; Kalisch, Beatrice; Gracey-Thomas, Angel

    2015-07-01

    To report additional mediation findings from a descriptive cross sectional study to examine if nurses' perceptions of the impact of healthcare information technology on their practice mediates the relationship between electronic nursing care reminder use and missed nursing care. The study used a descriptive design. The sample (N = 165) was composed of registered nurses working on acute care hospital units. The sample was obtained from a large teaching hospital in Southeast Michigan in the fall of 2012. All eligible nursing units (n = 19) were included. The MISSCARE Survey, Nursing Care Reminders Usage Survey, and the Impact of Healthcare Information Technology Scale were used to collect data to test for mediation. Mediation was tested using the method described by Baron and Kenny. Multiple regression equations were used to analyze the data to determine if mediation occurred between the variables. Missed nursing care, the outcome variable, was regressed on the predictor variable, reminder usage, and the mediator variable impact of technology on nursing practice. The impact of healthcare information technology (IHIT) on nursing practice negatively affected missed nursing care (t = -4.12, p < .001), explaining 9.8% of variance in missed nursing care. With IHIT present, the predictor (reminder usage) was no longer significant (t = -.70, p = .48). Thus, the reduced direct association between reminder usage and missed nursing care when IHIT was in the model supported the hypothesis that IHIT was at least one of the mediators in the relationship between reminder usage and missed nursing care. The perceptions of the impact of healthcare information technology mediates the relationship between nursing care reminder use and missed nursing care. The findings are beneficial to the advancement of healthcare technology in that designers of healthcare information technology systems need to keep in mind that perceptions regarding impacts of the technology will influence usage. Many times, information technology systems are not designed to match the workflow of nurses. Systems built with redundant or impertinent reminders may be ignored. System designers must study which reminders nurses find most useful and which reminders result in the best quality outcomes. © 2015 Sigma Theta Tau International.

  16. 29 CFR 4050.8 - Automatic lump sum.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... present value (determined as of the deemed distribution date under the missing participant lump sum... Relating to Labor (Continued) PENSION BENEFIT GUARANTY CORPORATION PLAN TERMINATIONS MISSING PARTICIPANTS § 4050.8 Automatic lump sum. This section applies to a missing participant whose designated benefit was...

  17. Full-Coverage High-Resolution Daily PM(sub 2.5) Estimation using MAIAC AOD in the Yangtze River Delta of China

    NASA Technical Reports Server (NTRS)

    Xiao, Qingyang; Wang, Yujie; Chang, Howard H.; Meng, Xia; Geng, Guannan; Lyapustin, Alexei Ivanovich; Liu, Yang

    2017-01-01

    Satellite aerosol optical depth (AOD) has been used to assess population exposure to fine particulate matter (PM (sub 2.5)). The emerging high-resolution satellite aerosol product, Multi-Angle Implementation of Atmospheric Correction(MAIAC), provides a valuable opportunity to characterize local-scale PM(sub 2.5) at 1-km resolution. However, non-random missing AOD due to cloud snow cover or high surface reflectance makes this task challenging. Previous studies filled the data gap by spatially interpolating neighboring PM(sub 2.5) measurements or predictions. This strategy ignored the effect of cloud cover on aerosol loadings and has been shown to exhibit poor performance when monitoring stations are sparse or when there is seasonal large-scale missngness. Using the Yangtze River Delta of China as an example, we present a Multiple Imputation (MI) method that combines the MAIAC high-resolution satellite retrievals with chemical transport model (CTM) simulations to fill missing AOD. A two-stage statistical model driven by gap-filled AOD, meteorology and land use information was then fitted to estimate daily ground PM(sub 2.5) concentrations in 2013 and 2014 at 1 km resolution with complete coverage in space and time. The daily MI models have an average R(exp 2) of 0.77, with an inter-quartile range of 0.71 to 0.82 across days. The overall Ml model 10-fold cross-validation R(exp 2) (root mean square error) were 0.81 (25 gm(exp 3)) and 0.73 (18 gm(exp 3)) for year 2013 and 2014, respectively. Predictions with only observational AOD or only imputed AOD showed similar accuracy.Comparing with previous gap-filling methods, our MI method presented in this study performed bette rwith higher coverage, higher accuracy, and the ability to fill missing PM(sub 2.5) predictions without ground PM(sub 2.5) measurements. This method can provide reliable PM(sub 2.5)predictions with complete coverage that can reduce biasin exposure assessment in air pollution and health studies.

  18. Striatal connectivity changes following gambling wins and near-misses: Associations with gambling severity.

    PubMed

    van Holst, Ruth J; Chase, Henry W; Clark, Luke

    2014-01-01

    Frontostriatal circuitry is implicated in the cognitive distortions associated with gambling behaviour. 'Near-miss' events, where unsuccessful outcomes are proximal to a jackpot win, recruit overlapping neural circuitry with actual monetary wins. Personal control over a gamble (e.g., via choice) is also known to increase confidence in one's chances of winning (the 'illusion of control'). Using psychophysiological interaction (PPI) analyses, we examined changes in functional connectivity as regular gamblers and non-gambling participants played a slot-machine game that delivered wins, near-misses and full-misses, and manipulated personal control. We focussed on connectivity with striatal seed regions, and associations with gambling severity, using voxel-wise regression. For the interaction term of near-misses (versus full-misses) by personal choice (participant-chosen versus computer-chosen), ventral striatal connectivity with the insula, bilaterally, was positively correlated with gambling severity. In addition, some effects for the contrast of wins compared to all non-wins were observed at an uncorrected (p < .001) threshold: there was an overall increase in connectivity between the striatal seeds and left orbitofrontal cortex and posterior insula, and a negative correlation for gambling severity with the connectivity between the right ventral striatal seed and left anterior cingulate cortex. These findings corroborate the 'non-categorical' nature of reward processing in gambling: near-misses and full-misses are objectively identical outcomes that are processed differentially. Ventral striatal connectivity with the insula correlated positively with gambling severity in the illusion of control contrast, which could be a risk factor for the cognitive distortions and loss-chasing that are characteristic of problem gambling.

  19. Non-completion and informed consent.

    PubMed

    Wertheimer, Alan

    2014-02-01

    There is a good deal of biomedical research that does not produce scientifically useful data because it fails to recruit a sufficient number of subjects. This fact is typically not disclosed to prospective subjects. In general, the guidance about consent concerns the information required to make intelligent self-interested decisions and ignores some of the information required for intelligent altruistic decisions. Bioethics has worried about the 'therapeutic misconception', but has ignored the 'completion misconception'. This article argues that, other things being equal, prospective subjects should be informed about the possibility of non-completion as part of the standard consent process if (1) it is or should be anticipatable that there is a non-trivial possibility of non-completion and (2) that information is likely to be relevant to a prospective subject's decision to consent. The article then considers several objections to the argument, including the objection that disclosing non-completion information would make recruitment even more difficult.

  20. Levels of analysis in neuroscientific studies of emotion: Comment on "The quartet theory of human emotions: an integrative and neurofunctional model" by S. Koelsch et al.

    NASA Astrophysics Data System (ADS)

    Kuiken, Don; Douglas, Shawn

    2015-06-01

    In the conduct of neuroscience research, methodological choices and theoretical claims often reveal underlying metamethodological and ontological commitments. Koelsch et al. [1] accentuate such commitments in their description of four "neuroanatomically distinct systems," each the substrate of "a specific class of affects" (p. 1). Explication of those classes of affect require theoretical integration across methodologically diverse disciplines, including "psychology, neurobiology, sociology, anthropology, and psycholinguistics" (p. 3). (Philosophy is noticeably missing from this list, but several aspects of the authors' stance indicate that it is not ignored.)

  1. Shanahan on symbolization.

    PubMed

    Lassègue, Jean

    2008-03-01

    In his article 'A New View of Language, Emotion and the Brain,' Dan Shanahan claims that the post-war Cognitive Turn focused mainly on information processing and that little attention was paid to the dramatic role played by emotion in human cognition. One key argument in his defence of a more comprehensive view of human cognition rests upon the idea that the process of symbolization--a unique capacity only developed by humans--combines, right from the start, information processing and feelings. The author argues that any theory ignoring this fact would miss the whole point, just as mainstream cognitive science has done since Noam Chomsky published Syntactic Structures, exactly 50 years ago.

  2. Meanings of being received and met by others as experienced by women with MS

    PubMed Central

    Olsson, Malin; Skär, Lisa; Söderberg, Siv

    2011-01-01

    In order to elucidate meanings of being received and met by others as experienced by women with multiple sclerosis (MS) we conducted a qualitative inquiry. We interviewed 15 women with MS and analysed the interviews with a phenomenological hermeneutic interpretation. The findings were presented in two themes: experiencing oneself as a valuable person and experiencing oneself as diminished. Meanings of being received and met by others, as experienced by women with MS, can be understood as containing two dimensions where treatment from others can mean recognising oneself through confirmation, as well as being ignored due to missing togetherness with others. PMID:21394245

  3. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... value shall be the best available estimate(s) of the parameter(s), based on all available process data or data used for accounting purposes. (c) For each missing value collected during the performance test (hourly CO2 concentration, stack gas volumetric flow rate, or average process vent flow from mine...

  4. Examining Solutions to Missing Data in Longitudinal Nursing Research

    PubMed Central

    Roberts, Mary B.; Sullivan, Mary C.; Winchester, Suzy B.

    2017-01-01

    Purpose Longitudinal studies are highly valuable in pediatrics because they provide useful data about developmental patterns of child health and behavior over time. When data are missing, the value of the research is impacted. The study’s purpose was to: (1) introduce a 3-step approach to assess and address missing data; (2) illustrate this approach using categorical and continuous level variables from a longitudinal study of premature infants. Methods A three-step approach with simulations was followed to assess the amount and pattern of missing data and to determine the most appropriate imputation method for the missing data. Patterns of missingness were Missing Completely at Random, Missing at Random, and Not Missing at Random. Missing continuous-level data were imputed using mean replacement, stochastic regression, multiple imputation, and fully conditional specification. Missing categorical-level data were imputed using last value carried forward, hot-decking, stochastic regression, and fully conditional specification. Simulations were used to evaluate these imputation methods under different patterns of missingness at different levels of missing data. Results The rate of missingness was 16–23% for continuous variables and 1–28% for categorical variables. Fully conditional specification imputation provided the least difference in mean and standard deviation estimates for continuous measures. Fully conditional specification imputation was acceptable for categorical measures. Results obtained through simulation reinforced and confirmed these findings. Practice Implications Significant investments are made in the collection of longitudinal data. The prudent handling of missing data can protect these investments and potentially improve the scientific information contained in pediatric longitudinal studies. PMID:28425202

  5. Principals' Values in School Administration

    ERIC Educational Resources Information Center

    Aslanargun, Engin

    2012-01-01

    School administration is value driven area depending on the emotions, cultures, and human values as well as technique and structure. Over the long years, educational administration throughout the world have experienced the influence of logical positivism that is based on rational techniques more than philosophical consideration, ignored values and…

  6. Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

    PubMed

    Taylor, Sandra L; Leiserowitz, Gary S; Kim, Kyoungmi

    2013-12-01

    Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

  7. Making the most of missing values : object clustering with partial data in astronomy

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Laidler, Victoria G.

    2004-01-01

    We demonstrate a clustering analysis algorithm, KSC, that a) uses all observed values and b) does not discard the partially observed objects. KSC uses soft constraints defined by the fully observed objects to assist in the grouping of objects with missing values. We present an analysis of objects taken from the Sloan Digital Sky Survey to demonstrate how imputing the values can be misleading and why the KSC approach can produce more appropriate results.

  8. A meta-data based method for DNA microarray imputation.

    PubMed

    Jörnsten, Rebecka; Ouyang, Ming; Wang, Hui-Yu

    2007-03-29

    DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available.

  9. Incomplete Data in Smart Grid: Treatment of Values in Electric Vehicle Charging Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Majipour, Mostafa; Chu, Peter; Gadh, Rajit

    2014-11-03

    In this paper, five imputation methods namely Constant (zero), Mean, Median, Maximum Likelihood, and Multiple Imputation methods have been applied to compensate for missing values in Electric Vehicle (EV) charging data. The outcome of each of these methods have been used as the input to a prediction algorithm to forecast the EV load in the next 24 hours at each individual outlet. The data is real world data at the outlet level from the UCLA campus parking lots. Given the sparsity of the data, both Median and Constant (=zero) imputations improved the prediction results. Since in most missing value casesmore » in our database, all values of that instance are missing, the multivariate imputation methods did not improve the results significantly compared to univariate approaches.« less

  10. Adolescent socioeconomic status and depressive symptoms in later life: Evidence from structural equation models.

    PubMed

    Pino, Elizabeth C; Damus, Karla; Jack, Brian; Henderson, David; Milanovic, Snezana; Kalesan, Bindu

    2018-01-01

    The complex association between socioeconomic status (SES) and depressive symptoms is not entirely understood and the existing literature does not address the relationship between early-life SES and later-life depression from a life-course perspective, incorporating mediating events. Using data from the Wisconsin Longitudinal Study, we employed structural equation modeling to examine how SES measured at age 18 affects depressive symptoms at age 54 directly and through mediating variables college graduation, marriage, and household income level at age 36. The total effect of adolescent SES on later-life depressive symptoms is largely mediated through college graduation. Our final model was driven by the effects of women. The variables contributing most to depressive symptoms in women were the direct effect of being raised in a home with a low SES and the indirect effect of low adolescent SES mediated through non-completion of college. Cohort was exclusively comprised of white, high school graduates born in 1939 (± 2 years). In our analysis we assume that missing values are missing at random (MAR); however, attrition both from death (excluded from our population) and from non-response could be associated with depression, i.e. missing not at random (MNAR). This study demonstrates the impact of completion of college, particularly among women, and supports the social mobility hypothesis to explain the relationship between adolescent socioeconomic circumstances and late-life health. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Homo Ignorans: Deliberately Choosing Not to Know.

    PubMed

    Hertwig, Ralph; Engel, Christoph

    2016-05-01

    Western history of thought abounds with claims that knowledge is valued and sought. Yet people often choose not to know. We call the conscious choice not to seek or use knowledge (or information) deliberate ignorance. Using examples from a wide range of domains, we demonstrate that deliberate ignorance has important functions. We systematize types of deliberate ignorance, describe their functions, discuss their normative desirability, and consider how they can be modeled. To date, psychologists have paid relatively little attention to the study of ignorance, let alone the deliberate kind. Yet the desire not to know is no anomaly. It is a choice to seek rather than reduce uncertainty whose reasons require nuanced cognitive and economic theories and whose consequences-for the individual and for society-require analyses of both actor and environment. © The Author(s) 2016.

  12. Estimating monthly streamflow values by cokriging

    USGS Publications Warehouse

    Solow, A.R.; Gorelick, S.M.

    1986-01-01

    Cokriging is applied to estimation of missing monthly streamflow values in three records from gaging stations in west central Virginia. Missing values are estimated from optimal consideration of the pattern of auto- and cross-correlation among standardized residual log-flow records. Investigation of the sensitivity of estimation to data configuration showed that when observations are available within two months of a missing value, estimation is improved by accounting for correlation. Concurrent and lag-one observations tend to screen the influence of other available observations. Three models of covariance structure in residual log-flow records are compared using cross-validation. Models differ in how much monthly variation they allow in covariance. Precision of estimation, reflected in mean squared error (MSE), proved to be insensitive to this choice. Cross-validation is suggested as a tool for choosing an inverse transformation when an initial nonlinear transformation is applied to flow values. ?? 1986 Plenum Publishing Corporation.

  13. Accountability, efficiency, and the "bottom line" in non-profit organizations.

    PubMed

    Cutt, J

    1982-01-01

    Financial reporting by non-profit organizations deals only with accountability for propriety and regularity, and ignores output measurement. The development of output measures of a physical or index nature offers a means of relating dollar costs to output in the form of cost-efficiency or cost-effectiveness measures, but does not provide any measure of the absolute value or worthwhileness of such programs. This fundamental absolute value question should be asked of all non-profit programs and documented to the greatest possible extent in budgetary submissions, and subsequent control and audit. In public sector non-profit programs, the posing of this question requires information on consumer demand other than in aggregative and imprecise form through the political process, and much improved information on the cost side. Eliciting demand information is feasible in the case of public programs with separable benefits by the use of a variety of pricing techniques, direct or imputed, whether or not the service in question is ultimately financed on a user-pay basis. The problem of eliciting demand is more difficult in the case of public goods, but improved demand information can be obtained, ideally by an approach such as the use of a Clarke tax. The argument can be extended to encompass questions of income distribution, stabilization, regulation and tax policy. Recent developments in program evaluation in the federal government are important, but remain deficient in failing to address the question of absolute value.

  14. Reuse of imputed data in microarray analysis increases imputation efficiency

    PubMed Central

    Kim, Ki-Yeol; Kim, Byoung-Jin; Yi, Gwan-Su

    2004-01-01

    Background The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. Results We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. Conclusions Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data. PMID:15504240

  15. Least-Squares Approximation of an Improper Correlation Matrix by a Proper One.

    ERIC Educational Resources Information Center

    Knol, Dirk L.; ten Berge, Jos M. F.

    1989-01-01

    An algorithm, based on a solution for C. I. Mosier's oblique Procrustes rotation problem, is presented for the best least-squares fitting correlation matrix approximating a given missing value or improper correlation matrix. Results are of interest for missing value and tetrachoric correlation, indefinite matrix correlation, and constrained…

  16. Assembly line inspection using neural networks

    NASA Astrophysics Data System (ADS)

    McAulay, Alastair D.; Danset, Paul; Wicker, Devert W.

    1990-09-01

    A user friendly flexible system for assembly line part inspection which learns good and bad parts is described. The system detects missing rivets and springs in clutch drivers. The system extracts features in a circular region of interest from a video image processes these using a Fast Fourier Transform for rotation invariance and uses this as inputs to a neural network trained with back-propagation. The advantage of a learning system is that expensive reprogramming and delays are avoided when a part is modified. Two cases were considered. The first one could use back lighting in that surface effects could be ignored. The second case required front lighting because the part had a cover which prevented light from passing through the parts. 100 percent classification of good and bad parts was achieved for both back-lit and front-lit cases with a limited number of training parts available. 1. BACKGROUND A vision system to inspect clutch drivers for missing rivets and springs at the Harrison Radiator Plant of General Motors (GM) works only on parts without covers Fig. 1 and is expensive. The system does not work when there are cover plates Fig. 2 that rule out back light passing through the area of missing rivets and springs. Also the system like all such systems must be reprogrammed at significant time and cost when the system needs to classify a different fault or a

  17. Comment on "falsification of the Atmospheric CO2 Greenhouse Effects Within the Frame of Physics"

    NASA Astrophysics Data System (ADS)

    Halpern, Joshua B.; Colose, Christopher M.; Ho-Stuart, Chris; Shore, Joel D.; Smith, Arthur P.; Zimmermann, Jörg

    In this journal, Gerhard Gerlich and Ralf D. Tscheuschner claim to have falsified the existence of an atmospheric greenhouse effect.1 Here, we show that their methods, logic, and conclusions are in error. Their most significant errors include trying to apply the Clausius statement of the Second Law of Thermodynamics to only one side of a heat transfer process rather than the entire process, and systematically ignoring most non-radiative heat flows applicable to the Earth's surface and atmosphere. They claim that radiative heat transfer from a colder atmosphere to a warmer surface is forbidden, ignoring the larger transfer in the other direction which makes the complete process allowed. Further, by ignoring heat capacity and non-radiative heat flows, they claim that radiative balance requires that the surface cool by 100 K or more at night, an obvious absurdity induced by an unphysical assumption. This comment concentrates on these two major points, while also taking note of some of Gerlich and Tscheuschner's other errors and misunderstandings.

  18. Value Contestations in Development Intervention: Community Development and Sustainable Livelihoods Approaches.

    ERIC Educational Resources Information Center

    Arce, Alberto

    2003-01-01

    Both community development and sustainable livelihood approaches ignore value contestations that underlie people's interests and experiences. A case from Bolivia demonstrates that local values, social relations, actions, and language strategies must underlie policy and method in development. (Contains 28 references.) (SK)

  19. Scanning and monitoring performance : effects of the reinforcement values of the events being monitored.

    DOT National Transportation Integrated Search

    1994-04-01

    We formulated a hypothesis suggesting that operators could make scanning and monitoring errors if they tended to concentrate on a "high value" display sub area while ignoring "low value" problems elsewhere on the display. Such "data" would have appli...

  20. Estimation of missing values in solar radiation data using piecewise interpolation methods: Case study at Penang city

    NASA Astrophysics Data System (ADS)

    Zainudin, Mohd Lutfi; Saaban, Azizan; Bakar, Mohd Nazari Abu

    2015-12-01

    The solar radiation values have been composed by automatic weather station using the device that namely pyranometer. The device is functions to records all the radiation values that have been dispersed, and these data are very useful for it experimental works and solar device's development. In addition, for modeling and designing on solar radiation system application is needed for complete data observation. Unfortunately, lack for obtained the complete solar radiation data frequently occur due to several technical problems, which mainly contributed by monitoring device. Into encountering this matter, estimation missing values in an effort to substitute absent values with imputed data. This paper aimed to evaluate several piecewise interpolation techniques likes linear, splines, cubic, and nearest neighbor into dealing missing values in hourly solar radiation data. Then, proposed an extendable work into investigating the potential used of cubic Bezier technique and cubic Said-ball method as estimator tools. As result, methods for cubic Bezier and Said-ball perform the best compare to another piecewise imputation technique.

  1. The Missing Data Assumptions of the NEAT Design and Their Implications for Test Equating

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Holland, Paul W.

    2010-01-01

    The Non-Equivalent groups with Anchor Test (NEAT) design involves "missing data" that are "missing by design." Three nonlinear observed score equating methods used with a NEAT design are the "frequency estimation equipercentile equating" (FEEE), the "chain equipercentile equating" (CEE), and the "item-response-theory observed-score-equating" (IRT…

  2. Double jeopardy, the equal value of lives and the veil of ignorance: a rejoinder to Harris.

    PubMed

    McKie, J; Kuhse, H; Richardson, J; Singer, P

    1996-08-01

    Harris levels two main criticisms against our original defence of QALYs (Quality Adjusted Life Years). First, he rejects the assumption implicit in the QALY approach that not all lives are of equal value. Second, he rejects our appeal to Rawls's veil of ignorance test in support of the QALY method. In the present article we defend QALYs against Harris's criticisms. We argue that some of the conclusions Harris draws from our view that resources should be allocated on the basis of potential improvements in quality of life and quantity of life are erroneous, and that others lack the moral implications Harris claims for them. On the other hand, we defend our claim that a rational egoist, behind a veil of ignorance, could consistently choose to allocate life-saving resources in accordance with the QALY method, despite Harris's claim that a rational egoist would allocate randomly if there is no better than a 50% chance of being the recipient.

  3. Double jeopardy, the equal value of lives and the veil of ignorance: a rejoinder to Harris.

    PubMed Central

    McKie, J; Kuhse, H; Richardson, J; Singer, P

    1996-01-01

    Harris levels two main criticisms against our original defence of QALYs (Quality Adjusted Life Years). First, he rejects the assumption implicit in the QALY approach that not all lives are of equal value. Second, he rejects our appeal to Rawls's veil of ignorance test in support of the QALY method. In the present article we defend QALYs against Harris's criticisms. We argue that some of the conclusions Harris draws from our view that resources should be allocated on the basis of potential improvements in quality of life and quantity of life are erroneous, and that others lack the moral implications Harris claims for them. On the other hand, we defend our claim that a rational egoist, behind a veil of ignorance, could consistently choose to allocate life-saving resources in accordance with the QALY method, despite Harris's claim that a rational egoist would allocate randomly if there is no better than a 50% chance of being the recipient. PMID:8863144

  4. A critical issue in model-based inference for studying trait-based community assembly and a solution.

    PubMed

    Ter Braak, Cajo J F; Peres-Neto, Pedro; Dray, Stéphane

    2017-01-01

    Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p -values (the p max test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the p max test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an "omitted variable bias" problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based p max test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time.

  5. Role of MRA in the detection of intracranial aneurysm in the acute phase of subarachnoid hemorrhage.

    PubMed

    Pierot, Laurent; Portefaix, Christophe; Rodriguez-Régent, Christine; Gallas, Sophie; Meder, Jean-François; Oppenheim, Catherine

    2013-07-01

    Magnetic resonance angiography (MRA) has been evaluated for the detection of unruptured intracranial aneurysms with favorable results at 3 Tesla (3T) and with similar diagnostic accuracy as both 3D time-of-flight (3D-TOF) and contrast-enhanced (CE-MRA) MRA. However, the diagnostic value and place of MRA in the detection of ruptured aneurysms has been little evaluated. Thus, the goal of this prospective single-center series was to assess the feasibility and diagnostic value of 3T 3D-TOF MRA and CE-MRA for aneurysm detection in acute non-traumatic subarachnoid hemorrhage (SAH). From March 2006 to December 2007, all consecutive patients admitted to our hospital with acute non-traumatic SAH (≤10 days) were prospectively included in this study evaluating MRA in the diagnostic workup of SAH. Feasibility of MRA and sensitivity/specificity of 3D-TOF and CE-MRA were assessed compared with gold standard DSA. In all, 84 consecutive patients (45 women, 39 men; age 23-86 years) were included. The feasibility of MRA was low (43/84, 51.2%). The reasons given for patients not undergoing magnetic resonance imaging (MRI) examination were clinical status (27 patients), potential delay in aneurysm treatment (11 patients) and contraindications to MRI (three patients). In patients explored by MRA, the sensitivity of CE-MRA (95%) was higher compared with 3D-TOF (86%) with similar specificity (80%). Also, 3D-TOF missed five aneurysms while CE-MRA missed two. The value of MRA in the diagnostic workup of ruptured aneurysms is limited due to its low feasibility during the acute phase of bleeding. Sensitivity for aneurysm detection was good for both MRA techniques, but tended to be better with CE-MRA. Copyright © 2013. Published by Elsevier Masson SAS.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Herraiz, Joaquin Lopez

    Experimental coincidence cross section and transverse-longitudinal asymmetry ATL have been obtained for the quasielastic (e,e'p) reaction in 16O, 12C, and {sup 208}Pb in constant q-ω kinematics in the missing momentum range -350 < p miss < 350 MeV/c. In these experiments, performed in experimental Hall A of the Thomas Jefferson National Accelerator Facility (JLAB), the beam energy and the momentum and angle of the scattered electrons were kept fixed, while the angle between the proton momentum and the momentum transfer q was varied in order to map out the missing momentum distribution. The experimental cross section and A TL asymmetrymore » have been compared with Monte Carlo simulations based on Distorted Wave Impulse Approximation (DWIA) calculations with both relativistic and non-relativistic spinor structure. The spectroscopic factors obtained for both models are in agreement with previous experimental values, while A TL measurements favor the relativistic DWIA calculation. This thesis describes the details of the experimental setup, the calibration of the spectrometers, the techniques used in the data analysis to derive the final cross sections and the A TL, the ingredients of the theoretical calculations employed and the comparison of the results with the simulations based on these theoretical models.« less

  7. Comparison of methods for dealing with missing values in the EPV-R.

    PubMed

    Paniagua, David; Amor, Pedro J; Echeburúa, Enrique; Abad, Francisco J

    2017-08-01

    The development of an effective instrument to assess the risk of partner violence is a topic of great social relevance. This study evaluates the scale of “Predicción del Riesgo de Violencia Grave Contra la Pareja” –Revisada– (EPV-R - Severe Intimate Partner Violence Risk Prediction Scale-Revised), a tool developed in Spain, which is facing the problem of how to treat the high rate of missing values, as is usual in this type of scale. First, responses to the EPV-R in a sample of 1215 male abusers who were reported to the police were used to analyze the patterns of occurrence of missing values, as well as the factor structure. Second, we analyzed the performance of various imputation methods using simulated data that emulates the missing data mechanism found in the empirical database. The imputation procedure originally proposed by the authors of the scale provides acceptable results, although the application of a method based on the Item Response Theory could provide greater accuracy and offers some additional advantages. Item Response Theory appears to be a useful tool for imputing missing data in this type of questionnaire.

  8. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    PubMed

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  9. Geo-Engineering Climate Change with Sulfate Aerosol

    NASA Astrophysics Data System (ADS)

    Rasch, P. J.; Crutzen, P. J.

    2006-12-01

    We explore the impact of injecting a precursor of sulfate aerosols into the middle atmosphere where they would act to increase the planetary albedo and thus counter some of the effects of greenhouse gase forcing. We use an atmospheric general circulation model (CAM, the Community Atmosphere Model) coupled to a slab ocean model for this study. Only physical effects are examined, that is we ignore the biogeochemical and chemical implications of changes to greenhouse gases and aerosols, and do not explore the important ethical, legal, and moral issues that are associated with deliberate geo-engineering efforts. The simulations suggest that the sulfate aerosol produced from the SO2 source in the stratosphere is sufficient to counterbalance most of the warming associated with the greenhouse gas forcing. Surface temperatures return to within a few tenths of a degree(K) of present day levels. Sea ice and precipitation distributions are also much closer to their present day values. The polar region surface temperatures remain 1-3 degrees warm in the winter hemisphere than present day values. This study is very preliminary. Only a subset of the relevant effects have been explored. The effect of such an injection of aerosols on middle atmospheric chemistry, and the effect on cirrus clouds are obvious missing components that merit scrutiny. There are probably others that should be considered. The injection of such aerosols cannot help in ameliorating the effects of CO2 changes on ocean PH, or other effects on the biogeochemistry of the earth system.

  10. Perceptions of a fluid consensus: uniqueness bias, false consensus, false polarization, and pluralistic ignorance in a water conservation crisis.

    PubMed

    Monin, Benoît; Norton, Michael I

    2003-05-01

    A 5-day field study (N = 415) during and right after a shower ban demonstrated multifaceted social projection and the tendency to draw personality inferences from simple behavior in a time of drastic consensus change. Bathers thought showering was more prevalent than did non-bathers (false consensus) and respondents consistently underestimated the prevalence of the desirable and common behavior--be it not showering during the shower ban or showering after the ban (uniqueness bias). Participants thought that bathers and non-bathers during the ban differed greatly in their general concern for the community, but self-reports demonstrated that this gap was illusory (false polarization). Finally, bathers thought other bathers cared less than they did, whereas non-bathers thought other non-bathers cared more than they did (pluralistic ignorance). The study captures the many biases at work in social perception in a time of social change.

  11. Values: Do We or Don't We Teach Them?

    ERIC Educational Resources Information Center

    Fraenkel, Jack R.

    Many teachers attempt to ignore value questions in the social studies classroom, emphasizing intellectual development alone. Through actions and selection of topics and materials, however, a teacher suggests that he believes in certain ideas and events and, therefore, teaches values. The key issue here is not whether values should be taught, but…

  12. Cognitive and Social Values.

    ERIC Educational Resources Information Center

    Machamer, Peter; Douglas, Heather

    1999-01-01

    Criticizes Hugh Lacey's separation of cognitive values and social values in discussions of the nature of science. Claims that attempting to distinguish between cognitive and social ignores crucial complexities in the development and use of knowledge. Proposes that the proper distinction be between legitimate and illegitimate reasons in science as…

  13. Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs.

    PubMed

    Cui, Licong; Bodenreider, Olivier; Shi, Jay; Zhang, Guo-Qiang

    2018-02-01

    We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT's IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the "Clinical Finding" and "Procedure" sub-hierarchies. Two domain experts confirmed 185 (among 223) suggested missing IS-A relations, a precision of 82.96%. Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation

    PubMed Central

    Bryndová, Michala; Kasari, Liis; Norberg, Anna; Weiss, Matthias; Bishop, Tom R.; Luke, Sarah H.; Sam, Katerina; Le Bagousse-Pinguet, Yoann; Lepš, Jan; Götzenberger, Lars; de Bello, Francesco

    2016-01-01

    Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data. PMID:26881747

  15. Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation.

    PubMed

    Májeková, Maria; Paal, Taavi; Plowman, Nichola S; Bryndová, Michala; Kasari, Liis; Norberg, Anna; Weiss, Matthias; Bishop, Tom R; Luke, Sarah H; Sam, Katerina; Le Bagousse-Pinguet, Yoann; Lepš, Jan; Götzenberger, Lars; de Bello, Francesco

    2016-01-01

    Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package "traitor" to facilitate assessments of missing trait data.

  16. Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example Using National Data on Drug Injection in Prisons

    PubMed Central

    Haji-Maghsoudi, Saiedeh; Haghdoost, Ali-akbar; Rastegari, Azam; Baneshi, Mohammad Reza

    2013-01-01

    Background: Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods: We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. Results: In scenario 2, bias in estimates was low and performances of all methods for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion: In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data. PMID:24596839

  17. Performance of bias-correction methods for exposure measurement error using repeated measurements with and without missing data.

    PubMed

    Batistatou, Evridiki; McNamee, Roseanne

    2012-12-10

    It is known that measurement error leads to bias in assessing exposure effects, which can however, be corrected if independent replicates are available. For expensive replicates, two-stage (2S) studies that produce data 'missing by design', may be preferred over a single-stage (1S) study, because in the second stage, measurement of replicates is restricted to a sample of first-stage subjects. Motivated by an occupational study on the acute effect of carbon black exposure on respiratory morbidity, we compare the performance of several bias-correction methods for both designs in a simulation study: an instrumental variable method (EVROS IV) based on grouping strategies, which had been recommended especially when measurement error is large, the regression calibration and the simulation extrapolation methods. For the 2S design, either the problem of 'missing' data was ignored or the 'missing' data were imputed using multiple imputations. Both in 1S and 2S designs, in the case of small or moderate measurement error, regression calibration was shown to be the preferred approach in terms of root mean square error. For 2S designs, regression calibration as implemented by Stata software is not recommended in contrast to our implementation of this method; the 'problematic' implementation of regression calibration although substantially improved with use of multiple imputations. The EVROS IV method, under a good/fairly good grouping, outperforms the regression calibration approach in both design scenarios when exposure mismeasurement is severe. Both in 1S and 2S designs with moderate or large measurement error, simulation extrapolation severely failed to correct for bias. Copyright © 2012 John Wiley & Sons, Ltd.

  18. Inter-individual variation in expression: a missing link in biomarker biology?

    PubMed

    Little, Peter F R; Williams, Rohan B H; Wilkins, Marc R

    2009-01-01

    The past decade has seen an explosion of variation data demonstrating that diversity of both protein-coding sequences and of regulatory elements of protein-coding genes is common and of functional importance. In this article, we argue that genetic diversity can no longer be ignored in studies of human biology, even research projects without explicit genetic experimental design, and that this knowledge can, and must, inform research. By way of illustration, we focus on the potential role of genetic data in case-control studies to identify and validate cancer protein biomarkers. We argue that a consideration of genetics, in conjunction with proteomic biomarker discovery projects, should improve the proportion of biomarkers that can accurately classify patients.

  19. Integrating multiple publics into the strategic plan. The best plans can be derailed without comprehensive up-front research.

    PubMed

    Peltier, J W; Kleimenhagen, A K; Naidu, G M

    1996-01-01

    The mission of a health care organization represents its vision for the future. The authors present an approach used to develop an organizational mission for a large multispecialty physician clinic. In implementing the strategic planning process, research objectives must be clearly stated that identify in advance how the data will be used. Failure to integrate strategic data from all relevant publics will likely result in a mission statement that misses the significant interests of one or more stakeholders and reduces the effectiveness of the strategic planning process. Although costly, comprehensive research can uncover some surprising differences in perception that, if ignored, might complete defeat strategic planning efforts.

  20. Anterior Cutaneous Nerve Entrapment Syndrome in a Pediatric Patient Previously Diagnosed With Functional Abdominal Pain: A Case Report.

    PubMed

    DiGiusto, Matthew; Suleman, M-Irfan

    2018-03-23

    Chronic abdominal pain is common in children and adolescents but challenging to diagnose, because practitioners may be concerned about missing serious occult disease. Abdominal wall pain is an often ignored etiology for chronic abdominal pain. Anterior cutaneous nerve entrapment syndrome causes abdominal wall pain but is frequently overlooked. Correctly diagnosing patients with anterior cutaneous nerve entrapment syndrome is important because nerve block interventions are highly successful in the remittance of pain. Here, we present the case of a pediatric patient who received a diagnosis of functional abdominal pain but experienced pain remittance after receiving a trigger-point injection and transverse abdominis plane block.

  1. A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

    PubMed

    Liu, Wei; Ding, Jinhui

    2018-04-01

    The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.

  2. Analysis of variance calculations for irregular experiments

    Treesearch

    Jonathan W. Wright

    1977-01-01

    Irregular experiments may be more useful than much smaller regular experiments and can be analyzed statistically without undue expenditure of time. For a few missing plots, standard methods of calculating missing-plot values can be used. For more missing plots (up to 10 percent), seedlot means or randomly chosen plot means of the same seedlot can be substituted for...

  3. Handling Missing Data in Structural Equation Models in R: A Replication Study for Applied Researchers

    ERIC Educational Resources Information Center

    Wolgast, Anett; Schwinger, Malte; Hahnel, Carolin; Stiensmeier-Pelster, Joachim

    2017-01-01

    Introduction: Multiple imputation (MI) is one of the most highly recommended methods for replacing missing values in research data. The scope of this paper is to demonstrate missing data handling in SEM by analyzing two modified data examples from educational psychology, and to give practical recommendations for applied researchers. Method: We…

  4. A Note on the Use of Missing Auxiliary Variables in Full Information Maximum Likelihood-Based Structural Equation Models

    ERIC Educational Resources Information Center

    Enders, Craig K.

    2008-01-01

    Recent missing data studies have argued in favor of an "inclusive analytic strategy" that incorporates auxiliary variables into the estimation routine, and Graham (2003) outlined methods for incorporating auxiliary variables into structural equation analyses. In practice, the auxiliary variables often have missing values, so it is reasonable to…

  5. A Primer for Handling Missing Values in the Analysis of Education and Training Data

    ERIC Educational Resources Information Center

    Gemici, Sinan; Bednarz, Alice; Lim, Patrick

    2012-01-01

    Quantitative research in vocational education and training (VET) is routinely affected by missing or incomplete information. However, the handling of missing data in published VET research is often sub-optimal, leading to a real risk of generating results that can range from being slightly biased to being plain wrong. Given that the growing…

  6. SPSS Syntax for Missing Value Imputation in Test and Questionnaire Data

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries

    2005-01-01

    A well-known problem in the analysis of test and questionnaire data is that some item scores may be missing. Advanced methods for the imputation of missing data are available, such as multiple imputation under the multivariate normal model and imputation under the saturated logistic model (Schafer, 1997). Accompanying software was made available…

  7. Missed hepatitis b/c or syphilis diagnosis among Kurdish, Russian, and Somali origin migrants in Finland: linking a population-based survey to the national infectious disease register.

    PubMed

    Tiittala, Paula; Ristola, Matti; Liitsola, Kirsi; Ollgren, Jukka; Koponen, Päivikki; Surcel, Heljä-Marja; Hiltunen-Back, Eija; Davidkin, Irja; Kivelä, Pia

    2018-03-20

    Migrants are considered a key population at risk for sexually transmitted and blood-borne diseases in Europe. Prevalence data to support the design of infectious diseases screening protocols are scarce. We aimed to estimate the prevalence of hepatitis B and C, human immunodefiency virus (HIV) infection and syphilis in specific migrant groups in Finland and to assess risk factors for missed diagnosis. A random sample of 3000 Kurdish, Russian, or Somali origin migrants in Finland was invited to a migrant population-based health interview and examination survey during 2010-2012. Participants in the health examination were offered screening for hepatitis B and C, HIV and syphilis. Notification prevalence in the National Infectious Diseases Register (NIDR) was compared between participants and non-participants to assess non-participation. Missed diagnosis was defined as test-positive case in the survey without previous notification in NIDR. Inverse probability weighting was used to correct for non-participation. Altogether 1000 migrants were screened for infectious diseases. No difference in the notification prevalence among participants and non-participants was observed. Seroprevalence of hepatitis B surface antigen (HBsAg) was 2.3%, hepatitis C antibodies 1.7%, and Treponema pallidum antibodies 1.3%. No cases of HIV were identified. Of all test-positive cases, 61% (34/56) had no previous notification in NIDR. 48% of HBsAg, 62.5% of anti-HCV and 84.6% of anti-Trpa positive cases had been missed. Among the Somali population (n = 261), prevalence of missed hepatitis B diagnosis was 3.0%. Of the 324 Russian migrants, 3.0% had not been previously diagnosed with hepatitis C and 2.4% had a missed syphilis diagnosis. In multivariable regression model missed diagnosis was associated with migrant origin, living alone, poor self-perceived health, daily smoking, and previous diagnosis of another blood-borne infection. More than half of chronic hepatitis and syphilis diagnoses had been missed among migrants in Finland. Undiagnosed hepatitis B among Somali migrants implies post-migration transmission that could be prevented by enhanced screening and vaccinations. Rate of missed diagnoses among Russian migrants supports implementation of targeted hepatitis and syphilis screening upon arrival and also in later health care contacts. Coverage and up-take of current screening among migrants should be evaluated.

  8. Nonparametric Multiple Imputation for Questionnaires with Individual Skip Patterns and Constraints: The Case of Income Imputation in The National Educational Panel Study

    ERIC Educational Resources Information Center

    Aßmann, Christian; Würbach, Ariane; Goßmann, Solange; Geissler, Ferdinand; Bela, Anika

    2017-01-01

    Large-scale surveys typically exhibit data structures characterized by rich mutual dependencies between surveyed variables and individual-specific skip patterns. Despite high efforts in fieldwork and questionnaire design, missing values inevitably occur. One approach for handling missing values is to provide multiply imputed data sets, thus…

  9. Relying on Your Own Best Judgment: Imputing Values to Missing Information in Decision Making.

    ERIC Educational Resources Information Center

    Johnson, Richard D.; And Others

    Processes involved in making estimates of the value of missing information that could help in a decision making process were studied. Hypothetical purchases of ground beef were selected for the study as such purchases have the desirable property of quantifying both the price and quality. A total of 150 students at the University of Iowa rated the…

  10. Microarray missing data imputation based on a set theoretic framework and biological knowledge.

    PubMed

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2006-01-01

    Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.

  11. Restrained eaters show altered brain response to food odor.

    PubMed

    Kemmotsu, Nobuko; Murphy, Claire

    2006-02-28

    Do restrained and unrestrained eaters differ in their brain response to food odor? We addressed this question by examining restrained eaters' brain response to food (chocolate) and non-food (geraniol, floral) odors, both when odor was attended to and when ignored. Using olfactory event-related potentials (OERPs), we found that restrained eaters and controls responded similarly to the non-food odor; however, unlike controls, restrained eaters showed no increase in brain response to the food odor when they focused attention on it. Rather, restrained eaters showed attenuated OERP amplitudes to the food odor in both attended and ignored conditions, suggesting that the brain's response to attended food odor was abnormally suppressed.

  12. Reply to "comment on 'falsification of the Atmospheric CO2 Greenhouse Effects Within the Frame of Physics' by Joshua B. Halpern, Christopher M. Colose, Chris Ho-Stuart Joel D. Shore, Arthur P. Smith, JÖRG Zimmermann"

    NASA Astrophysics Data System (ADS)

    Gerlich, Gerhard; Tscheuschner, Ralf D.

    It is shown that the notorious claim by Halpern et al. recently repeated in their comment that the method, logic, and conclusions of our "Falsification Of The CO2 Greenhouse Effects Within The Frame Of Physics" would be in error has no foundation. Since Halpern et al. communicate our arguments incorrectly, their comment is scientifically vacuous. In particular, it is not true that we are "trying to apply the Clausius statement of the Second Law of Thermodynamics to only one side of a heat transfer process rather than the entire process" and that we are "systematically ignoring most non-radiative heat flows applicable to Earth's surface and atmosphere". Rather, our falsification paper discusses the violation of fundamental physical and mathematical principles in 14 examples of common pseudo-derivations of fictitious greenhouse effects that are all based on simplistic pictures of radiative transfer and their obscure relation to thermodynamics, including but not limited to those descriptions (a) that define a "Perpetuum Mobile Of The 2nd Kind", (b) that rely on incorrectly calculated averages of global temperatures, (c) that refer to incorrectly normalized spectra of electromagnetic radiation. Halpern et al. completely missed an exceptional chance to formulate a scientifically well-founded antithesis. They do not even define a greenhouse effect that they wish to defend. We take the opportunity to clarify some misunderstandings, which are communicated in the current discussion on the non-measurable, i.e., physically non-existing influence of the trace gas CO2 on the climates of the Earth.

  13. Patient Relationship Management: What the U.S. Healthcare System Can Learn from Other Industries.

    PubMed

    Poku, Michael K; Behkami, Nima A; Bates, David W

    2017-01-01

    As the U.S. healthcare system moves to value-based care, the importance of engaging patients and families continues to intensify. However, simply engaging patients and families to improve their subjective satisfaction will not be enough for providers who want to maximize value. True optimization entails developing deep and long-term relationships with patients. We suggest that healthcare organizations must build such a discipline of "patient relationship management" (PRM) just as companies in non-healthcare industries have done with the concept of customer relationship management (CRM). Some providers have already made strides in this area, but overall it has been underemphasized or ignored by most healthcare systems to date. As healthcare providers work to develop their dedicated PRM systems, tools, and processes, we suggest they may benefit from emulating companies in other industries who have been able to engage their customers in innovative ways while acknowledging the differences between healthcare and other industries.

  14. Latent Factor Models and Analyses for Operator Response Times

    DTIC Science & Technology

    1990-09-01

    since imputation is based on presumption of model correctness, and the missing and non-standard values are imputed using the presumed model. The same...oN 0’a w’ w0. NI. 0. . INOC - .I O CO.1 0-CN NO.ION I.3-.03.-.. 0 0 0 0 oOO , 0 0 0 - w0 0 0 0 0 0 In 0 0 0 0 0 0 0 w 0 0 0 0 0 0 0 0 0 0 0 0aO . o w

  15. EFFECTS OF NON-CIRCULAR MOTIONS ON AZIMUTHAL COLOR GRADIENTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinez-Garcia, Eric E.; Gonzalez-Lopezlira, Rosa A.; Gomez, Gilberto C., E-mail: emartinez@cida.v, E-mail: r.gonzalez@crya.unam.m, E-mail: g.gomez@crya.unam.m

    2009-12-20

    Assuming that density waves trigger star formation, and that young stars preserve the velocity components of the molecular gas where they are born, we analyze the effects that non-circular gas orbits have on color gradients across spiral arms. We try two approaches, one involving semianalytical solutions for spiral shocks, and another with magnetohydrodynamic (MHD) numerical simulation data. We find that, if non-circular motions are ignored, the comparison between observed color gradients and stellar population synthesis models would in principle yield pattern speed values that are systematically too high for regions inside corotation, with the difference between the real and themore » measured pattern speeds increasing with decreasing radius. On the other hand, image processing and pixel averaging result in systematically lower measured spiral pattern speed values, regardless of the kinematics of stellar orbits. The net effect is that roughly the correct pattern speeds are recovered, although the trend of higher measured OMEGA{sub p} at lower radii (as expected when non-circular motions exist but are neglected) should still be observed. We examine the MartInez-GarcIa et al. photometric data and confirm that this is indeed the case. The comparison of the size of the systematic pattern speed offset in the data with the predictions of the semianalytical and MHD models corroborates that spirals are more likely to end at outer Lindblad resonance, as these authors had already found.« less

  16. Missing data and multiple imputation in clinical epidemiological research.

    PubMed

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

  17. Missing data and multiple imputation in clinical epidemiological research

    PubMed Central

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data. PMID:28352203

  18. Striatal connectivity changes following gambling wins and near-misses: Associations with gambling severity

    PubMed Central

    van Holst, Ruth J.; Chase, Henry W.; Clark, Luke

    2014-01-01

    Frontostriatal circuitry is implicated in the cognitive distortions associated with gambling behaviour. ‘Near-miss’ events, where unsuccessful outcomes are proximal to a jackpot win, recruit overlapping neural circuitry with actual monetary wins. Personal control over a gamble (e.g., via choice) is also known to increase confidence in one's chances of winning (the ‘illusion of control’). Using psychophysiological interaction (PPI) analyses, we examined changes in functional connectivity as regular gamblers and non-gambling participants played a slot-machine game that delivered wins, near-misses and full-misses, and manipulated personal control. We focussed on connectivity with striatal seed regions, and associations with gambling severity, using voxel-wise regression. For the interaction term of near-misses (versus full-misses) by personal choice (participant-chosen versus computer-chosen), ventral striatal connectivity with the insula, bilaterally, was positively correlated with gambling severity. In addition, some effects for the contrast of wins compared to all non-wins were observed at an uncorrected (p < .001) threshold: there was an overall increase in connectivity between the striatal seeds and left orbitofrontal cortex and posterior insula, and a negative correlation for gambling severity with the connectivity between the right ventral striatal seed and left anterior cingulate cortex. These findings corroborate the ‘non-categorical’ nature of reward processing in gambling: near-misses and full-misses are objectively identical outcomes that are processed differentially. Ventral striatal connectivity with the insula correlated positively with gambling severity in the illusion of control contrast, which could be a risk factor for the cognitive distortions and loss-chasing that are characteristic of problem gambling. PMID:25068112

  19. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... data shall be the best available estimate based on all available process data or data used for accounting purposes (such as sales records). (b) For missing values related to the performance test...

  20. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

    PubMed

    Peyre, Hugo; Leplège, Alain; Coste, Joël

    2011-03-01

    Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.

  1. Top-quark pair plus large missing energy at the LHC

    NASA Astrophysics Data System (ADS)

    Han, Tao; Mahbubani, Rakhi; Walker, Devin G. E.; Wang, Lian-Tao

    2009-05-01

    We study methods of extracting new physics signals in final states with a top-quark pair plus large missing energy at the LHC. We consider two typical examples of such new physics: pair production of a fermionic top partner (a T' in Little Higgs models for example) and of a scalar top partner (a tilde t in SUSY). With a commonly-adopted discrete symmetry under which non Standard Model particles are odd, the top partner is assumed to decay predominantly to a top quark plus a massive neutral stable particle A0. We focus on the case in which one of the top quarks decays leptonically and the other decays hadronically, pp → tbar tA0A0X → bj1j2 bar bl-bar nu A0A0X + c.c., where the A0s escape detection. We identify a key parameter for the signal observation: the mass splitting between the top partner and the missing particle. We reconstruct a transverse mass for the lepton-missing transverse energy system to separate the real W background from the signal and propose a definition for the reconstructed top quark mass that allows it to take unphysical values as an indication of new physics. We perform a scan over the two masses to map out the discovery reach at the LHC in this channel. We also comment on the possibility of distinguishing between scalar and fermionic top partners using collider signatures.

  2. The Systemic Theory of Living Systems and Relevance to CAM: the Theory (Part III)

    PubMed Central

    2005-01-01

    Western medical science lacks a solid philosophical and theoretical approach to disease cognition and therapeutics. My first two articles provided a framework for a humane medicine based on Modern Biophysics. Its precepts encompass modern therapeutics and CAM. Modern Biophysics and its concepts are presently missing in medicine, whether orthodox or CAM, albeit they probably provide the long sought explanation that bridges the abyss between East and West. Key points that differentiate Systemic from other systems' approaches are ‘Intelligence’, ‘Energy’ and the objective ‘to survive’. The General System Theory (GST) took a forward step by proposing a departure from the mechanistic biological concept—of analyzing parts and processes in isolation—and brought us towards an organismic model. GST examines the system's components and results of their interaction. However, GST still does not go far enough. GST assumes ‘Self-Organization’ as a spontaneous phenomenon, ignoring a causative entity or central controller to all systems: Intelligence. It also neglects ‘Survive’ as the directional motivation common to any living system, and scarcely assigns ‘Energy’ its true inherent value. These three parameters, Intelligence, Energy and Survive, are vital variables to be considered, in our human quest, if we are to achieve a unified theory of life. PMID:16136205

  3. Calibration of Safecast dose rate measurements.

    PubMed

    Cervone, Guido; Hultquist, Carolynne

    2018-10-01

    A methodology is presented to calibrate contributed Safecast dose rate measurements acquired between 2011 and 2016 in the Fukushima prefecture of Japan. The Safecast data are calibrated using observations acquired by the U.S. Department of Energy at the time of the 2011 Fukushima Daiichi power plant nuclear accident. The methodology performs a series of interpolations between the U.S. government and contributed datasets at specific temporal windows and at corresponding spatial locations. The coefficients found for all the different temporal windows are aggregated and interpolated using quadratic regressions to generate a time dependent calibration function. Normal background radiation, decay rates, and missing values are taken into account during the analysis. Results show that the standard Safecast static transformation function overestimates the official measurements because it fails to capture the presence of two different Cesium isotopes and their changing magnitudes with time. A model is created to predict the ratio of the isotopes from the time of the accident through 2020. The proposed time dependent calibration takes into account this Cesium isotopes ratio, and it is shown to reduce the error between U.S. government and contributed data. The proposed calibration is needed through 2020, after which date the errors introduced by ignoring the presence of different isotopes will become negligible. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Integrated modelling of H-mode pedestal and confinement in JET-ILW

    NASA Astrophysics Data System (ADS)

    Saarelma, S.; Challis, C. D.; Garzotti, L.; Frassinetti, L.; Maggi, C. F.; Romanelli, M.; Stokes, C.; Contributors, JET

    2018-01-01

    A pedestal prediction model Europed is built on the existing EPED1 model by coupling it with core transport simulation using a Bohm-gyroBohm transport model to self-consistently predict JET-ILW power scan for hybrid plasmas that display weaker power degradation than the IPB98(y, 2) scaling of the energy confinement time. The weak power degradation is reproduced in the coupled core-pedestal simulation. The coupled core-pedestal model is further tested for a 3.0 MA plasma with the highest stored energy achieved in JET-ILW so far, giving a prediction of the stored plasma energy within the error margins of the measured experimental value. A pedestal density prediction model based on the neutral penetration is tested on a JET-ILW database giving a prediction with an average error of 17% from the experimental data when a parameter taking into account the fuelling rate is added into the model. However the model fails to reproduce the power dependence of the pedestal density implying missing transport physics in the model. The future JET-ILW deuterium campaign with increased heating power is predicted to reach plasma energy of 11 MJ, which would correspond to 11-13 MW of fusion power in equivalent deuterium-tritium plasma but with isotope effects on pedestal stability and core transport ignored.

  5. Considerations of multiple imputation approaches for handling missing data in clinical trials.

    PubMed

    Quan, Hui; Qi, Li; Luo, Xiaodong; Darchy, Loic

    2018-07-01

    Missing data exist in all clinical trials and missing data issue is a very serious issue in terms of the interpretability of the trial results. There is no universally applicable solution for all missing data problems. Methods used for handling missing data issue depend on the circumstances particularly the assumptions on missing data mechanisms. In recent years, if the missing at random mechanism cannot be assumed, conservative approaches such as the control-based and returning to baseline multiple imputation approaches are applied for dealing with the missing data issues. In this paper, we focus on the variability in data analysis of these approaches. As demonstrated by examples, the choice of the variability can impact the conclusion of the analysis. Besides the methods for continuous endpoints, we also discuss methods for binary and time to event endpoints as well as consideration for non-inferiority assessment. Copyright © 2018. Published by Elsevier Inc.

  6. Impact of Missing Data for Body Mass Index in an Epidemiologic Study.

    PubMed

    Razzaghi, Hilda; Tinker, Sarah C; Herring, Amy H; Howards, Penelope P; Waller, D Kim; Johnson, Candice Y

    2016-07-01

    Objective To assess the potential impact of missing data on body mass index (BMI) on the association between prepregnancy obesity and specific birth defects. Methods Data from the National Birth Defects Prevention Study (NBDPS) were analyzed. We assessed the factors associated with missing BMI data among mothers of infants without birth defects. Four analytic methods were then used to assess the impact of missing BMI data on the association between maternal prepregnancy obesity and three birth defects; spina bifida, gastroschisis, and cleft lip with/without cleft palate. The analytic methods were: (1) complete case analysis; (2) assignment of missing values to either obese or normal BMI; (3) multiple imputation; and (4) probabilistic sensitivity analysis. Logistic regression was used to estimate crude and adjusted odds ratios (aOR) and 95 % confidence intervals (CI). Results Of NBDPS control mothers 4.6 % were missing BMI data, and most of the missing values were attributable to missing height (~90 %). Missing BMI data was associated with birth outside of the US (aOR 8.6; 95 % CI 5.5, 13.4), interview in Spanish (aOR 2.4; 95 % CI 1.8, 3.2), Hispanic ethnicity (aOR 2.0; 95 % CI 1.2, 3.4), and <12 years education (aOR 2.3; 95 % CI 1.7, 3.1). Overall the results of the multiple imputation and probabilistic sensitivity analysis were similar to the complete case analysis. Conclusions Although in some scenarios missing BMI data can bias the magnitude of association, it does not appear likely to have impacted conclusions from a traditional complete case analysis of these data.

  7. Invasiveness as a barrier to self-monitoring of blood glucose in diabetes.

    PubMed

    Wagner, Julie; Malchoff, Carl; Abbott, Gina

    2005-08-01

    This study investigated the degree to which the invasive characteristic of glucose monitoring is a barrier to self-monitoring of blood glucose (SMBG). A paper-and-pencil Measure of Invasiveness as a reason for Skipping SMBG (MISS) was created and administered to 339 people with diabetes. The correlations between MISS scores and actual SMBG frequency, percent adherence to SMBG recommendations, SMBG anxiety, SMBG burden, and knowledge of the importance of glycemic control for avoiding diabetes complications were each explored. On a scale of 0-28, the average MISS score was M = 4.3 (SD = 5.4, range 0-28). Fully 63% (nearly two-thirds) of respondents reported skipping SMBG because of the invasiveness of the procedure. MISS scores were negatively related to percent adherence to healthcare provider SMBG recommendations as measured by memory function of automated meters (Spearman's r= -0.47, P < 0.01). MISS scores were also negatively related to absolute SMBG frequency regardless of SMBG recommendations (Spearman's r= -0.11, P < 0.05). Correlation between the MISS and SMBG anxiety was significant (Spearman's r = 0.50, P < 0.01). With highly anxious participants deleted, the magnitude of the correlation was attenuated, but persisted (Spearman's r = 0.28, P < 0.01), suggesting that invasiveness is associated with SMBG anxiety even among patients without a blood or injection phobia. MISS scores were also correlated with the degree to which patients find routine and non-routine SMBG checks a burden (routine r = 0.38, P < 0.01; non-routine r = 0.45, P < 0.01). Results of Mann-Whitney U tests indicated higher MISS scores among participants with less knowledge about the importance of glycemic control in the development of diabetes vascular complications. Invasiveness is a common and serious barrier to SMBG. These findings suggest that people with diabetes would perform SMBG more frequently and have improved quality of life with non-invasive SMBG.

  8. Understanding repeated non-attendance in health services: a pilot analysis of administrative data and full study protocol for a national retrospective cohort.

    PubMed

    Williamson, Andrea E; Ellis, David A; Wilson, Philip; McQueenie, Ross; McConnachie, Alex

    2017-02-14

    Understanding the causes of low engagement in healthcare is a pre-requisite for improving health services' contribution to tackling health inequalities. Low engagement includes missing healthcare appointments. Serially (having a pattern of) missing general practice (GP) appointments may provide a risk marker for vulnerability and poorer health outcomes. A proof of concept pilot using GP appointment data and a focus group with GPs informed the development of missed appointment categories: patients can be classified based on the number of appointments missed each year. The full study, using a retrospective cohort design, will link routine health service and education data to determine the relationship between GP appointment attendance, health outcomes, healthcare usage, preventive health activity and social circumstances taking a life course approach and using data from the whole journey in the National Health Service (NHS) healthcare. 172 practices will be recruited (∼900 000 patients) across Scotland. The statistical analysis will focus on 2 key areas: factors that predict patients who serially miss appointments, and serial missed appointments as a predictor of future patient outcomes. Regression models will help understand how missed appointment patterns are associated with patient and practice characteristics. We shall identify key factors associated with serial missed appointments and potential interactions that might predict them. The results of the project will inform debates concerning how best to reduce non-attendance and increase patient engagement within healthcare systems. Significant non-academic beneficiaries include governments, policymakers and medical practitioners. Results will be disseminated via a combination of academic outputs (papers, conferences), social media and through collaborative public health/policy fora. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  9. How to improve breeding value prediction for feed conversion ratio in the case of incomplete longitudinal body weights.

    PubMed

    Tran, V H Huynh; Gilbert, H; David, I

    2017-01-01

    With the development of automatic self-feeders, repeated measurements of feed intake are becoming easier in an increasing number of species. However, the corresponding BW are not always recorded, and these missing values complicate the longitudinal analysis of the feed conversion ratio (FCR). Our aim was to evaluate the impact of missing BW data on estimations of the genetic parameters of FCR and ways to improve the estimations. On the basis of the missing BW profile in French Large White pigs (male pigs weighed weekly, females and castrated males weighed monthly), we compared 2 different ways of predicting missing BW, 1 using a Gompertz model and 1 using a linear interpolation. For the first part of the study, we used 17,398 weekly records of BW and feed intake recorded over 16 consecutive weeks in 1,222 growing male pigs. We performed a simulation study on this data set to mimic missing BW values according to the pattern of weekly proportions of incomplete BW data in females and castrated males. The FCR was then computed for each week using observed data (obser_FCR), data with missing BW (miss_FCR), data with BW predicted using a Gompertz model (Gomp_FCR), and data with BW predicted by linear interpolation (interp_FCR). Heritability (h) was estimated, and the EBV was predicted for each repeated FCR using a random regression model. In the second part of the study, the full data set (males with their complete BW records, castrated males and females with missing BW) was analyzed using the same methods (miss_FCR, Gomp_FCR, and interp_FCR). Results of the simulation study showed that h were overestimated in the case of missing BW and that predicting BW using a linear interpolation provided a more accurate estimation of h and of EBV than a Gompertz model. Over 100 simulations, the correlation between obser_EBV and interp_EBV, Gomp_EBV, and miss_EBV was 0.93 ± 0.02, 0.91 ± 0.01, and 0.79 ± 0.04, respectively. The heritabilities obtained with the full data set were quite similar for miss_FCR, Gomp_FCR, and interp_FCR. In conclusion, when the proportion of missing BW is high, genetic parameters of FCR are not well estimated. In French Large White pigs, in the growing period extending from d 65 to 168, prediction of missing BW using a Gompertz growth model slightly improved the estimations, but the linear interpolation improved the estimation to a greater extent. This result is due to the linear rather than sigmoidal increase in BW over the study period.

  10. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... substitute data shall be the best available estimate based on all available process data or data used for accounting purposes (such as sales records). (b) For missing values related to the performance test...

  11. Impact of missing data imputation methods on gene expression clustering and classification.

    PubMed

    de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G

    2015-02-26

    Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .

  12. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  13. The value of urban tree cover: A hedonic property price model in Ramsey and Dakota Counties, Minnesota, USA

    Treesearch

    Heather Sander; Stephen Polasky; Robert. Haight

    2010-01-01

    Urban tree cover benefits communities. These benefits' economic values, however, are poorly recognized and often ignored by landowners and planners. We use hedonic property price modeling to estimate urban tree cover's value in Dakota and Ramsey Counties, MN, USA, predicting housing value as a function of structural, neighborhood, and environmental variables...

  14. Investigating the variability of memory distortion for an analogue trauma.

    PubMed

    Strange, Deryn; Takarangi, Melanie K T

    2015-01-01

    In this paper, we examine whether source monitoring (SM) errors might be one mechanism that accounts for traumatic memory distortion. Participants watched a traumatic film with some critical (crux) and non-critical (non-crux) scenes removed. Twenty-four hours later, they completed a memory test. To increase the likelihood participants would notice the film's gaps, we inserted visual static for the length of each missing scene. We then added manipulations designed to affect people's SM behaviour. To encourage systematic SM, before watching the film, we warned half the participants that we had removed some scenes. To encourage heuristic SM some participants also saw labels describing the missing scenes. Adding static highlighting, the missing scenes did not affect false recognition of those missing scenes. However, a warning decreased, while labels increased, participants' false recognition rates. We conclude that manipulations designed to affect SM behaviour also affect the degree of memory distortion in our paradigm.

  15. The Effect of Near-Miss Rate and Card Control when American Indians and Non-Indians Gamble in a Laboratory Situation: The Influence of Alcohol

    ERIC Educational Resources Information Center

    Whitton, Melissa; Weatherly, Jeffrey N.

    2009-01-01

    Twelve American Indian (AI) and 12 non-AI participants gambled on a slot-machine simulation and on video poker. Prior to the gambling sessions, half of the participants consumed alcohol while the other half consumed a placebo beverage. They then played the slot-machine simulation three times, with the percentage of programmed "near misses" varying…

  16. On piecewise interpolation techniques for estimating solar radiation missing values in Kedah

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saaban, Azizan; Zainudin, Lutfi; Bakar, Mohd Nazari Abu

    2014-12-04

    This paper discusses the use of piecewise interpolation method based on cubic Ball and Bézier curves representation to estimate the missing value of solar radiation in Kedah. An hourly solar radiation dataset is collected at Alor Setar Meteorology Station that is taken from Malaysian Meteorology Deparment. The piecewise cubic Ball and Bézier functions that interpolate the data points are defined on each hourly intervals of solar radiation measurement and is obtained by prescribing first order derivatives at the starts and ends of the intervals. We compare the performance of our proposed method with existing methods using Root Mean Squared Errormore » (RMSE) and Coefficient of Detemination (CoD) which is based on missing values simulation datasets. The results show that our method is outperformed the other previous methods.« less

  17. Potentiation of latent inhibition by haloperidol and clozapine is attenuated in Dopamine D2 receptor (Drd-2)-deficient mice: Do antipsychotics influence learning to ignore irrelevant stimuli via both Drd-2 and non-Drd-2 mechanisms?

    PubMed Central

    O’Callaghan, Matthew J; Bay-Richter, Cecilie; O’Tuathaigh, Colm MP; Heery, David M; Waddington, John L; Moran, Paula M

    2014-01-01

    Whether the dopamine Drd-2 receptor is necessary for the behavioural action of antipsychotic drugs is an important question, as Drd-2 antagonism is responsible for their debilitating motor side effects. Using Drd-2 null mice (Drd2 -/-) it has previously been shown that Drd-2 is not necessary for antipsychotic drugs to reverse D-amphetamine disruption of latent inhibition (LI), a behavioural measure of learning to ignore irrelevant stimuli. Weiner’s ‘two-headed’ model indicates that antipsychotics not only reverse LI disruption, ‘disrupted LI’, but also potentiate LI when low/absent in controls, ‘persistent’ LI. We investigated whether antipsychotic drugs haloperidol or clozapine potentiated LI in wild-type controls or Drd2 -/-. Both drugs potentiated LI in wild-type but not in Drd2-/- mice, suggesting moderation of this effect of antipsychotics in the absence of Drd-2. Haloperidol potentiated LI similarly in both Drd1-/- and wild-type mice, indicating no such moderation in Drd1-/-. These data suggest that antipsychotic drugs can have either Drd-2 or non-Drd-2 effects on learning to ignore irrelevant stimuli, depending on how the abnormality is produced. Identification of the non-Drd-2 mechanism may help to identify novel non-Drd2 based therapeutic strategies for psychosis. PMID:25122042

  18. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    PubMed

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  19. On uncertainty in information and ignorance in knowledge

    NASA Astrophysics Data System (ADS)

    Ayyub, Bilal M.

    2010-05-01

    This paper provides an overview of working definitions of knowledge, ignorance, information and uncertainty and summarises formalised philosophical and mathematical framework for their analyses. It provides a comparative examination of the generalised information theory and the generalised theory of uncertainty. It summarises foundational bases for assessing the reliability of knowledge constructed as a collective set of justified true beliefs. It discusses system complexity for ancestor simulation potentials. It offers value-driven communication means of knowledge and contrarian knowledge using memes and memetics.

  20. Selective sulfur dioxide adsorption on crystal defect sites on an isoreticular metal organic framework series

    PubMed Central

    Rodríguez-Albelo, L. Marleny; López-Maya, Elena; Hamad, Said; Ruiz-Salvador, A. Rabdel; Calero, Sofia; Navarro, Jorge A.R.

    2017-01-01

    The widespread emissions of toxic gases from fossil fuel combustion represent major welfare risks. Here we report the improvement of the selective sulfur dioxide capture from flue gas emissions of isoreticular nickel pyrazolate metal organic frameworks through the sequential introduction of missing-linker defects and extra-framework barium cations. The results and feasibility of the defect pore engineering carried out are quantified through a combination of dynamic adsorption experiments, X-ray diffraction, electron microscopy and density functional theory calculations. The increased sulfur dioxide adsorption capacities and energies as well as the sulfur dioxide/carbon dioxide partition coefficients values of defective materials compared to original non-defective ones are related to the missing linkers enhanced pore accessibility and to the specificity of sulfur dioxide interactions with crystal defect sites. The selective sulfur dioxide adsorption on defects indicates the potential of fine-tuning the functional properties of metal organic frameworks through the deliberate creation of defects. PMID:28198376

  1. Adjusting HIV prevalence estimates for non-participation: an application to demographic surveillance

    PubMed Central

    McGovern, Mark E.; Marra, Giampiero; Radice, Rosalba; Canning, David; Newell, Marie-Louise; Bärnighausen, Till

    2015-01-01

    Introduction HIV testing is a cornerstone of efforts to combat the HIV epidemic, and testing conducted as part of surveillance provides invaluable data on the spread of infection and the effectiveness of campaigns to reduce the transmission of HIV. However, participation in HIV testing can be low, and if respondents systematically select not to be tested because they know or suspect they are HIV positive (and fear disclosure), standard approaches to deal with missing data will fail to remove selection bias. We implemented Heckman-type selection models, which can be used to adjust for missing data that are not missing at random, and established the extent of selection bias in a population-based HIV survey in an HIV hyperendemic community in rural South Africa. Methods We used data from a population-based HIV survey carried out in 2009 in rural KwaZulu-Natal, South Africa. In this survey, 5565 women (35%) and 2567 men (27%) provided blood for an HIV test. We accounted for missing data using interviewer identity as a selection variable which predicted consent to HIV testing but was unlikely to be independently associated with HIV status. Our approach involved using this selection variable to examine the HIV status of residents who would ordinarily refuse to test, except that they were allocated a persuasive interviewer. Our copula model allows for flexibility when modelling the dependence structure between HIV survey participation and HIV status. Results For women, our selection model generated an HIV prevalence estimate of 33% (95% CI 27–40) for all people eligible to consent to HIV testing in the survey. This estimate is higher than the estimate of 24% generated when only information from respondents who participated in testing is used in the analysis, and the estimate of 27% when imputation analysis is used to predict missing data on HIV status. For men, we found an HIV prevalence of 25% (95% CI 15–35) using the selection model, compared to 16% among those who participated in testing, and 18% estimated with imputation. We provide new confidence intervals that correct for the fact that the relationship between testing and HIV status is unknown and requires estimation. Conclusions We confirm the feasibility and value of adopting selection models to account for missing data in population-based HIV surveys and surveillance systems. Elements of survey design, such as interviewer identity, present the opportunity to adopt this approach in routine applications. Where non-participation is high, true confidence intervals are much wider than those generated by standard approaches to dealing with missing data suggest. PMID:26613900

  2. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    NASA Astrophysics Data System (ADS)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  3. Nuclear Forensics Analysis with Missing and Uncertain Data

    DOE PAGES

    Langan, Roisin T.; Archibald, Richard K.; Lamberti, Vincent

    2015-10-05

    We have applied a new imputation-based method for analyzing incomplete data, called Monte Carlo Bayesian Database Generation (MCBDG), to the Spent Fuel Isotopic Composition (SFCOMPO) database. About 60% of the entries are absent for SFCOMPO. The method estimates missing values of a property from a probability distribution created from the existing data for the property, and then generates multiple instances of the completed database for training a machine learning algorithm. Uncertainty in the data is represented by an empirical or an assumed error distribution. The method makes few assumptions about the underlying data, and compares favorably against results obtained bymore » replacing missing information with constant values.« less

  4. Molecular Pathways

    PubMed Central

    Lok, Benjamin H.; Powell, Simon N.

    2012-01-01

    The Rad52 protein was largely ignored in humans and other mammals when the mouse knockout revealed a largely “no-effect” phenotype. However, using synthetic lethal approaches to investigate context dependent function, new studies have shown that Rad52 plays a key survival role in cells lacking the function of the BRCA1-BRCA2 pathway of homologous recombination. Biochemical studies also showed significant differences between yeast and human Rad52, in which yeast Rad52 can promote strand invasion of RPA-coated single-stranded DNA in the presence of Rad51, but human Rad52 cannot. This results in the paradox of how is human Rad52 providing Rad51 function: presumably there is something missing in the biochemical assays that exists in-vivo, but the nature of this missing factor is currently unknown. Recent studies have suggested that Rad52 provides back-up Rad51 function for all members of the BRCA1-BRCA2 pathway, suggesting that Rad52 may be a target for therapy in BRCA pathway deficient cancers. Screening for ways to inhibit Rad52 would potentially provide a complementary strategy for targeting BRCA-deficient cancers in addition to PARP inhibitors. PMID:23071261

  5. Hard matching for boosted tops at two loops

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoang, Andre H.; Pathak, Aditya; Pietrulewicz, Piotr

    2015-12-10

    Here, cross sections for top quarks provide very interesting physics opportunities, being both sensitive to new physics and also perturbatively tractable due to the large top quark mass. Rigorous factorization theorems for top cross sections can be derived in several kinematic scenarios, including the boosted regime in the peak region that we consider here. In the context of the corresponding factorization theorem for e +e – collisions we extract the last missing ingredient that is needed to evaluate the cross section differential in the jet-mass at two-loop order, namely the matching coefficient at the scale μ≃m t. Our extraction alsomore » yields the final ingredients needed to carry out logarithmic re-summation at next-to-next-to-leading logarithmic order (or N 3LL if we ignore the missing 4-loop cusp anomalous dimension). This coefficient exhibits an amplitude level rapidity logarithm starting at O(α 2 s) due to virtual top quark loops, which we treat using rapidity renormalization group (RG) evolution. Interestingly, this rapidity RG evolution appears in the matching coefficient between two effective theories around the heavy quark mass scale μ ≃ m t.« less

  6. A bias-corrected estimator in multiple imputation for missing data.

    PubMed

    Tomita, Hiroaki; Fujisawa, Hironori; Henmi, Masayuki

    2018-05-29

    Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. Copyright © 2018 John Wiley & Sons, Ltd.

  7. Seven Deadly Sins in Trauma Outcomes Research: An Epidemiologic Post-Mortem for Major Causes of Bias

    PubMed Central

    del Junco, Deborah J.; Fox, Erin E.; Camp, Elizabeth A.; Rahbar, Mohammad H.; Holcomb, John B.

    2013-01-01

    Background Because randomized clinical trials (RCTs) in trauma outcomes research are expensive and complex, they have rarely been the basis for the clinical care of trauma patients. Most published findings are derived from retrospective and occasionally prospective observational studies that may be particularly susceptible to bias. The sources of bias include some common to other clinical domains, such as heterogeneous patient populations with competing and interdependent short- and long-term outcomes. Other sources of bias are unique to trauma, such as rapidly changing multi-system responses to injury that necessitate highly dynamic treatment regimes like blood product transfusion. The standard research design and analysis strategies applied in published observational studies are often inadequate to address these biases. Methods Drawing on recent experience in the design, data collection, monitoring and analysis of the 10-site observational PROMMTT study, seven common and sometimes overlapping biases are described through examples and resolution strategies. Results Sources of bias in trauma research include ignoring 1) variation in patients’ indications for treatment (indication bias), 2) the dependency of intervention delivery on patient survival (survival bias), 3) time-varying treatment, 4) time-dependent confounding, 5) non-uniform intervention effects over time, 6) non-random missing data mechanisms, and 7) imperfectly defined variables. This list is not exhaustive. Conclusion The mitigation strategies to overcome these threats to validity require epidemiologic and statistical vigilance. Minimizing the highlighted types of bias in trauma research will facilitate clinical translation of more accurate and reproducible findings and improve the evidence-base that clinicians apply in their care of injured patients. PMID:23778519

  8. Global volcanic aerosol properties derived from emissions, 1990-2014, using CESM1(WACCM): VOLCANIC AEROSOLS DERIVED FROM EMISSIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mills, Michael J.; Schmidt, Anja; Easter, Richard

    Accurate representation of global stratospheric aerosol properties from volcanic and non-volcanic sulfur emissions is key to understanding the cooling effects and ozone-loss enhancements of recent volcanic activity. Attribution of climate and ozone variability to volcanic activity is of particular interest in relation to the post-2000 slowing in the apparent rate of global average temperature increases, and variable recovery of the Antarctic ozone hole. We have developed a climatology of global aerosol properties from 1990 to 2014 calculated based on volcanic and non-volcanic emissions of sulfur sources. We have complied a database of volcanic SO2 emissions and plume altitudes for eruptionsmore » between 1990 and 2014, and a new prognostic capability for simulating stratospheric sulfate aerosols in version 5 of the Whole Atmosphere Community Climate Model, a component of the Community Earth System Model. Our climatology shows remarkable agreement with ground-based lidar observations of stratospheric aerosol optical depth (SAOD), and with in situ measurements of aerosol surface area density (SAD). These properties are key parameters in calculating the radiative and chemical effects of stratospheric aerosols. Our SAOD climatology represents a significant improvement over satellite-based analyses, which ignore aerosol extinction below 15 km, a region that can contain the vast majority of stratospheric aerosol extinction at mid- and high-latitudes. Our SAD climatology significantly improves on that provided for the Chemistry-Climate Model Initiative, which misses 60% of the SAD measured in situ. Our climatology of aerosol properties is publicly available on the Earth System Grid.« less

  9. The missing discourse of development: commentary on Lerum and Dworkin.

    PubMed

    Else-Quest, Nicole M; Hyde, Janet Shibley

    2009-01-01

    Lerum and Dworkin offer a provocative interdisciplinary feminist commentary on the Report of the APA (American Psychological Association) Task Force on the Sexualization of Girls. This commentary notes limitations to their argument and evidence, which make it a less convincing critique of the report's conclusions. Most notably, Lerum and Dworkin omit a developmental approach to the topic of sexualization and media exposure. In addition, their criticism that the report over-determines the negative effects (i.e., overstates the negative effects and ignores the positive effects) of sexualization on girls is unsupported by the empirical literature. This commentary also addresses their concerns about the language used in the report and highlights the need for clear and precise language in this dialogue.

  10. Strategies for dealing with missing data in clinical trials: from design to analysis.

    PubMed

    Dziura, James D; Post, Lori A; Zhao, Qing; Fu, Zhixuan; Peduzzi, Peter

    2013-09-01

    Randomized clinical trials are the gold standard for evaluating interventions as randomized assignment equalizes known and unknown characteristics between intervention groups. However, when participants miss visits, the ability to conduct an intent-to-treat analysis and draw conclusions about a causal link is compromised. As guidance to those performing clinical trials, this review is a non-technical overview of the consequences of missing data and a prescription for its treatment beyond the typical analytic approaches to the entire research process. Examples of bias from incorrect analysis with missing data and discussion of the advantages/disadvantages of analytic methods are given. As no single analysis is definitive when missing data occurs, strategies for its prevention throughout the course of a trial are presented. We aim to convey an appreciation for how missing data influences results and an understanding of the need for careful consideration of missing data during the design, planning, conduct, and analytic stages.

  11. Reducing Missed Opportunities for Influenza Vaccination in Patients with Rheumatoid Arthritis: Evaluation of a Multisystem Intervention.

    PubMed

    Broderick, Rachel; Ventura, Iazsmin; Soroosh, Sunoz; Franco, Lourdes; Giles, Jon T

    2018-05-15

    To assess a multimodal intervention for reducing missed opportunities for outpatient influenza vaccination in individuals with rheumatoid arthritis (RA). Patients with RA were enrolled from a single center and each rheumatology outpatient visit was tracked for missed opportunities for influenza vaccination, defined as a visit in which an unvaccinated patient without contraindications remained unvaccinated or lacked documentation of vaccine recommendation in the electronic medical record (EMR). Providers then received a multimodal intervention consisting of an education session, EMR alerts, and weekly provider-specific e-mail reminders. Missed opportunities before and after the intervention were compared, and the determinants of missed opportunities were analyzed. A total of 228 patients with RA were enrolled (904 preintervention visits) and 197 returned for at least 1 postintervention visit (721 postintervention visits). The preintervention frequency of any missed opportunities for influenza vaccination was 47%. This was reduced to 23% postintervention (p < 0.001). Among those vaccinated, the relative hazard for influenza vaccination post- versus pre- intervention period was 1.24 (p = 0.038). Younger age, less frequent office visits, higher erythrocyte sedimentation rate, and negative attitudes about vaccines were each independently associated with missed opportunities preintervention. Postintervention, these factors were no longer associated with missed opportunities; however, the intervention was not as effective in non-Hispanic black patients, non-English speakers, those residing outside of the New York City metropolitan area, and those reporting prior adverse reactions to vaccines. Improved uptake of influenza vaccination in patients with RA is possible using a multimodal approach. Certain subgroups may need a more potent intervention for equivalent efficacy.

  12. Missing data handling in non-inferiority and equivalence trials: A systematic review.

    PubMed

    Rabe, Brooke A; Day, Simon; Fiero, Mallorie H; Bell, Melanie L

    2018-05-25

    Non-inferiority (NI) and equivalence clinical trials test whether a new treatment is therapeutically no worse than, or equivalent to, an existing standard of care. Missing data in clinical trials have been shown to reduce statistical power and potentially bias estimates of effect size; however, in NI and equivalence trials, they present additional issues. For instance, they may decrease sensitivity to differences between treatment groups and bias toward the alternative hypothesis of NI (or equivalence). Our primary aim was to review the extent of and methods for handling missing data (model-based methods, single imputation, multiple imputation, complete case), the analysis sets used (Intention-To-Treat, Per-Protocol, or both), and whether sensitivity analyses were used to explore departures from assumptions about the missing data. We conducted a systematic review of NI and equivalence trials published between May 2015 and April 2016 by searching the PubMed database. Articles were reviewed primarily by 2 reviewers, with 6 articles reviewed by both reviewers to establish consensus. Of 109 selected articles, 93% reported some missing data in the primary outcome. Among those, 50% reported complete case analysis, and 28% reported single imputation approaches for handling missing data. Only 32% reported conducting analyses of both intention-to-treat and per-protocol populations. Only 11% conducted any sensitivity analyses to test assumptions with respect to missing data. Missing data are common in NI and equivalence trials, and they are often handled by methods which may bias estimates and lead to incorrect conclusions. Copyright © 2018 John Wiley & Sons, Ltd.

  13. Bias and sensitivity in the placement of fossil taxa resulting from interpretations of missing data.

    PubMed

    Sansom, Robert S

    2015-03-01

    The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic frameworks to support their interpretation. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  14. Gap-filling methods to impute eddy covariance flux data by preserving variance.

    NASA Astrophysics Data System (ADS)

    Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.

    2015-12-01

    To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables of the parameters traditionally used. Then, comparisons of the predicted values from these methods and 'traditional' gap-filling methods (using 12 fixed monthly windows) will be assessed to show the scale of preserving variance. Further, this method will be applied to impute artificially created gaps for analyzing if variance is preserved.

  15. Warpgroup: increased precision of metabolomic data processing by consensus integration bound analysis

    PubMed Central

    Mahieu, Nathaniel G.; Spalding, Jonathan L.; Patti, Gary J.

    2016-01-01

    Motivation: Current informatic techniques for processing raw chromatography/mass spectrometry data break down under several common, non-ideal conditions. Importantly, hydrophilic liquid interaction chromatography (a key separation technology for metabolomics) produces data which are especially challenging to process. We identify three critical points of failure in current informatic workflows: compound specific drift, integration region variance, and naive missing value imputation. We implement the Warpgroup algorithm to address these challenges. Results: Warpgroup adds peak subregion detection, consensus integration bound detection, and intelligent missing value imputation steps to the conventional informatic workflow. When compared with the conventional workflow, Warpgroup made major improvements to the processed data. The coefficient of variation for peaks detected in replicate injections of a complex Escherichia Coli extract were halved (a reduction of 19%). Integration regions across samples were much more robust. Additionally, many signals lost by the conventional workflow were ‘rescued’ by the Warpgroup refinement, thereby resulting in greater analyte coverage in the processed data. Availability and implementation: Warpgroup is an open source R package available on GitHub at github.com/nathaniel-mahieu/warpgroup. The package includes example data and XCMS compatibility wrappers for ease of use. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: nathaniel.mahieu@wustl.edu or gjpattij@wustl.edu PMID:26424859

  16. A method to estimate the additional uncertainty in gap-filled NEE resulting from long gaps in the CO2 flux record

    Treesearch

    Andrew D. Richardson; David Y. Hollinger

    2007-01-01

    Missing values in any data set create problems for researchers. The process by which missing values are replaced, and the data set is made complete, is generally referred to as imputation. Within the eddy flux community, the term "gap filling" is more commonly applied. A major challenge is that random errors in measured data result in uncertainty in the gap-...

  17. Cara Status and Upcoming Enhancements

    NASA Technical Reports Server (NTRS)

    Newman, Lauri

    2015-01-01

    RIC Miss Values in Summary TableTabular presentation of miss vector in Summary Section RIC Uncertainty Values in Details SectionNumerical presentation of miss component uncertainty values in Details SectionGreen Events with Potentially Maneuverable Secondary ObjectsAll potentially maneuverable secondary objects will be reported out to 7-days prior to TCA for LEO events and 10-days for NONLEO events, regardless of risk (relates to MOWG Action Item 1309-11) All green events with potentially active secondary objects included in Summary ReportsAllows more time for contacting other OOBlack Box FixSometimes a black square appeared in the summary report where the ASW RIC time history plot should beAppendix Orbit RegimeMission Name MismatchPc 0 Plotting BugAll Pc points less than 1e-10 (zero) are now plotted as 1e-10 (instead of not at all)Maneuver Indication FixManeuver indicator now present even if maneuver was in the past.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khachatryan, Vardan

    The performance of missing transverse energy reconstruction algorithms is presented by our team using√s=8 TeV proton-proton (pp) data collected with the CMS detector. Events with anomalous missing transverse energy are studied, and the performance of algorithms used to identify and remove these events is presented. The scale and resolution for missing transverse energy, including the effects of multiple pp interactions (pileup), are measured using events with an identified Z boson or isolated photon, and are found to be well described by the simulation. Novel missing transverse energy reconstruction algorithms developed specifically to mitigate the effects of large numbers of pileupmore » interactions on the missing transverse energy resolution are presented. These algorithms significantly reduce the dependence of the missing transverse energy resolution on pileup interactions. Furthermore, an algorithm that provides an estimate of the significance of the missing transverse energy is presented, which is used to estimate the compatibility of the reconstructed missing transverse energy with a zero nominal value.« less

  19. SU-E-T-56: A Novel Approach to Computing Expected Value and Variance of Point Dose From Non-Gated Radiotherapy Delivery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, S; Zhu, X; Zhang, M

    Purpose: Randomness in patient internal organ motion phase at the beginning of non-gated radiotherapy delivery may introduce uncertainty to dose received by the patient. Concerns of this dose deviation from the planned one has motivated many researchers to study this phenomenon although unified theoretical framework for computing it is still missing. This study was conducted to develop such framework for analyzing the effect. Methods: Two reasonable assumptions were made: a) patient internal organ motion is stationary and periodic; b) no special arrangement is made to start a non -gated radiotherapy delivery at any specific phase of patient internal organ motion.more » A statistical ensemble was formed consisting of patient’s non-gated radiotherapy deliveries at all equally possible initial organ motion phases. To characterize the patient received dose, statistical ensemble average method is employed to derive formulae for two variables: expected value and variance of dose received by a patient internal point from a non-gated radiotherapy delivery. Fourier Series was utilized to facilitate our analysis. Results: According to our formulae, the two variables can be computed from non-gated radiotherapy generated dose rate time sequences at the point’s corresponding locations on fixed phase 3D CT images sampled evenly in time over one patient internal organ motion period. The expected value of point dose is simply the average of the doses to the point’s corresponding locations on the fixed phase CT images. The variance can be determined by time integration in terms of Fourier Series coefficients of the dose rate time sequences on the same fixed phase 3D CT images. Conclusion: Given a non-gated radiotherapy delivery plan and patient’s 4D CT study, our novel approach can predict the expected value and variance of patient radiation dose. We expect it to play a significant role in determining both quality and robustness of patient non-gated radiotherapy plan.« less

  20. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

    PubMed Central

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  1. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

    PubMed

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.

  2. Socioeconomic position and occupational social class and their association with risky alcohol consumption among adolescents.

    PubMed

    Obradors-Rial, Núria; Ariza, Carles; Rajmil, Luis; Muntaner, Carles

    2018-05-01

    To compare different measures of socioeconomic position (SEP) and occupational social class (OSC) and to evaluate their association with risky alcohol consumption among adolescents attending the last mandatory secondary school (ages 15-17 years). This was a cross-sectional study. 1268 adolescents in Catalonia (Spain) participated in the study. Family affluence scale (FAS), parents' OSC, parents' level of education and monthly familiar income were used to compare socioeconomic indicators. Logistic regression analyses were conducted to evaluate socioeconomic variables and missing associated factors, and to observe the relation between each SEP variable and OSC adjusting by sociodemographic variables. Familiar income had more than 30% of missing values. OSC had the fewest missing values associated factors. Being immigrant was associated with all SEP missing values. All SEP measures were positively associated with risky alcohol consumption, yet the strength of these associations diminished after adjustment for sociodemographic variables. Weekly available money was the variable with the strongest association with risky alcohol consumption. OSC seems to be as good as the other indicators to assess adolescents' SEP. Adolescents with high SEP and those belonging to upper social classes reported higher levels of risky alcohol consumption.

  3. Emotional Stress as a Risk for Hypertension in Sub-Saharan Africans: Are We Ignoring the Odds?

    PubMed

    Malan, Leoné; Malan, Nico T

    2017-01-01

    Globally most interventions focus on improving lifestyle habits and treatment regimens to combat hypertension as a non-communicable disease (NCD). However, despite these interventions and improved medical treatments, blood pressure (BP) values are still on the rise and poorly controlled in sub-Saharan Africa (SSA). Other factors contributing to hypertension prevalence, such as chronic emotional stress, might provide some insight for future health policy approaches.Currently, Hypertension Society guidelines do not mention emotional stress as a probable cause for hypertension. Recently the 2014 World Global Health reports, suggested that African governments should consider using World Health Organization hypertension data as a proxy indicator for social well-being. However, the possibility that a stressful life and taxing environmental factors might disturb central neural control of BP regulation has largely been ignored in SSA.Linking emotional stress to vascular dysregulation is therefore one way to investigate increased cardiometabolic challenges, neurotransmitter depletion and disturbed hemodynamics. Disruption of stress response pathways and subsequent changes in lifestyle habits as ways of coping with a stressful life, and as probable cause for hypertension prevalence in SSA, may be included in future preventive measures. We will provide an overview on emotional stress and central neural control of BP and will include also implications thereof for clinical practice in SSA cohorts.

  4. Performance of the CMS missing transverse momentum reconstruction in pp data at $$\\sqrt{s}$$ = 8 TeV

    DOE PAGES

    Khachatryan, Vardan

    2015-02-12

    The performance of missing transverse energy reconstruction algorithms is presented by our team using√s=8 TeV proton-proton (pp) data collected with the CMS detector. Events with anomalous missing transverse energy are studied, and the performance of algorithms used to identify and remove these events is presented. The scale and resolution for missing transverse energy, including the effects of multiple pp interactions (pileup), are measured using events with an identified Z boson or isolated photon, and are found to be well described by the simulation. Novel missing transverse energy reconstruction algorithms developed specifically to mitigate the effects of large numbers of pileupmore » interactions on the missing transverse energy resolution are presented. These algorithms significantly reduce the dependence of the missing transverse energy resolution on pileup interactions. Furthermore, an algorithm that provides an estimate of the significance of the missing transverse energy is presented, which is used to estimate the compatibility of the reconstructed missing transverse energy with a zero nominal value.« less

  5. Russia’s Reactions to the Color Revolutions

    DTIC Science & Technology

    2017-03-01

    Democratic Institute NGO non -governmental organization ODIHR Office for Democratic Institutions and Human Rights OSCE Organization for Security and...as the United States and other Western partners continue their tradition of promoting democracy and encouraging non -governmental organization (NGO...Ukraine as genuinely popular and indigenous upheavals, largely ignoring the role of U.S. funding and U.S. non -governmental organisations in

  6. Children’s Sensitivity to the Knowledge Expressed in Pedagogical and Non-Pedagogical Contexts

    PubMed Central

    Gelman, Susan A.; Ware, Elizabeth A.; Manczak, Erika M.; Graham, Susan A.

    2013-01-01

    The present studies test two hypotheses: (1) that pedagogical contexts especially convey generic information (Csibra & Gergely, 2009), and (2) that young children are sensitive to this aspect of pedagogy. We examined generic language (e.g., “Elephants live in Africa”) in three studies, focusing on: informational versus narrative children’s books (Study 1), the language of 6-year-old children and adults assuming either a pedagogical (teacher) or non-pedagogical (friend) role (Study 2), and the language of 5-year-old children and adults speaking to either an ignorant alien (pedagogical context) or a peer (non-pedagogical context; Study 3). Results suggest that generics are more frequent in informational than narrative texts. Furthermore, both adults and young children provide more generic language in pedagogical contexts and when assuming a pedagogical role. Together, the studies demonstrate that pedagogical contexts are distinctive in conveying generic information, and that children are sensitive to this aspect of the language input. We suggest that generic knowledge is more useful in making predictions about the future, and thus more highly valued during instruction. PMID:22468565

  7. An Application on Merton Model in the Non-efficient Market

    NASA Astrophysics Data System (ADS)

    Feng, Yanan; Xiao, Qingxian

    Merton Model is one of the famous credit risk models. This model presumes that the only source of uncertainty in equity prices is the firm’s net asset value .But the above market condition holds only when the market is efficient which is often been ignored in modern research. Another, the original Merton Model is based on assumptions that in the event of default absolute priority holds, renegotiation is not permitted , liquidation of the firm is costless and in the Merton Model and most of its modified version the default boundary is assumed to be constant which don’t correspond with the reality. So these can influence the level of predictive power of the model. In this paper, we have made some extensions on some of these assumptions underlying the original model. The model is virtually a modification of Merton’s model. In a non-efficient market, we use the stock data to analysis this model. The result shows that the modified model can evaluate the credit risk well in the non-efficient market.

  8. A regressive methodology for estimating missing data in rainfall daily time series

    NASA Astrophysics Data System (ADS)

    Barca, E.; Passarella, G.

    2009-04-01

    The "presence" of gaps in environmental data time series represents a very common, but extremely critical problem, since it can produce biased results (Rubin, 1976). Missing data plagues almost all surveys. The problem is how to deal with missing data once it has been deemed impossible to recover the actual missing values. Apart from the amount of missing data, another issue which plays an important role in the choice of any recovery approach is the evaluation of "missingness" mechanisms. When data missing is conditioned by some other variable observed in the data set (Schafer, 1997) the mechanism is called MAR (Missing at Random). Otherwise, when the missingness mechanism depends on the actual value of the missing data, it is called NCAR (Not Missing at Random). This last is the most difficult condition to model. In the last decade interest arose in the estimation of missing data by using regression (single imputation). More recently multiple imputation has become also available, which returns a distribution of estimated values (Scheffer, 2002). In this paper an automatic methodology for estimating missing data is presented. In practice, given a gauging station affected by missing data (target station), the methodology checks the randomness of the missing data and classifies the "similarity" between the target station and the other gauging stations spread over the study area. Among different methods useful for defining the similarity degree, whose effectiveness strongly depends on the data distribution, the Spearman correlation coefficient was chosen. Once defined the similarity matrix, a suitable, nonparametric, univariate, and regressive method was applied in order to estimate missing data in the target station: the Theil method (Theil, 1950). Even though the methodology revealed to be rather reliable an improvement of the missing data estimation can be achieved by a generalization. A first possible improvement consists in extending the univariate technique to the multivariate approach. Another approach follows the paradigm of the "multiple imputation" (Rubin, 1987; Rubin, 1988), which consists in using a set of "similar stations" instead than the most similar. This way, a sort of estimation range can be determined allowing the introduction of uncertainty. Finally, time series can be grouped on the basis of monthly rainfall rates defining classes of wetness (i.e.: dry, moderately rainy and rainy), in order to achieve the estimation using homogeneous data subsets. We expect that integrating the methodology with these enhancements will certainly improve its reliability. The methodology was applied to the daily rainfall time series data registered in the Candelaro River Basin (Apulia - South Italy) from 1970 to 2001. REFERENCES D.B., Rubin, 1976. Inference and Missing Data. Biometrika 63 581-592 D.B. Rubin, 1987. Multiple Imputation for Nonresponce in Surveys, New York: John Wiley & Sons, Inc. D.B. Rubin, 1988. An overview of multiple imputation. In Survey Research Section, pp. 79-84, American Statistical Association, 1988. J.L., Schafer, 1997. Analysis of Incomplete Multivariate Data, Chapman & Hall. J., Scheffer, 2002. Dealing with Missing Data. Res. Lett. Inf. Math. Sci. 3, 153-160. Available online at http://www.massey.ac.nz/~wwiims/research/letters/ H. Theil, 1950. A rank-invariant method of linear and polynomial regression analysis. Indicationes Mathematicae, 12, pp.85-91.

  9. Evaluation of non-additive genetic variation in feed-related traits of broiler chickens.

    PubMed

    Li, Y; Hawken, R; Sapp, R; George, A; Lehnert, S A; Henshall, J M; Reverter, A

    2017-03-01

    Genome-wide association mapping and genomic predictions of phenotype of individuals in livestock are predominately based on the detection and estimation of additive genetic effects. Non-additive genetic effects are largely ignored. Studies in animals, plants, and humans to assess the impact of non-additive genetic effects in genetic analyses have led to differing conclusions. In this paper, we examined the consequences of including non-additive genetic effects in genome-wide association mapping and genomic prediction of total genetic values in a commercial population of 5,658 broiler chickens genotyped for 45,176 single nucleotide polymorphism (SNP) markers. We employed mixed-model equations and restricted maximum likelihood to analyze 7 feed related traits (TRT1 - TRT7). Dominance variance accounted for a significant proportion of the total genetic variance in all 7 traits, ranging from 29.5% for TRT1 to 58.4% for TRT7. Using a 5-fold cross-validation schema, we found that in spite of the large dominance component, including the estimated dominance effects in the prediction of total genetic values did not improve the accuracy of the predictions for any of the phenotypes. We offer some possible explanations for this counter-intuitive result including the possible confounding of dominance deviations with common environmental effects such as hatch, different directional effects of SNP additive and dominance variations, and the gene-gene interactions' failure to contribute to the level of variance. © 2016 Poultry Science Association Inc.

  10. Education for Multiculturalism among Arab Youth in Israel

    ERIC Educational Resources Information Center

    Abu Asbah, Khaled

    2018-01-01

    Education for multiculturalism, founded on liberal-democratic values, is a frequent topic of educational discourse that has not been ignored by Muslim Arab schools in Israel. In general, Arab society is undergoing change processes, in transition from a traditional to a modern society; traditional values are challenged, engendering social crises.…

  11. Prevalence and Correlates of Missing Meals Among High School Students-United States, 2010.

    PubMed

    Demissie, Zewditu; Eaton, Danice K; Lowry, Richard; Nihiser, Allison J; Foltz, Jennifer L

    2018-01-01

    To determine the prevalence and correlates of missing meals among adolescents. The 2010 National Youth Physical Activity and Nutrition Study, a cross-sectional study. School based. A nationally representative sample of 11 429 high school students. Breakfast, lunch, and dinner consumption; demographics; measured and perceived weight status; physical activity and sedentary behaviors; and fruit, vegetable, milk, sugar-sweetened beverage, and fast-food intake. Prevalence estimates for missing breakfast, lunch, or dinner on ≥1 day during the past 7 days were calculated. Associations between demographics and missing meals were tested. Associations of lifestyle and dietary behaviors with missing meals were examined using logistic regression controlling for sex, race/ethnicity, and grade. In 2010, 63.1% of students missed breakfast, 38.2% missed lunch, and 23.3% missed dinner; the prevalence was highest among female and non-Hispanic black students. Being overweight/obese, perceiving oneself to be overweight, and video game/computer use were associated with increased risk of missing meals. Physical activity behaviors were associated with reduced risk of missing meals. Students who missed breakfast were less likely to eat fruits and vegetables and more likely to consume sugar-sweetened beverages and fast food. Breakfast was the most frequently missed meal, and missing breakfast was associated with the greatest number of less healthy dietary practices. Intervention and education efforts might prioritize breakfast consumption.

  12. Poor description of non-pharmacological interventions: analysis of consecutive sample of randomised trials

    PubMed Central

    Erueti, Chrissy; Glasziou, Paul P

    2013-01-01

    Objectives To evaluate the completeness of descriptions of non-pharmacological interventions in randomised trials, identify which elements are most frequently missing, and assess whether authors can provide missing details. Design Analysis of consecutive sample of randomised trials of non-pharmacological interventions. Data sources and study selection All reports of randomised trials of non-pharmacological interventions published in 2009 in six leading general medical journals; 133 trial reports, with 137 interventions, met the inclusion criteria. Data collection Using an eight item checklist, two raters assessed the primary full trial report, plus any reference materials, appendices, or websites. Questions about missing details were emailed to corresponding authors, and relevant items were then reassessed. Results Of 137 interventions, only 53 (39%) were adequately described; this was increased to 81 (59%) by using 63 responses from 88 contacted authors. The most frequently missing item was the “intervention materials” (47% complete), but it also improved the most after author response (92% complete). Whereas some authors (27/70) provided materials or further information, other authors (21/70) could not; their reasons included copyright or intellectual property concerns, not having the materials or intervention details, or being unaware of their importance. Although 46 (34%) trial interventions had further information or materials readily available on a website, many were not mentioned in the report, were not freely accessible, or the URL was no longer functioning. Conclusions Missing essential information about interventions is a frequent, yet remediable, contributor to the worldwide waste in research funding. If trial reports do not have a sufficient description of interventions, other researchers cannot build on the findings, and clinicians and patients cannot reliably implement useful interventions. Improvement will require action by funders, researchers, and publishers, aided by long term repositories of materials linked to publications. PMID:24021722

  13. Neural Mechanisms of Updating under Reducible and Irreducible Uncertainty.

    PubMed

    Kobayashi, Kenji; Hsu, Ming

    2017-07-19

    Adaptive decision making depends on an agent's ability to use environmental signals to reduce uncertainty. However, because of multiple types of uncertainty, agents must take into account not only the extent to which signals violate prior expectations but also whether uncertainty can be reduced in the first place. Here we studied how human brains of both sexes respond to signals under conditions of reducible and irreducible uncertainty. We show behaviorally that subjects' value updating was sensitive to the reducibility of uncertainty, and could be quantitatively characterized by a Bayesian model where agents ignore expectancy violations that do not update beliefs or values. Using fMRI, we found that neural processes underlying belief and value updating were separable from responses to expectancy violation, and that reducibility of uncertainty in value modulated connections from belief-updating regions to value-updating regions. Together, these results provide insights into how agents use knowledge about uncertainty to make better decisions while ignoring mere expectancy violation. SIGNIFICANCE STATEMENT To make good decisions, a person must observe the environment carefully, and use these observations to reduce uncertainty about consequences of actions. Importantly, uncertainty should not be reduced purely based on how surprising the observations are, particularly because in some cases uncertainty is not reducible. Here we show that the human brain indeed reduces uncertainty adaptively by taking into account the nature of uncertainty and ignoring mere surprise. Behaviorally, we show that human subjects reduce uncertainty in a quasioptimal Bayesian manner. Using fMRI, we characterize brain regions that may be involved in uncertainty reduction, as well as the network they constitute, and dissociate them from brain regions that respond to mere surprise. Copyright © 2017 the authors 0270-6474/17/376972-11$15.00/0.

  14. Neural Mechanisms of Updating under Reducible and Irreducible Uncertainty

    PubMed Central

    2017-01-01

    Adaptive decision making depends on an agent's ability to use environmental signals to reduce uncertainty. However, because of multiple types of uncertainty, agents must take into account not only the extent to which signals violate prior expectations but also whether uncertainty can be reduced in the first place. Here we studied how human brains of both sexes respond to signals under conditions of reducible and irreducible uncertainty. We show behaviorally that subjects' value updating was sensitive to the reducibility of uncertainty, and could be quantitatively characterized by a Bayesian model where agents ignore expectancy violations that do not update beliefs or values. Using fMRI, we found that neural processes underlying belief and value updating were separable from responses to expectancy violation, and that reducibility of uncertainty in value modulated connections from belief-updating regions to value-updating regions. Together, these results provide insights into how agents use knowledge about uncertainty to make better decisions while ignoring mere expectancy violation. SIGNIFICANCE STATEMENT To make good decisions, a person must observe the environment carefully, and use these observations to reduce uncertainty about consequences of actions. Importantly, uncertainty should not be reduced purely based on how surprising the observations are, particularly because in some cases uncertainty is not reducible. Here we show that the human brain indeed reduces uncertainty adaptively by taking into account the nature of uncertainty and ignoring mere surprise. Behaviorally, we show that human subjects reduce uncertainty in a quasioptimal Bayesian manner. Using fMRI, we characterize brain regions that may be involved in uncertainty reduction, as well as the network they constitute, and dissociate them from brain regions that respond to mere surprise. PMID:28626019

  15. Remembering the forgotten non-communicable diseases.

    PubMed

    Lopez, Alan D; Williams, Thomas N; Levin, Adeera; Tonelli, Marcello; Singh, Jasvinder A; Burney, Peter G J; Rehm, Jürgen; Volkow, Nora D; Koob, George; Ferri, Cleusa P

    2014-10-22

    The forthcoming post-Millennium Development Goals era will bring about new challenges in global health. Low- and middle-income countries will have to contend with a dual burden of infectious and non-communicable diseases (NCDs). Some of these NCDs, such as neoplasms, COPD, cardiovascular diseases and diabetes, cause much health loss worldwide and are already widely recognised as doing so. However, 55% of the global NCD burden arises from other NCDs, which tend to be ignored in terms of premature mortality and quality of life reduction. Here, experts in some of these 'forgotten NCDs' review the clinical impact of these diseases along with the consequences of their ignoring their medical importance, and discuss ways in which they can be given higher global health priority in order to decrease the growing burden of disease and disability.

  16. Exploring Missing Values on Responses to Experienced and Labeled Event as Harassment in 2004 Reserves Data

    DTIC Science & Technology

    2008-07-01

    Personal Experiences of Sexual Harassment and Missing Values on Sexual Harassment Questions by Perceptions of Sexism in a Unit (Quartiles... sexism in a unit). The “worst” category indicates units with the highest levels of reported sexist behavior, and the “best” category indicates the...Education and Prevention, 19 (6), 519–530. Harris, R. J., & Firestone, J. M., (1997). Subtle sexism in the U.S. Military: Individual responses to

  17. Multiple imputation for assessment of exposures to drinking water contaminants: evaluation with the Atrazine Monitoring Program.

    PubMed

    Jones, Rachael M; Stayner, Leslie T; Demirtas, Hakan

    2014-10-01

    Drinking water may contain pollutants that harm human health. The frequency of pollutant monitoring may occur quarterly, annually, or less frequently, depending upon the pollutant, the pollutant concentration, and community water system. However, birth and other health outcomes are associated with narrow time-windows of exposure. Infrequent monitoring impedes linkage between water quality and health outcomes for epidemiological analyses. To evaluate the performance of multiple imputation to fill in water quality values between measurements in community water systems (CWSs). The multiple imputation method was implemented in a simulated setting using data from the Atrazine Monitoring Program (AMP, 2006-2009 in five Midwestern states). Values were deleted from the AMP data to leave one measurement per month. Four patterns reflecting drinking water monitoring regulations were used to delete months of data in each CWS: three patterns were missing at random and one pattern was missing not at random. Synthetic health outcome data were created using a linear and a Poisson exposure-response relationship with five levels of hypothesized association, respectively. The multiple imputation method was evaluated by comparing the exposure-response relationships estimated based on multiply imputed data with the hypothesized association. The four patterns deleted 65-92% months of atrazine observations in AMP data. Even with these high rates of missing information, our procedure was able to recover most of the missing information when the synthetic health outcome was included for missing at random patterns and for missing not at random patterns with low-to-moderate exposure-response relationships. Multiple imputation appears to be an effective method for filling in water quality values between measurements. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Identification of pathogenic gene variants in small families with intellectually disabled siblings by exome sequencing.

    PubMed

    Schuurs-Hoeijmakers, Janneke H M; Vulto-van Silfhout, Anneke T; Vissers, Lisenka E L M; van de Vondervoort, Ilse I G M; van Bon, Bregje W M; de Ligt, Joep; Gilissen, Christian; Hehir-Kwa, Jayne Y; Neveling, Kornelia; del Rosario, Marisol; Hira, Gausiya; Reitano, Santina; Vitello, Aurelio; Failla, Pinella; Greco, Donatella; Fichera, Marco; Galesi, Ornella; Kleefstra, Tjitske; Greally, Marie T; Ockeloen, Charlotte W; Willemsen, Marjolein H; Bongers, Ernie M H F; Janssen, Irene M; Pfundt, Rolph; Veltman, Joris A; Romano, Corrado; Willemsen, Michèl A; van Bokhoven, Hans; Brunner, Han G; de Vries, Bert B A; de Brouwer, Arjan P M

    2013-12-01

    Intellectual disability (ID) is a common neurodevelopmental disorder affecting 1-3% of the general population. Mutations in more than 10% of all human genes are considered to be involved in this disorder, although the majority of these genes are still unknown. We investigated 19 small non-consanguineous families with two to five affected siblings in order to identify pathogenic gene variants in known, novel and potential ID candidate genes. Non-consanguineous families have been largely ignored in gene identification studies as small family size precludes prior mapping of the genetic defect. Using exome sequencing, we identified pathogenic mutations in three genes, DDHD2, SLC6A8, and SLC9A6, of which the latter two have previously been implicated in X-linked ID phenotypes. In addition, we identified potentially pathogenic mutations in BCORL1 on the X-chromosome and in MCM3AP, PTPRT, SYNE1, and ZNF528 on autosomes. We show that potentially pathogenic gene variants can be identified in small, non-consanguineous families with as few as two affected siblings, thus emphasising their value in the identification of syndromic and non-syndromic ID genes.

  19. A Retrospective Study of Non-Ventilator-Associated Hospital Acquired Pneumonia Incidence and Missed Opportunities for Nursing Care.

    PubMed

    Tesoro, Mary; Peyser, Diane J; Villarente, Farley

    2018-05-01

    To determine non-ventilator-associated hospital-acquired pneumonia (NV-HAP) incidence, assess negative impacts on patient outcomes and cost, and identify missed preventive nursing care opportunities. NV-HAP is inadequately studied and underreported. Missed nursing care opportunities, particularly oral care, may aid NV-HAP prevention. This descriptive, observational, retrospective chart review identified adult NV-HAP cases and associated demographic and hospital care data. Two hundred five NV-HAP cases occurred in 1 year at Montefiore Medical Center, equating to an incidence of 0.47 per 1000 patient-days and an estimated excess cost of $8.2 million. ICU transfer following pneumonia occurred in 15.6% of cases. Care requirements from specialist nursing facilities increased at discharge (26.8%), as compared with care requirements on admission (17.6%). Complete nursing care documentation was missing for most patients, with oral care undocumented 60.5% of the time. Preventable NV-HAP cases and their negative impact on cost and patient outcomes may decrease through improved basic nursing care.

  20. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    PubMed

    Voillet, Valentin; Besse, Philippe; Liaubet, Laurence; San Cristobal, Magali; González, Ignacio

    2016-10-03

    In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated.

Top