Sample records for estimating missing features

  1. The effects of missing data on global ozone estimates

    NASA Technical Reports Server (NTRS)

    Drewry, J. W.; Robbins, J. L.

    1981-01-01

    The effects of missing data and model truncation on estimates of the global mean, zonal distribution, and global distribution of ozone are considered. It is shown that missing data can introduce biased estimates with errors that are not accounted for in the accuracy calculations of empirical modeling techniques. Data-fill techniques are introduced and used for evaluating error bounds and constraining the estimate in areas of sparse and missing data. It is found that the accuracy of the global mean estimate is more dependent on data distribution than model size. Zonal features can be accurately described by 7th order models over regions of adequate data distribution. Data variance accounted for by higher order models appears to represent climatological features of columnar ozone rather than pure error. Data-fill techniques can prevent artificial feature generation in regions of sparse or missing data without degrading high order estimates over dense data regions.

  2. Estimating Missing Features to Improve Multimedia Information Retrieval

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bagherjeiran, A; Love, N S; Kamath, C

    Retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The retrieval results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features.more » In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.« less

  3. Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

    PubMed

    Xie, Yanmei; Zhang, Biao

    2017-04-20

    Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

  4. Multi-test cervical cancer diagnosis with missing data estimation

    NASA Astrophysics Data System (ADS)

    Xu, Tao; Huang, Xiaolei; Kim, Edward; Long, L. Rodney; Antani, Sameer

    2015-03-01

    Cervical cancer is a leading most common type of cancer for women worldwide. Existing screening programs for cervical cancer suffer from low sensitivity. Using images of the cervix (cervigrams) as an aid in detecting pre-cancerous changes to the cervix has good potential to improve sensitivity and help reduce the number of cervical cancer cases. In this paper, we present a method that utilizes multi-modality information extracted from multiple tests of a patient's visit to classify the patient visit to be either low-risk or high-risk. Our algorithm integrates image features and text features to make a diagnosis. We also present two strategies to estimate the missing values in text features: Image Classifier Supervised Mean Imputation (ICSMI) and Image Classifier Supervised Linear Interpolation (ICSLI). We evaluate our method on a large medical dataset and compare it with several alternative approaches. The results show that the proposed method with ICSLI strategy achieves the best result of 83.03% specificity and 76.36% sensitivity. When higher specificity is desired, our method can achieve 90% specificity with 62.12% sensitivity.

  5. Making an unknown unknown a known unknown: Missing data in longitudinal neuroimaging studies.

    PubMed

    Matta, Tyler H; Flournoy, John C; Byrne, Michelle L

    2017-10-28

    The analysis of longitudinal neuroimaging data within the massively univariate framework provides the opportunity to study empirical questions about neurodevelopment. Missing outcome data are an all-to-common feature of any longitudinal study, a feature that, if handled improperly, can reduce statistical power and lead to biased parameter estimates. The goal of this paper is to provide conceptual clarity of the issues and non-issues that arise from analyzing incomplete data in longitudinal studies with particular focus on neuroimaging data. This paper begins with a review of the hierarchy of missing data mechanisms and their relationship to likelihood-based methods, a review that is necessary not just for likelihood-based methods, but also for multiple-imputation methods. Next, the paper provides a series of simulation studies with designs common in longitudinal neuroimaging studies to help illustrate missing data concepts regardless of interpretation. Finally, two applied examples are used to demonstrate the sensitivity of inferences under different missing data assumptions and how this may change the substantive interpretation. The paper concludes with a set of guidelines for analyzing incomplete longitudinal data that can improve the validity of research findings in developmental neuroimaging research. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Network reconstruction via graph blending

    NASA Astrophysics Data System (ADS)

    Estrada, Rolando

    2016-05-01

    Graphs estimated from empirical data are often noisy and incomplete due to the difficulty of faithfully observing all the components (nodes and edges) of the true graph. This problem is particularly acute for large networks where the number of components may far exceed available surveillance capabilities. Errors in the observed graph can render subsequent analyses invalid, so it is vital to develop robust methods that can minimize these observational errors. Errors in the observed graph may include missing and spurious components, as well fused (multiple nodes are merged into one) and split (a single node is misinterpreted as many) nodes. Traditional graph reconstruction methods are only able to identify missing or spurious components (primarily edges, and to a lesser degree nodes), so we developed a novel graph blending framework that allows us to cast the full estimation problem as a simple edge addition/deletion problem. Armed with this framework, we systematically investigate the viability of various topological graph features, such as the degree distribution or the clustering coefficients, and existing graph reconstruction methods for tackling the full estimation problem. Our experimental results suggest that incorporating any topological feature as a source of information actually hinders reconstruction accuracy. We provide a theoretical analysis of this phenomenon and suggest several avenues for improving this estimation problem.

  7. 40 CFR 98.385 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. You must follow the procedures for estimating missing data in § 98... estimating missing data for petroleum products in § 98.395 also applies to coal-to-liquid products. ...

  8. 40 CFR 98.385 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. You must follow the procedures for estimating missing data in § 98... estimating missing data for petroleum products in § 98.395 also applies to coal-to-liquid products. ...

  9. 40 CFR 98.385 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. You must follow the procedures for estimating missing data in § 98... estimating missing data for petroleum products in § 98.395 also applies to coal-to-liquid products. ...

  10. 40 CFR 98.385 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. You must follow the procedures for estimating missing data in § 98... estimating missing data for petroleum products in § 98.395 also applies to coal-to-liquid products. ...

  11. 40 CFR 98.385 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. You must follow the procedures for estimating missing data in § 98... estimating missing data for petroleum products in § 98.395 also applies to coal-to-liquid products. ...

  12. On Estimation of the Survivor Average Causal Effect in Observational Studies when Important Confounders are Missing Due to Death

    PubMed Central

    Egleston, Brian L.; Scharfstein, Daniel O.; MacKenzie, Ellen

    2008-01-01

    We focus on estimation of the causal effect of treatment on the functional status of individuals at a fixed point in time t* after they have experienced a catastrophic event, from observational data with the following features: (1) treatment is imposed shortly after the event and is non-randomized, (2) individuals who survive to t* are scheduled to be interviewed, (3) there is interview non-response, (4) individuals who die prior to t* are missing information on pre-event confounders, (5) medical records are abstracted on all individuals to obtain information on post-event, pre-treatment confounding factors. To address the issue of survivor bias, we seek to estimate the survivor average causal effect (SACE), the effect of treatment on functional status among the cohort of individuals who would survive to t* regardless of whether or not assigned to treatment. To estimate this effect from observational data, we need to impose untestable assumptions, which depend on the collection of all confounding factors. Since pre-event information is missing on those who die prior to t*, it is unlikely that these data are missing at random (MAR). We introduce a sensitivity analysis methodology to evaluate the robustness of SACE inferences to deviations from the MAR assumption. We apply our methodology to the evaluation of the effect of trauma center care on vitality outcomes using data from the National Study on Costs and Outcomes of Trauma Care. PMID:18759833

  13. 40 CFR 98.245 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For missing feedstock and product flow rates, use the same procedures as for missing... contents and missing molecular weights for fuels as specified in § 98.35(b)(1). For missing flare data...

  14. Learning through Feature Prediction: An Initial Investigation into Teaching Categories to Children with Autism through Predicting Missing Features

    ERIC Educational Resources Information Center

    Sweller, Naomi

    2015-01-01

    Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…

  15. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...

  16. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...

  17. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...

  18. 40 CFR 98.435 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Gases Contained in Pre-Charged Equipment or Closed-Cell Foams § 98.435 Procedures for estimating missing data. Procedures for estimating missing data are not provided for importers and exporters of...

  19. 40 CFR 98.435 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Gases Contained in Pre-Charged Equipment or Closed-Cell Foams § 98.435 Procedures for estimating missing data. Procedures for estimating missing data are not provided for importers and exporters of...

  20. 40 CFR 98.435 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Gases Contained in Pre-Charged Equipment or Closed-Cell Foams § 98.435 Procedures for estimating missing data. Procedures for estimating missing data are not provided for importers and exporters of...

  1. 40 CFR 98.435 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Gases Contained in Pre-Charged Equipment or Closed-Cell Foams § 98.435 Procedures for estimating missing data. Procedures for estimating missing data are not provided for importers and exporters of...

  2. 40 CFR 98.445 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. A complete record of all measured parameters used in the GHG... following missing data procedures: (a) A quarterly flow rate of CO2 received that is missing must be...

  3. 40 CFR 98.126 - Data reporting requirements.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... fluorinated GHG emitted from equipment leaks (metric tons). (d) Reporting for missing data. Where missing data have been estimated pursuant to § 98.125, you must report the reason the data were missing, the length of time the data were missing, the method used to estimate the missing data, and the estimates of...

  4. 40 CFR 98.245 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For missing feedstock flow rates, product flow rates, and carbon contents, use the same procedures as for missing flow rates and carbon contents for fuels as specified in § 98.35. ...

  5. 40 CFR 98.245 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For missing feedstock flow rates, product flow rates, and carbon contents, use the same procedures as for missing flow rates and carbon contents for fuels as specified in § 98.35. ...

  6. 40 CFR 98.245 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For missing feedstock flow rates, product flow rates, and carbon contents, use the same procedures as for missing flow rates and carbon contents for fuels as specified in § 98.35. ...

  7. 40 CFR 98.126 - Data reporting requirements.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... fluorinated GHG emitted from equipment leaks (metric tons). (d) Reporting for missing data. Where missing data have been estimated pursuant to § 98.125, you must report the reason the data were missing, the length of time the data were missing, the method used to estimate the missing data, and the estimates of...

  8. 40 CFR 98.245 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For missing feedstock flow rates, product flow rates, and carbon contents, use the same procedures as for missing flow rates and carbon contents for fuels as specified in § 98.35. ...

  9. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION

    PubMed Central

    Allen, Genevera I.; Tibshirani, Robert

    2015-01-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility. PMID:26877823

  10. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    PubMed

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  11. 40 CFR 98.235 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. A complete record of all estimated and/or measured parameters used in... sources as soon as possible, including in the subsequent calendar year if missing data are not discovered...

  12. 40 CFR 98.235 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. A complete record of all estimated and/or measured parameters used in... sources as soon as possible, including in the subsequent calendar year if missing data are not discovered...

  13. 40 CFR 98.235 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. A complete record of all estimated and/or measured parameters used in... sources as soon as possible, including in the subsequent calendar year if missing data are not discovered...

  14. 40 CFR 98.235 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. A complete record of all estimated and/or measured parameters used in... sources as soon as possible, including in the subsequent calendar year if missing data are not discovered...

  15. "Wish You Were Here": Examining Characteristics, Outcomes, and Statistical Solutions for Missing Cases in Web-Based Psychotherapeutic Trials.

    PubMed

    Karin, Eyal; Dear, Blake F; Heller, Gillian Z; Crane, Monique F; Titov, Nickolai

    2018-04-19

    Missing cases following treatment are common in Web-based psychotherapy trials. Without the ability to directly measure and evaluate the outcomes for missing cases, the ability to measure and evaluate the effects of treatment is challenging. Although common, little is known about the characteristics of Web-based psychotherapy participants who present as missing cases, their likely clinical outcomes, or the suitability of different statistical assumptions that can characterize missing cases. Using a large sample of individuals who underwent Web-based psychotherapy for depressive symptoms (n=820), the aim of this study was to explore the characteristics of cases who present as missing cases at posttreatment (n=138), their likely treatment outcomes, and compare between statistical methods for replacing their missing data. First, common participant and treatment features were tested through binary logistic regression models, evaluating the ability to predict missing cases. Second, the same variables were screened for their ability to increase or impede the rate symptom change that was observed following treatment. Third, using recontacted cases at 3-month follow-up to proximally represent missing cases outcomes following treatment, various simulated replacement scores were compared and evaluated against observed clinical follow-up scores. Missing cases were dominantly predicted by lower treatment adherence and increased symptoms at pretreatment. Statistical methods that ignored these characteristics can overlook an important clinical phenomenon and consequently produce inaccurate replacement outcomes, with symptoms estimates that can swing from -32% to 70% from the observed outcomes of recontacted cases. In contrast, longitudinal statistical methods that adjusted their estimates for missing cases outcomes by treatment adherence rates and baseline symptoms scores resulted in minimal measurement bias (<8%). Certain variables can characterize and predict missing cases likelihood and jointly predict lesser clinical improvement. Under such circumstances, individuals with potentially worst off treatment outcomes can become concealed, and failure to adjust for this can lead to substantial clinical measurement bias. Together, this preliminary research suggests that missing cases in Web-based psychotherapeutic interventions may not occur as random events and can be systematically predicted. Critically, at the same time, missing cases may experience outcomes that are distinct and important for a complete understanding of the treatment effect. ©Eyal Karin, Blake F Dear, Gillian Z Heller, Monique F Crane, Nickolai Titov. Originally published in JMIR Mental Health (http://mental.jmir.org), 19.04.2018.

  16. “Wish You Were Here”: Examining Characteristics, Outcomes, and Statistical Solutions for Missing Cases in Web-Based Psychotherapeutic Trials

    PubMed Central

    Dear, Blake F; Heller, Gillian Z; Crane, Monique F; Titov, Nickolai

    2018-01-01

    Background Missing cases following treatment are common in Web-based psychotherapy trials. Without the ability to directly measure and evaluate the outcomes for missing cases, the ability to measure and evaluate the effects of treatment is challenging. Although common, little is known about the characteristics of Web-based psychotherapy participants who present as missing cases, their likely clinical outcomes, or the suitability of different statistical assumptions that can characterize missing cases. Objective Using a large sample of individuals who underwent Web-based psychotherapy for depressive symptoms (n=820), the aim of this study was to explore the characteristics of cases who present as missing cases at posttreatment (n=138), their likely treatment outcomes, and compare between statistical methods for replacing their missing data. Methods First, common participant and treatment features were tested through binary logistic regression models, evaluating the ability to predict missing cases. Second, the same variables were screened for their ability to increase or impede the rate symptom change that was observed following treatment. Third, using recontacted cases at 3-month follow-up to proximally represent missing cases outcomes following treatment, various simulated replacement scores were compared and evaluated against observed clinical follow-up scores. Results Missing cases were dominantly predicted by lower treatment adherence and increased symptoms at pretreatment. Statistical methods that ignored these characteristics can overlook an important clinical phenomenon and consequently produce inaccurate replacement outcomes, with symptoms estimates that can swing from −32% to 70% from the observed outcomes of recontacted cases. In contrast, longitudinal statistical methods that adjusted their estimates for missing cases outcomes by treatment adherence rates and baseline symptoms scores resulted in minimal measurement bias (<8%). Conclusions Certain variables can characterize and predict missing cases likelihood and jointly predict lesser clinical improvement. Under such circumstances, individuals with potentially worst off treatment outcomes can become concealed, and failure to adjust for this can lead to substantial clinical measurement bias. Together, this preliminary research suggests that missing cases in Web-based psychotherapeutic interventions may not occur as random events and can be systematically predicted. Critically, at the same time, missing cases may experience outcomes that are distinct and important for a complete understanding of the treatment effect. PMID:29674311

  17. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  18. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  19. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  20. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  1. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  2. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  3. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  4. 40 CFR 98.455 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... § 98.455 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  5. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  6. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  7. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  8. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  9. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  10. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  11. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  12. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  13. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  14. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  15. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  16. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  17. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  18. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  19. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  20. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  1. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  2. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  3. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  4. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  5. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  6. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  7. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  8. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  9. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  10. 40 CFR 98.305 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Use § 98.305 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  11. 40 CFR 98.305 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Use § 98.305 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  12. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  13. 40 CFR 98.455 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... § 98.455 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  14. 40 CFR 98.455 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... § 98.455 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  15. 40 CFR 98.305 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Use § 98.305 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  16. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  17. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  18. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  19. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  20. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  1. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  2. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  3. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  4. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  5. 40 CFR 98.115 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  6. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  7. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  8. 40 CFR 98.305 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Use § 98.305 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  9. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  10. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  11. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  12. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  13. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  14. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  15. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  16. 40 CFR 98.125 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...

  17. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  18. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  19. 40 CFR 98.35 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...

  20. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  1. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  2. 40 CFR 98.265 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter must be used in the calculations as specified in paragraphs...

  3. 40 CFR 98.355 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...

  4. 40 CFR 98.345 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  5. 40 CFR 98.215 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  6. 40 CFR 98.325 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  7. 40 CFR 98.465 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

  8. 40 CFR 98.175 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  9. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  10. 40 CFR 98.155 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...

  11. 40 CFR 98.65 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

  12. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...

  13. 40 CFR 98.455 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... § 98.455 Procedures for estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations is required. Replace missing data, if needed, based on data from...

  14. 40 CFR 98.365 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

  15. Missing data and multiple imputation in clinical epidemiological research.

    PubMed

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

  16. Missing data and multiple imputation in clinical epidemiological research

    PubMed Central

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data. PMID:28352203

  17. Shrinkage regression-based methods for microarray missing value imputation.

    PubMed

    Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng

    2013-01-01

    Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.

  18. Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data.

    PubMed

    Lobach, Iryna; Mallick, Bani; Carroll, Raymond J

    2011-01-01

    Case-control studies are widely used to detect gene-environment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development.

  19. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  20. 40 CFR 98.95 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) Except as provided in paragraph (b) of this section, a complete record of all... required. (b) If you use fluorinated heat transfer fluids at your facility and are missing data for one or...

  1. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  2. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  3. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  4. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  5. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  6. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  7. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  8. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  9. 40 CFR 98.95 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) Except as provided in paragraph (b) of this section, a complete record of all... required. (b) If you use fluorinated heat transfer fluids at your facility and are missing data for one or...

  10. 40 CFR Appendix C to Part 75 - Missing Data Estimation Procedures

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 17 2013-07-01 2013-07-01 false Missing Data Estimation Procedures C... (CONTINUED) CONTINUOUS EMISSION MONITORING Pt. 75, App. C Appendix C to Part 75—Missing Data Estimation Procedures 1. Parametric Monitoring Procedure for Missing SO2 Concentration or NOX Emission Rate Data 1...

  11. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a) of this subpart cannot... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  12. 40 CFR Appendix C to Part 75 - Missing Data Estimation Procedures

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 17 2014-07-01 2014-07-01 false Missing Data Estimation Procedures C... (CONTINUED) CONTINUOUS EMISSION MONITORING Pt. 75, App. C Appendix C to Part 75—Missing Data Estimation Procedures 1. Parametric Monitoring Procedure for Missing SO2 Concentration or NOX Emission Rate Data 1...

  13. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  14. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  15. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  16. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  17. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  18. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  19. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  20. 40 CFR Appendix C to Part 75 - Missing Data Estimation Procedures

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 17 2012-07-01 2012-07-01 false Missing Data Estimation Procedures C... (CONTINUED) CONTINUOUS EMISSION MONITORING Pt. 75, App. C Appendix C to Part 75—Missing Data Estimation Procedures 1. Parametric Monitoring Procedure for Missing SO2 Concentration or NOX Emission Rate Data 1...

  1. 40 CFR 98.205 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...

  2. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  3. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  4. 40 CFR 98.95 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) Except as provided in paragraph (b) of this section, a complete record of all... required. (b) If you use fluorinated heat transfer fluids at your facility and are missing data for one or...

  5. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  6. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  7. 40 CFR 98.285 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...

  8. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  9. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... all available process data or data used for accounting purposes. (b) For missing values related to the...

  10. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  11. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  12. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  13. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  14. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

  15. 40 CFR 98.255 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...

  16. 40 CFR 98.425 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

  17. 40 CFR 98.415 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...

  18. 40 CFR Appendix C to Part 75 - Missing Data Estimation Procedures

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 16 2011-07-01 2011-07-01 false Missing Data Estimation Procedures C... (CONTINUED) CONTINUOUS EMISSION MONITORING Pt. 75, App. C Appendix C to Part 75—Missing Data Estimation Procedures 1. Parametric Monitoring Procedure for Missing SO2 Concentration or NOX Emission Rate Data 1...

  19. Methods for estimating missing human skeletal element osteometric dimensions employed in the revised fully technique for estimating stature.

    PubMed

    Auerbach, Benjamin M

    2011-05-01

    One of the greatest limitations to the application of the revised Fully anatomical stature estimation method is the inability to measure some of the skeletal elements required in its calculation. These element dimensions cannot be obtained due to taphonomic factors, incomplete excavation, or disease processes, and result in missing data. This study examines methods of imputing these missing dimensions using observable Fully measurements from the skeleton and the accuracy of incorporating these missing element estimations into anatomical stature reconstruction. These are further assessed against stature estimations obtained from mathematical regression formulae for the lower limb bones (femur and tibia). Two thousand seven hundred and seventeen North and South American indigenous skeletons were measured, and subsets of these with observable Fully dimensions were used to simulate missing elements and create estimation methods and equations. Comparisons were made directly between anatomically reconstructed statures and mathematically derived statures, as well as with anatomically derived statures with imputed missing dimensions. These analyses demonstrate that, while mathematical stature estimations are more accurate, anatomical statures incorporating missing dimensions are not appreciably less accurate and are more precise. The anatomical stature estimation method using imputed missing dimensions is supported. Missing element estimation, however, is limited to the vertebral column (only when lumbar vertebrae are present) and to talocalcaneal height (only when femora and tibiae are present). Crania, entire vertebral columns, and femoral or tibial lengths cannot be reliably estimated. Further discussion of the applicability of these methods is discussed. Copyright © 2011 Wiley-Liss, Inc.

  20. Identification of significant features by the Global Mean Rank test.

    PubMed

    Klammer, Martin; Dybowski, J Nikolaj; Hoffmann, Daniel; Schaab, Christoph

    2014-01-01

    With the introduction of omics-technologies such as transcriptomics and proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions of features. With the MeanRank test we aimed at developing a test that performs robustly under these conditions, while favorably scaling with the number of replicates. The test proposed here is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation. In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset. The tests were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies. MeanRank outperformed the other global rank-based methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed similar or better. Furthermore, MeanRank exhibits more consistent behavior regarding the degree of regulation and is robust against the choice of preprocessing methods. MeanRank does not require any imputation of missing values, is easy to understand, and yields results that are easy to interpret. The software implementing the algorithm is freely available for academic and commercial use.

  1. Three-Dimensional Object Recognition and Registration for Robotic Grasping Systems Using a Modified Viewpoint Feature Histogram

    PubMed Central

    Chen, Chin-Sheng; Chen, Po-Chun; Hsu, Chih-Ming

    2016-01-01

    This paper presents a novel 3D feature descriptor for object recognition and to identify poses when there are six-degrees-of-freedom for mobile manipulation and grasping applications. Firstly, a Microsoft Kinect sensor is used to capture 3D point cloud data. A viewpoint feature histogram (VFH) descriptor for the 3D point cloud data then encodes the geometry and viewpoint, so an object can be simultaneously recognized and registered in a stable pose and the information is stored in a database. The VFH is robust to a large degree of surface noise and missing depth information so it is reliable for stereo data. However, the pose estimation for an object fails when the object is placed symmetrically to the viewpoint. To overcome this problem, this study proposes a modified viewpoint feature histogram (MVFH) descriptor that consists of two parts: a surface shape component that comprises an extended fast point feature histogram and an extended viewpoint direction component. The MVFH descriptor characterizes an object’s pose and enhances the system’s ability to identify objects with mirrored poses. Finally, the refined pose is further estimated using an iterative closest point when the object has been recognized and the pose roughly estimated by the MVFH descriptor and it has been registered on a database. The estimation results demonstrate that the MVFH feature descriptor allows more accurate pose estimation. The experiments also show that the proposed method can be applied in vision-guided robotic grasping systems. PMID:27886080

  2. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...

  3. 40 CFR 98.95 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) Except as provided in paragraph (b) of this section, a complete record of all... required. (b) If you use heat transfer fluids at your facility and are missing data for one or more of the...

  4. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...

  5. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(2), a complete record of all measured parameters... process data or data used for accounting purposes. (b) For missing values related to the CaO and MgO...

  6. 40 CFR 98.195 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...

  7. Cox model with interval-censored covariate in cohort studies.

    PubMed

    Ahn, Soohyun; Lim, Johan; Paik, Myunghee Cho; Sacco, Ralph L; Elkind, Mitchell S

    2018-05-18

    In cohort studies the outcome is often time to a particular event, and subjects are followed at regular intervals. Periodic visits may also monitor a secondary irreversible event influencing the event of primary interest, and a significant proportion of subjects develop the secondary event over the period of follow-up. The status of the secondary event serves as a time-varying covariate, but is recorded only at the times of the scheduled visits, generating incomplete time-varying covariates. While information on a typical time-varying covariate is missing for entire follow-up period except the visiting times, the status of the secondary event are unavailable only between visits where the status has changed, thus interval-censored. One may view interval-censored covariate of the secondary event status as missing time-varying covariates, yet missingness is partial since partial information is provided throughout the follow-up period. Current practice of using the latest observed status produces biased estimators, and the existing missing covariate techniques cannot accommodate the special feature of missingness due to interval censoring. To handle interval-censored covariates in the Cox proportional hazards model, we propose an available-data estimator, a doubly robust-type estimator as well as the maximum likelihood estimator via EM algorithm and present their asymptotic properties. We also present practical approaches that are valid. We demonstrate the proposed methods using our motivating example from the Northern Manhattan Study. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Robust Learning of High-dimensional Biological Networks with Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Nägele, Andreas; Dejori, Mathäus; Stetter, Martin

    Structure learning of Bayesian networks applied to gene expression data has become a potentially useful method to estimate interactions between genes. However, the NP-hardness of Bayesian network structure learning renders the reconstruction of the full genetic network with thousands of genes unfeasible. Consequently, the maximal network size is usually restricted dramatically to a small set of genes (corresponding with variables in the Bayesian network). Although this feature reduction step makes structure learning computationally tractable, on the downside, the learned structure might be adversely affected due to the introduction of missing genes. Additionally, gene expression data are usually very sparse with respect to the number of samples, i.e., the number of genes is much greater than the number of different observations. Given these problems, learning robust network features from microarray data is a challenging task. This chapter presents several approaches tackling the robustness issue in order to obtain a more reliable estimation of learned network features.

  9. 40 CFR 98.145 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations is... in § 98.144 cannot be followed and data is missing, you must use the most appropriate of the missing...

  10. Impact of Missing Data on Person-Model Fit and Person Trait Estimation

    ERIC Educational Resources Information Center

    Zhang, Bo; Walker, Cindy M.

    2008-01-01

    The purpose of this research was to examine the effects of missing data on person-model fit and person trait estimation in tests with dichotomous items. Under the missing-completely-at-random framework, four missing data treatment techniques were investigated including pairwise deletion, coding missing responses as incorrect, hotdeck imputation,…

  11. 40 CFR 98.145 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations is... in § 98.144 cannot be followed and data is missing, you must use the most appropriate of the missing...

  12. 40 CFR 98.145 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations is... in § 98.144 cannot be followed and data is missing, you must use the most appropriate of the missing...

  13. Autoregressive-model-based missing value estimation for DNA microarray time series data.

    PubMed

    Choong, Miew Keen; Charbit, Maurice; Yan, Hong

    2009-01-01

    Missing value estimation is important in DNA microarray data analysis. A number of algorithms have been developed to solve this problem, but they have several limitations. Most existing algorithms are not able to deal with the situation where a particular time point (column) of the data is missing entirely. In this paper, we present an autoregressive-model-based missing value estimation method (ARLSimpute) that takes into account the dynamic property of microarray temporal data and the local similarity structures in the data. ARLSimpute is especially effective for the situation where a particular time point contains many missing values or where the entire time point is missing. Experiment results suggest that our proposed algorithm is an accurate missing value estimator in comparison with other imputation methods on simulated as well as real microarray time series datasets.

  14. Tackling Missing Data in Community Health Studies Using Additive LS-SVM Classifier.

    PubMed

    Wang, Guanjin; Deng, Zhaohong; Choi, Kup-Sze

    2018-03-01

    Missing data is a common issue in community health and epidemiological studies. Direct removal of samples with missing data can lead to reduced sample size and information bias, which deteriorates the significance of the results. While data imputation methods are available to deal with missing data, they are limited in performance and could introduce noises into the dataset. Instead of data imputation, a novel method based on additive least square support vector machine (LS-SVM) is proposed in this paper for predictive modeling when the input features of the model contain missing data. The method also determines simultaneously the influence of the features with missing values on the classification accuracy using the fast leave-one-out cross-validation strategy. The performance of the method is evaluated by applying it to predict the quality of life (QOL) of elderly people using health data collected in the community. The dataset involves demographics, socioeconomic status, health history, and the outcomes of health assessments of 444 community-dwelling elderly people, with 5% to 60% of data missing in some of the input features. The QOL is measured using a standard questionnaire of the World Health Organization. Results show that the proposed method outperforms four conventional methods for handling missing data-case deletion, feature deletion, mean imputation, and K-nearest neighbor imputation, with the average QOL prediction accuracy reaching 0.7418. It is potentially a promising technique for tackling missing data in community health research and other applications.

  15. On Obtaining Estimates of the Fraction of Missing Information from Full Information Maximum Likelihood

    ERIC Educational Resources Information Center

    Savalei, Victoria; Rhemtulla, Mijke

    2012-01-01

    Fraction of missing information [lambda][subscript j] is a useful measure of the impact of missing data on the quality of estimation of a particular parameter. This measure can be computed for all parameters in the model, and it communicates the relative loss of efficiency in the estimation of a particular parameter due to missing data. It has…

  16. Missing observations in multiyear rotation sampling designs

    NASA Technical Reports Server (NTRS)

    Gbur, E. E.; Sielken, R. L., Jr. (Principal Investigator)

    1982-01-01

    Because Multiyear estimation of at-harvest stratum crop proportions is more efficient than single year estimation, the behavior of multiyear estimators in the presence of missing acquisitions was studied. Only the (worst) case when a segment proportion cannot be estimated for the entire year is considered. The effect of these missing segments on the variance of the at-harvest stratum crop proportion estimator is considered when missing segments are not replaced, and when missing segments are replaced by segments not sampled in previous years. The principle recommendations are to replace missing segments according to some specified strategy, and to use a sequential procedure for selecting a sampling design; i.e., choose an optimal two year design and then, based on the observed two year design after segment losses have been taken into account, choose the best possible three year design having the observed two year parent design.

  17. What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates

    PubMed Central

    Thomas, Sarah L.; Schmidt, Karen M.; Erbacher, Monica K.; Bergeman, Cindy S.

    2017-01-01

    The authors investigated the effect of Missing Completely at Random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates. PMID:26784376

  18. Targeted Feature Detection for Data-Dependent Shotgun Proteomics

    PubMed Central

    2017-01-01

    Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088

  19. Targeted Feature Detection for Data-Dependent Shotgun Proteomics.

    PubMed

    Weisser, Hendrik; Choudhary, Jyoti S

    2017-08-04

    Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).

  20. Taking the Missing Propensity Into Account When Estimating Competence Scores

    PubMed Central

    Pohl, Steffi; Carstensen, Claus H.

    2014-01-01

    When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically made when using these models: (1) The missing propensity is unidimensional and (2) the missing propensity and the ability are bivariate normally distributed. These assumptions may, however, be violated in real data sets and could, thus, pose a threat to the validity of this approach. The present study focuses on modeling competencies in various domains, using data from a school sample (N = 15,396) and an adult sample (N = 7,256) from the National Educational Panel Study. Our interest was to investigate whether violations of unidimensionality and the normal distribution assumption severely affect the performance of the model-based approach in terms of differences in ability estimates. We propose a model with a competence dimension, a unidimensional missing propensity and a distributional assumption more flexible than a multivariate normal. Using this model for ability estimation results in different ability estimates compared with a model ignoring missing responses. Implications for ability estimation in large-scale assessments are discussed. PMID:29795844

  1. Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information.

    PubMed

    Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor

    2013-01-01

    In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.

  2. Missing texture reconstruction method based on error reduction algorithm using Fourier transform magnitude estimation scheme.

    PubMed

    Ogawa, Takahiro; Haseyama, Miki

    2013-03-01

    A missing texture reconstruction method based on an error reduction (ER) algorithm, including a novel estimation scheme of Fourier transform magnitudes is presented in this brief. In our method, Fourier transform magnitude is estimated for a target patch including missing areas, and the missing intensities are estimated by retrieving its phase based on the ER algorithm. Specifically, by monitoring errors converged in the ER algorithm, known patches whose Fourier transform magnitudes are similar to that of the target patch are selected from the target image. In the second approach, the Fourier transform magnitude of the target patch is estimated from those of the selected known patches and their corresponding errors. Consequently, by using the ER algorithm, we can estimate both the Fourier transform magnitudes and phases to reconstruct the missing areas.

  3. Integrative missing value estimation for microarray data.

    PubMed

    Hu, Jianjun; Li, Haifeng; Waterman, Michael S; Zhou, Xianghong Jasmine

    2006-10-12

    Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests. We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

  4. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  5. Estimating Missing Unit Process Data in Life Cycle Assessment Using a Similarity-Based Approach.

    PubMed

    Hou, Ping; Cai, Jiarui; Qu, Shen; Xu, Ming

    2018-05-01

    In life cycle assessment (LCA), collecting unit process data from the empirical sources (i.e., meter readings, operation logs/journals) is often costly and time-consuming. We propose a new computational approach to estimate missing unit process data solely relying on limited known data based on a similarity-based link prediction method. The intuition is that similar processes in a unit process network tend to have similar material/energy inputs and waste/emission outputs. We use the ecoinvent 3.1 unit process data sets to test our method in four steps: (1) dividing the data sets into a training set and a test set; (2) randomly removing certain numbers of data in the test set indicated as missing; (3) using similarity-weighted means of various numbers of most similar processes in the training set to estimate the missing data in the test set; and (4) comparing estimated data with the original values to determine the performance of the estimation. The results show that missing data can be accurately estimated when less than 5% data are missing in one process. The estimation performance decreases as the percentage of missing data increases. This study provides a new approach to compile unit process data and demonstrates a promising potential of using computational approaches for LCA data compilation.

  6. More than one kind of inference: re-examining what's learned in feature inference and classification.

    PubMed

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  7. Can statistical linkage of missing variables reduce bias in treatment effect estimates in comparative effectiveness research studies?

    PubMed

    Crown, William; Chang, Jessica; Olson, Melvin; Kahler, Kristijan; Swindle, Jason; Buzinec, Paul; Shah, Nilay; Borah, Bijan

    2015-09-01

    Missing data, particularly missing variables, can create serious analytic challenges in observational comparative effectiveness research studies. Statistical linkage of datasets is a potential method for incorporating missing variables. Prior studies have focused upon the bias introduced by imperfect linkage. This analysis uses a case study of hepatitis C patients to estimate the net effect of statistical linkage on bias, also accounting for the potential reduction in missing variable bias. The results show that statistical linkage can reduce bias while also enabling parameter estimates to be obtained for the formerly missing variables. The usefulness of statistical linkage will vary depending upon the strength of the correlations of the missing variables with the treatment variable, as well as the outcome variable of interest.

  8. 40 CFR 98.416 - Data reporting requirements.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    .... (16) Where missing data have been estimated pursuant to § 98.415, the reason the data were missing, the length of time the data were missing, the method used to estimate the missing data, and the... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Data reporting requirements. 98.416...

  9. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  10. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  11. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  12. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  13. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

  14. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  15. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  16. 40 CFR 98.185 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...

  17. Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

    PubMed

    Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

    2005-05-15

    Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. The CMVE software is available upon request from the authors.

  18. Built-Up Area Feature Extraction: Second Year Technical Progress Report

    DTIC Science & Technology

    1990-02-01

    Contract DACA 72-87-C-001. During this year we have built on previous research, in road network extraction and in the detection and delineation of buildings...methods to perform stereo analysis using loosely coupled techniques where comparison is deferred until each method has performed a complete estimate...or missing information. A course of action may be suggested to the user depending on the error. Although the checks do not guarantee the correctness

  19. Hitting the Goalpost: Calculating the Fine Line Between Winning and Losing a Penalty Shootout

    NASA Astrophysics Data System (ADS)

    Widenhorn, Ralf

    2016-10-01

    The Portland Timbers won their first Major League Soccer (MLS) Cup Championship in December 2015. However, if it had not been for a kind double goalpost miss during a penalty shootout a few weeks earlier, the Timbers would never have been in the finals. On Oct. 30th, after what has been called "the greatest penalty kick shootout in MLS history," featuring a combined 22 penalties that included penalties by both goalkeepers, the Timbers won their first-round playoff against Sporting Kansas City. During the thrilling shootout, which can be watched, for example, for example on the MLS website, Sporting had two potentially game-winning penalties miss by the smallest of margins. One penalty bounced off the goalpost back into the field and another was an improbable double post miss. For a physicist, this prompts an interesting research question. Could we find an estimate by what distance the double post penalty shown in Fig. 1 failed to be the game winning shot?

  20. Partially linear mixed-effects joint models for skewed and missing longitudinal competing risks outcomes.

    PubMed

    Lu, Tao; Lu, Minggen; Wang, Min; Zhang, Jun; Dong, Guang-Hui; Xu, Yong

    2017-12-18

    Longitudinal competing risks data frequently arise in clinical studies. Skewness and missingness are commonly observed for these data in practice. However, most joint models do not account for these data features. In this article, we propose partially linear mixed-effects joint models to analyze skew longitudinal competing risks data with missingness. In particular, to account for skewness, we replace the commonly assumed symmetric distributions by asymmetric distribution for model errors. To deal with missingness, we employ an informative missing data model. The joint models that couple the partially linear mixed-effects model for the longitudinal process, the cause-specific proportional hazard model for competing risks process and missing data process are developed. To estimate the parameters in the joint models, we propose a fully Bayesian approach based on the joint likelihood. To illustrate the proposed model and method, we implement them to an AIDS clinical study. Some interesting findings are reported. We also conduct simulation studies to validate the proposed method.

  1. Taking the Missing Propensity into Account When Estimating Competence Scores: Evaluation of Item Response Theory Models for Nonignorable Omissions

    ERIC Educational Resources Information Center

    Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H.

    2015-01-01

    When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically…

  2. 40 CFR 98.335 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. For the carbon input procedure in § 98.333(b), a complete record of all measured parameters... average carbon contents of inputs according to the procedures in § 98.335(b) if data are missing. (b) For...

  3. 40 CFR 98.335 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. For the carbon input procedure in § 98.333(b), a complete record of all measured parameters... average carbon contents of inputs according to the procedures in § 98.335(b) if data are missing. (b) For...

  4. 40 CFR 98.335 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. For the carbon input procedure in § 98.333(b), a complete record of all measured parameters... average carbon contents of inputs according to the procedures in § 98.335(b) if data are missing. (b) For...

  5. 40 CFR 98.335 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. For the carbon input procedure in § 98.333(b), a complete record of all measured parameters... average carbon contents of inputs according to the procedures in § 98.335(b) if data are missing. (b) For...

  6. 40 CFR 98.335 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... missing data. For the carbon input procedure in § 98.333(b), a complete record of all measured parameters... average carbon contents of inputs according to the procedures in § 98.335(b) if data are missing. (b) For...

  7. Missing Data in Alcohol Clinical Trials with Binary Outcomes

    PubMed Central

    Hallgren, Kevin A.; Witkiewitz, Katie; Kranzler, Henry R.; Falk, Daniel E.; Litten, Raye Z.; O’Malley, Stephanie S.; Anton, Raymond F.

    2017-01-01

    Background Missing data are common in alcohol clinical trials for both continuous and binary endpoints. Approaches to handle missing data have been explored for continuous outcomes, yet no studies have compared missing data approaches for binary outcomes (e.g., abstinence, no heavy drinking days). The present study compares approaches to modeling binary outcomes with missing data in the COMBINE study. Method We included participants in the COMBINE Study who had complete drinking data during treatment and who were assigned to active medication or placebo conditions (N=1146). Using simulation methods, missing data were introduced under common scenarios with varying sample sizes and amounts of missing data. Logistic regression was used to estimate the effect of naltrexone (vs. placebo) in predicting any drinking and any heavy drinking outcomes at the end of treatment using four analytic approaches: complete case analysis (CCA), last observation carried forward (LOCF), the worst-case scenario of missing equals any drinking or heavy drinking (WCS), and multiple imputation (MI). In separate analyses, these approaches were compared when drinking data were manually deleted for those participants who discontinued treatment but continued to provide drinking data. Results WCS produced the greatest amount of bias in treatment effect estimates. MI usually yielded less biased estimates than WCS and CCA in the simulated data, and performed considerably better than LOCF when estimating treatment effects among individuals who discontinued treatment. Conclusions Missing data can introduce bias in treatment effect estimates in alcohol clinical trials. Researchers should utilize modern missing data methods, including MI, and avoid WCS and CCA when analyzing binary alcohol clinical trial outcomes. PMID:27254113

  8. Missing Data in Alcohol Clinical Trials with Binary Outcomes.

    PubMed

    Hallgren, Kevin A; Witkiewitz, Katie; Kranzler, Henry R; Falk, Daniel E; Litten, Raye Z; O'Malley, Stephanie S; Anton, Raymond F

    2016-07-01

    Missing data are common in alcohol clinical trials for both continuous and binary end points. Approaches to handle missing data have been explored for continuous outcomes, yet no studies have compared missing data approaches for binary outcomes (e.g., abstinence, no heavy drinking days). This study compares approaches to modeling binary outcomes with missing data in the COMBINE study. We included participants in the COMBINE study who had complete drinking data during treatment and who were assigned to active medication or placebo conditions (N = 1,146). Using simulation methods, missing data were introduced under common scenarios with varying sample sizes and amounts of missing data. Logistic regression was used to estimate the effect of naltrexone (vs. placebo) in predicting any drinking and any heavy drinking outcomes at the end of treatment using 4 analytic approaches: complete case analysis (CCA), last observation carried forward (LOCF), the worst case scenario (WCS) of missing equals any drinking or heavy drinking, and multiple imputation (MI). In separate analyses, these approaches were compared when drinking data were manually deleted for those participants who discontinued treatment but continued to provide drinking data. WCS produced the greatest amount of bias in treatment effect estimates. MI usually yielded less biased estimates than WCS and CCA in the simulated data and performed considerably better than LOCF when estimating treatment effects among individuals who discontinued treatment. Missing data can introduce bias in treatment effect estimates in alcohol clinical trials. Researchers should utilize modern missing data methods, including MI, and avoid WCS and CCA when analyzing binary alcohol clinical trial outcomes. Copyright © 2016 by the Research Society on Alcoholism.

  9. Multiple imputation for handling missing outcome data when estimating the relative risk.

    PubMed

    Sullivan, Thomas R; Lee, Katherine J; Ryan, Philip; Salter, Amy B

    2017-09-06

    Multiple imputation is a popular approach to handling missing data in medical research, yet little is known about its applicability for estimating the relative risk. Standard methods for imputing incomplete binary outcomes involve logistic regression or an assumption of multivariate normality, whereas relative risks are typically estimated using log binomial models. It is unclear whether misspecification of the imputation model in this setting could lead to biased parameter estimates. Using simulated data, we evaluated the performance of multiple imputation for handling missing data prior to estimating adjusted relative risks from a correctly specified multivariable log binomial model. We considered an arbitrary pattern of missing data in both outcome and exposure variables, with missing data induced under missing at random mechanisms. Focusing on standard model-based methods of multiple imputation, missing data were imputed using multivariate normal imputation or fully conditional specification with a logistic imputation model for the outcome. Multivariate normal imputation performed poorly in the simulation study, consistently producing estimates of the relative risk that were biased towards the null. Despite outperforming multivariate normal imputation, fully conditional specification also produced somewhat biased estimates, with greater bias observed for higher outcome prevalences and larger relative risks. Deleting imputed outcomes from analysis datasets did not improve the performance of fully conditional specification. Both multivariate normal imputation and fully conditional specification produced biased estimates of the relative risk, presumably since both use a misspecified imputation model. Based on simulation results, we recommend researchers use fully conditional specification rather than multivariate normal imputation and retain imputed outcomes in the analysis when estimating relative risks. However fully conditional specification is not without its shortcomings, and so further research is needed to identify optimal approaches for relative risk estimation within the multiple imputation framework.

  10. Stratified Mucin-Producing Intraepithelial Lesion of the Cervix: Subtle Features Not to Be Missed.

    PubMed

    Schwock, Joerg; Ko, Hyang Mi; Dubé, Valérie; Rouzbahman, Marjan; Cesari, Matthew; Ghorab, Zeina; Geddie, William R

    2016-01-01

    Stratified mucin-producing intraepithelial lesion (SMILE) is an uncommon premalignant lesion of the uterine cervix. A detailed examination of preinvasive SMILE cases including a comparison of the cytologic features with usual-type adenocarcinoma in situ (AIS) and human papillomavirus (HPV) genotyping was performed. Excisions and preceding Papanicolaou (Pap) tests were retrieved from the files of 2 tertiary care centers. Histologic review estimated the lesional SMILE proportion. Pap tests were reviewed and assessed for architectural, cellular and background features. Cobas® HPV test was performed. 13 cases were identified. Mean/median patient age was 35/33 years (range 23-51 years). Concurrent high-grade squamous intraepithelial lesion was found in 10/13 (77%) and AIS in 8/13 (62%) cases. In 6 cases, SMILE was dominant (≥50%) and represented in 5/6 corresponding Pap tests. Cytology interpretations differed more often in the SMILE-dominant group (p < 0.05). SMILE and AIS had overlapping features. Feathering and prominent nucleoli were absent in SMILE. HPV DNA was detected in all 12 cases tested. HPV 18 was most common (7/12). Excisions with positive/suspicious margins were reported in 5/6 SMILE-dominant versus 3/7 nondominant cases. SMILE is best considered as an AIS variant for cytologic, etiologic and management purposes. Cytologic features overlap with AIS, but are more subtle and easily missed. HPV testing may play a role in facilitating SMILE detection. © 2016 S. Karger AG, Basel.

  11. Using an EM Covariance Matrix to Estimate Structural Equation Models with Missing Data: Choosing an Adjusted Sample Size to Improve the Accuracy of Inferences

    ERIC Educational Resources Information Center

    Enders, Craig K.; Peugh, James L.

    2004-01-01

    Two methods, direct maximum likelihood (ML) and the expectation maximization (EM) algorithm, can be used to obtain ML parameter estimates for structural equation models with missing data (MD). Although the 2 methods frequently produce identical parameter estimates, it may be easier to satisfy missing at random assumptions using EM. However, no…

  12. Genetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment

    PubMed Central

    Fu, Yong-Bi

    2014-01-01

    Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, with up to 90% of observations missing. Here we performed an empirical assessment of accuracy in genetic diversity analysis of highly incomplete single nucleotide polymorphism genotypes with imputations. Three large single-nucleotide polymorphism genotype data sets for corn, wheat, and rice were acquired, and missing data with up to 90% of missing observations were randomly generated and then imputed for missing genotypes with three map-independent imputation methods. Estimating heterozygosity and inbreeding coefficient from original, missing, and imputed data revealed variable patterns of bias from assessed levels of missingness and genotype imputation, but the estimation biases were smaller for missing data without genotype imputation. The estimates of genetic differentiation were rather robust up to 90% of missing observations but became substantially biased when missing genotypes were imputed. The estimates of topology accuracy for four representative samples of interested groups generally were reduced with increased levels of missing genotypes. Probabilistic principal component analysis based imputation performed better in terms of topology accuracy than those analyses of missing data without genotype imputation. These findings are not only significant for understanding the reliability of the genetic diversity analysis with respect to large missing data and genotype imputation but also are instructive for performing a proper genetic diversity analysis of highly incomplete GBS or other genotype data. PMID:24626289

  13. 28 CFR 19.4 - Cost and percentage estimates.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... RECOVERY OF MISSING CHILDREN § 19.4 Cost and percentage estimates. It is estimated that this program will... administrative costs. It is DOJ's objective that 50 percent of DOJ penalty mail contain missing children...

  14. The Fifth Cell: Correlation Bias in U.S. Census Adjustment.

    ERIC Educational Resources Information Center

    Wachter, Kenneth W.; Freedman, David A.

    2000-01-01

    Presents a method for estimating the total national number of doubly missing people (missing from Census counts and adjusted counts as well) and their distribution by race and sex. Application to the 1990 U.S. Census yields an estimate of three million doubly-missing people. (SLD)

  15. The Impact of Missing Background Data on Subpopulation Estimation

    ERIC Educational Resources Information Center

    Rutkowski, Leslie

    2011-01-01

    Although population modeling methods are well established, a paucity of literature appears to exist regarding the effect of missing background data on subpopulation achievement estimates. Using simulated data that follows typical large-scale assessment designs with known parameters and a number of missing conditions, this paper examines the extent…

  16. Estimation of covariate-specific time-dependent ROC curves in the presence of missing biomarkers.

    PubMed

    Li, Shanshan; Ning, Yang

    2015-09-01

    Covariate-specific time-dependent ROC curves are often used to evaluate the diagnostic accuracy of a biomarker with time-to-event outcomes, when certain covariates have an impact on the test accuracy. In many medical studies, measurements of biomarkers are subject to missingness due to high cost or limitation of technology. This article considers estimation of covariate-specific time-dependent ROC curves in the presence of missing biomarkers. To incorporate the covariate effect, we assume a proportional hazards model for the failure time given the biomarker and the covariates, and a semiparametric location model for the biomarker given the covariates. In the presence of missing biomarkers, we propose a simple weighted estimator for the ROC curves where the weights are inversely proportional to the selection probability. We also propose an augmented weighted estimator which utilizes information from the subjects with missing biomarkers. The augmented weighted estimator enjoys the double-robustness property in the sense that the estimator remains consistent if either the missing data process or the conditional distribution of the missing data given the observed data is correctly specified. We derive the large sample properties of the proposed estimators and evaluate their finite sample performance using numerical studies. The proposed approaches are illustrated using the US Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. © 2015, The International Biometric Society.

  17. Ship heading and velocity analysis by wake detection in SAR images

    NASA Astrophysics Data System (ADS)

    Graziano, Maria Daniela; D'Errico, Marco; Rufino, Giancarlo

    2016-11-01

    With the aim of ship-route estimation, a wake detection method is developed and applied to COSMO/SkyMed and TerraSAR-X Stripmap SAR images over the Gulf of Naples, Italy. In order to mitigate the intrinsic limitations of the threshold logic, the algorithm identifies the wake features according to the hydrodynamic theory. A post-detection validation phase is performed to classify the features as real wake structures by means of merit indexes defined in the intensity domain. After wake reconstruction, ship heading is evaluated on the basis of turbulent wake direction and ship velocity is estimated by both techniques of azimuth shift and Kelvin pattern wavelength. The method is tested over 34 ship wakes identified by visual inspection in both HH and VV images at different incidence angles. For all wakes, no missed detections are reported and at least the turbulent and one narrow-V wakes are correctly identified, with ship heading successfully estimated. Also, the azimuth shift method is applied to estimate velocity for the 10 ships having route with sufficient angular separation from the satellite ground track. In one case ship velocity is successfully estimated with both methods, showing agreement within 14%.

  18. Research Note: The consequences of different methods for handling missing network data in Stochastic Actor Based Models

    PubMed Central

    Hipp, John R.; Wang, Cheng; Butts, Carter T.; Jose, Rupa; Lakon, Cynthia M.

    2015-01-01

    Although stochastic actor based models (e.g., as implemented in the SIENA software program) are growing in popularity as a technique for estimating longitudinal network data, a relatively understudied issue is the consequence of missing network data for longitudinal analysis. We explore this issue in our research note by utilizing data from four schools in an existing dataset (the AddHealth dataset) over three time points, assessing the substantive consequences of using four different strategies for addressing missing network data. The results indicate that whereas some measures in such models are estimated relatively robustly regardless of the strategy chosen for addressing missing network data, some of the substantive conclusions will differ based on the missing data strategy chosen. These results have important implications for this burgeoning applied research area, implying that researchers should more carefully consider how they address missing data when estimating such models. PMID:25745276

  19. Research Note: The consequences of different methods for handling missing network data in Stochastic Actor Based Models.

    PubMed

    Hipp, John R; Wang, Cheng; Butts, Carter T; Jose, Rupa; Lakon, Cynthia M

    2015-05-01

    Although stochastic actor based models (e.g., as implemented in the SIENA software program) are growing in popularity as a technique for estimating longitudinal network data, a relatively understudied issue is the consequence of missing network data for longitudinal analysis. We explore this issue in our research note by utilizing data from four schools in an existing dataset (the AddHealth dataset) over three time points, assessing the substantive consequences of using four different strategies for addressing missing network data. The results indicate that whereas some measures in such models are estimated relatively robustly regardless of the strategy chosen for addressing missing network data, some of the substantive conclusions will differ based on the missing data strategy chosen. These results have important implications for this burgeoning applied research area, implying that researchers should more carefully consider how they address missing data when estimating such models.

  20. How to improve breeding value prediction for feed conversion ratio in the case of incomplete longitudinal body weights.

    PubMed

    Tran, V H Huynh; Gilbert, H; David, I

    2017-01-01

    With the development of automatic self-feeders, repeated measurements of feed intake are becoming easier in an increasing number of species. However, the corresponding BW are not always recorded, and these missing values complicate the longitudinal analysis of the feed conversion ratio (FCR). Our aim was to evaluate the impact of missing BW data on estimations of the genetic parameters of FCR and ways to improve the estimations. On the basis of the missing BW profile in French Large White pigs (male pigs weighed weekly, females and castrated males weighed monthly), we compared 2 different ways of predicting missing BW, 1 using a Gompertz model and 1 using a linear interpolation. For the first part of the study, we used 17,398 weekly records of BW and feed intake recorded over 16 consecutive weeks in 1,222 growing male pigs. We performed a simulation study on this data set to mimic missing BW values according to the pattern of weekly proportions of incomplete BW data in females and castrated males. The FCR was then computed for each week using observed data (obser_FCR), data with missing BW (miss_FCR), data with BW predicted using a Gompertz model (Gomp_FCR), and data with BW predicted by linear interpolation (interp_FCR). Heritability (h) was estimated, and the EBV was predicted for each repeated FCR using a random regression model. In the second part of the study, the full data set (males with their complete BW records, castrated males and females with missing BW) was analyzed using the same methods (miss_FCR, Gomp_FCR, and interp_FCR). Results of the simulation study showed that h were overestimated in the case of missing BW and that predicting BW using a linear interpolation provided a more accurate estimation of h and of EBV than a Gompertz model. Over 100 simulations, the correlation between obser_EBV and interp_EBV, Gomp_EBV, and miss_EBV was 0.93 ± 0.02, 0.91 ± 0.01, and 0.79 ± 0.04, respectively. The heritabilities obtained with the full data set were quite similar for miss_FCR, Gomp_FCR, and interp_FCR. In conclusion, when the proportion of missing BW is high, genetic parameters of FCR are not well estimated. In French Large White pigs, in the growing period extending from d 65 to 168, prediction of missing BW using a Gompertz growth model slightly improved the estimations, but the linear interpolation improved the estimation to a greater extent. This result is due to the linear rather than sigmoidal increase in BW over the study period.

  1. A regressive methodology for estimating missing data in rainfall daily time series

    NASA Astrophysics Data System (ADS)

    Barca, E.; Passarella, G.

    2009-04-01

    The "presence" of gaps in environmental data time series represents a very common, but extremely critical problem, since it can produce biased results (Rubin, 1976). Missing data plagues almost all surveys. The problem is how to deal with missing data once it has been deemed impossible to recover the actual missing values. Apart from the amount of missing data, another issue which plays an important role in the choice of any recovery approach is the evaluation of "missingness" mechanisms. When data missing is conditioned by some other variable observed in the data set (Schafer, 1997) the mechanism is called MAR (Missing at Random). Otherwise, when the missingness mechanism depends on the actual value of the missing data, it is called NCAR (Not Missing at Random). This last is the most difficult condition to model. In the last decade interest arose in the estimation of missing data by using regression (single imputation). More recently multiple imputation has become also available, which returns a distribution of estimated values (Scheffer, 2002). In this paper an automatic methodology for estimating missing data is presented. In practice, given a gauging station affected by missing data (target station), the methodology checks the randomness of the missing data and classifies the "similarity" between the target station and the other gauging stations spread over the study area. Among different methods useful for defining the similarity degree, whose effectiveness strongly depends on the data distribution, the Spearman correlation coefficient was chosen. Once defined the similarity matrix, a suitable, nonparametric, univariate, and regressive method was applied in order to estimate missing data in the target station: the Theil method (Theil, 1950). Even though the methodology revealed to be rather reliable an improvement of the missing data estimation can be achieved by a generalization. A first possible improvement consists in extending the univariate technique to the multivariate approach. Another approach follows the paradigm of the "multiple imputation" (Rubin, 1987; Rubin, 1988), which consists in using a set of "similar stations" instead than the most similar. This way, a sort of estimation range can be determined allowing the introduction of uncertainty. Finally, time series can be grouped on the basis of monthly rainfall rates defining classes of wetness (i.e.: dry, moderately rainy and rainy), in order to achieve the estimation using homogeneous data subsets. We expect that integrating the methodology with these enhancements will certainly improve its reliability. The methodology was applied to the daily rainfall time series data registered in the Candelaro River Basin (Apulia - South Italy) from 1970 to 2001. REFERENCES D.B., Rubin, 1976. Inference and Missing Data. Biometrika 63 581-592 D.B. Rubin, 1987. Multiple Imputation for Nonresponce in Surveys, New York: John Wiley & Sons, Inc. D.B. Rubin, 1988. An overview of multiple imputation. In Survey Research Section, pp. 79-84, American Statistical Association, 1988. J.L., Schafer, 1997. Analysis of Incomplete Multivariate Data, Chapman & Hall. J., Scheffer, 2002. Dealing with Missing Data. Res. Lett. Inf. Math. Sci. 3, 153-160. Available online at http://www.massey.ac.nz/~wwiims/research/letters/ H. Theil, 1950. A rank-invariant method of linear and polynomial regression analysis. Indicationes Mathematicae, 12, pp.85-91.

  2. An anti-disturbing real time pose estimation method and system

    NASA Astrophysics Data System (ADS)

    Zhou, Jian; Zhang, Xiao-hu

    2011-08-01

    Pose estimation relating two-dimensional (2D) images to three-dimensional (3D) rigid object need some known features to track. In practice, there are many algorithms which perform this task in high accuracy, but all of these algorithms suffer from features lost. This paper investigated the pose estimation when numbers of known features or even all of them were invisible. Firstly, known features were tracked to calculate pose in the current and the next image. Secondly, some unknown but good features to track were automatically detected in the current and the next image. Thirdly, those unknown features which were on the rigid and could match each other in the two images were retained. Because of the motion characteristic of the rigid object, the 3D information of those unknown features on the rigid could be solved by the rigid object's pose at the two moment and their 2D information in the two images except only two case: the first one was that both camera and object have no relative motion and camera parameter such as focus length, principle point, and etc. have no change at the two moment; the second one was that there was no shared scene or no matched feature in the two image. Finally, because those unknown features at the first time were known now, pose estimation could go on in the followed images in spite of the missing of known features in the beginning by repeating the process mentioned above. The robustness of pose estimation by different features detection algorithms such as Kanade-Lucas-Tomasi (KLT) feature, Scale Invariant Feature Transform (SIFT) and Speed Up Robust Feature (SURF) were compared and the compact of the different relative motion between camera and the rigid object were discussed in this paper. Graphic Processing Unit (GPU) parallel computing was also used to extract and to match hundreds of features for real time pose estimation which was hard to work on Central Processing Unit (CPU). Compared with other pose estimation methods, this new method can estimate pose between camera and object when part even all known features are lost, and has a quick response time benefit from GPU parallel computing. The method present here can be used widely in vision-guide techniques to strengthen its intelligence and generalization, which can also play an important role in autonomous navigation and positioning, robots fields at unknown environment. The results of simulation and experiments demonstrate that proposed method could suppress noise effectively, extracted features robustly, and achieve the real time need. Theory analysis and experiment shows the method is reasonable and efficient.

  3. 40 CFR 98.96 - Data reporting requirements.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ..., Equation I-16 of this subpart, for each fluorinated heat transfer fluid used. (s) Where missing data... § 98.95(b), the number of times missing data procedures were followed in the reporting year, the method used to estimate the missing data, and the estimates of those data. (t) A brief description of each...

  4. 40 CFR 98.96 - Data reporting requirements.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ..., Equation I-16 of this subpart, for each fluorinated heat transfer fluid used. (s) Where missing data... § 98.95(b), the number of times missing data procedures were followed in the reporting year, the method used to estimate the missing data, and the estimates of those data. (t) A brief description of each...

  5. How to deal with missing longitudinal data in cost of illness analysis in Alzheimer's disease-suggestions from the GERAS observational study.

    PubMed

    Belger, Mark; Haro, Josep Maria; Reed, Catherine; Happich, Michael; Kahle-Wrobleski, Kristin; Argimon, Josep Maria; Bruno, Giuseppe; Dodel, Richard; Jones, Roy W; Vellas, Bruno; Wimo, Anders

    2016-07-18

    Missing data are a common problem in prospective studies with a long follow-up, and the volume, pattern and reasons for missing data may be relevant when estimating the cost of illness. We aimed to evaluate the effects of different methods for dealing with missing longitudinal cost data and for costing caregiver time on total societal costs in Alzheimer's disease (AD). GERAS is an 18-month observational study of costs associated with AD. Total societal costs included patient health and social care costs, and caregiver health and informal care costs. Missing data were classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Simulation datasets were generated from baseline data with 10-40 % missing total cost data for each missing data mechanism. Datasets were also simulated to reflect the missing cost data pattern at 18 months using MAR and MNAR assumptions. Naïve and multiple imputation (MI) methods were applied to each dataset and results compared with complete GERAS 18-month cost data. Opportunity and replacement cost approaches were used for caregiver time, which was costed with and without supervision included and with time for working caregivers only being costed. Total costs were available for 99.4 % of 1497 patients at baseline. For MCAR datasets, naïve methods performed as well as MI methods. For MAR, MI methods performed better than naïve methods. All imputation approaches were poor for MNAR data. For all approaches, percentage bias increased with missing data volume. For datasets reflecting 18-month patterns, a combination of imputation methods provided more accurate cost estimates (e.g. bias: -1 % vs -6 % for single MI method), although different approaches to costing caregiver time had a greater impact on estimated costs (29-43 % increase over base case estimate). Methods used to impute missing cost data in AD will impact on accuracy of cost estimates although varying approaches to costing informal caregiver time has the greatest impact on total costs. Tailoring imputation methods to the reason for missing data will further our understanding of the best analytical approach for studies involving cost outcomes.

  6. Predicting missing biomarker data in a longitudinal study of Alzheimer disease.

    PubMed

    Lo, Raymond Y; Jagust, William J

    2012-05-01

    To investigate predictors of missing data in a longitudinal study of Alzheimer disease (AD). The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a clinic-based, multicenter, longitudinal study with blood, CSF, PET, and MRI scans repeatedly measured in 229 participants with normal cognition (NC), 397 with mild cognitive impairment (MCI), and 193 with mild AD during 2005-2007. We used univariate and multivariable logistic regression models to examine the associations between baseline demographic/clinical features and loss of biomarker follow-ups in ADNI. CSF studies tended to recruit and retain patients with MCI with more AD-like features, including lower levels of baseline CSF Aβ(42). Depression was the major predictor for MCI dropouts, while family history of AD kept more patients with AD enrolled in PET and MRI studies. Poor cognitive performance was associated with loss of follow-up in most biomarker studies, even among NC participants. The presence of vascular risk factors seemed more critical than cognitive function for predicting dropouts in AD. The missing data are not missing completely at random in ADNI and likely conditional on certain features in addition to cognitive function. Missing data predictors vary across biomarkers and even MCI and AD groups do not share the same missing data pattern. Understanding the missing data structure may help in the design of future longitudinal studies and clinical trials in AD.

  7. Predicting missing biomarker data in a longitudinal study of Alzheimer disease

    PubMed Central

    Jagust, William J.; Aisen, Paul; Jack, Clifford R.; Toga, Arthur W.; Beckett, Laurel; Gamst, Anthony; Soares, Holly; C. Green, Robert; Montine, Tom; Thomas, Ronald G.; Donohue, Michael; Walter, Sarah; Dale, Anders; Bernstein, Matthew; Felmlee, Joel; Fox, Nick; Thompson, Paul; Schuff, Norbert; Alexander, Gene; DeCarli, Charles; Bandy, Dan; Chen, Kewei; Morris, John; Lee, Virginia M.-Y.; Korecka, Magdalena; Crawford, Karen; Neu, Scott; Harvey, Danielle; Kornak, John; Saykin, Andrew J.; Foroud, Tatiana M.; Potkin, Steven; Shen, Li; Buckholtz, Neil; Kaye, Jeffrey; Dolen, Sara; Quinn, Joseph; Schneider, Lon; Pawluczyk, Sonia; Spann, Bryan M.; Brewer, James; Vanderswag, Helen; Heidebrink, Judith L.; Lord, Joanne L.; Petersen, Ronald; Johnson, Kris; Doody, Rachelle S.; Villanueva-Meyer, Javier; Chowdhury, Munir; Stern, Yaakov; Honig, Lawrence S.; Bell, Karen L.; Morris, John C.; Mintun, Mark A.; Schneider, Stacy; Marson, Daniel; Griffith, Randall; Clark, David; Grossman, Hillel; Tang, Cheuk; Marzloff, George; Toledo-Morrell, Leylade; Shah, Raj C.; Duara, Ranjan; Varon, Daniel; Roberts, Peggy; Albert, Marilyn S.; Pedroso, Julia; Toroney, Jaimie; Rusinek, Henry; de Leon, Mony J; De Santi, Susan M; Doraiswamy, P. Murali; Petrella, Jeffrey R.; Aiello, Marilyn; Clark, Christopher M.; Pham, Cassie; Nunez, Jessica; Smith, Charles D.; Given, Curtis A.; Hardy, Peter; Lopez, Oscar L.; Oakley, MaryAnn; Simpson, Donna M.; Ismail, M. Saleem; Brand, Connie; Richard, Jennifer; Mulnard, Ruth A.; Thai, Gaby; Mc-Adams-Ortiz, Catherine; Diaz-Arrastia, Ramon; Martin-Cook, Kristen; DeVous, Michael; Levey, Allan I.; Lah, James J.; Cellar, Janet S.; Burns, Jeffrey M.; Anderson, Heather S.; Laubinger, Mary M.; Bartzokis, George; Silverman, Daniel H.S.; Lu, Po H.; Graff-Radford MBBCH, Neill R; Parfitt, Francine; Johnson, Heather; Farlow, Martin; Herring, Scott; Hake, Ann M.; van Dyck, Christopher H.; MacAvoy, Martha G.; Benincasa, Amanda L.; Chertkow, Howard; Bergman, Howard; Hosein, Chris; Black, Sandra; Graham, Simon; Caldwell, Curtis; Hsiung, Ging-Yuek Robin; Feldman, Howard; Assaly, Michele; Kertesz, Andrew; Rogers, John; Trost, Dick; Bernick, Charles; Munic, Donna; Wu, Chuang-Kuo; Johnson, Nancy; Mesulam, Marsel; Sadowsky, Carl; Martinez, Walter; Villena, Teresa; Turner, Scott; Johnson, Kathleen B.; Behan, Kelly E.; Sperling, Reisa A.; Rentz, Dorene M.; Johnson, Keith A.; Rosen, Allyson; Tinklenberg, Jared; Ashford, Wes; Sabbagh, Marwan; Connor, Donald; Jacobson, Sandra; Killiany, Ronald; Norbash, Alexander; Nair, Anil; Obisesan, Thomas O.; Jayam-Trouth, Annapurni; Wang, Paul; Lerner, Alan; Hudson, Leon; Ogrocki, Paula; DeCarli, Charles; Fletcher, Evan; Carmichael, Owen; Kittur, Smita; Mirje, Seema; Borrie, Michael; Lee, T-Y; Bartha, Dr Rob; Johnson, Sterling; Asthana, Sanjay; Carlsson, Cynthia M.; Potkin, Steven G.; Preda, Adrian; Nguyen, Dana; Tariot, Pierre; Fleisher, Adam; Reeder, Stephanie; Bates, Vernice; Capote, Horacio; Rainka, Michelle; Hendin, Barry A.; Scharre, Douglas W.; Kataki, Maria; Zimmerman, Earl A.; Celmins, Dzintra; Brown, Alice D.; Gandy, Sam; Marenberg, Marjorie E.; Rovner, Barry W.; Pearlson, Godfrey; Anderson, Karen; Saykin, Andrew J.; Santulli, Robert B.; Englert, Jessica; Williamson, Jeff D.; Sink, Kaycee M.; Watkins, Franklin; Ott, Brian R.; Wu, Chuang-Kuo; Cohen, Ronald; Salloway, Stephen; Malloy, Paul; Correia, Stephen; Rosen, Howard J.; Miller, Bruce L.; Mintzer, Jacobo

    2012-01-01

    Objective: To investigate predictors of missing data in a longitudinal study of Alzheimer disease (AD). Methods: The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a clinic-based, multicenter, longitudinal study with blood, CSF, PET, and MRI scans repeatedly measured in 229 participants with normal cognition (NC), 397 with mild cognitive impairment (MCI), and 193 with mild AD during 2005–2007. We used univariate and multivariable logistic regression models to examine the associations between baseline demographic/clinical features and loss of biomarker follow-ups in ADNI. Results: CSF studies tended to recruit and retain patients with MCI with more AD-like features, including lower levels of baseline CSF Aβ42. Depression was the major predictor for MCI dropouts, while family history of AD kept more patients with AD enrolled in PET and MRI studies. Poor cognitive performance was associated with loss of follow-up in most biomarker studies, even among NC participants. The presence of vascular risk factors seemed more critical than cognitive function for predicting dropouts in AD. Conclusion: The missing data are not missing completely at random in ADNI and likely conditional on certain features in addition to cognitive function. Missing data predictors vary across biomarkers and even MCI and AD groups do not share the same missing data pattern. Understanding the missing data structure may help in the design of future longitudinal studies and clinical trials in AD. PMID:22491869

  8. Learning-based subject-specific estimation of dynamic maps of cortical morphology at missing time points in longitudinal infant studies.

    PubMed

    Meng, Yu; Li, Gang; Gao, Yaozong; Lin, Weili; Shen, Dinggang

    2016-11-01

    Longitudinal neuroimaging analysis of the dynamic brain development in infants has received increasing attention recently. Many studies expect a complete longitudinal dataset in order to accurately chart the brain developmental trajectories. However, in practice, a large portion of subjects in longitudinal studies often have missing data at certain time points, due to various reasons such as the absence of scan or poor image quality. To make better use of these incomplete longitudinal data, in this paper, we propose a novel machine learning-based method to estimate the subject-specific, vertex-wise cortical morphological attributes at the missing time points in longitudinal infant studies. Specifically, we develop a customized regression forest, named dynamically assembled regression forest (DARF), as the core regression tool. DARF ensures the spatial smoothness of the estimated maps for vertex-wise cortical morphological attributes and also greatly reduces the computational cost. By employing a pairwise estimation followed by a joint refinement, our method is able to fully exploit the available information from both subjects with complete scans and subjects with missing scans for estimation of the missing cortical attribute maps. The proposed method has been applied to estimating the dynamic cortical thickness maps at missing time points in an incomplete longitudinal infant dataset, which includes 31 healthy infant subjects, each having up to five time points in the first postnatal year. The experimental results indicate that our proposed framework can accurately estimate the subject-specific vertex-wise cortical thickness maps at missing time points, with the average error less than 0.23 mm. Hum Brain Mapp 37:4129-4147, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  9. Estimated Environmental Exposures for MISSE-3 and MISSE-4

    NASA Technical Reports Server (NTRS)

    Finckenor, Miria M.; Pippin, Gary; Kinard, William H.

    2008-01-01

    Describes the estimated environmental exposure for MISSE-2 and MISSE-4. These test beds, attached to the outside of the International Space Station, were planned for 3 years of exposure. This was changed to 1 year after MISSE-1 and -2 were in space for 4 years. MISSE-3 and -4 operate in a low Earth orbit space environment, which exposes them to a variety of assaults including atomic oxygen, ultraviolet radiation, particulate radiation, thermal cycling, and meteoroid/space debris impact, as well as contamination associated with proximity to an active space station. Measurements and determinations of atomic oxygen fluences, solar UV exposure levels, molecular contamination levels, and particulate radiation are included.

  10. Evaluation of methods to estimate missing days' supply within pharmacy data of the Clinical Practice Research Datalink (CPRD) and The Health Improvement Network (THIN).

    PubMed

    Lum, Kirsten J; Newcomb, Craig W; Roy, Jason A; Carbonari, Dena M; Saine, M Elle; Cardillo, Serena; Bhullar, Harshvinder; Gallagher, Arlene M; Lo Re, Vincent

    2017-01-01

    The extent to which days' supply data are missing in pharmacoepidemiologic databases and effective methods for estimation is unknown. We determined the percentage of missing days' supply on prescription and patient levels for oral anti-diabetic drugs (OADs) and evaluated three methods for estimating days' supply within the Clinical Practice Research Datalink (CPRD) and The Health Improvement Network (THIN). We estimated the percentage of OAD prescriptions and patients with missing days' supply in each database from 2009 to 2013. Within a random sample of prescriptions with known days' supply, we measured the accuracy of three methods to estimate missing days' supply by imputing the following: (1) 28 days' supply, (2) mode number of tablets/day by drug strength and number of tablets/prescription, and (3) number of tablets/day via a machine learning algorithm. We determined incidence rates (IRs) of acute myocardial infarction (AMI) using each method to evaluate the impact on ascertainment of exposure time and outcomes. Days' supply was missing for 24 % of OAD prescriptions in CPRD and 33 % in THIN (affecting 48 and 57 % of patients, respectively). Methods 2 and 3 were very accurate in estimating days' supply for OADs prescribed at a consistent number of tablets/day. Method 3 was more accurate for OADs prescribed at varying number of tablets/day. IRs of AMI were similar across methods for most OADs. Missing days' supply is a substantial problem in both databases. Method 2 is easy and very accurate for most OADs and results in IRs comparable to those from method 3.

  11. Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing

    ERIC Educational Resources Information Center

    Han, Kyung T.; Guo, Fanmin

    2014-01-01

    The full-information maximum likelihood (FIML) method makes it possible to estimate and analyze structural equation models (SEM) even when data are partially missing, enabling incomplete data to contribute to model estimation. The cornerstone of FIML is the missing-at-random (MAR) assumption. In (unidimensional) computerized adaptive testing…

  12. Estimation of Item Response Theory Parameters in the Presence of Missing Data

    ERIC Educational Resources Information Center

    Finch, Holmes

    2008-01-01

    Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…

  13. Inverse-Probability-Weighted Estimation for Monotone and Nonmonotone Missing Data.

    PubMed

    Sun, BaoLuo; Perkins, Neil J; Cole, Stephen R; Harel, Ofer; Mitchell, Emily M; Schisterman, Enrique F; Tchetgen Tchetgen, Eric J

    2018-03-01

    Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568-575).

  14. Inverse-Probability-Weighted Estimation for Monotone and Nonmonotone Missing Data

    PubMed Central

    Sun, BaoLuo; Perkins, Neil J; Cole, Stephen R; Harel, Ofer; Mitchell, Emily M; Schisterman, Enrique F; Tchetgen Tchetgen, Eric J

    2018-01-01

    Abstract Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568–575). PMID:29165557

  15. Treatment of Missing Data in Workforce Education Research

    ERIC Educational Resources Information Center

    Gemici, Sinan; Rojewski, Jay W.; Lee, In Heok

    2012-01-01

    Most quantitative analyses in workforce education are affected by missing data. Traditional approaches to remedy missing data problems often result in reduced statistical power and biased parameter estimates due to systematic differences between missing and observed values. This article examines the treatment of missing data in pertinent…

  16. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data*

    PubMed Central

    Cai, T. Tony; Zhang, Anru

    2016-01-01

    Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data. PMID:27777471

  17. Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.

    PubMed

    Cai, T Tony; Zhang, Anru

    2016-09-01

    Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.

  18. A MAP-based image interpolation method via Viterbi decoding of Markov chains of interpolation functions.

    PubMed

    Vedadi, Farhang; Shirani, Shahram

    2014-01-01

    A new method of image resolution up-conversion (image interpolation) based on maximum a posteriori sequence estimation is proposed. Instead of making a hard decision about the value of each missing pixel, we estimate the missing pixels in groups. At each missing pixel of the high resolution (HR) image, we consider an ensemble of candidate interpolation methods (interpolation functions). The interpolation functions are interpreted as states of a Markov model. In other words, the proposed method undergoes state transitions from one missing pixel position to the next. Accordingly, the interpolation problem is translated to the problem of estimating the optimal sequence of interpolation functions corresponding to the sequence of missing HR pixel positions. We derive a parameter-free probabilistic model for this to-be-estimated sequence of interpolation functions. Then, we solve the estimation problem using a trellis representation and the Viterbi algorithm. Using directional interpolation functions and sequence estimation techniques, we classify the new algorithm as an adaptive directional interpolation using soft-decision estimation techniques. Experimental results show that the proposed algorithm yields images with higher or comparable peak signal-to-noise ratios compared with some benchmark interpolation methods in the literature while being efficient in terms of implementation and complexity considerations.

  19. Effects of correcting missing daily feed intake values on the genetic parameters and estimated breeding values for feeding traits in pigs.

    PubMed

    Ito, Tetsuya; Fukawa, Kazuo; Kamikawa, Mai; Nikaidou, Satoshi; Taniguchi, Masaaki; Arakawa, Aisaku; Tanaka, Genki; Mikawa, Satoshi; Furukawa, Tsutomu; Hirose, Kensuke

    2018-01-01

    Daily feed intake (DFI) is an important consideration for improving feed efficiency, but measurements using electronic feeder systems contain many missing and incorrect values. Therefore, we evaluated three methods for correcting missing DFI data (quadratic, orthogonal polynomial, and locally weighted (Loess) regression equations) and assessed the effects of these missing values on the genetic parameters and the estimated breeding values (EBV) for feeding traits. DFI records were obtained from 1622 Duroc pigs, comprising 902 individuals without missing DFI and 720 individuals with missing DFI. The Loess equation was the most suitable method for correcting the missing DFI values in 5-50% randomly deleted datasets among the three equations. Both variance components and heritability for the average DFI (ADFI) did not change because of the missing DFI proportion and Loess correction. In terms of rank correlation and information criteria, Loess correction improved the accuracy of EBV for ADFI compared to randomly deleted cases. These findings indicate that the Loess equation is useful for correcting missing DFI values for individual pigs and that the correction of missing DFI values could be effective for the estimation of breeding values and genetic improvement using EBV for feeding traits. © 2017 The Authors. Animal Science Journal published by John Wiley & Sons Australia, Ltd on behalf of Japanese Society of Animal Science.

  20. VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA

    PubMed Central

    Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu

    2009-01-01

    We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190

  1. Building a kinetic Monte Carlo model with a chosen accuracy.

    PubMed

    Bhute, Vijesh J; Chatterjee, Abhijit

    2013-06-28

    The kinetic Monte Carlo (KMC) method is a popular modeling approach for reaching large materials length and time scales. The KMC dynamics is erroneous when atomic processes that are relevant to the dynamics are missing from the KMC model. Recently, we had developed for the first time an error measure for KMC in Bhute and Chatterjee [J. Chem. Phys. 138, 084103 (2013)]. The error measure, which is given in terms of the probability that a missing process will be selected in the correct dynamics, requires estimation of the missing rate. In this work, we present an improved procedure for estimating the missing rate. The estimate found using the new procedure is within an order of magnitude of the correct missing rate, unlike our previous approach where the estimate was larger by orders of magnitude. This enables one to find the error in the KMC model more accurately. In addition, we find the time for which the KMC model can be used before a maximum error in the dynamics has been reached.

  2. Instrumental Variable Methods for Continuous Outcomes That Accommodate Nonignorable Missing Baseline Values.

    PubMed

    Ertefaie, Ashkan; Flory, James H; Hennessy, Sean; Small, Dylan S

    2017-06-15

    Instrumental variable (IV) methods provide unbiased treatment effect estimation in the presence of unmeasured confounders under certain assumptions. To provide valid estimates of treatment effect, treatment effect confounders that are associated with the IV (IV-confounders) must be included in the analysis, and not including observations with missing values may lead to bias. Missing covariate data are particularly problematic when the probability that a value is missing is related to the value itself, which is known as nonignorable missingness. In such cases, imputation-based methods are biased. Using health-care provider preference as an IV method, we propose a 2-step procedure with which to estimate a valid treatment effect in the presence of baseline variables with nonignorable missing values. First, the provider preference IV value is estimated by performing a complete-case analysis using a random-effects model that includes IV-confounders. Second, the treatment effect is estimated using a 2-stage least squares IV approach that excludes IV-confounders with missing values. Simulation results are presented, and the method is applied to an analysis comparing the effects of sulfonylureas versus metformin on body mass index, where the variables baseline body mass index and glycosylated hemoglobin have missing values. Our result supports the association of sulfonylureas with weight gain. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. Using generalized estimating equations and extensions in randomized trials with missing longitudinal patient reported outcome data.

    PubMed

    Bell, Melanie L; Horton, Nicholas J; Dhillon, Haryana M; Bray, Victoria J; Vardy, Janette

    2018-05-26

    Patient reported outcomes (PROs) are important in oncology research; however, missing data can pose a threat to the validity of results. Psycho-oncology researchers should be aware of the statistical options for handling missing data robustly. One rarely used set of methods, which includes extensions for handling missing data, is generalized estimating equations (GEEs). Our objective was to demonstrate use of GEEs to analyze PROs with missing data in randomized trials with assessments at fixed time points. We introduce GEEs and show, with a worked example, how to use GEEs that account for missing data: inverse probability weighted GEEs and multiple imputation with GEE. We use data from an RCT evaluating a web-based brain training for cancer survivors reporting cognitive symptoms after chemotherapy treatment. The primary outcome for this demonstration is the binary outcome of cognitive impairment. Several methods are used, and results are compared. We demonstrate that estimates can vary depending on the choice of analytical approach, with odds ratios for no cognitive impairment ranging from 2.04 to 5.74. While most of these estimates were statistically significant (P < 0.05), a few were not. Researchers using PROs should use statistical methods that handle missing data in a way as to result in unbiased estimates. GEE extensions are analytic options for handling dropouts in longitudinal RCTs, particularly if the outcome is not continuous. Copyright © 2018 John Wiley & Sons, Ltd.

  4. Depth inpainting by tensor voting.

    PubMed

    Kulkarni, Mandar; Rajagopalan, Ambasamudram N

    2013-06-01

    Depth maps captured by range scanning devices or by using optical cameras often suffer from missing regions due to occlusions, reflectivity, limited scanning area, sensor imperfections, etc. In this paper, we propose a fast and reliable algorithm for depth map inpainting using the tensor voting (TV) framework. For less complex missing regions, local edge and depth information is utilized for synthesizing missing values. The depth variations are modeled by local planes using 3D TV, and missing values are estimated using plane equations. For large and complex missing regions, we collect and evaluate depth estimates from self-similar (training) datasets. We align the depth maps of the training set with the target (defective) depth map and evaluate the goodness of depth estimates among candidate values using 3D TV. We demonstrate the effectiveness of the proposed approaches on real as well as synthetic data.

  5. Missed diagnoses of acute myocardial infarction in the emergency department: variation by patient and facility characteristics.

    PubMed

    Moy, Ernest; Barrett, Marguerite; Coffey, Rosanna; Hines, Anika L; Newman-Toker, David E

    2015-02-01

    An estimated 1.2 million people in the US have an acute myocardial infarction (AMI) each year. An estimated 7% of AMI hospitalizations result in death. Most patients experiencing acute coronary symptoms, such as unstable angina, visit an emergency department (ED). Some patients hospitalized with AMI after a treat-and-release ED visit likely represent missed opportunities for correct diagnosis and treatment. The purpose of the present study is to estimate the frequency of missed AMI or its precursors in the ED by examining use of EDs prior to hospitalization for AMI. We estimated the rate of probable missed diagnoses in EDs in the week before hospitalization for AMI and examined associated factors. We used Healthcare Cost and Utilization Project State Inpatient Databases and State Emergency Department Databases for 2007 to evaluate missed diagnoses in 111,973 admitted patients aged 18 years and older. We identified missed diagnoses in the ED for 993 of 112,000 patients (0.9% of all AMI admissions). These patients had visited an ED with chest pain or cardiac conditions, were released, and were subsequently admitted for AMI within 7 days. Higher odds of having missed diagnoses were associated with being younger and of Black race. Hospital teaching status, availability of cardiac catheterization, high ED admission rates, high inpatient occupancy rates, and urban location were associated with lower odds of missed diagnoses. Administrative data provide robust information that may help EDs identify populations at risk of experiencing a missed diagnosis, address disparities, and reduce diagnostic errors.

  6. Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

    PubMed Central

    Lee, Minjung; Dignam, James J.; Han, Junhee

    2014-01-01

    We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107

  7. 40 CFR 98.225 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... data shall be the best available estimate based on all available process data or data used for accounting purposes (such as sales records). (b) For missing values related to the performance test...

  8. 40 CFR 98.445 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... quantities calculations is required. Whenever the monitoring procedures cannot be followed, you must use the...) A quarterly mass or volume of contents in containers received that is missing must be estimated as...

  9. 40 CFR 98.445 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... quantities calculations is required. Whenever the monitoring procedures cannot be followed, you must use the...) A quarterly mass or volume of contents in containers received that is missing must be estimated as...

  10. 40 CFR 98.445 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... quantities calculations is required. Whenever the monitoring procedures cannot be followed, you must use the...) A quarterly mass or volume of contents in containers received that is missing must be estimated as...

  11. 40 CFR 98.295 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... value shall be the best available estimate(s) of the parameter(s), based on all available process data or data used for accounting purposes. (c) For each missing value collected during the performance test (hourly CO2 concentration, stack gas volumetric flow rate, or average process vent flow from mine...

  12. Utility of inverse probability weighting in molecular pathological epidemiology.

    PubMed

    Liu, Li; Nevo, Daniel; Nishihara, Reiko; Cao, Yin; Song, Mingyang; Twombly, Tyler S; Chan, Andrew T; Giovannucci, Edward L; VanderWeele, Tyler J; Wang, Molin; Ogino, Shuji

    2018-04-01

    As one of causal inference methodologies, the inverse probability weighting (IPW) method has been utilized to address confounding and account for missing data when subjects with missing data cannot be included in a primary analysis. The transdisciplinary field of molecular pathological epidemiology (MPE) integrates molecular pathological and epidemiological methods, and takes advantages of improved understanding of pathogenesis to generate stronger biological evidence of causality and optimize strategies for precision medicine and prevention. Disease subtyping based on biomarker analysis of biospecimens is essential in MPE research. However, there are nearly always cases that lack subtype information due to the unavailability or insufficiency of biospecimens. To address this missing subtype data issue, we incorporated inverse probability weights into Cox proportional cause-specific hazards regression. The weight was inverse of the probability of biomarker data availability estimated based on a model for biomarker data availability status. The strategy was illustrated in two example studies; each assessed alcohol intake or family history of colorectal cancer in relation to the risk of developing colorectal carcinoma subtypes classified by tumor microsatellite instability (MSI) status, using a prospective cohort study, the Nurses' Health Study. Logistic regression was used to estimate the probability of MSI data availability for each cancer case with covariates of clinical features and family history of colorectal cancer. This application of IPW can reduce selection bias caused by nonrandom variation in biospecimen data availability. The integration of causal inference methods into the MPE approach will likely have substantial potentials to advance the field of epidemiology.

  13. 40 CFR 98.55 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... substitute data shall be the best available estimate based on all available process data or data used for accounting purposes (such as sales records). (b) For missing values related to the performance test...

  14. Improving cluster-based missing value estimation of DNA microarray data.

    PubMed

    Brás, Lígia P; Menezes, José C

    2007-06-01

    We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khachatryan, Vardan

    The performance of missing transverse energy reconstruction algorithms is presented by our team using√s=8 TeV proton-proton (pp) data collected with the CMS detector. Events with anomalous missing transverse energy are studied, and the performance of algorithms used to identify and remove these events is presented. The scale and resolution for missing transverse energy, including the effects of multiple pp interactions (pileup), are measured using events with an identified Z boson or isolated photon, and are found to be well described by the simulation. Novel missing transverse energy reconstruction algorithms developed specifically to mitigate the effects of large numbers of pileupmore » interactions on the missing transverse energy resolution are presented. These algorithms significantly reduce the dependence of the missing transverse energy resolution on pileup interactions. Furthermore, an algorithm that provides an estimate of the significance of the missing transverse energy is presented, which is used to estimate the compatibility of the reconstructed missing transverse energy with a zero nominal value.« less

  16. Evaluation of Fuzzy-Logic Framework for Spatial Statistics Preserving Methods for Estimation of Missing Precipitation Data

    NASA Astrophysics Data System (ADS)

    El Sharif, H.; Teegavarapu, R. S.

    2012-12-01

    Spatial interpolation methods used for estimation of missing precipitation data at a site seldom check for their ability to preserve site and regional statistics. Such statistics are primarily defined by spatial correlations and other site-to-site statistics in a region. Preservation of site and regional statistics represents a means of assessing the validity of missing precipitation estimates at a site. This study evaluates the efficacy of a fuzzy-logic methodology for infilling missing historical daily precipitation data in preserving site and regional statistics. Rain gauge sites in the state of Kentucky, USA, are used as a case study for evaluation of this newly proposed method in comparison to traditional data infilling techniques. Several error and performance measures will be used to evaluate the methods and trade-offs in accuracy of estimation and preservation of site and regional statistics.

  17. A practical salient region feature based 3D multi-modality registration method for medical images

    NASA Astrophysics Data System (ADS)

    Hahn, Dieter A.; Wolz, Gabriele; Sun, Yiyong; Hornegger, Joachim; Sauer, Frank; Kuwert, Torsten; Xu, Chenyang

    2006-03-01

    We present a novel representation of 3D salient region features and its integration into a hybrid rigid-body registration framework. We adopt scale, translation and rotation invariance properties of those intrinsic 3D features to estimate a transform between underlying mono- or multi-modal 3D medical images. Our method combines advantageous aspects of both feature- and intensity-based approaches and consists of three steps: an automatic extraction of a set of 3D salient region features on each image, a robust estimation of correspondences and their sub-pixel accurate refinement with outliers elimination. We propose a region-growing based approach for the extraction of 3D salient region features, a solution to the problem of feature clustering and a reduction of the correspondence search space complexity. Results of the developed algorithm are presented for both mono- and multi-modal intra-patient 3D image pairs (CT, PET and SPECT) that have been acquired for change detection, tumor localization, and time based intra-person studies. The accuracy of the method is clinically evaluated by a medical expert with an approach that measures the distance between a set of selected corresponding points consisting of both anatomical and functional structures or lesion sites. This demonstrates the robustness of the proposed method to image overlap, missing information and artefacts. We conclude by discussing potential medical applications and possibilities for integration into a non-rigid registration framework.

  18. Estimation of Missing Water-Level Data for the Everglades Depth Estimation Network (EDEN)

    USGS Publications Warehouse

    Conrads, Paul; Petkewich, Matthew D.

    2009-01-01

    The Everglades Depth Estimation Network (EDEN) is an integrated network of real-time water-level gaging stations, ground-elevation models, and water-surface elevation models designed to provide scientists, engineers, and water-resource managers with current (2000-2009) water-depth information for the entire freshwater portion of the greater Everglades. The U.S. Geological Survey Greater Everglades Priority Ecosystems Science provides support for EDEN and their goal of providing quality-assured monitoring data for the U.S. Army Corps of Engineers Comprehensive Everglades Restoration Plan. To increase the accuracy of the daily water-surface elevation model, water-level estimation equations were developed to fill missing data. To minimize the occurrences of no estimation of data due to missing data for an input station, a minimum of three linear regression equations were developed for each station using different input stations. Of the 726 water-level estimation equations developed to fill missing data at 239 stations, more than 60 percent of the equations have coefficients of determination greater than 0.90, and 92 percent have an coefficient of determination greater than 0.70.

  19. Sensitivity Analysis of Multiple Informant Models When Data are Not Missing at Random

    PubMed Central

    Blozis, Shelley A.; Ge, Xiaojia; Xu, Shu; Natsuaki, Misaki N.; Shaw, Daniel S.; Neiderhiser, Jenae; Scaramella, Laura; Leve, Leslie; Reiss, David

    2014-01-01

    Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups may be retained even if only one member of a group contributes data. Statistical inference is based on the assumption that data are missing completely at random or missing at random. Importantly, whether or not data are missing is assumed to be independent of the missing data. A saturated correlates model that incorporates correlates of the missingness or the missing data into an analysis and multiple imputation that may also use such correlates offer advantages over the standard implementation of SEM when data are not missing at random because these approaches may result in a data analysis problem for which the missingness is ignorable. This paper considers these approaches in an analysis of family data to assess the sensitivity of parameter estimates to assumptions about missing data, a strategy that may be easily implemented using SEM software. PMID:25221420

  20. Performance of the CMS missing transverse momentum reconstruction in pp data at $$\\sqrt{s}$$ = 8 TeV

    DOE PAGES

    Khachatryan, Vardan

    2015-02-12

    The performance of missing transverse energy reconstruction algorithms is presented by our team using√s=8 TeV proton-proton (pp) data collected with the CMS detector. Events with anomalous missing transverse energy are studied, and the performance of algorithms used to identify and remove these events is presented. The scale and resolution for missing transverse energy, including the effects of multiple pp interactions (pileup), are measured using events with an identified Z boson or isolated photon, and are found to be well described by the simulation. Novel missing transverse energy reconstruction algorithms developed specifically to mitigate the effects of large numbers of pileupmore » interactions on the missing transverse energy resolution are presented. These algorithms significantly reduce the dependence of the missing transverse energy resolution on pileup interactions. Furthermore, an algorithm that provides an estimate of the significance of the missing transverse energy is presented, which is used to estimate the compatibility of the reconstructed missing transverse energy with a zero nominal value.« less

  1. A novel framework for objective detection and tracking of TC center from noisy satellite imagery

    NASA Astrophysics Data System (ADS)

    Johnson, Bibin; Thomas, Sachin; Rani, J. Sheeba

    2018-07-01

    This paper proposes a novel framework for automatically determining and tracking the center of a tropical cyclone (TC) during its entire life-cycle from the Thermal infrared (TIR) channel data of the geostationary satellite. The proposed method handles meteorological images with noise, missing or partial information due to the seasonal variability and lack of significant spatial or vortex features. To retrieve the cyclone center from these circumstances, a synergistic approach based on objective measures and Numerical Weather Prediction (NWP) model is being proposed. This method employs a spatial gradient scheme to process missing and noisy frames or a spatio-temporal gradient scheme for image sequences that are continuous and contain less noise. The initial estimate of the TC center from the missing imagery is corrected by exploiting a NWP model based post-processing scheme. The validity of the framework is tested on Infrared images of different cyclones obtained from various Geostationary satellites such as the Meteosat-7, INSAT- 3 D , Kalpana-1 etc. The computed track is compared with the actual track data obtained from Joint Typhoon Warning Center (JTWC), and it shows a reduction of mean track error by 11 % as compared to the other state of the art methods in the presence of missing and noisy frames. The proposed method is also successfully tested for simultaneous retrieval of the TC center from images containing multiple non-overlapping cyclones.

  2. Recognition of children on age-different images: Facial morphology and age-stable features.

    PubMed

    Caplova, Zuzana; Compassi, Valentina; Giancola, Silvio; Gibelli, Daniele M; Obertová, Zuzana; Poppa, Pasquale; Sala, Remo; Sforza, Chiarella; Cattaneo, Cristina

    2017-07-01

    The situation of missing children is one of the most emotional social issues worldwide. The search for and identification of missing children is often hampered, among others, by the fact that the facial morphology of long-term missing children changes as they grow. Nowadays, the wide coverage by surveillance systems potentially provides image material for comparisons with images of missing children that may facilitate identification. The aim of study was to identify whether facial features are stable in time and can be utilized for facial recognition by comparing facial images of children at different ages as well as to test the possible use of moles in recognition. The study was divided into two phases (1) morphological classification of facial features using an Anthropological Atlas; (2) algorithm developed in MATLAB® R2014b for assessing the use of moles as age-stable features. The assessment of facial features by Anthropological Atlases showed high mismatch percentages among observers. On average, the mismatch percentages were lower for features describing shape than for those describing size. The nose tip cleft and the chin dimple showed the best agreement between observers regarding both categorization and stability over time. Using the position of moles as a reference point for recognition of the same person on age-different images seems to be a useful method in terms of objectivity and it can be concluded that moles represent age-stable facial features that may be considered for preliminary recognition. Copyright © 2017 The Chartered Society of Forensic Sciences. Published by Elsevier B.V. All rights reserved.

  3. The Role of Trustworthiness in Teaching: An Examination of "The Prime of Miss Jean Brodie"

    ERIC Educational Resources Information Center

    Katz, Michael S.

    2014-01-01

    The purpose of this paper is to examine the role that trustworthiness plays in the ability of teachers to function as moral role models. Through exploration of Muriel Spark's novel, "The Prime of Miss Jean Brodie," I explain some of the central features of trustworthiness as a moral virtue and suggest how these features are critical…

  4. Binary variable multiple-model multiple imputation to address missing data mechanism uncertainty: Application to a smoking cessation trial

    PubMed Central

    Siddique, Juned; Harel, Ofer; Crespi, Catherine M.; Hedeker, Donald

    2014-01-01

    The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables that formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. PMID:24634315

  5. Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example Using National Data on Drug Injection in Prisons

    PubMed Central

    Haji-Maghsoudi, Saiedeh; Haghdoost, Ali-akbar; Rastegari, Azam; Baneshi, Mohammad Reza

    2013-01-01

    Background: Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods: We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. Results: In scenario 2, bias in estimates was low and performances of all methods for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion: In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data. PMID:24596839

  6. Impact of missing data strategies in studies of parental employment and health: Missing items, missing waves, and missing mothers.

    PubMed

    Nguyen, Cattram D; Strazdins, Lyndall; Nicholson, Jan M; Cooklin, Amanda R

    2018-07-01

    Understanding the long-term health effects of employment - a major social determinant - on population health is best understood via longitudinal cohort studies, yet missing data (attrition, item non-response) remain a ubiquitous challenge. Additionally, and unique to the work-family context, is the intermittent participation of parents, particularly mothers, in employment, yielding 'incomplete' data. Missing data are patterned by gender and social circumstances, and the extent and nature of resulting biases are unknown. This study investigates how estimates of the association between work-family conflict and mental health depend on the use of four different approaches to missing data treatment, each of which allows for progressive inclusion of more cases in the analyses. We used 5 waves of data from 4983 mothers participating in the Longitudinal Study of Australian Children. Only 23% had completely observed work-family conflict data across all waves. Participants with and without missing data differed such that complete cases were the most advantaged group. Comparison of the missing data treatments indicate the expected narrowing of confidence intervals when more sample were included. However, impact on the estimated strength of association varied by level of exposure: At the lower levels of work-family conflict, estimates strengthened (were larger); at higher levels they weakened (were smaller). Our results suggest that inadequate handling of missing data in extant longitudinal studies of work-family conflict and mental health may have misestimated the adverse effects of work-family conflict, particularly for mothers. Considerable caution should be exercised in interpreting analyses that fail to explore and account for biases arising from missing data. Copyright © 2018. Published by Elsevier Ltd.

  7. Missing doses in the life span study of Japanese atomic bomb survivors.

    PubMed

    Richardson, David B; Wing, Steve; Cole, Stephen R

    2013-03-15

    The Life Span Study of atomic bomb survivors is an important source of risk estimates used to inform radiation protection and compensation. Interviews with survivors in the 1950s and 1960s provided information needed to estimate radiation doses for survivors proximal to ground zero. Because of a lack of interview or the complexity of shielding, doses are missing for 7,058 of the 68,119 proximal survivors. Recent analyses excluded people with missing doses, and despite the protracted collection of interview information necessary to estimate some survivors' doses, defined start of follow-up as October 1, 1950, for everyone. We describe the prevalence of missing doses and its association with mortality, distance from hypocenter, city, age, and sex. Missing doses were more common among Nagasaki residents than among Hiroshima residents (prevalence ratio = 2.05; 95% confidence interval: 1.96, 2.14), among people who were closer to ground zero than among those who were far from it, among people who were younger at enrollment than among those who were older, and among males than among females (prevalence ratio = 1.22; 95% confidence interval: 1.17, 1.28). Missing dose was associated with all-cancer and leukemia mortality, particularly during the first years of follow-up (all-cancer rate ratio = 2.16, 95% confidence interval: 1.51, 3.08; and leukemia rate ratio = 4.28, 95% confidence interval: 1.72, 10.67). Accounting for missing dose and late entry should reduce bias in estimated dose-mortality associations.

  8. Missing Doses in the Life Span Study of Japanese Atomic Bomb Survivors

    PubMed Central

    Richardson, David B.; Wing, Steve; Cole, Stephen R.

    2013-01-01

    The Life Span Study of atomic bomb survivors is an important source of risk estimates used to inform radiation protection and compensation. Interviews with survivors in the 1950s and 1960s provided information needed to estimate radiation doses for survivors proximal to ground zero. Because of a lack of interview or the complexity of shielding, doses are missing for 7,058 of the 68,119 proximal survivors. Recent analyses excluded people with missing doses, and despite the protracted collection of interview information necessary to estimate some survivors' doses, defined start of follow-up as October 1, 1950, for everyone. We describe the prevalence of missing doses and its association with mortality, distance from hypocenter, city, age, and sex. Missing doses were more common among Nagasaki residents than among Hiroshima residents (prevalence ratio = 2.05; 95% confidence interval: 1.96, 2.14), among people who were closer to ground zero than among those who were far from it, among people who were younger at enrollment than among those who were older, and among males than among females (prevalence ratio = 1.22; 95% confidence interval: 1.17, 1.28). Missing dose was associated with all-cancer and leukemia mortality, particularly during the first years of follow-up (all-cancer rate ratio = 2.16, 95% confidence interval: 1.51, 3.08; and leukemia rate ratio = 4.28, 95% confidence interval: 1.72, 10.67). Accounting for missing dose and late entry should reduce bias in estimated dose-mortality associations. PMID:23429722

  9. Allowing for uncertainty due to missing continuous outcome data in pairwise and network meta-analysis.

    PubMed

    Mavridis, Dimitris; White, Ian R; Higgins, Julian P T; Cipriani, Andrea; Salanti, Georgia

    2015-02-28

    Missing outcome data are commonly encountered in randomized controlled trials and hence may need to be addressed in a meta-analysis of multiple trials. A common and simple approach to deal with missing data is to restrict analysis to individuals for whom the outcome was obtained (complete case analysis). However, estimated treatment effects from complete case analyses are potentially biased if informative missing data are ignored. We develop methods for estimating meta-analytic summary treatment effects for continuous outcomes in the presence of missing data for some of the individuals within the trials. We build on a method previously developed for binary outcomes, which quantifies the degree of departure from a missing at random assumption via the informative missingness odds ratio. Our new model quantifies the degree of departure from missing at random using either an informative missingness difference of means or an informative missingness ratio of means, both of which relate the mean value of the missing outcome data to that of the observed data. We propose estimating the treatment effects, adjusted for informative missingness, and their standard errors by a Taylor series approximation and by a Monte Carlo method. We apply the methodology to examples of both pairwise and network meta-analysis with multi-arm trials. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  10. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

  11. Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?

    PubMed

    Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian

    2016-07-22

    Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.

  12. Correcting bias due to missing stage data in the non-parametric estimation of stage-specific net survival for colorectal cancer using multiple imputation.

    PubMed

    Falcaro, Milena; Carpenter, James R

    2017-06-01

    Population-based net survival by tumour stage at diagnosis is a key measure in cancer surveillance. Unfortunately, data on tumour stage are often missing for a non-negligible proportion of patients and the mechanism giving rise to the missingness is usually anything but completely at random. In this setting, restricting analysis to the subset of complete records gives typically biased results. Multiple imputation is a promising practical approach to the issues raised by the missing data, but its use in conjunction with the Pohar-Perme method for estimating net survival has not been formally evaluated. We performed a resampling study using colorectal cancer population-based registry data to evaluate the ability of multiple imputation, used along with the Pohar-Perme method, to deliver unbiased estimates of stage-specific net survival and recover missing stage information. We created 1000 independent data sets, each containing 5000 patients. Stage data were then made missing at random under two scenarios (30% and 50% missingness). Complete records analysis showed substantial bias and poor confidence interval coverage. Across both scenarios our multiple imputation strategy virtually eliminated the bias and greatly improved confidence interval coverage. In the presence of missing stage data complete records analysis often gives severely biased results. We showed that combining multiple imputation with the Pohar-Perme estimator provides a valid practical approach for the estimation of stage-specific colorectal cancer net survival. As usual, when the percentage of missing data is high the results should be interpreted cautiously and sensitivity analyses are recommended. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. MISSE-6 hardware

    NASA Image and Video Library

    2009-09-02

    ISS020-E-037367 (1 Sept. 2009) --- A close-up view of a Materials International Space Station Experiment (MISSE-6) on the exterior of the Columbus laboratory is featured in this image photographed by a space walking astronaut during the STS-128 mission’s first session of extravehicular activity (EVA). MISSE collects information on how different materials weather in the environment of space. MISSE was later placed in Space Shuttle Discovery’s cargo bay for its return to Earth.

  14. Confidence-Based Feature Acquisition

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; desJardins, Marie; MacGlashan, James

    2010-01-01

    Confidence-based Feature Acquisition (CFA) is a novel, supervised learning method for acquiring missing feature values when there is missing data at both training (learning) and test (deployment) time. To train a machine learning classifier, data is encoded with a series of input features describing each item. In some applications, the training data may have missing values for some of the features, which can be acquired at a given cost. A relevant JPL example is that of the Mars rover exploration in which the features are obtained from a variety of different instruments, with different power consumption and integration time costs. The challenge is to decide which features will lead to increased classification performance and are therefore worth acquiring (paying the cost). To solve this problem, CFA, which is made up of two algorithms (CFA-train and CFA-predict), has been designed to greedily minimize total acquisition cost (during training and testing) while aiming for a specific accuracy level (specified as a confidence threshold). With this method, it is assumed that there is a nonempty subset of features that are free; that is, every instance in the data set includes these features initially for zero cost. It is also assumed that the feature acquisition (FA) cost associated with each feature is known in advance, and that the FA cost for a given feature is the same for all instances. Finally, CFA requires that the base-level classifiers produce not only a classification, but also a confidence (or posterior probability).

  15. Robust estimation of partially linear models for longitudinal data with dropouts and measurement error.

    PubMed

    Qin, Guoyou; Zhang, Jiajia; Zhu, Zhongyi; Fung, Wing

    2016-12-20

    Outliers, measurement error, and missing data are commonly seen in longitudinal data because of its data collection process. However, no method can address all three of these issues simultaneously. This paper focuses on the robust estimation of partially linear models for longitudinal data with dropouts and measurement error. A new robust estimating equation, simultaneously tackling outliers, measurement error, and missingness, is proposed. The asymptotic properties of the proposed estimator are established under some regularity conditions. The proposed method is easy to implement in practice by utilizing the existing standard generalized estimating equations algorithms. The comprehensive simulation studies show the strength of the proposed method in dealing with longitudinal data with all three features. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study and confirms the effectiveness of the intervention in producing weight loss at month 9. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Meaning of Missing Values in Eyewitness Recall and Accident Records

    PubMed Central

    Uttl, Bob; Kisinger, Kelly

    2010-01-01

    Background Eyewitness recalls and accident records frequently do not mention the conditions and behaviors of interest to researchers and lead to missing values and to uncertainty about the prevalence of these conditions and behaviors surrounding accidents. Missing values may occur because eyewitnesses report the presence but not the absence of obvious clues/accident features. We examined this possibility. Methodology/Principal Findings Participants watched car accident videos and were asked to recall as much information as they could remember about each accident. The results showed that eyewitnesses were far more likely to report the presence of present obvious clues than the absence of absent obvious clues even though they were aware of their absence. Conclusions One of the principal mechanisms causing missing values may be eyewitnesses' tendency to not report the absence of obvious features. We discuss the implications of our findings for both retrospective and prospective analyses of accident records, and illustrate the consequences of adopting inappropriate assumptions about the meaning of missing values using the Avaluator Avalanche Accident Prevention Card. PMID:20824054

  17. Meaning of missing values in eyewitness recall and accident records.

    PubMed

    Uttl, Bob; Kisinger, Kelly

    2010-09-02

    Eyewitness recalls and accident records frequently do not mention the conditions and behaviors of interest to researchers and lead to missing values and to uncertainty about the prevalence of these conditions and behaviors surrounding accidents. Missing values may occur because eyewitnesses report the presence but not the absence of obvious clues/accident features. We examined this possibility. Participants watched car accident videos and were asked to recall as much information as they could remember about each accident. The results showed that eyewitnesses were far more likely to report the presence of present obvious clues than the absence of absent obvious clues even though they were aware of their absence. One of the principal mechanisms causing missing values may be eyewitnesses' tendency to not report the absence of obvious features. We discuss the implications of our findings for both retrospective and prospective analyses of accident records, and illustrate the consequences of adopting inappropriate assumptions about the meaning of missing values using the Avaluator Avalanche Accident Prevention Card.

  18. Do missing data influence the accuracy of divergence-time estimation with BEAST?

    PubMed

    Zheng, Yuchi; Wiens, John J

    2015-04-01

    Time-calibrated phylogenies have become essential to evolutionary biology. A recurrent and unresolved question for dating analyses is whether genes with missing data cells should be included or excluded. This issue is particularly unclear for the most widely used dating method, the uncorrelated lognormal approach implemented in BEAST. Here, we test the robustness of this method to missing data. We compare divergence-time estimates from a nearly complete dataset (20 nuclear genes for 32 species of squamate reptiles) to those from subsampled matrices, including those with 5 or 2 complete loci only and those with 5 or 8 incomplete loci added. In general, missing data had little impact on estimated dates (mean error of ∼5Myr per node or less, given an overall age of ∼220Myr in squamates), even when 80% of sampled genes had 75% missing data. Mean errors were somewhat higher when all genes were 75% incomplete (∼17Myr). However, errors increased dramatically when only 2 of 9 fossil calibration points were included (∼40Myr), regardless of missing data. Overall, missing data (and even numbers of genes sampled) may have only minor impacts on the accuracy of divergence dating with BEAST, relative to the dramatic effects of fossil calibrations. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Multiple imputation strategies for zero-inflated cost data in economic evaluations: which method works best?

    PubMed

    MacNeil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G; van Hout, Hein; de Rooij, Sophia E; Heymans, Martijn W; Bosmans, Judith E

    2016-11-01

    Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing cost-effectiveness data in a randomized controlled trial. Three incomplete data sets were generated from a complete reference data set with 17, 35 and 50 % missing data in effects and costs. The strategies evaluated included complete case analysis (CCA), multiple imputation with predictive mean matching (MI-PMM), MI-PMM on log-transformed costs (log MI-PMM), and a two-step MI. Mean cost and effect estimates, standard errors and incremental net benefits were compared with the results of the analyses on the complete reference data set. The CCA, MI-PMM, and the two-step MI strategy diverged from the results for the reference data set when the amount of missing data increased. In contrast, the estimates of the Log MI-PMM strategy remained stable irrespective of the amount of missing data. MI provided better estimates than CCA in all scenarios. With low amounts of missing data the MI strategies appeared equivalent but we recommend using the log MI-PMM with missing data greater than 35 %.

  20. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    NASA Astrophysics Data System (ADS)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  1. A Probability Based Framework for Testing the Missing Data Mechanism

    ERIC Educational Resources Information Center

    Lin, Johnny Cheng-Han

    2013-01-01

    Many methods exist for imputing missing data but fewer methods have been proposed to test the missing data mechanism. Little (1988) introduced a multivariate chi-square test for the missing completely at random data mechanism (MCAR) that compares observed means for each pattern with expectation-maximization (EM) estimated means. As an alternative,…

  2. Multiple Imputation for Incomplete Data in Epidemiologic Studies

    PubMed Central

    Harel, Ofer; Mitchell, Emily M; Perkins, Neil J; Cole, Stephen R; Tchetgen Tchetgen, Eric J; Sun, BaoLuo; Schisterman, Enrique F

    2018-01-01

    Abstract Epidemiologic studies are frequently susceptible to missing information. Omitting observations with missing variables remains a common strategy in epidemiologic studies, yet this simple approach can often severely bias parameter estimates of interest if the values are not missing completely at random. Even when missingness is completely random, complete-case analysis can reduce the efficiency of estimated parameters, because large amounts of available data are simply tossed out with the incomplete observations. Alternative methods for mitigating the influence of missing information, such as multiple imputation, are becoming an increasing popular strategy in order to retain all available information, reduce potential bias, and improve efficiency in parameter estimation. In this paper, we describe the theoretical underpinnings of multiple imputation, and we illustrate application of this method as part of a collaborative challenge to assess the performance of various techniques for dealing with missing data (Am J Epidemiol. 2018;187(3):568–575). We detail the steps necessary to perform multiple imputation on a subset of data from the Collaborative Perinatal Project (1959–1974), where the goal is to estimate the odds of spontaneous abortion associated with smoking during pregnancy. PMID:29165547

  3. Use of Multiple Imputation Method to Improve Estimation of Missing Baseline Serum Creatinine in Acute Kidney Injury Research

    PubMed Central

    Peterson, Josh F.; Eden, Svetlana K.; Moons, Karel G.; Ikizler, T. Alp; Matheny, Michael E.

    2013-01-01

    Summary Background and objectives Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m2 (eGFR 75). Design, setting, participants, & measurements From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a “missing” data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. Results All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; P<0.001). Improvements in misclassification were greater in patients with impaired kidney function (full multiple imputation + serum creatinine) (15.3%) versus eGFR 75 (40.5%; P<0.001). Multiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Conclusions Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods. PMID:23037980

  4. A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables

    ERIC Educational Resources Information Center

    Savalei, Victoria; Bentler, Peter M.

    2009-01-01

    A well-known ad-hoc approach to conducting structural equation modeling with missing data is to obtain a saturated maximum likelihood (ML) estimate of the population covariance matrix and then to use this estimate in the complete data ML fitting function to obtain parameter estimates. This 2-stage (TS) approach is appealing because it minimizes a…

  5. A Comparison of Factor Score Estimation Methods in the Presence of Missing Data: Reliability and an Application to Nicotine Dependence

    ERIC Educational Resources Information Center

    Estabrook, Ryne; Neale, Michael

    2013-01-01

    Factor score estimation is a controversial topic in psychometrics, and the estimation of factor scores from exploratory factor models has historically received a great deal of attention. However, both confirmatory factor models and the existence of missing data have generally been ignored in this debate. This article presents a simulation study…

  6. Drogue pose estimation for unmanned aerial vehicle autonomous aerial refueling system based on infrared vision sensor

    NASA Astrophysics Data System (ADS)

    Chen, Shanjun; Duan, Haibin; Deng, Yimin; Li, Cong; Zhao, Guozhi; Xu, Yan

    2017-12-01

    Autonomous aerial refueling is a significant technology that can significantly extend the endurance of unmanned aerial vehicles. A reliable method that can accurately estimate the position and attitude of the probe relative to the drogue is the key to such a capability. A drogue pose estimation method based on infrared vision sensor is introduced with the general goal of yielding an accurate and reliable drogue state estimate. First, by employing direct least squares ellipse fitting and convex hull in OpenCV, a feature point matching and interference point elimination method is proposed. In addition, considering the conditions that some infrared LEDs are damaged or occluded, a missing point estimation method based on perspective transformation and affine transformation is designed. Finally, an accurate and robust pose estimation algorithm improved by the runner-root algorithm is proposed. The feasibility of the designed visual measurement system is demonstrated by flight test, and the results indicate that our proposed method enables precise and reliable pose estimation of the probe relative to the drogue, even in some poor conditions.

  7. 40 CFR 98.75 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data. 98.75 Section 98.75 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Ammonia Manufacturing § 98.75 Procedures for...

  8. 40 CFR 98.75 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data. 98.75 Section 98.75 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Ammonia Manufacturing § 98.75 Procedures for...

  9. 40 CFR 98.75 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data. 98.75 Section 98.75 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Ammonia Manufacturing § 98.75 Procedures for...

  10. 40 CFR 98.75 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data. 98.75 Section 98.75 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Ammonia Manufacturing § 98.75 Procedures for...

  11. 40 CFR 98.75 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data. 98.75 Section 98.75 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Ammonia Manufacturing § 98.75 Procedures for...

  12. 40 CFR 98.45 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data. 98.45 Section 98.45 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Electricity Generation § 98.45 Procedures for...

  13. 40 CFR 98.45 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data. 98.45 Section 98.45 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Electricity Generation § 98.45 Procedures for...

  14. 40 CFR 98.45 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data. 98.45 Section 98.45 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Electricity Generation § 98.45 Procedures for...

  15. 40 CFR 98.45 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data. 98.45 Section 98.45 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Electricity Generation § 98.45 Procedures for...

  16. 40 CFR 98.45 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data. 98.45 Section 98.45 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Electricity Generation § 98.45 Procedures for...

  17. Estimating monthly streamflow values by cokriging

    USGS Publications Warehouse

    Solow, A.R.; Gorelick, S.M.

    1986-01-01

    Cokriging is applied to estimation of missing monthly streamflow values in three records from gaging stations in west central Virginia. Missing values are estimated from optimal consideration of the pattern of auto- and cross-correlation among standardized residual log-flow records. Investigation of the sensitivity of estimation to data configuration showed that when observations are available within two months of a missing value, estimation is improved by accounting for correlation. Concurrent and lag-one observations tend to screen the influence of other available observations. Three models of covariance structure in residual log-flow records are compared using cross-validation. Models differ in how much monthly variation they allow in covariance. Precision of estimation, reflected in mean squared error (MSE), proved to be insensitive to this choice. Cross-validation is suggested as a tool for choosing an inverse transformation when an initial nonlinear transformation is applied to flow values. ?? 1986 Plenum Publishing Corporation.

  18. A Review On Missing Value Estimation Using Imputation Algorithm

    NASA Astrophysics Data System (ADS)

    Armina, Roslan; Zain, Azlan Mohd; Azizah Ali, Nor; Sallehuddin, Roselina

    2017-09-01

    The presence of the missing value in the data set has always been a major problem for precise prediction. The method for imputing missing value needs to minimize the effect of incomplete data sets for the prediction model. Many algorithms have been proposed for countermeasure of missing value problem. In this review, we provide a comprehensive analysis of existing imputation algorithm, focusing on the technique used and the implementation of global or local information of data sets for missing value estimation. In addition validation method for imputation result and way to measure the performance of imputation algorithm also described. The objective of this review is to highlight possible improvement on existing method and it is hoped that this review gives reader better understanding of imputation method trend.

  19. Parameter estimation in Cox models with missing failure indicators and the OPPERA study.

    PubMed

    Brownstein, Naomi C; Cai, Jianwen; Slade, Gary D; Bair, Eric

    2015-12-30

    In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the "gold standard" for diagnosing temporomandibular disorder (TMD) is a physical examination by a trained clinician. In large studies, examining all participants in this manner is infeasible. Instead, it is common to use questionnaires to screen for incidence of TMD and perform the "gold standard" examination only on participants who screen positively. Unfortunately, some participants may leave the study before receiving the "gold standard" examination. Within the framework of survival analysis, this results in missing failure indicators. Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose a method for parameter estimation in survival models with missing failure indicators. We estimate the probability of being an incident case for those lacking a "gold standard" examination using logistic regression. These estimated probabilities are used to generate multiple imputations of case status for each missing examination that are combined with observed data in appropriate regression models. The variance introduced by the procedure is estimated using multiple imputation. The method can be used to estimate both regression coefficients in Cox proportional hazard models as well as incidence rates using Poisson regression. We simulate data with missing failure indicators and show that our method performs as well as or better than competing methods. Finally, we apply the proposed method to data from the OPPERA study. Copyright © 2015 John Wiley & Sons, Ltd.

  20. Comparing the accuracy and precision of three techniques used for estimating missing landmarks when reconstructing fossil hominin crania.

    PubMed

    Neeser, Rudolph; Ackermann, Rebecca Rogers; Gain, James

    2009-09-01

    Various methodological approaches have been used for reconstructing fossil hominin remains in order to increase sample sizes and to better understand morphological variation. Among these, morphometric quantitative techniques for reconstruction are increasingly common. Here we compare the accuracy of three approaches--mean substitution, thin plate splines, and multiple linear regression--for estimating missing landmarks of damaged fossil specimens. Comparisons are made varying the number of missing landmarks, sample sizes, and the reference species of the population used to perform the estimation. The testing is performed on landmark data from individuals of Homo sapiens, Pan troglodytes and Gorilla gorilla, and nine hominin fossil specimens. Results suggest that when a small, same-species fossil reference sample is available to guide reconstructions, thin plate spline approaches perform best. However, if no such sample is available (or if the species of the damaged individual is uncertain), estimates of missing morphology based on a single individual (or even a small sample) of close taxonomic affinity are less accurate than those based on a large sample of individuals drawn from more distantly related extant populations using a technique (such as a regression method) able to leverage the information (e.g., variation/covariation patterning) contained in this large sample. Thin plate splines also show an unexpectedly large amount of error in estimating landmarks, especially over large areas. Recommendations are made for estimating missing landmarks under various scenarios. Copyright 2009 Wiley-Liss, Inc.

  1. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  2. Integrated built-in-test false and missed alarms reduction based on forward infinite impulse response & recurrent finite impulse response dynamic neural networks

    NASA Astrophysics Data System (ADS)

    Cui, Yiqian; Shi, Junyou; Wang, Zili

    2017-11-01

    Built-in tests (BITs) are widely used in mechanical systems to perform state identification, whereas the BIT false and missed alarms cause trouble to the operators or beneficiaries to make correct judgments. Artificial neural networks (ANN) are previously used for false and missed alarms identification, which has the features such as self-organizing and self-study. However, these ANN models generally do not incorporate the temporal effect of the bottom-level threshold comparison outputs and the historical temporal features are not fully considered. To improve the situation, this paper proposes a new integrated BIT design methodology by incorporating a novel type of dynamic neural networks (DNN) model. The new DNN model is termed as Forward IIR & Recurrent FIR DNN (FIRF-DNN), where its component neurons, network structures, and input/output relationships are discussed. The condition monitoring false and missed alarms reduction implementation scheme based on FIRF-DNN model is also illustrated, which is composed of three stages including model training, false and missed alarms detection, and false and missed alarms suppression. Finally, the proposed methodology is demonstrated in the application study and the experimental results are analyzed.

  3. Fault-tolerant feature-based estimation of space debris rotational motion during active removal missions

    NASA Astrophysics Data System (ADS)

    Biondi, Gabriele; Mauro, Stefano; Pastorelli, Stefano; Sorli, Massimo

    2018-05-01

    One of the key functionalities required by an Active Debris Removal mission is the assessment of the target kinematics and inertial properties. Passive sensors, such as stereo cameras, are often included in the onboard instrumentation of a chaser spacecraft for capturing sequential photographs and for tracking features of the target surface. A plenty of methods, based on Kalman filtering, are available for the estimation of the target's state from feature positions; however, to guarantee the filter convergence, they typically require continuity of measurements and the capability of tracking a fixed set of pre-defined features of the object. These requirements clash with the actual tracking conditions: failures in feature detection often occur and the assumption of having some a-priori knowledge about the shape of the target could be restrictive in certain cases. The aim of the presented work is to propose a fault-tolerant alternative method for estimating the angular velocity and the relative magnitudes of the principal moments of inertia of the target. Raw data regarding the positions of the tracked features are processed to evaluate corrupted values of a 3-dimentional parameter which entirely describes the finite screw motion of the debris and which primarily is invariant on the particular set of considered features of the object. Missing values of the parameter are completely restored exploiting the typical periodicity of the rotational motion of an uncontrolled satellite: compressed sensing techniques, typically adopted for recovering images or for prognostic applications, are herein used in a completely original fashion for retrieving a kinematic signal that appears sparse in the frequency domain. Due to its invariance about the features, no assumptions are needed about the target's shape and continuity of the tracking. The obtained signal is useful for the indirect evaluation of an attitude signal that feeds an unscented Kalman filter for the estimation of the global rotational state of the target. The results of the computer simulations showed a good robustness of the method and its potential applicability for general motion conditions of the target.

  4. Handling Missing Data in Educational Research Using SPSS

    ERIC Educational Resources Information Center

    Cheema, Jehanzeb

    2012-01-01

    This study looked at the effect of a number of factors such as the choice of analytical method, the handling method for missing data, sample size, and proportion of missing data, in order to evaluate the effect of missing data treatment on accuracy of estimation. In order to accomplish this a methodological approach involving simulated data was…

  5. MISSE-6 hardware

    NASA Image and Video Library

    2009-09-02

    ISS020-E-037372 (1 Sept. 2009) --- A close-up view of a Materials International Space Station Experiment (MISSE-6) on the exterior of the Columbus laboratory is featured in this image photographed by a space walking astronaut during the STS-128 mission’s first session of extravehicular activity (EVA). MISSE collects information on how different materials weather in the environment of space. MISSE was later placed in Space Shuttle Discovery’s payload bay for its return to Earth. A portion of a payload bay door is visible in the background.

  6. MISSE-6 hardware

    NASA Image and Video Library

    2009-09-02

    ISS020-E-037369 (1 Sept. 2009) --- A close-up view of a Materials International Space Station Experiment (MISSE-6) on the exterior of the Columbus laboratory is featured in this image photographed by a space walking astronaut during the STS-128 mission’s first session of extravehicular activity (EVA). MISSE collects information on how different materials weather in the environment of space. MISSE was later placed in Space Shuttle Discovery’s payload bay for its return to Earth. A portion of a payload bay door is visible in the background.

  7. Classification and data acquisition with incomplete data

    NASA Astrophysics Data System (ADS)

    Williams, David P.

    In remote-sensing applications, incomplete data can result when only a subset of sensors (e.g., radar, infrared, acoustic) are deployed at certain regions. The limitations of single sensor systems have spurred interest in employing multiple sensor modalities simultaneously. For example, in land mine detection tasks, different sensor modalities are better-suited to capture different aspects of the underlying physics of the mines. Synthetic aperture radar sensors may be better at detecting surface mines, while infrared sensors may be better at detecting buried mines. By employing multiple sensor modalities to address the detection task, the strengths of the disparate sensors can be exploited in a synergistic manner to improve performance beyond that which would be achievable with either single sensor alone. When multi-sensor approaches are employed, however, incomplete data can be manifested. If each sensor is located on a separate platform ( e.g., aircraft), each sensor may interrogate---and hence collect data over---only partially overlapping areas of land. As a result, some data points may be characterized by data (i.e., features) from only a subset of the possible sensors employed in the task. Equivalently, this scenario implies that some data points will be missing features. Increasing focus in the future on using---and fusing data from---multiple sensors will make such incomplete-data problems commonplace. In many applications involving incomplete data, it is possible to acquire the missing data at a cost. In multi-sensor remote-sensing applications, data is acquired by deploying sensors to data points. Acquiring data is usually an expensive, time-consuming task, a fact that necessitates an intelligent data acquisition process. Incomplete data is not limited to remote-sensing applications, but rather, can arise in virtually any data set. In this dissertation, we address the general problem of classification when faced with incomplete data. We also address the closely related problem of active data acquisition, which develops a strategy to acquire missing features and labels that will most benefit the classification task. We first address the general problem of classification with incomplete data, maintaining the view that all data (i.e., information) is valuable. We employ a logistic regression framework within which we formulate a supervised classification algorithm for incomplete data. This principled, yet flexible, framework permits several interesting extensions that allow all available data to be utilized. One extension incorporates labeling error, which permits the usage of potentially imperfectly labeled data in learning a classifier. A second major extension converts the proposed algorithm to a semi-supervised approach by utilizing unlabeled data via graph-based regularization. Finally, the classification algorithm is extended to the case in which (image) data---from which features are extracted---are available from multiple resolutions. Taken together, this family of incomplete-data classification algorithms exploits all available data in a principled manner by avoiding explicit imputation. Instead, missing data is integrated out analytically with the aid of an estimated conditional density function (conditioned on the observed features). This feat is accomplished by invoking only mild assumptions. We also address the problem of active data acquisition by determining which missing data should be acquired to most improve performance. Specifically, we examine this data acquisition task when the data to be acquired can be either labels or features. The proposed approach is based on a criterion that accounts for the expected benefit of the acquisition. This approach, which is applicable for any general missing data problem, exploits the incomplete-data classification framework introduced in the first part of this dissertation. This data acquisition approach allows for the acquisition of both labels and features. Moreover, several types of feature acquisition are permitted, including the acquisition of individual or multiple features for individual or multiple data points, which may be either labeled or unlabeled. Furthermore, if different types of data acquisition are feasible for a given application, the algorithm will automatically determine the most beneficial type of data to acquire. Experimental results on both benchmark machine learning data sets and real (i.e., measured) remote-sensing data demonstrate the advantages of the proposed incomplete-data classification and active data acquisition algorithms.

  8. TV Crime Reporter Missed Clues | NIH MedlinePlus the Magazine

    MedlinePlus

    ... JavaScript on. Feature: Women and Heart Disease TV Crime Reporter Missed Clues Past Issues / Spring 2016 Table ... heart attack at the age of 36. A crime reporter for WJLA-TV in Washington, D.C., ...

  9. Estimating a Missing Examination Score

    ERIC Educational Resources Information Center

    Loui, Michael C.; Lin, Athena

    2017-01-01

    In science and engineering courses, instructors administer multiple examinations as major assessments of students' learning. When a student is unable to take an exam, the instructor might estimate the missing exam score to calculate the student's course grade. Using exam score data from multiple offerings of two large courses at a public…

  10. Applied Missing Data Analysis. Methodology in the Social Sciences Series

    ERIC Educational Resources Information Center

    Enders, Craig K.

    2010-01-01

    Walking readers step by step through complex concepts, this book translates missing data techniques into something that applied researchers and graduate students can understand and utilize in their own research. Enders explains the rationale and procedural details for maximum likelihood estimation, Bayesian estimation, multiple imputation, and…

  11. 40 CFR 98.85 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... to determine combined process and combustion CO2 emissions, the missing data procedures in § 98.35 apply. (b) For CO2 process emissions from cement manufacturing facilities calculated according to § 98... best available estimate of the monthly clinker production based on information used for accounting...

  12. 40 CFR 98.315 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... measured parameters used in the GHG emissions calculations is required (e.g., carbon content values, etc... such estimates. (a) For each missing value of the monthly carbon content of calcined petroleum coke the substitute data value shall be the arithmetic average of the quality-assured values of carbon contents for...

  13. Missing value imputation: with application to handwriting data

    NASA Astrophysics Data System (ADS)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  14. A sensitivity analysis for missing outcomes due to truncation by death under the matched-pairs design.

    PubMed

    Imai, Kosuke; Jiang, Zhichao

    2018-04-29

    The matched-pairs design enables researchers to efficiently infer causal effects from randomized experiments. In this paper, we exploit the key feature of the matched-pairs design and develop a sensitivity analysis for missing outcomes due to truncation by death, in which the outcomes of interest (e.g., quality of life measures) are not even well defined for some units (e.g., deceased patients). Our key idea is that if 2 nearly identical observations are paired prior to the randomization of the treatment, the missingness of one unit's outcome is informative about the potential missingness of the other unit's outcome under an alternative treatment condition. We consider the average treatment effect among always-observed pairs (ATOP) whose units exhibit no missing outcome regardless of their treatment status. The naive estimator based on available pairs is unbiased for the ATOP if 2 units of the same pair are identical in terms of their missingness patterns. The proposed sensitivity analysis characterizes how the bounds of the ATOP widen as the degree of the within-pair similarity decreases. We further extend the methodology to the matched-pairs design in observational studies. Our simulation studies show that informative bounds can be obtained under some scenarios when the proportion of missing data is not too large. The proposed methodology is also applied to the randomized evaluation of the Mexican universal health insurance program. An open-source software package is available for implementing the proposed research. Copyright © 2018 John Wiley & Sons, Ltd.

  15. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  16. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    PubMed

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  17. Moderation analysis with missing data in the predictors.

    PubMed

    Zhang, Qian; Wang, Lijuan

    2017-12-01

    The most widely used statistical model for conducting moderation analysis is the moderated multiple regression (MMR) model. In MMR modeling, missing data could pose a challenge, mainly because the interaction term is a product of two or more variables and thus is a nonlinear function of the involved variables. In this study, we consider a simple MMR model, where the effect of the focal predictor X on the outcome Y is moderated by a moderator U. The primary interest is to find ways of estimating and testing the moderation effect with the existence of missing data in X. We mainly focus on cases when X is missing completely at random (MCAR) and missing at random (MAR). Three methods are compared: (a) Normal-distribution-based maximum likelihood estimation (NML); (b) Normal-distribution-based multiple imputation (NMI); and (c) Bayesian estimation (BE). Via simulations, we found that NML and NMI could lead to biased estimates of moderation effects under MAR missingness mechanism. The BE method outperformed NMI and NML for MMR modeling with missing data in the focal predictor, missingness depending on the moderator and/or auxiliary variables, and correctly specified distributions for the focal predictor. In addition, more robust BE methods are needed in terms of the distribution mis-specification problem of the focal predictor. An empirical example was used to illustrate the applications of the methods with a simple sensitivity analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  18. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets.

    PubMed

    Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

    2018-01-01

    Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  19. Should multiple imputation be the method of choice for handling missing data in randomized trials?

    PubMed Central

    Sullivan, Thomas R; White, Ian R; Salter, Amy B; Ryan, Philip; Lee, Katherine J

    2016-01-01

    The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle missing data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both missing outcome and missing baseline data, with missing outcome data induced under missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to missing outcome data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group. PMID:28034175

  20. Should multiple imputation be the method of choice for handling missing data in randomized trials?

    PubMed

    Sullivan, Thomas R; White, Ian R; Salter, Amy B; Ryan, Philip; Lee, Katherine J

    2016-01-01

    The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle missing data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both missing outcome and missing baseline data, with missing outcome data induced under missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to missing outcome data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group.

  1. Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

    PubMed

    Taylor, Sandra L; Leiserowitz, Gary S; Kim, Kyoungmi

    2013-12-01

    Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

  2. View of MISSE PEC taken during STS-118/Expedition 15 Joint Operations

    NASA Image and Video Library

    2007-08-13

    ISS015-E-22410 (13 Aug. 2007) --- Backdropped by a blue and white Earth, a Materials International Space Station Experiment (MISSE) on the exterior of the station is featured in this image photographed by a crewmember during the STS-118 mission's second planned session of extravehicular activity (EVA). MISSE collects information on how different materials weather in the environment of space.

  3. Fatality estimator user’s guide

    USGS Publications Warehouse

    Huso, Manuela M.; Som, Nicholas; Ladd, Lew

    2012-12-11

    Only carcasses judged to have been killed after the previous search should be included in the fatality data set submitted to this estimator software. This estimator already corrects for carcasses missed in previous searches, so carcasses judged to have been missed at least once should be considered “incidental” and not included in the fatality data set used to estimate fatality. Note: When observed carcass count is <5 (including 0 for species known to be at risk, but not observed), USGS Data Series 881 (http://pubs.usgs.gov/ds/0881/) is recommended for fatality estimation.

  4. A Bayesian approach to assess heart disease mortality among persons with diabetes in the presence of missing data.

    PubMed

    Cadwell, Betsy L; Boyle, James P; Tierney, Edward F; Thompson, Theodore J

    2007-09-01

    Some states' death certificate form includes a diabetes yes/no check box that enables policy makers to investigate the change in heart disease mortality rates by diabetes status. Because the check boxes are sometimes unmarked, a method accounting for missing data is needed when estimating heart disease mortality rates by diabetes status. Using North Dakota's data (1992-2003), we generate the posterior distribution of diabetes status to estimate diabetes status among those with heart disease and an unmarked check box using Monte Carlo methods. Combining this estimate with the number of death certificates with known diabetes status provides a numerator for heart disease mortality rates. Denominators for rates were estimated from the North Dakota Behavioral Risk Factor Surveillance System. Accounting for missing data, age-adjusted heart disease mortality rates (per 1,000) among women with diabetes were 8.6 during 1992-1998 and 6.7 during 1999-2003. Among men with diabetes, rates were 13.0 during 1992-1998 and 10.0 during 1999-2003. The Bayesian approach accounted for the uncertainty due to missing diabetes status as well as the uncertainty in estimating the populations with diabetes.

  5. A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy).

    PubMed

    Lo Presti, Rossella; Barca, Emanuele; Passarella, Giuseppe

    2010-01-01

    Environmental time series are often affected by the "presence" of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the "most similar" monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.

  6. Inferential Precision in Single-Case Time-Series Data Streams: How Well Does the EM Procedure Perform When Missing Observations Occur in Autocorrelated Data?

    PubMed Central

    Smith, Justin D.; Borckardt, Jeffrey J.; Nash, Michael R.

    2013-01-01

    The case-based time-series design is a viable methodology for treatment outcome research. However, the literature has not fully addressed the problem of missing observations with such autocorrelated data streams. Mainly, to what extent do missing observations compromise inference when observations are not independent? Do the available missing data replacement procedures preserve inferential integrity? Does the extent of autocorrelation matter? We use Monte Carlo simulation modeling of a single-subject intervention study to address these questions. We find power sensitivity to be within acceptable limits across four proportions of missing observations (10%, 20%, 30%, and 40%) when missing data are replaced using the Expectation-Maximization Algorithm, more commonly known as the EM Procedure (Dempster, Laird, & Rubin, 1977).This applies to data streams with lag-1 autocorrelation estimates under 0.80. As autocorrelation estimates approach 0.80, the replacement procedure yields an unacceptable power profile. The implications of these findings and directions for future research are discussed. PMID:22697454

  7. Modeling missing data in knowledge space theory.

    PubMed

    de Chiusole, Debora; Stefanutti, Luca; Anselmi, Pasquale; Robusto, Egidio

    2015-12-01

    Missing data are a well known issue in statistical inference, because some responses may be missing, even when data are collected carefully. The problem that arises in these cases is how to deal with missing data. In this article, the missingness is analyzed in knowledge space theory, and in particular when the basic local independence model (BLIM) is applied to the data. Two extensions of the BLIM to missing data are proposed: The former, called ignorable missing BLIM (IMBLIM), assumes that missing data are missing completely at random; the latter, called missing BLIM (MissBLIM), introduces specific dependencies of the missing data on the knowledge states, thus assuming that the missing data are missing not at random. The IMBLIM and the MissBLIM modeled the missingness in a satisfactory way, in both a simulation study and an empirical application, depending on the process that generates the missingness: If the missing data-generating process is of type missing completely at random, then either IMBLIM or MissBLIM provide adequate fit to the data. However, if the pattern of missingness is functionally dependent upon unobservable features of the data (e.g., missing answers are more likely to be wrong), then only a correctly specified model of the missingness distribution provides an adequate fit to the data. (c) 2015 APA, all rights reserved).

  8. Causal inference with missing exposure information: Methods and applications to an obstetric study.

    PubMed

    Zhang, Zhiwei; Liu, Wei; Zhang, Bo; Tang, Li; Zhang, Jun

    2016-10-01

    Causal inference in observational studies is frequently challenged by the occurrence of missing data, in addition to confounding. Motivated by the Consortium on Safe Labor, a large observational study of obstetric labor practice and birth outcomes, this article focuses on the problem of missing exposure information in a causal analysis of observational data. This problem can be approached from different angles (i.e. missing covariates and causal inference), and useful methods can be obtained by drawing upon the available techniques and insights in both areas. In this article, we describe and compare a collection of methods based on different modeling assumptions, under standard assumptions for missing data (i.e. missing-at-random and positivity) and for causal inference with complete data (i.e. no unmeasured confounding and another positivity assumption). These methods involve three models: one for treatment assignment, one for the dependence of outcome on treatment and covariates, and one for the missing data mechanism. In general, consistent estimation of causal quantities requires correct specification of at least two of the three models, although there may be some flexibility as to which two models need to be correct. Such flexibility is afforded by doubly robust estimators adapted from the missing covariates literature and the literature on causal inference with complete data, and by a newly developed triply robust estimator that is consistent if any two of the three models are correct. The methods are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortium on Safe Labor. © The Author(s) 2013.

  9. Feature Inference Learning and Eyetracking

    ERIC Educational Resources Information Center

    Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.

    2009-01-01

    Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…

  10. Neural Correlates of User-initiated Motor Success and Failure - A Brain-Computer Interface Perspective.

    PubMed

    Yazmir, Boris; Reiner, Miriam

    2018-05-15

    Any motor action is, by nature, potentially accompanied by human errors. In order to facilitate development of error-tailored Brain-Computer Interface (BCI) correction systems, we focused on internal, human-initiated errors, and investigated EEG correlates of user outcome successes and errors during a continuous 3D virtual tennis game against a computer player. We used a multisensory, 3D, highly immersive environment. Missing and repelling the tennis ball were considered, as 'error' (miss) and 'success' (repel). Unlike most previous studies, where the environment "encouraged" the participant to perform a mistake, here errors happened naturally, resulting from motor-perceptual-cognitive processes of incorrect estimation of the ball kinematics, and can be regarded as user internal, self-initiated errors. Results show distinct and well-defined Event-Related Potentials (ERPs), embedded in the ongoing EEG, that differ across conditions by waveforms, scalp signal distribution maps, source estimation results (sLORETA) and time-frequency patterns, establishing a series of typical features that allow valid discrimination between user internal outcome success and error. The significant delay in latency between positive peaks of error- and success-related ERPs, suggests a cross-talk between top-down and bottom-up processing, represented by an outcome recognition process, in the context of the game world. Success-related ERPs had a central scalp distribution, while error-related ERPs were centro-parietal. The unique characteristics and sharp differences between EEG correlates of error/success provide the crucial components for an improved BCI system. The features of the EEG waveform can be used to detect user action outcome, to be fed into the BCI correction system. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  11. Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

    PubMed

    Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

    2013-09-06

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.

  12. Missing data imputation and haplotype phase inference for genome-wide association studies

    PubMed Central

    Browning, Sharon R.

    2009-01-01

    Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance. PMID:18850115

  13. Accounting for missing data in the estimation of contemporary genetic effective population size (N(e) ).

    PubMed

    Peel, D; Waples, R S; Macbeth, G M; Do, C; Ovenden, J R

    2013-03-01

    Theoretical models are often applied to population genetic data sets without fully considering the effect of missing data. Researchers can deal with missing data by removing individuals that have failed to yield genotypes and/or by removing loci that have failed to yield allelic determinations, but despite their best efforts, most data sets still contain some missing data. As a consequence, realized sample size differs among loci, and this poses a problem for unbiased methods that must explicitly account for random sampling error. One commonly used solution for the calculation of contemporary effective population size (N(e) ) is to calculate the effective sample size as an unweighted mean or harmonic mean across loci. This is not ideal because it fails to account for the fact that loci with different numbers of alleles have different information content. Here we consider this problem for genetic estimators of contemporary effective population size (N(e) ). To evaluate bias and precision of several statistical approaches for dealing with missing data, we simulated populations with known N(e) and various degrees of missing data. Across all scenarios, one method of correcting for missing data (fixed-inverse variance-weighted harmonic mean) consistently performed the best for both single-sample and two-sample (temporal) methods of estimating N(e) and outperformed some methods currently in widespread use. The approach adopted here may be a starting point to adjust other population genetics methods that include per-locus sample size components. © 2012 Blackwell Publishing Ltd.

  14. Gaussian mixture clustering and imputation of microarray data.

    PubMed

    Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

    2004-04-12

    In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.

  15. Sensitivity analysis for missing dichotomous outcome data in multi-visit randomized clinical trial with randomization-based covariance adjustment.

    PubMed

    Li, Siying; Koch, Gary G; Preisser, John S; Lam, Diana; Sanchez-Kam, Matilde

    2017-01-01

    Dichotomous endpoints in clinical trials have only two possible outcomes, either directly or via categorization of an ordinal or continuous observation. It is common to have missing data for one or more visits during a multi-visit study. This paper presents a closed form method for sensitivity analysis of a randomized multi-visit clinical trial that possibly has missing not at random (MNAR) dichotomous data. Counts of missing data are redistributed to the favorable and unfavorable outcomes mathematically to address possibly informative missing data. Adjusted proportion estimates and their closed form covariance matrix estimates are provided. Treatment comparisons over time are addressed with Mantel-Haenszel adjustment for a stratification factor and/or randomization-based adjustment for baseline covariables. The application of such sensitivity analyses is illustrated with an example. An appendix outlines an extension of the methodology to ordinal endpoints.

  16. A nonparametric multiple imputation approach for missing categorical data.

    PubMed

    Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

    2017-06-06

    Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.

  17. Marginalized zero-inflated Poisson models with missing covariates.

    PubMed

    Benecha, Habtamu K; Preisser, John S; Divaris, Kimon; Herring, Amy H; Das, Kalyan

    2018-05-11

    Unlike zero-inflated Poisson regression, marginalized zero-inflated Poisson (MZIP) models for counts with excess zeros provide estimates with direct interpretations for the overall effects of covariates on the marginal mean. In the presence of missing covariates, MZIP and many other count data models are ordinarily fitted using complete case analysis methods due to lack of appropriate statistical methods and software. This article presents an estimation method for MZIP models with missing covariates. The method, which is applicable to other missing data problems, is illustrated and compared with complete case analysis by using simulations and dental data on the caries preventive effects of a school-based fluoride mouthrinse program. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. A Comparison of Missing-Data Procedures for Arima Time-Series Analysis

    ERIC Educational Resources Information Center

    Velicer, Wayne F.; Colby, Suzanne M.

    2005-01-01

    Missing data are a common practical problem for longitudinal designs. Time-series analysis is a longitudinal method that involves a large number of observations on a single unit. Four different missing-data methods (deletion, mean substitution, mean of adjacent observations, and maximum likelihood estimation) were evaluated. Computer-generated…

  19. They Remember the "Lost" People.

    ERIC Educational Resources Information Center

    Klages, Karen

    Estimates of the number of children currently missing in the United States are only approximate because there is no effective central data bank to collect information on missing persons and unidentified bodies. However, the problem appears to have reached epidemic proportions. Some parents of missing persons have formed organizations in different…

  20. LIMITATIONS ON THE USES OF MULTIMEDIA EXPOSURE MEASUREMENTS FOR MULTIPATHWAY EXPOSURE ASSESSMENT - PART II: EFFECTS OF MISSING DATA AND IMPRECISION

    EPA Science Inventory

    Multimedia data from two probability-based exposure studies were investigated in terms of how missing data and measurement-error imprecision affected estimation of population parameters and associations. Missing data resulted mainly from individuals' refusing to participate in c...

  1. The Empirical Nature and Statistical Treatment of Missing Data

    ERIC Educational Resources Information Center

    Tannenbaum, Christyn E.

    2009-01-01

    Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…

  2. 40 CFR 98.96 - Data reporting requirements.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... of this subpart, for each fluorinated GHG used. (s) Where missing data procedures were used to... missing data procedures were followed in the reporting year, the method used to estimate the missing data... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Data reporting requirements. 98.96...

  3. 40 CFR 98.456 - Data reporting requirements.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ..., of Equation SS-6 of this subpart. (t) For any missing data, you must report the reason the data were missing, the parameters for which the data were missing, the substitute parameters used to estimate... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Data reporting requirements. 98.456...

  4. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    USDA-ARS?s Scientific Manuscript database

    Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...

  5. Toward a hybrid brain-computer interface based on repetitive visual stimuli with missing events.

    PubMed

    Wu, Yingying; Li, Man; Wang, Jing

    2016-07-26

    Steady-state visually evoked potentials (SSVEPs) can be elicited by repetitive stimuli and extracted in the frequency domain with satisfied performance. However, the temporal information of such stimulus is often ignored. In this study, we utilized repetitive visual stimuli with missing events to present a novel hybrid BCI paradigm based on SSVEP and omitted stimulus potential (OSP). Four discs flickering from black to white with missing flickers served as visual stimulators to simultaneously elicit subject's SSVEPs and OSPs. Key parameters in the new paradigm, including flicker frequency, optimal electrodes, missing flicker duration and intervals of missing events were qualitatively discussed with offline data. Two omitted flicker patterns including missing black/white disc were proposed and compared. Averaging times were optimized with Information Transfer Rate (ITR) in online experiments, where SSVEPs and OSPs were identified using Canonical Correlation Analysis in the frequency domain and Support Vector Machine (SVM)-Bayes fusion in the time domain, respectively. The online accuracy and ITR (mean ± standard deviation) over nine healthy subjects were 79.29 ± 18.14 % and 19.45 ± 11.99 bits/min with missing black disc pattern, and 86.82 ± 12.91 % and 24.06 ± 10.95 bits/min with missing white disc pattern, respectively. The proposed BCI paradigm, for the first time, demonstrated that SSVEPs and OSPs can be simultaneously elicited in single visual stimulus pattern and recognized in real-time with satisfied performance. Besides the frequency features such as SSVEP elicited by repetitive stimuli, we found a new feature (OSP) in the time domain to design a novel hybrid BCI paradigm by adding missing events in repetitive stimuli.

  6. Bi-level Multi-Source Learning for Heterogeneous Block-wise Missing Data

    PubMed Central

    Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M.; Ye, Jieping

    2013-01-01

    Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer’s Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified “bi-level” learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. PMID:23988272

  7. Bi-level multi-source learning for heterogeneous block-wise missing data.

    PubMed

    Xiang, Shuo; Yuan, Lei; Fan, Wei; Wang, Yalin; Thompson, Paul M; Ye, Jieping

    2014-11-15

    Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. © 2013 Elsevier Inc. All rights reserved.

  8. MULTI-SOURCE FEATURE LEARNING FOR JOINT ANALYSIS OF INCOMPLETE MULTIPLE HETEROGENEOUS NEUROIMAGING DATA

    PubMed Central

    Yuan, Lei; Wang, Yalin; Thompson, Paul M.; Narayan, Vaibhav A.; Ye, Jieping

    2012-01-01

    Analysis of incomplete data is a big challenge when integrating large-scale brain imaging datasets from different imaging modalities. In the Alzheimer’s Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. In this paper, we address this problem by proposing an incomplete Multi-Source Feature (iMSF) learning method where all the samples (with at least one available data source) can be used. To illustrate the proposed approach, we classify patients from the ADNI study into groups with Alzheimer’s disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI’s 780 participants (172 AD, 397 MCI, 211 NC), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithm. Depending on the problem being solved, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. To build a practical and robust system, we construct a classifier ensemble by combining our method with four other methods for missing value estimation. Comprehensive experiments with various parameters show that our proposed iMSF method and the ensemble model yield stable and promising results. PMID:22498655

  9. Morphological and wavelet features towards sonographic thyroid nodules evaluation.

    PubMed

    Tsantis, Stavros; Dimitropoulos, Nikos; Cavouras, Dionisis; Nikiforidis, George

    2009-03-01

    This paper presents a computer-based classification scheme that utilized various morphological and novel wavelet-based features towards malignancy risk evaluation of thyroid nodules in ultrasonography. The study comprised 85 ultrasound images-patients that were cytological confirmed (54 low-risk and 31 high-risk). A set of 20 features (12 based on nodules boundary shape and 8 based on wavelet local maxima located within each nodule) has been generated. Two powerful pattern recognition algorithms (support vector machines and probabilistic neural networks) have been designed and developed in order to quantify the power of differentiation of the introduced features. A comparative study has also been held, in order to estimate the impact speckle had onto the classification procedure. The diagnostic sensitivity and specificity of both classifiers was made by means of receiver operating characteristics (ROC) analysis. In the speckle-free feature set, the area under the ROC curve was 0.96 for the support vector machines classifier whereas for the probabilistic neural networks was 0.91. In the feature set with speckle, the corresponding areas under the ROC curves were 0.88 and 0.86 respectively for the two classifiers. The proposed features can increase the classification accuracy and decrease the rate of missing and misdiagnosis in thyroid cancer control.

  10. [Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM].

    PubMed

    Jiang, Z; Dou, Z; Song, W L; Xu, J; Wu, Z Y

    2017-11-10

    Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.

  11. Matched samples logistic regression in case-control studies with missing values: when to break the matches.

    PubMed

    Hansson, Lisbeth; Khamis, Harry J

    2008-12-01

    Simulated data sets are used to evaluate conditional and unconditional maximum likelihood estimation in an individual case-control design with continuous covariates when there are different rates of excluded cases and different levels of other design parameters. The effectiveness of the estimation procedures is measured by method bias, variance of the estimators, root mean square error (RMSE) for logistic regression and the percentage of explained variation. Conditional estimation leads to higher RMSE than unconditional estimation in the presence of missing observations, especially for 1:1 matching. The RMSE is higher for the smaller stratum size, especially for the 1:1 matching. The percentage of explained variation appears to be insensitive to missing data, but is generally higher for the conditional estimation than for the unconditional estimation. It is particularly good for the 1:2 matching design. For minimizing RMSE, a high matching ratio is recommended; in this case, conditional and unconditional logistic regression models yield comparable levels of effectiveness. For maximizing the percentage of explained variation, the 1:2 matching design with the conditional logistic regression model is recommended.

  12. Dealing with gene expression missing data.

    PubMed

    Brás, L P; Menezes, J C

    2006-05-01

    Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.

  13. Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level

    PubMed Central

    Savalei, Victoria; Rhemtulla, Mijke

    2017-01-01

    In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data—that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study. PMID:29276371

  14. Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level.

    PubMed

    Savalei, Victoria; Rhemtulla, Mijke

    2017-08-01

    In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data-that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study.

  15. Should genes with missing data be excluded from phylogenetic analyses?

    PubMed

    Jiang, Wei; Chen, Si-Yun; Wang, Hong; Li, De-Zhu; Wiens, John J

    2014-11-01

    Phylogeneticists often design their studies to maximize the number of genes included but minimize the overall amount of missing data. However, few studies have addressed the costs and benefits of adding characters with missing data, especially for likelihood analyses of multiple loci. In this paper, we address this topic using two empirical data sets (in yeast and plants) with well-resolved phylogenies. We introduce varying amounts of missing data into varying numbers of genes and test whether the benefits of excluding genes with missing data outweigh the costs of excluding the non-missing data that are associated with them. We also test if there is a proportion of missing data in the incomplete genes at which they cease to be beneficial or harmful, and whether missing data consistently bias branch length estimates. Our results indicate that adding incomplete genes generally increases the accuracy of phylogenetic analyses relative to excluding them, especially when there is a high proportion of incomplete genes in the overall dataset (and thus few complete genes). Detailed analyses suggest that adding incomplete genes is especially helpful for resolving poorly supported nodes. Given that we find that excluding genes with missing data often decreases accuracy relative to including these genes (and that decreases are generally of greater magnitude than increases), there is little basis for assuming that excluding these genes is necessarily the safer or more conservative approach. We also find no evidence that missing data consistently bias branch length estimates. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. Effective gene prediction by high resolution frequency estimator based on least-norm solution technique

    PubMed Central

    2014-01-01

    Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895

  18. Structural Effects of Network Sampling Coverage I: Nodes Missing at Random1.

    PubMed

    Smith, Jeffrey A; Moody, James

    2013-10-01

    Network measures assume a census of a well-bounded population. This level of coverage is rarely achieved in practice, however, and we have only limited information on the robustness of network measures to incomplete coverage. This paper examines the effect of node-level missingness on 4 classes of network measures: centrality, centralization, topology and homophily across a diverse sample of 12 empirical networks. We use a Monte Carlo simulation process to generate data with known levels of missingness and compare the resulting network scores to their known starting values. As with past studies (Borgatti et al 2006; Kossinets 2006), we find that measurement bias generally increases with more missing data. The exact rate and nature of this increase, however, varies systematically across network measures. For example, betweenness and Bonacich centralization are quite sensitive to missing data while closeness and in-degree are robust. Similarly, while the tau statistic and distance are difficult to capture with missing data, transitivity shows little bias even with very high levels of missingness. The results are also clearly dependent on the features of the network. Larger, more centralized networks are generally more robust to missing data, but this is especially true for centrality and centralization measures. More cohesive networks are robust to missing data when measuring topological features but not when measuring centralization. Overall, the results suggest that missing data may have quite large or quite small effects on network measurement, depending on the type of network and the question being posed.

  19. Estimating missing daily temperature extremes in Jaffna, Sri Lanka

    NASA Astrophysics Data System (ADS)

    Thevakaran, A.; Sonnadara, D. U. J.

    2018-04-01

    The accuracy of reconstructing missing daily temperature extremes in the Jaffna climatological station, situated in the northern part of the dry zone of Sri Lanka, is presented. The adopted method utilizes standard departures of daily maximum and minimum temperature values at four neighbouring stations, Mannar, Anuradhapura, Puttalam and Trincomalee to estimate the standard departures of daily maximum and minimum temperatures at the target station, Jaffna. The daily maximum and minimum temperatures from 1966 to 1980 (15 years) were used to test the validity of the method. The accuracy of the estimation is higher for daily maximum temperature compared to daily minimum temperature. About 95% of the estimated daily maximum temperatures are within ±1.5 °C of the observed values. For daily minimum temperature, the percentage is about 92. By calculating the standard deviation of the difference in estimated and observed values, we have shown that the error in estimating the daily maximum and minimum temperatures is ±0.7 and ±0.9 °C, respectively. To obtain the best accuracy when estimating the missing daily temperature extremes, it is important to include Mannar which is the nearest station to the target station, Jaffna. We conclude from the analysis that the method can be applied successfully to reconstruct the missing daily temperature extremes in Jaffna where no data is available due to frequent disruptions caused by civil unrests and hostilities in the region during the period, 1984 to 2000.

  20. Why do we miss rare targets? Exploring the boundaries of the low prevalence effect

    PubMed Central

    Rich, Anina N.; Kunar, Melina A.; Van Wert, Michael J.; Hidalgo-Sotelo, Barbara; Horowitz, Todd S.; Wolfe, Jeremy M.

    2011-01-01

    Observers tend to miss a disproportionate number of targets in visual search tasks with rare targets. This ‘prevalence effect’ may have practical significance since many screening tasks (e.g., airport security, medical screening) are low prevalence searches. It may also shed light on the rules used to terminate search when a target is not found. Here, we use perceptually simple stimuli to explore the sources of this effect. Experiment 1 shows a prevalence effect in inefficient spatial configuration search. Experiment 2 demonstrates this effect occurs even in a highly efficient feature search. However, the two prevalence effects differ. In spatial configuration search, misses seem to result from ending the search prematurely, while in feature search, they seem due to response errors. In Experiment 3, a minimum delay before response eliminated the prevalence effect for feature but not spatial configuration search. In Experiment 4, a target was present on each trial in either two (2AFC) or four (4AFC) orientations. With only two response alternatives, low prevalence produced elevated errors. Providing four response alternatives eliminated this effect. Low target prevalence puts searchers under pressure that tends to increase miss errors. We conclude that the specific source of those errors depends on the nature of the search. PMID:19146299

  1. The Missing Data Assumptions of the NEAT Design and Their Implications for Test Equating

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Holland, Paul W.

    2010-01-01

    The Non-Equivalent groups with Anchor Test (NEAT) design involves "missing data" that are "missing by design." Three nonlinear observed score equating methods used with a NEAT design are the "frequency estimation equipercentile equating" (FEEE), the "chain equipercentile equating" (CEE), and the "item-response-theory observed-score-equating" (IRT…

  2. Effects of Missing Data Methods in Structural Equation Modeling with Nonnormal Longitudinal Data

    ERIC Educational Resources Information Center

    Shin, Tacksoo; Davison, Mark L.; Long, Jeffrey D.

    2009-01-01

    The purpose of this study is to investigate the effects of missing data techniques in longitudinal studies under diverse conditions. A Monte Carlo simulation examined the performance of 3 missing data methods in latent growth modeling: listwise deletion (LD), maximum likelihood estimation using the expectation and maximization algorithm with a…

  3. Strategies for Handling Missing Data with Maximum Likelihood Estimation in Career and Technical Education Research

    ERIC Educational Resources Information Center

    Lee, In Heok

    2012-01-01

    Researchers in career and technical education often ignore more effective ways of reporting and treating missing data and instead implement traditional, but ineffective, missing data methods (Gemici, Rojewski, & Lee, 2012). The recent methodological, and even the non-methodological, literature has increasingly emphasized the importance of…

  4. Inferential precision in single-case time-series data streams: how well does the em procedure perform when missing observations occur in autocorrelated data?

    PubMed

    Smith, Justin D; Borckardt, Jeffrey J; Nash, Michael R

    2012-09-01

    The case-based time-series design is a viable methodology for treatment outcome research. However, the literature has not fully addressed the problem of missing observations with such autocorrelated data streams. Mainly, to what extent do missing observations compromise inference when observations are not independent? Do the available missing data replacement procedures preserve inferential integrity? Does the extent of autocorrelation matter? We use Monte Carlo simulation modeling of a single-subject intervention study to address these questions. We find power sensitivity to be within acceptable limits across four proportions of missing observations (10%, 20%, 30%, and 40%) when missing data are replaced using the Expectation-Maximization Algorithm, more commonly known as the EM Procedure (Dempster, Laird, & Rubin, 1977). This applies to data streams with lag-1 autocorrelation estimates under 0.80. As autocorrelation estimates approach 0.80, the replacement procedure yields an unacceptable power profile. The implications of these findings and directions for future research are discussed. Copyright © 2011. Published by Elsevier Ltd.

  5. A bias-corrected estimator in multiple imputation for missing data.

    PubMed

    Tomita, Hiroaki; Fujisawa, Hironori; Henmi, Masayuki

    2018-05-29

    Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. Copyright © 2018 John Wiley & Sons, Ltd.

  6. Robust Coefficients Alpha and Omega and Confidence Intervals With Outlying Observations and Missing Data: Methods and Software.

    PubMed

    Zhang, Zhiyong; Yuan, Ke-Hai

    2016-06-01

    Cronbach's coefficient alpha is a widely used reliability measure in social, behavioral, and education sciences. It is reported in nearly every study that involves measuring a construct through multiple items. With non-tau-equivalent items, McDonald's omega has been used as a popular alternative to alpha in the literature. Traditional estimation methods for alpha and omega often implicitly assume that data are complete and normally distributed. This study proposes robust procedures to estimate both alpha and omega as well as corresponding standard errors and confidence intervals from samples that may contain potential outlying observations and missing values. The influence of outlying observations and missing data on the estimates of alpha and omega is investigated through two simulation studies. Results show that the newly developed robust method yields substantially improved alpha and omega estimates as well as better coverage rates of confidence intervals than the conventional nonrobust method. An R package coefficientalpha is developed and demonstrated to obtain robust estimates of alpha and omega.

  7. Robust Coefficients Alpha and Omega and Confidence Intervals With Outlying Observations and Missing Data

    PubMed Central

    Zhang, Zhiyong; Yuan, Ke-Hai

    2015-01-01

    Cronbach’s coefficient alpha is a widely used reliability measure in social, behavioral, and education sciences. It is reported in nearly every study that involves measuring a construct through multiple items. With non-tau-equivalent items, McDonald’s omega has been used as a popular alternative to alpha in the literature. Traditional estimation methods for alpha and omega often implicitly assume that data are complete and normally distributed. This study proposes robust procedures to estimate both alpha and omega as well as corresponding standard errors and confidence intervals from samples that may contain potential outlying observations and missing values. The influence of outlying observations and missing data on the estimates of alpha and omega is investigated through two simulation studies. Results show that the newly developed robust method yields substantially improved alpha and omega estimates as well as better coverage rates of confidence intervals than the conventional nonrobust method. An R package coefficientalpha is developed and demonstrated to obtain robust estimates of alpha and omega. PMID:29795870

  8. MISSE-6 hardware

    NASA Image and Video Library

    2009-09-02

    ISS020-E-037371 (1 Sept. 2009) --- A close-up view of a Materials International Space Station Experiment (MISSE-6) on the exterior of the Columbus laboratory is featured in this image photographed by a space walking astronaut during the STS-128 mission’s first session of extravehicular activity (EVA). MISSE collects information on how different materials weather in the environment of space. MISSE was later placed in Space Shuttle Discovery’s payload bay for its return to Earth. A portion of a payload bay door is visible in the background. The blackness of space and Earth’s horizon provide the backdrop for the scene.

  9. Virtual reconstruction of very large skull defects featuring partly and completely missing midsagittal planes.

    PubMed

    Senck, Sascha; Coquerelle, Michael; Weber, Gerhard W; Benazzi, Stefano

    2013-05-01

    Despite the development of computer-based methods, cranial reconstruction of very large skull defects remains a challenge particularly if the damage affects the midsagittal region hampering the usage of mirror imaging techniques. This pilot study aims to deliver a new method that goes beyond mirror imaging, giving the possibility to reconstruct crania characterized by large missing areas, which might be useful in the fields of paleoanthropology, bioarcheology, and forensics. We test the accuracy of digital reconstructions in cases where two-thirds or more of a human cranium were missing. A three-dimensional (3D) virtual model of a human cranium was virtually damaged twice to compare two destruction-reconstruction scenarios. In the first case, a small fraction of the midsagittal region was still preserved, allowing the application of mirror imaging techniques. In the second case, the damage affected the complete midsagittal region, which demands a new approach to estimate the position of the midsagittal plane. Reconstructions were carried out using CT scans from a sample of modern humans (12 males and 13 females), to which 3D digital modeling techniques and geometric morphometric methods were applied. As expected, the second simulation showed a larger variability than the first one, which underlines the fact that the individual midsagittal plane is of course preferable in order to minimize the reconstruction error. However, in both simulations the Procrustes mean shape was an effective reference for the reconstruction of the entire cranium, producing models that showed a remarkably low error of about 3 mm, given the extent of missing data. Copyright © 2013 Wiley Periodicals, Inc.

  10. Near-Miss Effects on Response Latencies and Win Estimations of Slot Machine Players

    ERIC Educational Resources Information Center

    Dixon, Mark R.; Schreiber, James E.

    2004-01-01

    The present study examined the degree to which slot machine near-miss trials, or trials that displayed 2 of 3 winning symbols on the payoff line, affected response times and win estimations of 12 recreational slot machine players. Participants played a commercial slot machine in a casino-like laboratory for course extra-credit points. Videotaped…

  11. The Impact of Missing Data on Species Tree Estimation.

    PubMed

    Xi, Zhenxiang; Liu, Liang; Davis, Charles C

    2016-03-01

    Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. [Biometric method for the description of the head of an unrecognized corpse for the purpose of personality individualization and identification].

    PubMed

    Zviagin, V N; Galitskaia, O I; Negasheva, M A

    2012-01-01

    We have determined absolute dimensions of the head and the relationship between the dimensions of its selected parts. The study enrolled adult subjects (mostly of Russian ethnicity) at the age from 17 to 22 years (1108 men and 1153 women). We calculated the normal values for the estimation of real dimensional characteristics and the frequency of their occurrence in the population. The proposed approach makes it possible to reliably identify the dimensional features of human appearance in terms of the quantitative verbal description (categories 1-5) and to reveal its most characteristic features. The results of this biometric study of the heads of unrecognized corpses obtained by the specially developed technology may be used in operational and search investigations, in the procedure of corpse identification, and forensic medical personality identification of a missing subject.

  13. Using linked educational attainment data to reduce bias due to missing outcome data in estimates of the association between the duration of breastfeeding and IQ at 15 years.

    PubMed

    Cornish, Rosie P; Tilling, Kate; Boyd, Andy; Davies, Amy; Macleod, John

    2015-06-01

    Most epidemiological studies have missing information, leading to reduced power and potential bias. Estimates of exposure-outcome associations will generally be biased if the outcome variable is missing not at random (MNAR). Linkage to administrative data containing a proxy for the missing study outcome allows assessment of whether this outcome is MNAR and the evaluation of bias. We examined this in relation to the association between infant breastfeeding and IQ at 15 years, where a proxy for IQ was available through linkage to school attainment data. Subjects were those who enrolled in the Avon Longitudinal Study of Parents and Children in 1990-91 (n = 13 795), of whom 5023 had IQ measured at age 15. For those with missing IQ, 7030 (79%) had information on educational attainment at age 16 obtained through linkage to the National Pupil Database. The association between duration of breastfeeding and IQ was estimated using a complete case analysis, multiple imputation and inverse probability-of-missingness weighting; these estimates were then compared with those derived from analyses informed by the linkage. IQ at 15 was MNAR-individuals with higher attainment were less likely to have missing IQ data, even after adjusting for socio-demographic factors. All the approaches underestimated the association between breastfeeding and IQ compared with analyses informed by linkage. Linkage to administrative data containing a proxy for the outcome variable allows the MNAR assumption to be tested and more efficient analyses to be performed. Under certain circumstances, this may produce unbiased results. © The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association.

  14. Morphological Deficits of Children with SLI: Evaluation of Number Marking and Agreement.

    ERIC Educational Resources Information Center

    Rice, Mabel L.; Oetting, Janna B.

    1993-01-01

    Grammatical deficits (e.g., missing feature, surface account, and missing agreement) reported for children with specific language impairment (SLI) were evaluated in spontaneous language transcripts from 108 preschool children. Results indicated that children with SLI do control number marking but find number agreement across clausal boundaries…

  15. Application of SEASAT-1 Synthetic Aperture Radar (SAR) data to enhance and detect geological lineaments and to assist LANDSAT landcover classification mapping. [Appalachian Region, West Virginia

    NASA Technical Reports Server (NTRS)

    Sekhon, R.

    1981-01-01

    Digital SEASAT-1 synthetic aperture radar (SAR) data were used to enhance linear features to extract geologically significant lineaments in the Appalachian region. Comparison of Lineaments thus mapped with an existing lineament map based on LANDSAT MSS images shows that appropriately processed SEASAT-1 SAR data can significantly improve the detection of lineaments. Merge MSS and SAR data sets were more useful fo lineament detection and landcover classification than LANDSAT or SEASAT data alone. About 20 percent of the lineaments plotted from the SEASAT SAR image did not appear on the LANDSAT image. About 6 percent of minor lineaments or parts of lineaments present in the LANDSAT map were missing from the SEASAT map. Improvement in the landcover classification (acreage and spatial estimation accuracy) was attained by using MSS-SAR merged data. The aerial estimation of residential/built-up and forest categories was improved. Accuracy in estimating the agricultural and water categories was slightly reduced.

  16. Missing value imputation in DNA microarrays based on conjugate gradient method.

    PubMed

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Bayesian Inference for Growth Mixture Models with Latent Class Dependent Missing Data

    ERIC Educational Resources Information Center

    Lu, Zhenqiu Laura; Zhang, Zhiyong; Lubke, Gitta

    2011-01-01

    "Growth mixture models" (GMMs) with nonignorable missing data have drawn increasing attention in research communities but have not been fully studied. The goal of this article is to propose and to evaluate a Bayesian method to estimate the GMMs with latent class dependent missing data. An extended GMM is first presented in which class…

  18. A Note on the Use of Missing Auxiliary Variables in Full Information Maximum Likelihood-Based Structural Equation Models

    ERIC Educational Resources Information Center

    Enders, Craig K.

    2008-01-01

    Recent missing data studies have argued in favor of an "inclusive analytic strategy" that incorporates auxiliary variables into the estimation routine, and Graham (2003) outlined methods for incorporating auxiliary variables into structural equation analyses. In practice, the auxiliary variables often have missing values, so it is reasonable to…

  19. The Impact of Different Missing Data Handling Methods on DINA Model

    ERIC Educational Resources Information Center

    Sünbül, Seçil Ömür

    2018-01-01

    In this study, it was aimed to investigate the impact of different missing data handling methods on DINA model parameter estimation and classification accuracy. In the study, simulated data were used and the data were generated by manipulating the number of items and sample size. In the generated data, two different missing data mechanisms…

  20. Missing the Boat--Impact of Just Missing Identification as a High-Performing School

    ERIC Educational Resources Information Center

    Weiner, Jennie; Donaldson, Morgaen; Dougherty, Shaun M.

    2017-01-01

    This study capitalizes on the performance identification system under the No Child Left Behind waivers to estimate the school-level impact of just missing formal state recognition as a high-performing school. Using a fuzzy regression-discontinuity design and data from the early years of waiver implementation in Rhode Island, we find that, when…

  1. Photoanthropometric face iridial proportions for age estimation: An investigation using features selected via a joint mutual information criterion.

    PubMed

    Borges, Díbio L; Vidal, Flávio B; Flores, Marta R P; Melani, Rodolfo F H; Guimarães, Marco A; Machado, Carlos E P

    2018-03-01

    Age assessment from images is of high interest in the forensic community because of the necessity to establish formal protocols to identify child pornography, child missing and abuses where visual evidences are the mostly admissible. Recently, photoanthropometric methods have been found useful for age estimation correlating facial proportions in image databases with samples of some age groups. Notwithstanding the advances, newer facial features and further analysis are needed to improve accuracy and establish larger applicability. In this investigation, frontal images of 1000 individuals (500 females, 500 males), equally distributed in five age groups (6, 10, 14, 18, 22 years old) were used in a 10 fold cross-validated experiment for three age thresholds classifications (<10, <14, <18 years old). A set of novel 40 features, based on a relation between landmark distances and the iris diameter, is proposed and joint mutual information is used to select the most relevant and complementary features for the classification task. In a civil image identification database with diverse ancestry, receiver operating characteristic (ROC) curves were plotted to verify accuracy, and the resultant AUCs achieved 0.971, 0.969, and 0.903 for the age classifications (<10, <14, <18 years old), respectively. These results add support to continuing research in age assessment from images using the metric approach. Still, larger samples are necessary to evaluate reliability in extensive conditions. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Clustering with Missing Values: No Imputation Required

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  3. Influence function based variance estimation and missing data issues in case-cohort studies.

    PubMed

    Mark, S D; Katki, H

    2001-12-01

    Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.

  4. Kalman Filtering for Genetic Regulatory Networks with Missing Values

    PubMed Central

    Liu, Qiuhua; Lai, Tianyue; Wang, Wu

    2017-01-01

    The filter problem with missing value for genetic regulation networks (GRNs) is addressed, in which the noises exist in both the state dynamics and measurement equations; furthermore, the correlation between process noise and measurement noise is also taken into consideration. In order to deal with the filter problem, a class of discrete-time GRNs with missing value, noise correlation, and time delays is established. Then a new observation model is proposed to decrease the adverse effect caused by the missing value and to decouple the correlation between process noise and measurement noise in theory. Finally, a Kalman filtering is used to estimate the states of GRNs. Meanwhile, a typical example is provided to verify the effectiveness of the proposed method, and it turns out to be the case that the concentrations of mRNA and protein could be estimated accurately. PMID:28814967

  5. Estimation of missing water-level data for the Everglades Depth Estimation Network (EDEN), 2013 update

    USGS Publications Warehouse

    Petkewich, Matthew D.; Conrads, Paul

    2013-01-01

    The Everglades Depth Estimation Network is an integrated network of real-time water-level gaging stations, a ground-elevation model, and a water-surface elevation model designed to provide scientists, engineers, and water-resource managers with water-level and water-depth information (1991-2013) for the entire freshwater portion of the Greater Everglades. The U.S. Geological Survey Greater Everglades Priority Ecosystems Science provides support for the Everglades Depth Estimation Network in order for the Network to provide quality-assured monitoring data for the U.S. Army Corps of Engineers Comprehensive Everglades Restoration Plan. In a previous study, water-level estimation equations were developed to fill in missing data to increase the accuracy of the daily water-surface elevation model. During this study, those equations were updated because of the addition and removal of water-level gaging stations, the consistent use of water-level data relative to the North American Vertical Datum of 1988, and availability of recent data (March 1, 2006, to September 30, 2011). Up to three linear regression equations were developed for each station by using three different input stations to minimize the occurrences of missing data for an input station. Of the 667 water-level estimation equations developed to fill missing data at 223 stations, more than 72 percent of the equations have coefficients of determination greater than 0.90, and 97 percent have coefficients of determination greater than 0.70.

  6. Visual search for features and conjunctions in development.

    PubMed

    Lobaugh, N J; Cole, S; Rovet, J F

    1998-12-01

    Visual search performance was examined in three groups of children 7 to 12 years of age and in young adults. Colour and orientation feature searches and a conjunction search were conducted. Reaction time (RT) showed expected improvements in processing speed with age. Comparisons of RT's on target-present and target-absent trials were consistent with parallel search on the two feature conditions and with serial search in the conjunction condition. The RT results indicated searches for feature and conjunctions were treated similarly for children and adults. However, the youngest children missed more targets at the largest array sizes, most strikingly in conjunction search. Based on an analysis of speed/accuracy trade-offs, we suggest that low target-distractor discriminability leads to an undersampling of array elements, and is responsible for the high number of misses in the youngest children.

  7. Replacing missing data between airborne SAR coherent image pairs

    DOE PAGES

    Musgrove, Cameron H.; West, James C.

    2017-07-31

    For synthetic aperture radar systems, missing data samples can cause severe image distortion. When multiple, coherent data collections exist and the missing data samples do not overlap between collections, there exists the possibility of replacing data samples between collections. For airborne radar, the known and unknown motion of the aircraft prevents direct data sample replacement to repair image features. Finally, this paper presents a method to calculate the necessary phase corrections to enable data sample replacement using only the collected radar data.

  8. Replacing missing data between airborne SAR coherent image pairs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Musgrove, Cameron H.; West, James C.

    For synthetic aperture radar systems, missing data samples can cause severe image distortion. When multiple, coherent data collections exist and the missing data samples do not overlap between collections, there exists the possibility of replacing data samples between collections. For airborne radar, the known and unknown motion of the aircraft prevents direct data sample replacement to repair image features. Finally, this paper presents a method to calculate the necessary phase corrections to enable data sample replacement using only the collected radar data.

  9. Hazard Function Estimation with Cause-of-Death Data Missing at Random.

    PubMed

    Wang, Qihua; Dinse, Gregg E; Liu, Chunling

    2012-04-01

    Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.

  10. It's What's on the Outside that Matters: An Advantage for External Features in Children's Word Recognition

    ERIC Educational Resources Information Center

    Webb, Tessa M.; Beech, John R.; Mayall, Kate M.; Andrews, Antony S.

    2006-01-01

    The relative importance of internal and external letter features of words in children's developing reading was investigated to clarify further the nature of early featural analysis. In Experiment 1, 72 6-, 8-, and 10-year-olds read aloud words displayed as wholes, external features only (central features missing, thereby preserving word shape…

  11. Melancholic depression prediction by identifying representative features in metabolic and microarray profiles with missing values.

    PubMed

    Nie, Zhi; Yang, Tao; Liu, Yashu; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2015-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed method give rise to significantly improved sensitivity scores, suggesting that the learned features allow prediction with high accuracy of disease status in those who are diagnosed with melancholic depression. To our best knowledge, this is the first work that applies sparse coding to deal with high feature correlations and missing values, which are common challenges in many biomedical applications. The proposed method can be readily adapted to other biomedical applications involving incomplete and high-dimensional data.

  12. Feature Masking in Computer Game Promotes Visual Imagery

    ERIC Educational Resources Information Center

    Smith, Glenn Gordon; Morey, Jim; Tjoe, Edwin

    2007-01-01

    Can learning of mental imagery skills for visualizing shapes be accelerated with feature masking? Chemistry, physics fine arts, military tactics, and laparoscopic surgery often depend on mentally visualizing shapes in their absence. Does working with "spatial feature-masks" (skeletal shapes, missing key identifying portions) encourage people to…

  13. Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms

    ERIC Educational Resources Information Center

    Jamshidian, Mortaza; Mata, Matthew

    2008-01-01

    Incomplete or missing data is a common problem in almost all areas of empirical research. It is well known that simple and ad hoc methods such as complete case analysis or mean imputation can lead to biased and/or inefficient estimates. The method of maximum likelihood works well; however, when the missing data mechanism is not one of missing…

  14. Comparing multiple imputation methods for systematically missing subject-level data.

    PubMed

    Kline, David; Andridge, Rebecca; Kaizar, Eloise

    2017-06-01

    When conducting research synthesis, the collection of studies that will be combined often do not measure the same set of variables, which creates missing data. When the studies to combine are longitudinal, missing data can occur on the observation-level (time-varying) or the subject-level (non-time-varying). Traditionally, the focus of missing data methods for longitudinal data has been on missing observation-level variables. In this paper, we focus on missing subject-level variables and compare two multiple imputation approaches: a joint modeling approach and a sequential conditional modeling approach. We find the joint modeling approach to be preferable to the sequential conditional approach, except when the covariance structure of the repeated outcome for each individual has homogenous variance and exchangeable correlation. Specifically, the regression coefficient estimates from an analysis incorporating imputed values based on the sequential conditional method are attenuated and less efficient than those from the joint method. Remarkably, the estimates from the sequential conditional method are often less efficient than a complete case analysis, which, in the context of research synthesis, implies that we lose efficiency by combining studies. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  16. A guide to missing data for the pediatric nephrologist.

    PubMed

    Larkins, Nicholas G; Craig, Jonathan C; Teixeira-Pinto, Armando

    2018-03-13

    Missing data is an important and common source of bias in clinical research. Readers should be alert to and consider the impact of missing data when reading studies. Beyond preventing missing data in the first place, through good study design and conduct, there are different strategies available to handle data containing missing observations. Complete case analysis is often biased unless data are missing completely at random. Better methods of handling missing data include multiple imputation and models using likelihood-based estimation. With advancing computing power and modern statistical software, these methods are within the reach of clinician-researchers under guidance of a biostatistician. As clinicians reading papers, we need to continue to update our understanding of statistical methods, so that we understand the limitations of these techniques and can critically interpret literature.

  17. An alternative empirical likelihood method in missing response problems and causal inference.

    PubMed

    Ren, Kaili; Drummond, Christopher A; Brewster, Pamela S; Haller, Steven T; Tian, Jiang; Cooper, Christopher J; Zhang, Biao

    2016-11-30

    Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood-based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  19. DAMBE7: New and Improved Tools for Data Analysis in Molecular Biology and Evolution.

    PubMed

    Xia, Xuhua

    2018-06-01

    DAMBE is a comprehensive software package for genomic and phylogenetic data analysis on Windows, Linux, and Macintosh computers. New functions include imputing missing distances and phylogeny simultaneously (paving the way to build large phage and transposon trees), new bootstrapping/jackknifing methods for PhyPA (phylogenetics from pairwise alignments), and an improved function for fast and accurate estimation of the shape parameter of the gamma distribution for fitting rate heterogeneity over sites. Previous method corrects multiple hits for each site independently. DAMBE's new method uses all sites simultaneously for correction. DAMBE, featuring a user-friendly graphic interface, is freely available from http://dambe.bio.uottawa.ca (last accessed, April 17, 2018).

  20. Probing Higgs self-coupling of a classically scale invariant model in e+e- → Zhh: Evaluation at physical point

    NASA Astrophysics Data System (ADS)

    Fujitani, Y.; Sumino, Y.

    2018-04-01

    A classically scale invariant extension of the standard model predicts large anomalous Higgs self-interactions. We compute missing contributions in previous studies for probing the Higgs triple coupling of a minimal model using the process e+e- → Zhh. Employing a proper order counting, we compute the total and differential cross sections at the leading order, which incorporate the one-loop corrections between zero external momenta and their physical values. Discovery/exclusion potential of a future e+e- collider for this model is estimated. We also find a unique feature in the momentum dependence of the Higgs triple vertex for this class of models.

  1. "Ersatz" and "hybrid" NMR spectral estimates using the filter diagonalization method.

    PubMed

    Ridge, Clark D; Shaka, A J

    2009-03-12

    The filter diagonalization method (FDM) is an efficient and elegant way to make a spectral estimate purely in terms of Lorentzian peaks. As NMR spectral peaks of liquids conform quite well to this model, the FDM spectral estimate can be accurate with far fewer time domain points than conventional discrete Fourier transform (DFT) processing. However, noise is not efficiently characterized by a finite number of Lorentzian peaks, or by any other analytical form, for that matter. As a result, noise can affect the FDM spectrum in different ways than it does the DFT spectrum, and the effect depends on the dimensionality of the spectrum. Regularization to suppress (or control) the influence of noise to give an "ersatz", or EFDM, spectrum is shown to sometimes miss weak features, prompting a more conservative implementation of filter diagonalization. The spectra obtained, called "hybrid" or HFDM spectra, are acquired by using regularized FDM to obtain an "infinite time" spectral estimate and then adding to it the difference between the DFT of the data and the finite time FDM estimate, over the same time interval. HFDM has a number of advantages compared to the EFDM spectra, where all features must be Lorentzian. They also show better resolution than DFT spectra. The HFDM spectrum is a reliable and robust way to try to extract more information from noisy, truncated data records and is less sensitive to the choice of regularization parameter. In multidimensional NMR of liquids, HFDM is a conservative way to handle the problems of noise, truncation, and spectral peaks that depart significantly from the model of a multidimensional Lorentzian peak.

  2. Working with Missing Values

    ERIC Educational Resources Information Center

    Acock, Alan C.

    2005-01-01

    Less than optimum strategies for missing values can produce biased estimates, distorted statistical power, and invalid conclusions. After reviewing traditional approaches (listwise, pairwise, and mean substitution), selected alternatives are covered including single imputation, multiple imputation, and full information maximum likelihood…

  3. Linear regression analysis of survival data with missing censoring indicators.

    PubMed

    Wang, Qihua; Dinse, Gregg E

    2011-04-01

    Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.

  4. Novel Methods for Optically Measuring Whitecaps Under Natural Wave Breaking Conditions in the Southern Ocean

    NASA Astrophysics Data System (ADS)

    Randolph, K. L.; Dierssen, H. M.; Cifuentes-Lorenzen, A.; Balch, W. M.; Monahan, E. C.; Zappa, C. J.; Drapeau, D.; Bowler, B.

    2016-02-01

    Breaking waves on the ocean surface mark areas of significant importance to air-sea flux estimates of gas, aerosols, and heat. Traditional methods of measuring whitecap coverage using digital photography can miss features that are small in size or do not show high enough contrast to the background. The geometry of the images collected captures the near surface, bright manifestations of the whitecap feature and miss a portion of the bubble plume that is responsible for the production of sea salt aerosols and the transfer of lower solubility gases. Here, a novel method for accurately measuring both the fractional coverage of whitecaps and the intensity and decay rate of whitecap events using above water radiometry is presented. The methodology was developed using data collected during the austral summer in the Atlantic sector of the Southern Ocean under a large range of wind (speeds of 1 to 15 m s-1) and wave (significant wave heights 2 to 8 m) conditions as part of the Southern Ocean Gas Exchange experiment. Whitecap metrics were retrieved by employing a magnitude threshold based on the interquartile range of the radiance or reflectance signal for a single channel (411 nm) after a baseline removal, determined using a moving minimum/maximum filter. Breaking intensity and decay rate metrics were produced from the integration of, and the exponential fit to, radiance or reflectance over the lifetime of the whitecap. When compared to fractional whitecap coverage measurements obtained from high resolution digital images, radiometric estimates were consistently higher because they capture more of the decaying bubble plume area that is difficult to detect with photography. Radiometrically-retrieved whitecap measurements are presented in the context of concurrently measured meteorological (e.g., wind speed) and oceanographic (e.g., wave) data. The optimal fit of the radiometrically estimated whitecap coverage to the instantaneous wind speed, determined using ordinary least squares, showed a cubic dependence. Increasing the magnitude threshold for whitecap detection from 2 to 3(IQR) produced a wind speed-whitecap relationship most comparable to previously published and widely accepted wind speed-whitecap parameterizations.

  5. Hazard Function Estimation with Cause-of-Death Data Missing at Random

    PubMed Central

    Wang, Qihua; Dinse, Gregg E.; Liu, Chunling

    2010-01-01

    Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data. PMID:22267874

  6. Case Report: Congenital Erythroleukemia in a Premature Infant with Dysmorphic Features.

    PubMed

    Helin, Heidi; van der Walt, Jon; Holder, Muriel; George, Simi

    2016-01-01

    We present a case of pure erythroleukemia, diagnosed at autopsy, in a dysmorphic premature infant who died of multiorgan failure within 24 hours of birth. Dysmorphic features included facial and limb abnormalities with long philtrum, microagnathia, downturned mouth, short neck as well as abnormal and missing nails, missing distal phalanx from the second toe, and overlapping toes. Internal findings included gross hepatomegaly and patchy hemorrhages in the liver, splenomegaly, and cardiomegaly; and subdural, intracerebral, and intraventricular hemorrhages. Histology revealed infiltration of bone marrow, kidney, heart, liver, adrenal, lung, spleen, pancreas, thyroid, testis, thymus, and placenta by pure erythroleukemia. Only 6 cases of congenital erythroleukemia have been previously reported with autopsy findings similar to those of this case. The dysmorphic features, although not fitting any specific syndrome, make this case unique. Congenital erythroleukemia and possible syndromes suggested by the dysmorphic features are discussed.

  7. Meta‐analysis of test accuracy studies using imputation for partial reporting of multiple thresholds

    PubMed Central

    Deeks, J.J.; Martin, E.C.; Riley, R.D.

    2017-01-01

    Introduction For tests reporting continuous results, primary studies usually provide test performance at multiple but often different thresholds. This creates missing data when performing a meta‐analysis at each threshold. A standard meta‐analysis (no imputation [NI]) ignores such missing data. A single imputation (SI) approach was recently proposed to recover missing threshold results. Here, we propose a new method that performs multiple imputation of the missing threshold results using discrete combinations (MIDC). Methods The new MIDC method imputes missing threshold results by randomly selecting from the set of all possible discrete combinations which lie between the results for 2 known bounding thresholds. Imputed and observed results are then synthesised at each threshold. This is repeated multiple times, and the multiple pooled results at each threshold are combined using Rubin's rules to give final estimates. We compared the NI, SI, and MIDC approaches via simulation. Results Both imputation methods outperform the NI method in simulations. There was generally little difference in the SI and MIDC methods, but the latter was noticeably better in terms of estimating the between‐study variances and generally gave better coverage, due to slightly larger standard errors of pooled estimates. Given selective reporting of thresholds, the imputation methods also reduced bias in the summary receiver operating characteristic curve. Simulations demonstrate the imputation methods rely on an equal threshold spacing assumption. A real example is presented. Conclusions The SI and, in particular, MIDC methods can be used to examine the impact of missing threshold results in meta‐analysis of test accuracy studies. PMID:29052347

  8. Improving link prediction in complex networks by adaptively exploiting multiple structural features of networks

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng

    2017-10-01

    So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.

  9. A model for incomplete longitudinal multivariate ordinal data.

    PubMed

    Liu, Li C

    2008-12-30

    In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point. Copyright 2008 John Wiley & Sons, Ltd.

  10. A Simple Method for Estimating Informative Node Age Priors for the Fossil Calibration of Molecular Divergence Time Analyses

    PubMed Central

    Nowak, Michael D.; Smith, Andrew B.; Simpson, Carl; Zwickl, Derrick J.

    2013-01-01

    Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates. PMID:23755303

  11. Daily reference crop evapotranspiration with reduced data sets in the humid environments of Azores islands using estimates of actual vapor pressure, solar radiation, and wind speed

    NASA Astrophysics Data System (ADS)

    Paredes, P.; Fontes, J. C.; Azevedo, E. B.; Pereira, L. S.

    2017-11-01

    Reference crop evapotranspiration (ETo) estimations using the FAO Penman-Monteith equation (PM-ETo) require a set of weather data including maximum and minimum air temperatures (T max, T min), actual vapor pressure (e a), solar radiation (R s), and wind speed (u 2). However, those data are often not available, or data sets are incomplete due to missing values. A set of procedures were proposed in FAO56 (Allen et al. 1998) to overcome these limitations, and which accuracy for estimating daily ETo in the humid climate of Azores islands is assessed in this study. Results show that after locally and seasonally calibrating the temperature adjustment factor a d used for dew point temperature (T dew) computation from mean temperature, ETo estimations shown small bias and small RMSE ranging from 0.15 to 0.53 mm day-1. When R s data are missing, their estimation from the temperature difference (T max-T min), using a locally and seasonal calibrated radiation adjustment coefficient (k Rs), yielded highly accurate ETo estimates, with RMSE averaging 0.41 mm day-1 and ranging from 0.33 to 0.58 mm day-1. If wind speed observations are missing, the use of the default u 2 = 2 m s-1, or 3 m s-1 in case of weather measurements over clipped grass in airports, revealed appropriated even for the windy locations (u 2 > 4 m s-1), with RMSE < 0.36 mm day-1. The appropriateness of procedure to estimating the missing values of e a, R s, and u 2 was confirmed.

  12. Semiparametric Estimation of Treatment Effect in a Pretest–Posttest Study with Missing Data

    PubMed Central

    Davidian, Marie; Tsiatis, Anastasios A.; Leon, Selene

    2008-01-01

    The pretest–posttest study is commonplace in numerous applications. Typically, subjects are randomized to two treatments, and response is measured at baseline, prior to intervention with the randomized treatment (pretest), and at prespecified follow-up time (posttest). Interest focuses on the effect of treatments on the change between mean baseline and follow-up response. Missing posttest response for some subjects is routine, and disregarding missing cases can lead to invalid inference. Despite the popularity of this design, a consensus on an appropriate analysis when no data are missing, let alone for taking into account missing follow-up, does not exist. Under a semiparametric perspective on the pretest–posttest model, in which limited distributional assumptions on pretest or posttest response are made, we show how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class. We then describe how the theoretical results translate into practice. The development not only shows how a unified framework for inference in this setting emerges from the Robins, Rotnitzky and Zhao theory, but also provides a review and demonstration of the key aspects of this theory in a familiar context. The results are also relevant to the problem of comparing two treatment means with adjustment for baseline covariates. PMID:19081743

  13. Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data.

    PubMed

    Davidian, Marie; Tsiatis, Anastasios A; Leon, Selene

    2005-08-01

    The pretest-posttest study is commonplace in numerous applications. Typically, subjects are randomized to two treatments, and response is measured at baseline, prior to intervention with the randomized treatment (pretest), and at prespecified follow-up time (posttest). Interest focuses on the effect of treatments on the change between mean baseline and follow-up response. Missing posttest response for some subjects is routine, and disregarding missing cases can lead to invalid inference. Despite the popularity of this design, a consensus on an appropriate analysis when no data are missing, let alone for taking into account missing follow-up, does not exist. Under a semiparametric perspective on the pretest-posttest model, in which limited distributional assumptions on pretest or posttest response are made, we show how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class. We then describe how the theoretical results translate into practice. The development not only shows how a unified framework for inference in this setting emerges from the Robins, Rotnitzky and Zhao theory, but also provides a review and demonstration of the key aspects of this theory in a familiar context. The results are also relevant to the problem of comparing two treatment means with adjustment for baseline covariates.

  14. Analysis of cohort studies with multivariate and partially observed disease classification data.

    PubMed

    Chatterjee, Nilanjan; Sinha, Samiran; Diver, W Ryan; Feigelson, Heather Spencer

    2010-09-01

    Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.

  15. Adjusting HIV prevalence estimates for non-participation: an application to demographic surveillance

    PubMed Central

    McGovern, Mark E.; Marra, Giampiero; Radice, Rosalba; Canning, David; Newell, Marie-Louise; Bärnighausen, Till

    2015-01-01

    Introduction HIV testing is a cornerstone of efforts to combat the HIV epidemic, and testing conducted as part of surveillance provides invaluable data on the spread of infection and the effectiveness of campaigns to reduce the transmission of HIV. However, participation in HIV testing can be low, and if respondents systematically select not to be tested because they know or suspect they are HIV positive (and fear disclosure), standard approaches to deal with missing data will fail to remove selection bias. We implemented Heckman-type selection models, which can be used to adjust for missing data that are not missing at random, and established the extent of selection bias in a population-based HIV survey in an HIV hyperendemic community in rural South Africa. Methods We used data from a population-based HIV survey carried out in 2009 in rural KwaZulu-Natal, South Africa. In this survey, 5565 women (35%) and 2567 men (27%) provided blood for an HIV test. We accounted for missing data using interviewer identity as a selection variable which predicted consent to HIV testing but was unlikely to be independently associated with HIV status. Our approach involved using this selection variable to examine the HIV status of residents who would ordinarily refuse to test, except that they were allocated a persuasive interviewer. Our copula model allows for flexibility when modelling the dependence structure between HIV survey participation and HIV status. Results For women, our selection model generated an HIV prevalence estimate of 33% (95% CI 27–40) for all people eligible to consent to HIV testing in the survey. This estimate is higher than the estimate of 24% generated when only information from respondents who participated in testing is used in the analysis, and the estimate of 27% when imputation analysis is used to predict missing data on HIV status. For men, we found an HIV prevalence of 25% (95% CI 15–35) using the selection model, compared to 16% among those who participated in testing, and 18% estimated with imputation. We provide new confidence intervals that correct for the fact that the relationship between testing and HIV status is unknown and requires estimation. Conclusions We confirm the feasibility and value of adopting selection models to account for missing data in population-based HIV surveys and surveillance systems. Elements of survey design, such as interviewer identity, present the opportunity to adopt this approach in routine applications. Where non-participation is high, true confidence intervals are much wider than those generated by standard approaches to dealing with missing data suggest. PMID:26613900

  16. Features of pedestrian behavior in car-to-pedestrian contact situations in near-miss incidents in Japan.

    PubMed

    Matsui, Yasuhiro; Hitosugi, Masahito; Doi, Tsutomu; Oikawa, Shoko; Takahashi, Kunio; Ando, Kenichi

    2013-01-01

    The objective of this study is to evaluate the severe conditions between car-to-pedestrian near-miss situations using pedestrian time-to-vehicle (pedestrian TTV) which is the time when the pedestrian would reach the forward moving car line. Since the information available from the real-world accidents was limited, the authors focused on the near-miss situations captured by driving recorders installed in passenger cars. In their previous study, the authors found there were some similarities between accidents and near-miss incidents. It was made clear that the situations in pedestrians' accidents could be estimated from the near-miss incident data which included motion pictures capturing pedestrian behaviors. In their previous study, the vehicle time-to-collision (vehicle TTC) was investigated from the near-miss incident data. The authors analyzed data for 101 near-miss car-to-pedestrian incident events in which pedestrians were crossing the roads in front of a forward-moving car at an intersection or on a straight road. Using a video of near-miss car-to-pedestrian incidents captured by drive recorders and collected by the Society of Automotive Engineers of Japan (J-SAE) from 2005 to 2009, the pedestrian TTV was calculated. Based on the calculated pedestrian TTV, one of the severe conditions between car-to-pedestrian near-miss situations was evaluated for pedestrians who emerged from behind an obstruction such as a building, a parked vehicle and a moving vehicle. Focusing on the cases of the pedestrians who emerged from behind an obstruction, the averages of the vehicle TTC and pedestrian TTV were 1.31 and 1.05 seconds, respectively, and did not demonstrate a significant difference. Since the averages of the vehicle TTC and pedestrian TTV were similar, there would be a higher possibility of the contact between a car and pedestrian if the driver and pedestrian were not paying any attention. The authors propose that a moving speed of a pedestrian surrogate "dummy" should be determined considering the near-miss incident situations for the evaluation of a CDMBS for pedestrian detection. The authors also propose that the time-to-collision of the dummy to the tested car during the evaluation of the performance of the CDMBS for pedestrian detection should be determined considering the time such as the vehicle TTC in this study. Additionally or alternatively, the pedestrian TTV should be considered, in which the worst situation was assumed for a car that was moving toward a pedestrian without braking due to the car driver's inattentiveness and the pedestrian not slowing down their walking speed or stopping.

  17. The impact of different strategies to handle missing data on both precision and bias in a drug safety study: a multidatabase multinational population-based cohort study

    PubMed Central

    Martín-Merino, Elisa; Calderón-Larrañaga, Amaia; Hawley, Samuel; Poblador-Plou, Beatriz; Llorente-García, Ana; Petersen, Irene; Prieto-Alhambra, Daniel

    2018-01-01

    Background Missing data are often an issue in electronic medical records (EMRs) research. However, there are many ways that people deal with missing data in drug safety studies. Aim To compare the risk estimates resulting from different strategies for the handling of missing data in the study of venous thromboembolism (VTE) risk associated with antiosteoporotic medications (AOM). Methods New users of AOM (alendronic acid, other bisphosphonates, strontium ranelate, selective estrogen receptor modulators, teriparatide, or denosumab) aged ≥50 years during 1998–2014 were identified in two Spanish (the Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria [BIFAP] and EpiChron cohort) and one UK (Clinical Practice Research Datalink [CPRD]) EMR. Hazard ratios (HRs) according to AOM (with alendronic acid as reference) were calculated adjusting for VTE risk factors, body mass index (that was missing in 61% of patients included in the three databases), and smoking (that was missing in 23% of patients) in the year of AOM therapy initiation. HRs and standard errors obtained using cross-sectional multiple imputation (MI) (reference method) were compared to complete case (CC) analysis – using only patients with complete data – and longitudinal MI – adding to the cross-sectional MI model the body mass index/smoking values as recorded in the year before and after therapy initiation. Results Overall, 422/95,057 (0.4%), 19/12,688 (0.1%), and 2,051/161,202 (1.3%) VTE cases/participants were seen in BIFAP, EpiChron, and CPRD, respectively. HRs moved from 100.00% underestimation to 40.31% overestimation in CC compared with cross-sectional MI, while longitudinal MI methods provided similar risk estimates compared with cross-sectional MI. Precision for HR improved in cross-sectional MI versus CC by up to 160.28%, while longitudinal MI improved precision (compared with cross-sectional) only minimally (up to 0.80%). Conclusion CC may substantially affect relative risk estimation in EMR-based drug safety studies, since missing data are not often completely at random. Little improvement was seen in these data in terms of power with the inclusion of longitudinal MI compared with cross-sectional MI. The strategy for handling missing data in drug safety studies can have a large impact on both risk estimates and precision.

  18. The impact of different strategies to handle missing data on both precision and bias in a drug safety study: a multidatabase multinational population-based cohort study.

    PubMed

    Martín-Merino, Elisa; Calderón-Larrañaga, Amaia; Hawley, Samuel; Poblador-Plou, Beatriz; Llorente-García, Ana; Petersen, Irene; Prieto-Alhambra, Daniel

    2018-01-01

    Missing data are often an issue in electronic medical records (EMRs) research. However, there are many ways that people deal with missing data in drug safety studies. To compare the risk estimates resulting from different strategies for the handling of missing data in the study of venous thromboembolism (VTE) risk associated with antiosteoporotic medications (AOM). New users of AOM (alendronic acid, other bisphosphonates, strontium ranelate, selective estrogen receptor modulators, teriparatide, or denosumab) aged ≥50 years during 1998-2014 were identified in two Spanish (the Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria [BIFAP] and EpiChron cohort) and one UK (Clinical Practice Research Datalink [CPRD]) EMR. Hazard ratios (HRs) according to AOM (with alendronic acid as reference) were calculated adjusting for VTE risk factors, body mass index (that was missing in 61% of patients included in the three databases), and smoking (that was missing in 23% of patients) in the year of AOM therapy initiation. HRs and standard errors obtained using cross-sectional multiple imputation (MI) (reference method) were compared to complete case (CC) analysis - using only patients with complete data - and longitudinal MI - adding to the cross-sectional MI model the body mass index/smoking values as recorded in the year before and after therapy initiation. Overall, 422/95,057 (0.4%), 19/12,688 (0.1%), and 2,051/161,202 (1.3%) VTE cases/participants were seen in BIFAP, EpiChron, and CPRD, respectively. HRs moved from 100.00% underestimation to 40.31% overestimation in CC compared with cross-sectional MI, while longitudinal MI methods provided similar risk estimates compared with cross-sectional MI. Precision for HR improved in cross-sectional MI versus CC by up to 160.28%, while longitudinal MI improved precision (compared with cross-sectional) only minimally (up to 0.80%). CC may substantially affect relative risk estimation in EMR-based drug safety studies, since missing data are not often completely at random. Little improvement was seen in these data in terms of power with the inclusion of longitudinal MI compared with cross-sectional MI. The strategy for handling missing data in drug safety studies can have a large impact on both risk estimates and precision.

  19. Technical Operations Support (TOPS) II. Delivery Order 0011: Summary Status of MISSE-1 and MISSE-2 Experiments and Details of Estimated Environmental Exposures for MISSE-1 and MISSE-2

    DTIC Science & Technology

    2006-07-01

    distinct types of devices were used that respond to radiation differently, TLDs (thermoluminescent dosimeters ), Charge-Coupled Devices (CCDs) and...optocouplers. The TLDs respond to total ionizing dose, most of which is contributed to by the trapped electrons for locations with less than 100 mils of... electrons , slab config. Space Station Calc., 1 yr dose, protons + electrons 7th TLD on W2-15 gave anomalously low reading BREB =Boeing Radiation

  20. The Estimation of Gestational Age at Birth in Database Studies.

    PubMed

    Eberg, Maria; Platt, Robert W; Filion, Kristian B

    2017-11-01

    Studies on the safety of prenatal medication use require valid estimation of the pregnancy duration. However, gestational age is often incompletely recorded in administrative and clinical databases. Our objective was to compare different approaches to estimating the pregnancy duration. Using data from the Clinical Practice Research Datalink and Hospital Episode Statistics, we examined the following four approaches to estimating missing gestational age: (1) generalized estimating equations for longitudinal data; (2) multiple imputation; (3) estimation based on fetal birth weight and sex; and (4) conventional approaches that assigned a fixed value (39 weeks for all or 39 weeks for full term and 35 weeks for preterm). The gestational age recorded in Hospital Episode Statistics was considered the gold standard. We conducted a simulation study comparing the described approaches in terms of estimated bias and mean square error. A total of 25,929 infants from 22,774 mothers were included in our "gold standard" cohort. The smallest average absolute bias was observed for the generalized estimating equation that included birth weight, while the largest absolute bias occurred when assigning 39-week gestation to all those with missing values. The smallest mean square errors were detected with generalized estimating equations while multiple imputation had the highest mean square errors. The use of generalized estimating equations resulted in the most accurate estimation of missing gestational age when birth weight information was available. In the absence of birth weight, assignment of fixed gestational age based on term/preterm status may be the optimal approach.

  1. Empirical likelihood method for non-ignorable missing data problems.

    PubMed

    Guan, Zhong; Qin, Jing

    2017-01-01

    Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.

  2. The Candy Crush Sweet Tooth: How 'Near-misses' in Candy Crush Increase Frustration, and the Urge to Continue Gameplay.

    PubMed

    Larche, Chanel J; Musielak, Natalia; Dixon, Mike J

    2017-06-01

    Like many gambling games, the exceedingly popular and lucrative smartphone game "Candy Crush" features near-miss outcomes. In slot machines, a near-miss involves getting two of the needed three high-paying symbols on the pay-line (i.e., just missing the big win). In Candy Crush, the game signals when you just miss getting to the next level by one or two moves. Because near-misses in gambling games have consistently been shown to invigorate play despite being frustrating outcomes, the goal of the present study was to examine whether such near-misses trigger increases in player arousal, frustration and urge to continue play in Candy Crush. Sixty avid Candy Crush players were recruited to play the game for 30 min while having their Heart Rate, Skin Conductance Level, subjective arousal, frustration and urge to play recorded for three types of outcomes: wins (where they level up), losses (where they don't come close to levelling up), and near-misses (where they just miss levelling up). Near-misses were more arousing than losses as indexed by increased heart rate and greater subjective arousal. Near-misses were also subjectively rated as the most frustrating of all outcomes. Most importantly, of any type of outcome, near-misses triggered the most substantial urge to continue play. These findings suggest that near-misses in Candy Crush play a role in player commitment to the game, and may contribute to players playing longer than intended.

  3. The effect of maternal near miss on adverse infant nutritional outcomes.

    PubMed

    Zanardi, Dulce M; Moura, Erly C; Santos, Leonor P; Leal, Maria C; Cecatti, Jose G

    2016-10-01

    To evaluate the association between self-reported maternal near miss and adverse nutritional status in children under one year of age. This study is a secondary analysis of a study in which women who took their children under one year of age to the national vaccine campaign were interviewed. The self-reported condition of maternal near miss used the criteria of Intensive Care Unit admission; eclampsia; blood transfusion and hysterectomy; and their potential associations with any type of nutritional disorder in children, including deficits in weight-for-age, deficits in height-for-age, obesity and breastfeeding. The rates of near miss for the country, regions and states were initially estimated. The relative risks of infant adverse nutritional status according to near miss and maternal/childbirth characteristics were estimated with their 95% CIs using bivariate and multiple analyses. The overall prevalence of near miss was 2.9% and was slightly higher for the Legal Amazon than for other regions. No significant associations were found with nutritional disorders in children. Only a 12% decrease in overall maternal breastfeeding was associated with near miss. Living in the countryside and child over 6 months of age increased the risk of altered nutritional status by approximately 15%, while female child gender decreased this risk by 30%. Maternal near miss was not associated with an increased risk of any alteration in infant nutritional status. There was no association between maternal near miss and altered nutritional status in children up to one year of age. The risk of infant adverse nutritional status was greater in women living in the countryside, for children over 6 months of age and for male gender.

  4. 75 FR 61784 - Proposed Collection; Comment Request for Review of a Revised Information Collection

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-06

    ... response time of ten minutes per form reporting a missing check is estimated; the same amount of time is needed to report the missing checks or electronic funds transfer (EFT) payments using the telephone. The...

  5. Comparing ground-penetrating radar (GPR) techniques in 18th-century yard spaces

    NASA Astrophysics Data System (ADS)

    Carducci, Christiane M.

    Yards surrounding historical homesteads are the liminal space between private houses and public space, and contain artifactural and structural remains that help us understand how the residents interfaced with the world. Comparing different yards means collecting reliable evidence, and what is missing is just as important as what is found. Excavations can rely on randomly placed 50-cm shovel test pits to locate features, but this can miss important features. Shallow geophysics, in particular ground-penetrating radar (GPR), can be used to identify features and reliably and efficiently collect evidence. GPR is becoming more integrated into archaeological investigations due to the potential to quickly and nondestructively identify archaeological features and to recent advancements in processing software that make these methods more user-friendly. The most efficacious GPR surveys must take into consideration what is expected to be below the surface, what features look like in GPR outputs, the best methods for detecting features, and the limitations of GPR surveys. Man-made landscape features are expected to have existed within yard spaces, and the alteration of these features shows how the domestic economy of the residence changed through time. This study creates an inventory of these features. By producing a standardized sampling method for GPR in yard spaces, archaeologists can quickly map subsurface features and carry out broad comparisons between yards. To determine the most effective sampling method, several GPR surveys were conducted at the 18th-century Durant-Kenrick House in Newton, Massachusetts, using varied line spacing, line direction, and bin size. Examples of the GPR signatures of features, obtained using GPR-Slice software, from the Durant-Kenrick House and similar sites were analyzed. The efficacy of each method was determined based on the number of features distinguished, clarity of the results, and the time involved. The survey at Newton showed that ground surface conditions are extremely important when using GPR. Furthermore, GPR and archaeological excavations together provide the most complete interpretation because GPR has the ability to detect large-scale features that might be missed with test units, while excavation provides more detailed information, finds small-scale objects, and can be used to test false negatives seen in GPR surveys.

  6. A novel application of the Intent to Attend assessment to reduce bias due to missing data in a randomized controlled clinical trial

    PubMed Central

    Rabideau, Dustin J; Nierenberg, Andrew A; Sylvia, Louisa G; Friedman, Edward S.; Bowden, Charles L.; Thase, Michael E.; Ketter, Terence; Ostacher, Michael J.; Reilly-Harrington, Noreen; Iosifescu, Dan V.; Calabrese, Joseph R.; Leon, Andrew C.; Schoenfeld, David A

    2014-01-01

    Background Missing data are unavoidable in most randomized controlled clinical trials, especially when measurements are taken repeatedly. If strong assumptions about the missing data are not accurate, crude statistical analyses are biased and can lead to false inferences. Furthermore, if we fail to measure all predictors of missing data, we may not be able to model the missing data process sufficiently. In longitudinal randomized trials, measuring a patient's intent to attend future study visits may help to address both of these problems. Leon et al. developed and included the Intent to Attend assessment in the Lithium Treatment—Moderate dose Use Study (LiTMUS), aiming to remove bias due to missing data from the primary study hypothesis [1]. Purpose The purpose of this study is to assess the performance of the Intent to Attend assessment with regard to its use in a sensitivity analysis of missing data. Methods We fit marginal models to assess whether a patient's self-rated intent predicted actual study adherence. We applied inverse probability of attrition weighting (IPAW) coupled with patient intent to assess whether there existed treatment group differences in response over time. We compared the IPAW results to those obtained using other methods. Results Patient-rated intent predicted missed study visits, even when adjusting for other predictors of missing data. On average, the hazard of retention increased by 19% for every one-point increase in intent. We also found that more severe mania, male gender, and a previously missed visit predicted subsequent absence. Although we found no difference in response between the randomized treatment groups, IPAW increased the estimated group difference over time. Limitations LiTMUS was designed to limit missed study visits, which may have attenuated the effects of adjusting for missing data. Additionally, IPAW can be less efficient and less powerful than maximum likelihood or Bayesian estimators, given that the parametric model is well-specified. Conclusions In LiTMUS, the Intent to Attend assessment predicted missed study visits. This item was incorporated into our IPAW models and helped reduce bias due to informative missing data. This analysis should both encourage and facilitate future use of the Intent to Attend assessment along with IPAW to address missing data in a randomized trial. PMID:24872362

  7. Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution.

    PubMed

    Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip

    2018-06-01

    Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  8. Statistical theory and methodology for remote sensing data analysis with special emphasis on LACIE

    NASA Technical Reports Server (NTRS)

    Odell, P. L.

    1975-01-01

    Crop proportion estimators for determining crop acreage through the use of remote sensing were evaluated. Several studies of these estimators were conducted, including an empirical comparison of the different estimators (using actual data) and an empirical study of the sensitivity (robustness) of the class of mixture estimators. The effect of missing data upon crop classification procedures is discussed in detail including a simulation of the missing data effect. The final problem addressed is that of taking yield data (bushels per acre) gathered at several yield stations and extrapolating these values over some specified large region. Computer programs developed in support of some of these activities are described.

  9. Missing data handling in non-inferiority and equivalence trials: A systematic review.

    PubMed

    Rabe, Brooke A; Day, Simon; Fiero, Mallorie H; Bell, Melanie L

    2018-05-25

    Non-inferiority (NI) and equivalence clinical trials test whether a new treatment is therapeutically no worse than, or equivalent to, an existing standard of care. Missing data in clinical trials have been shown to reduce statistical power and potentially bias estimates of effect size; however, in NI and equivalence trials, they present additional issues. For instance, they may decrease sensitivity to differences between treatment groups and bias toward the alternative hypothesis of NI (or equivalence). Our primary aim was to review the extent of and methods for handling missing data (model-based methods, single imputation, multiple imputation, complete case), the analysis sets used (Intention-To-Treat, Per-Protocol, or both), and whether sensitivity analyses were used to explore departures from assumptions about the missing data. We conducted a systematic review of NI and equivalence trials published between May 2015 and April 2016 by searching the PubMed database. Articles were reviewed primarily by 2 reviewers, with 6 articles reviewed by both reviewers to establish consensus. Of 109 selected articles, 93% reported some missing data in the primary outcome. Among those, 50% reported complete case analysis, and 28% reported single imputation approaches for handling missing data. Only 32% reported conducting analyses of both intention-to-treat and per-protocol populations. Only 11% conducted any sensitivity analyses to test assumptions with respect to missing data. Missing data are common in NI and equivalence trials, and they are often handled by methods which may bias estimates and lead to incorrect conclusions. Copyright © 2018 John Wiley & Sons, Ltd.

  10. Measuring the Association Between Body Mass Index and All-Cause Mortality in the Presence of Missing Data: Analyses From the Scottish National Diabetes Register.

    PubMed

    Read, Stephanie H; Lewis, Steff C; Halbesma, Nynke; Wild, Sarah H

    2017-04-15

    Incorrectly handling missing data can lead to imprecise and biased estimates. We describe the effect of applying different approaches to handling missing data in an analysis of the association between body mass index and all-cause mortality among people with type 2 diabetes. We used data from the Scottish diabetes register that were linked to hospital admissions data and death registrations. The analysis was based on people diagnosed with type 2 diabetes between 2004 and 2011, with follow-up until May 31, 2014. The association between body mass index and mortality was investigated using Cox proportional hazards models. Findings were compared using 4 different missing-data methods: complete-case analysis, 2 multiple-imputation models, and nearest-neighbor imputation. There were 124,451 cases of type 2 diabetes, among which there were 17,085 deaths during 787,275 person-years of follow-up. Patients with missing data (24.8%) had higher mortality than those without missing data (adjusted hazard ratio = 1.36, 95% confidence interval: 1.31, 1.41). A U-shaped relationship between body mass index and mortality was observed, with the lowest hazard ratios occurring among moderately obese people, regardless of the chosen approach for handling missing data. Missing data may affect absolute and relative risk estimates differently and should be considered in analyses of routinely collected data. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

    PubMed

    Sun, Lei; Wang, Jun; Wei, Jinmao

    2017-03-14

    The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pichara, Karim; Protopapas, Pavlos

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine howmore » classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.« less

  13. Breast Cancer and Modifiable Lifestyle Factors in Argentinean Women: Addressing Missing Data in a Case-Control Study

    PubMed Central

    Coquet, Julia Becaria; Tumas, Natalia; Osella, Alberto Ruben; Tanzi, Matteo; Franco, Isabella; Diaz, Maria Del Pilar

    2016-01-01

    A number of studies have evidenced the effect of modifiable lifestyle factors such as diet, breastfeeding and nutritional status on breast cancer risk. However, none have addressed the missing data problem in nutritional epidemiologic research in South America. Missing data is a frequent problem in breast cancer studies and epidemiological settings in general. Estimates of effect obtained from these studies may be biased, if no appropriate method for handling missing data is applied. We performed Multiple Imputation for missing values on covariates in a breast cancer case-control study of Córdoba (Argentina) to optimize risk estimates. Data was obtained from a breast cancer case control study from 2008 to 2015 (318 cases, 526 controls). Complete case analysis and multiple imputation using chained equations were the methods applied to estimate the effects of a Traditional dietary pattern and other recognized factors associated with breast cancer. Physical activity and socioeconomic status were imputed. Logistic regression models were performed. When complete case analysis was performed only 31% of women were considered. Although a positive association of Traditional dietary pattern and breast cancer was observed from both approaches (complete case analysis OR=1.3, 95%CI=1.0-1.7; multiple imputation OR=1.4, 95%CI=1.2-1.7), effects of other covariates, like BMI and breastfeeding, were only identified when multiple imputation was considered. A Traditional dietary pattern, BMI and breastfeeding are associated with the occurrence of breast cancer in this Argentinean population when multiple imputation is appropriately performed. Multiple Imputation is suggested in Latin America’s epidemiologic studies to optimize effect estimates in the future. PMID:27892664

  14. IDENTIFICATION AND CHARACTERIZATION OF MISSING AND UNACCOUNTED FOR AREA SOURCE CATEGORIES

    EPA Science Inventory

    The report identifies and characterizes missing or unaccounted for area source categories. Area source emissions of particulate matter (TSP), sulfur dioxide (SO2), oxides of nitrogen (NOx), reactive volatile organic compounds (VOCs), and carbon monoxide (CO) are estimated annuall...

  15. A comprehensive literature review of haplotyping software and methods for use with unrelated individuals.

    PubMed

    Salem, Rany M; Wessel, Jennifer; Schork, Nicholas J

    2005-03-01

    Interest in the assignment and frequency analysis of haplotypes in samples of unrelated individuals has increased immeasurably as a result of the emphasis placed on haplotype analyses by, for example, the International HapMap Project and related initiatives. Although there are many available computer programs for haplotype analysis applicable to samples of unrelated individuals, many of these programs have limitations and/or very specific uses. In this paper, the key features of available haplotype analysis software for use with unrelated individuals, as well as pooled DNA samples from unrelated individuals, are summarised. Programs for haplotype analysis were identified through keyword searches on PUBMED and various internet search engines, a review of citations from retrieved papers and personal communications, up to June 2004. Priority was given to functioning computer programs, rather than theoretical models and methods. The available software was considered in light of a number of factors: the algorithm(s) used, algorithm accuracy, assumptions, the accommodation of genotyping error, implementation of hypothesis testing, handling of missing data, software characteristics and web-based implementations. Review papers comparing specific methods and programs are also summarised. Forty-six haplotyping programs were identified and reviewed. The programs were divided into two groups: those designed for individual genotype data (a total of 43 programs) and those designed for use with pooled DNA samples (a total of three programs). The accuracy of programs using various criteria are assessed and the programs are categorised and discussed in light of: algorithm and method, accuracy, assumptions, genotyping error, hypothesis testing, missing data, software characteristics and web implementation. Many available programs have limitations (eg some cannot accommodate missing data) and/or are designed with specific tasks in mind (eg estimating haplotype frequencies rather than assigning most likely haplotypes to individuals). It is concluded that the selection of an appropriate haplotyping program for analysis purposes should be guided by what is known about the accuracy of estimation, as well as by the limitations and assumptions built into a program.

  16. Estimated Environmental Exposures for MISSE-7B

    NASA Technical Reports Server (NTRS)

    Finckenor, Miria M.; Moore, Chip; Norwood, Joseph K.; Henrie, Ben; DeGroh, Kim

    2012-01-01

    This paper details the 18-month environmental exposure for Materials International Space Station Experiment 7B (MISSE-7B) ram and wake sides. This includes atomic oxygen, ultraviolet radiation, particulate radiation, thermal cycling, meteoroid/space debris impacts, and observed contamination. Atomic oxygen fluence was determined by measured mass and thickness loss of polymers of known reactivity. Diodes sensitive to ultraviolet light actively measured solar radiation incident on the experiment. Comparisons to earlier MISSE flights are discussed.

  17. Characteristics of patients with missing information on stage: a population-based study of patients diagnosed with colon, lung or breast cancer in England in 2013.

    PubMed

    Di Girolamo, Chiara; Walters, Sarah; Benitez Majano, Sara; Rachet, Bernard; Coleman, Michel P; Njagi, Edmund Njeru; Morris, Melanie

    2018-05-02

    Stage is a key predictor of cancer survival. Complete cancer staging is vital for understanding outcomes at population level and monitoring the efficacy of early diagnosis initiatives. Cancer registries usually collect details of the disease extent but staging information may be missing because a stage was never assigned to a patient or because it was not included in cancer registration records. Missing stage information introduce methodological difficulties for analysis and interpretation of results. We describe the associations between missing stage and socio-demographic and clinical characteristics of patients diagnosed with colon, lung or breast cancer in England in 2013. We assess how these associations change when completeness is high, and administrative issues are assumed to be minimal. We estimate the amount of avoidable missing stage data if high levels of completeness reached by some Clinical Commissioning Groups (CCGs), were achieved nationally. Individual cancer records were retrieved from the National Cancer Registration and linked to the Routes to Diagnosis and Hospital Episode Statistics datasets to obtain additional clinical information. We used multivariable beta binomial regression models to estimate the strength of the association between socio-demographic and clinical characteristics of patients and missing stage and to derive the amount of avoidable missing stage. Multivariable modelling showed that old age was associated with missing stage irrespective of the cancer site and independent of comorbidity score, short-term mortality and patient characteristics. This remained true for patients in the CCGs with high completeness. Applying the results from these CCGs to the whole cohort showed that approximately 70% of missing stage information was potentially avoidable. Missing stage was more frequent in older patients, including those residing in CCGs with high completeness. This disadvantage for older patients was not explained fully by the presence of comorbidity. A substantial gain in completeness could have been achieved if administrative practices were improved to the level of the highest performing areas. Reasons for missing stage information should be carefully assessed before any study, and potential distortions introduced by how missing stage is handled should be considered in order to draw the most correct inference from available statistics.

  18. Link prediction based on local weighted paths for complex networks

    NASA Astrophysics Data System (ADS)

    Yao, Yabing; Zhang, Ruisheng; Yang, Fan; Yuan, Yongna; Hu, Rongjing; Zhao, Zhili

    As a significant problem in complex networks, link prediction aims to find the missing and future links between two unconnected nodes by estimating the existence likelihood of potential links. It plays an important role in understanding the evolution mechanism of networks and has broad applications in practice. In order to improve prediction performance, a variety of structural similarity-based methods that rely on different topological features have been put forward. As one topological feature, the path information between node pairs is utilized to calculate the node similarity. However, many path-dependent methods neglect the different contributions of paths for a pair of nodes. In this paper, a local weighted path (LWP) index is proposed to differentiate the contributions between paths. The LWP index considers the effect of the link degrees of intermediate links and the connectivity influence of intermediate nodes on paths to quantify the path weight in the prediction procedure. The experimental results on 12 real-world networks show that the LWP index outperforms other seven prediction baselines.

  19. Survival analysis for the missing censoring indicator model using kernel density estimation techniques

    PubMed Central

    Subramanian, Sundarraman

    2008-01-01

    This article concerns asymptotic theory for a new estimator of a survival function in the missing censoring indicator model of random censorship. Specifically, the large sample results for an inverse probability-of-non-missingness weighted estimator of the cumulative hazard function, so far not available, are derived, including an almost sure representation with rate for a remainder term, and uniform strong consistency with rate of convergence. The estimator is based on a kernel estimate for the conditional probability of non-missingness of the censoring indicator. Expressions for its bias and variance, in turn leading to an expression for the mean squared error as a function of the bandwidth, are also obtained. The corresponding estimator of the survival function, whose weak convergence is derived, is asymptotically efficient. A numerical study, comparing the performances of the proposed and two other currently existing efficient estimators, is presented. PMID:18953423

  20. Survival analysis for the missing censoring indicator model using kernel density estimation techniques.

    PubMed

    Subramanian, Sundarraman

    2006-01-01

    This article concerns asymptotic theory for a new estimator of a survival function in the missing censoring indicator model of random censorship. Specifically, the large sample results for an inverse probability-of-non-missingness weighted estimator of the cumulative hazard function, so far not available, are derived, including an almost sure representation with rate for a remainder term, and uniform strong consistency with rate of convergence. The estimator is based on a kernel estimate for the conditional probability of non-missingness of the censoring indicator. Expressions for its bias and variance, in turn leading to an expression for the mean squared error as a function of the bandwidth, are also obtained. The corresponding estimator of the survival function, whose weak convergence is derived, is asymptotically efficient. A numerical study, comparing the performances of the proposed and two other currently existing efficient estimators, is presented.

  1. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    PubMed

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. SAMPLE SIZE FOR SEASONAL MEAN CONCENTRATION, DEPOSITION VELOCITY AND DEPOSITION: A RESAMPLING STUDY

    EPA Science Inventory

    Methodologies are described to assign confidence statements to seasonal means of concentration (C), deposition velocity (V J, and deposition categorized by species/parameters, sites, and seasons in the presence of missing data. Estimators of seasonal means with missing weekly dat...

  3. 40 CFR 75.1 - Purpose and scope.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... monitoring systems and provisions to account for missing data from certified continuous emission monitoring... estimation procedures for missing data are included in appendix C to this part. Optional protocols for...), and carbon dioxide (CO2) emissions, volumetric flow, and opacity data from affected units under the...

  4. 40 CFR 75.1 - Purpose and scope.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... monitoring systems and provisions to account for missing data from certified continuous emission monitoring... estimation procedures for missing data are included in appendix C to this part. Optional protocols for...), and carbon dioxide (CO2) emissions, volumetric flow, and opacity data from affected units under the...

  5. 40 CFR 75.1 - Purpose and scope.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... monitoring systems and provisions to account for missing data from certified continuous emission monitoring... estimation procedures for missing data are included in appendix C to this part. Optional protocols for...), and carbon dioxide (CO2) emissions, volumetric flow, and opacity data from affected units under the...

  6. 76 FR 12999 - Submission for OMB Review; Comment Request for Review of a Revised Information Collection: (OMB...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-09

    ...,600 are reported by telephone. A response time of ten minutes per form reporting a missing check is estimated; the same amount of time is needed to report the missing checks or electronic funds transfer (EFT...

  7. Estimating Interaction Effects With Incomplete Predictor Variables

    PubMed Central

    Enders, Craig K.; Baraldi, Amanda N.; Cham, Heining

    2014-01-01

    The existing missing data literature does not provide a clear prescription for estimating interaction effects with missing data, particularly when the interaction involves a pair of continuous variables. In this article, we describe maximum likelihood and multiple imputation procedures for this common analysis problem. We outline 3 latent variable model specifications for interaction analyses with missing data. These models apply procedures from the latent variable interaction literature to analyses with a single indicator per construct (e.g., a regression analysis with scale scores). We also discuss multiple imputation for interaction effects, emphasizing an approach that applies standard imputation procedures to the product of 2 raw score predictors. We thoroughly describe the process of probing interaction effects with maximum likelihood and multiple imputation. For both missing data handling techniques, we outline centering and transformation strategies that researchers can implement in popular software packages, and we use a series of real data analyses to illustrate these methods. Finally, we use computer simulations to evaluate the performance of the proposed techniques. PMID:24707955

  8. The estimated cost of "no-shows" in an academic pediatric neurology clinic.

    PubMed

    Guzek, Lindsay M; Gentry, Shelley D; Golomb, Meredith R

    2015-02-01

    Missed appointments ("no-shows") represent an important source of lost revenue for academic medical centers. The goal of this study was to examine the costs of "no-shows" at an academic pediatric neurology outpatient clinic. This was a retrospective cohort study of patients who missed appointments at an academic pediatric neurology outpatient clinic during 1 academic year. Revenue lost was estimated based on average reimbursement for different insurance types and visit types. The yearly "no-show" rate was 26%. Yearly revenue lost from missed appointments was $257,724.57, and monthly losses ranged from $15,652.33 in October 2013 to $27,042.44 in January 2014. The yearly revenue lost from missed appointments at the academic pediatric neurology clinic represents funds that could have been used to improve patient access and care. Further work is needed to develop strategies to decrease the no-show rate to decrease lost revenue and improve patient care and access. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. A Method for Estimating Missing Hourly Temperatures Using Daily Maximum and Minimum Temperatures

    DTIC Science & Technology

    1991-08-01

    work documented by USAFETAC/PR-90/006, S/urt-Termn Hourl ’y Iernpcrature Interlyolaion, by Mal Wvalter F . Miller, December 1990. In his study, Miller...temperatures for the missing hours and concluded that the best model %as one developed by Hoogenboom and [luck (1986). The Hoogcnboom/Huck model uses a...mean of the error estimate, was determined from the following equation: - 7)) BIAS = 1 N", f (14) where the difference between the observed hourly

  10. Do word-problem features differentially affect problem difficulty as a function of students' mathematics difficulty with and without reading difficulty?

    PubMed

    Powell, Sarah R; Fuchs, Lynn S; Fuchs, Douglas; Cirino, Paul T; Fletcher, Jack M

    2009-01-01

    This study examined whether and, if so, how word-problem features differentially affect problem difficulty as a function of mathematics difficulty (MD) status: no MD (n = 109), MD only (n = 109), or MD in combination with reading difficulties (MDRD; n = 109). The problem features were problem type (total, difference, or change) and position of missing information in the number sentence representing the word problem (first, second, or third position). Students were assessed on 14 word problems near the beginning of third grade. Consistent with the hypothesis that mathematical cognition differs as a function of MD subtype, problem type affected problem difficulty differentially for MDRD versus MD-only students; however, the position of missing information in word problems did not. Implications for MD subtyping and for instruction are discussed.

  11. Missed and Delayed Diagnosis of Dementia in Primary Care: Prevalence and Contributing Factors

    PubMed Central

    Bradford, Andrea; Kunik, Mark E.; Schulz, Paul; Williams, Susan P.; Singh, Hardeep

    2009-01-01

    Dementia is a growing public health problem for which early detection may be beneficial. Currently, the diagnosis of dementia in primary care is dependent mostly on clinical suspicion based on patient symptomsor caregivers’ concerns and is prone to be missed or delayed. We conducted a systematic review of the literature to ascertain the prevalence and contributing factors for missed and delayed dementia diagnoses in primary care. Prevalence of missed and delayed diagnosis was estimated by abstracting quantitative data from studies of diagnostic sensitivity among primary care providers. Possible predictors and contributory factors were determined from the text of quantitative and qualitative studies of patient-, caregiver-, provider-, and system-related barriers. Overall estimates of diagnostic sensitivity varied among studies and appeared to be in part a function of dementia severity, degree of patient impairment, dementia subtype, and frequency of patient-provider contact. Major contributory factors included problems with attitudes and patient-provider communication, educational deficits, and system resource constraints. The true prevalence of missed and delayed diagnoses of dementia is unknown but appears to be high. Until the case for dementia screening becomes more compelling, efforts to promote timely detection should focus on removing barriers to diagnosis. PMID:19568149

  12. A Statistical Method for Estimating Missing GHG Emissions in Bottom-Up Inventories: The Case of Fossil Fuel Combustion in Industry in the Bogota Region, Colombia

    NASA Astrophysics Data System (ADS)

    Jimenez-Pizarro, R.; Rojas, A. M.; Pulido-Guio, A. D.

    2012-12-01

    The development of environmentally, socially and financially suitable greenhouse gas (GHG) mitigation portfolios requires detailed disaggregation of emissions by activity sector, preferably at the regional level. Bottom-up (BU) emission inventories are intrinsically disaggregated, but although detailed, they are frequently incomplete. Missing and erroneous activity data are rather common in emission inventories of GHG, criteria and toxic pollutants, even in developed countries. The fraction of missing and erroneous data can be rather large in developing country inventories. In addition, the cost and time for obtaining or correcting this information can be prohibitive or can delay the inventory development. This is particularly true for regional BU inventories in the developing world. Moreover, a rather common practice is to disregard or to arbitrarily impute low default activity or emission values to missing data, which typically leads to significant underestimation of the total emissions. Our investigation focuses on GHG emissions by fossil fuel combustion in industry in the Bogota Region, composed by Bogota and its adjacent, semi-rural area of influence, the Province of Cundinamarca. We found that the BU inventories for this sub-category substantially underestimate emissions when compared to top-down (TD) estimations based on sub-sector specific national fuel consumption data and regional energy intensities. Although both BU inventories have a substantial number of missing and evidently erroneous entries, i.e. information on fuel consumption per combustion unit per company, the validated energy use and emission data display clear and smooth frequency distributions, which can be adequately fitted to bimodal log-normal distributions. This is not unexpected as industrial plant sizes are typically log-normally distributed. Moreover, our statistical tests suggest that industrial sub-sectors, as classified by the International Standard Industrial Classification (ISIC), are also well represented by log-normal distributions. Using the validated data, we tested several missing data estimation procedures, including Montecarlo sampling of the real and fitted distributions, and a per ISIC estimation based on bootstrap-calculated mean values. These results will be presented and discussed in detail. Our results suggest that the accuracy of sub-sector BU emission inventories, particularly in developing regions, could be significantly improved if they are designed and carried out to be representative sub-samples (surveys) of the actual universe of emitters. A large fraction the missing data could be subsequently estimated by robust statistical procedures provided that most of the emitters were accounted by number and ISIC.

  13. Evaluation of techniques for handling missing cost-to-charge ratios in the USA Nationwide Inpatient Sample: a simulation study.

    PubMed

    Yu, Tzy-Chyi; Zhou, Huanxue

    2015-09-01

    Evaluate performance of techniques used to handle missing cost-to-charge ratio (CCR) data in the USA Healthcare Cost and Utilization Project's Nationwide Inpatient Sample. Four techniques to replace missing CCR data were evaluated: deleting discharges with missing CCRs (complete case analysis), reweighting as recommended by Healthcare Cost and Utilization Project, reweighting by adjustment cells and hot deck imputation by adjustment cells. Bias and root mean squared error of these techniques on hospital cost were evaluated in five disease cohorts. Similar mean cost estimates would be obtained with any of the four techniques when the percentage of missing data is low (<10%). When total cost is the outcome of interest, a reweighting technique to avoid underestimation from dropping observations with missing data should be adopted.

  14. Estimation of missing values in solar radiation data using piecewise interpolation methods: Case study at Penang city

    NASA Astrophysics Data System (ADS)

    Zainudin, Mohd Lutfi; Saaban, Azizan; Bakar, Mohd Nazari Abu

    2015-12-01

    The solar radiation values have been composed by automatic weather station using the device that namely pyranometer. The device is functions to records all the radiation values that have been dispersed, and these data are very useful for it experimental works and solar device's development. In addition, for modeling and designing on solar radiation system application is needed for complete data observation. Unfortunately, lack for obtained the complete solar radiation data frequently occur due to several technical problems, which mainly contributed by monitoring device. Into encountering this matter, estimation missing values in an effort to substitute absent values with imputed data. This paper aimed to evaluate several piecewise interpolation techniques likes linear, splines, cubic, and nearest neighbor into dealing missing values in hourly solar radiation data. Then, proposed an extendable work into investigating the potential used of cubic Bezier technique and cubic Said-ball method as estimator tools. As result, methods for cubic Bezier and Said-ball perform the best compare to another piecewise imputation technique.

  15. Effects of conflict alerting system reliability and task difficulty on pilots' conflict detection with cockpit display of traffic information.

    PubMed

    Xu, Xidong; Wickens, Christopher D; Rantanen, Esa M

    2007-01-15

    A total of 24 pilots viewed dynamic encounters between their own aircraft and an intruder aircraft on a 2-D cockpit display of traffic information (CDTI) and estimated the point and time of closest approach. A three-level alerting system provided a correct categorical estimate of the projected miss distance on 83% of the trials. The remaining 17% of alerts were equally divided between misses and false alarms, of large and small magnitude. Roughly half the pilots depended on automation to improve estimation of miss distance relative to the baseline pilots, who viewed identical trials without the aid of automated alerts. Moreover, they did so more on the more difficult traffic trials resulting in improved performance on the 83% correct automation trials without causing harm on the 17% automation-error trials, compared to the baseline group. The automated alerts appeared to lead pilots to inspect the raw data more closely. While assisting the accurate prediction of miss distance, the automation led to an underestimate of the time remaining until the point of closest approach. The results point to the benefits of even imperfect automation in the strategic alerts characteristic of the CDTI, at least as long as this reliability remains high (above 80%).

  16. Prevalence and Correlates of Missing Meals Among High School Students-United States, 2010.

    PubMed

    Demissie, Zewditu; Eaton, Danice K; Lowry, Richard; Nihiser, Allison J; Foltz, Jennifer L

    2018-01-01

    To determine the prevalence and correlates of missing meals among adolescents. The 2010 National Youth Physical Activity and Nutrition Study, a cross-sectional study. School based. A nationally representative sample of 11 429 high school students. Breakfast, lunch, and dinner consumption; demographics; measured and perceived weight status; physical activity and sedentary behaviors; and fruit, vegetable, milk, sugar-sweetened beverage, and fast-food intake. Prevalence estimates for missing breakfast, lunch, or dinner on ≥1 day during the past 7 days were calculated. Associations between demographics and missing meals were tested. Associations of lifestyle and dietary behaviors with missing meals were examined using logistic regression controlling for sex, race/ethnicity, and grade. In 2010, 63.1% of students missed breakfast, 38.2% missed lunch, and 23.3% missed dinner; the prevalence was highest among female and non-Hispanic black students. Being overweight/obese, perceiving oneself to be overweight, and video game/computer use were associated with increased risk of missing meals. Physical activity behaviors were associated with reduced risk of missing meals. Students who missed breakfast were less likely to eat fruits and vegetables and more likely to consume sugar-sweetened beverages and fast food. Breakfast was the most frequently missed meal, and missing breakfast was associated with the greatest number of less healthy dietary practices. Intervention and education efforts might prioritize breakfast consumption.

  17. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

    PubMed Central

    Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2016-01-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885

  18. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

    PubMed

    Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2017-06-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.

  19. Advanced Issues in Propensity Scores: Longitudinal and Missing Data

    ERIC Educational Resources Information Center

    Kupzyk, Kevin A.; Beal, Sarah J.

    2017-01-01

    In order to investigate causality in situations where random assignment is not possible, propensity scores can be used in regression adjustment, stratification, inverse-probability treatment weighting, or matching. The basic concepts behind propensity scores have been extensively described. When data are longitudinal or missing, the estimation and…

  20. Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data

    ERIC Educational Resources Information Center

    Lee, Sik-Yum

    2006-01-01

    A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…

  1. 40 CFR 98.275 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... measurements of heat content and carbon content of spent pulping liquor. A re-test must be performed if the data from any annual measurements are determined to be invalid. (b) For missing measurements of the... accounting records, production rates). The owner or operator shall document and keep records of the...

  2. 40 CFR 98.475 - Procedures for estimating missing data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... is required. (a) Whenever the monitoring procedures for all facilities that used flow meters covered...) Whenever the monitoring procedures of this subpart cannot be followed to measure quarterly quantity of CO2 received in containers, the most appropriate of the following missing data procedures must be followed: (1...

  3. 40 CFR 98.475 - Procedures for estimating missing data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... is required. (a) Whenever the monitoring procedures for all facilities that used flow meters covered...) Whenever the monitoring procedures of this subpart cannot be followed to measure quarterly quantity of CO2 received in containers, the most appropriate of the following missing data procedures must be followed: (1...

  4. 40 CFR 98.475 - Procedures for estimating missing data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... is required. (a) Whenever the monitoring procedures for all facilities that used flow meters covered...) Whenever the monitoring procedures of this subpart cannot be followed to measure quarterly quantity of CO2 received in containers, the most appropriate of the following missing data procedures must be followed: (1...

  5. 40 CFR 98.475 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... is required. (a) Whenever the monitoring procedures for all facilities that used flow meters covered...) Whenever the monitoring procedures of this subpart cannot be followed to measure quarterly quantity of CO2 received in containers, the most appropriate of the following missing data procedures must be followed: (1...

  6. Loose fusion based on SLAM and IMU for indoor environment

    NASA Astrophysics Data System (ADS)

    Zhu, Haijiang; Wang, Zhicheng; Zhou, Jinglin; Wang, Xuejing

    2018-04-01

    The simultaneous localization and mapping (SLAM) method based on the RGB-D sensor is widely researched in recent years. However, the accuracy of the RGB-D SLAM relies heavily on correspondence feature points, and the position would be lost in case of scenes with sparse textures. Therefore, plenty of fusion methods using the RGB-D information and inertial measurement unit (IMU) data have investigated to improve the accuracy of SLAM system. However, these fusion methods usually do not take into account the size of matched feature points. The pose estimation calculated by RGB-D information may not be accurate while the number of correct matches is too few. Thus, considering the impact of matches in SLAM system and the problem of missing position in scenes with few textures, a loose fusion method combining RGB-D with IMU is proposed in this paper. In the proposed method, we design a loose fusion strategy based on the RGB-D camera information and IMU data, which is to utilize the IMU data for position estimation when the corresponding point matches are quite few. While there are a lot of matches, the RGB-D information is still used to estimate position. The final pose would be optimized by General Graph Optimization (g2o) framework to reduce error. The experimental results show that the proposed method is better than the RGB-D camera's method. And this method can continue working stably for indoor environment with sparse textures in the SLAM system.

  7. Assessment of score- and Rasch-based methods for group comparison of longitudinal patient-reported outcomes with intermittent missing data (informative and non-informative).

    PubMed

    de Bock, Élodie; Hardouin, Jean-Benoit; Blanchin, Myriam; Le Neel, Tanguy; Kubis, Gildas; Sébille, Véronique

    2015-01-01

    The purpose of this study was to identify the most adequate strategy for group comparison of longitudinal patient-reported outcomes in the presence of possibly informative intermittent missing data. Models coming from classical test theory (CTT) and item response theory (IRT) were compared. Two groups of patients' responses to dichotomous items with three times of assessment were simulated. Different cases were considered: presence or absence of a group effect and/or a time effect, a total of 100 or 200 patients, 4 or 7 items and two different values for the correlation coefficient of the latent trait between two consecutive times (0.4 or 0.9). Cases including informative and non-informative intermittent missing data were compared at different rates (15, 30 %). These simulated data were analyzed with CTT using score and mixed model (SM) and with IRT using longitudinal Rasch mixed model (LRM). The type I error, the power and the bias of the group effect estimations were compared between the two methods. This study showed that LRM performs better than SM. When the rate of missing data rose to 30 %, estimations were biased with SM mainly for informative missing data. Otherwise, LRM and SM methods were comparable concerning biases. However, regardless of the rate of intermittent missing data, power of LRM was higher compared to power of SM. In conclusion, LRM should be favored when the rate of missing data is higher than 15 %. For other cases, SM and LRM provide similar results.

  8. Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity.

    PubMed

    Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki

    2018-04-01

    Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.

  9. Using Tensor Completion Method to Achieving Better Coverage of Traffic State Estimation from Sparse Floating Car Data

    PubMed Central

    Ran, Bin; Song, Li; Cheng, Yang; Tan, Huachun

    2016-01-01

    Traffic state estimation from the floating car system is a challenging problem. The low penetration rate and random distribution make available floating car samples usually cover part space and time points of the road networks. To obtain a wide range of traffic state from the floating car system, many methods have been proposed to estimate the traffic state for the uncovered links. However, these methods cannot provide traffic state of the entire road networks. In this paper, the traffic state estimation is transformed to solve a missing data imputation problem, and the tensor completion framework is proposed to estimate missing traffic state. A tensor is constructed to model traffic state in which observed entries are directly derived from floating car system and unobserved traffic states are modeled as missing entries of constructed tensor. The constructed traffic state tensor can represent spatial and temporal correlations of traffic data and encode the multi-way properties of traffic state. The advantage of the proposed approach is that it can fully mine and utilize the multi-dimensional inherent correlations of traffic state. We tested the proposed approach on a well calibrated simulation network. Experimental results demonstrated that the proposed approach yield reliable traffic state estimation from very sparse floating car data, particularly when dealing with the floating car penetration rate is below 1%. PMID:27448326

  10. Using Tensor Completion Method to Achieving Better Coverage of Traffic State Estimation from Sparse Floating Car Data.

    PubMed

    Ran, Bin; Song, Li; Zhang, Jian; Cheng, Yang; Tan, Huachun

    2016-01-01

    Traffic state estimation from the floating car system is a challenging problem. The low penetration rate and random distribution make available floating car samples usually cover part space and time points of the road networks. To obtain a wide range of traffic state from the floating car system, many methods have been proposed to estimate the traffic state for the uncovered links. However, these methods cannot provide traffic state of the entire road networks. In this paper, the traffic state estimation is transformed to solve a missing data imputation problem, and the tensor completion framework is proposed to estimate missing traffic state. A tensor is constructed to model traffic state in which observed entries are directly derived from floating car system and unobserved traffic states are modeled as missing entries of constructed tensor. The constructed traffic state tensor can represent spatial and temporal correlations of traffic data and encode the multi-way properties of traffic state. The advantage of the proposed approach is that it can fully mine and utilize the multi-dimensional inherent correlations of traffic state. We tested the proposed approach on a well calibrated simulation network. Experimental results demonstrated that the proposed approach yield reliable traffic state estimation from very sparse floating car data, particularly when dealing with the floating car penetration rate is below 1%.

  11. Methods for using clinical laboratory test results as baseline confounders in multi-site observational database studies when missing data are expected.

    PubMed

    Raebel, Marsha A; Shetterly, Susan; Lu, Christine Y; Flory, James; Gagne, Joshua J; Harrell, Frank E; Haynes, Kevin; Herrinton, Lisa J; Patorno, Elisabetta; Popovic, Jennifer; Selvan, Mano; Shoaibi, Azadeh; Wang, Xingmei; Roy, Jason

    2016-07-01

    Our purpose was to quantify missing baseline laboratory results, assess predictors of missingness, and examine performance of missing data methods. Using the Mini-Sentinel Distributed Database from three sites, we selected three exposure-outcome scenarios with laboratory results as baseline confounders. We compared hazard ratios (HRs) or risk differences (RDs) and 95% confidence intervals (CIs) from models that omitted laboratory results, included only available results (complete cases), and included results after applying missing data methods (multiple imputation [MI] regression, MI predictive mean matching [PMM] indicator). Scenario 1 considered glucose among second-generation antipsychotic users and diabetes. Across sites, glucose was available for 27.7-58.9%. Results differed between complete case and missing data models (e.g., olanzapine: HR 0.92 [CI 0.73, 1.12] vs 1.02 [0.90, 1.16]). Across-site models employing different MI approaches provided similar HR and CI; site-specific models provided differing estimates. Scenario 2 evaluated creatinine among individuals starting high versus low dose lisinopril and hyperkalemia. Creatinine availability: 44.5-79.0%. Results differed between complete case and missing data models (e.g., HR 0.84 [CI 0.77, 0.92] vs. 0.88 [0.83, 0.94]). HR and CI were identical across MI methods. Scenario 3 examined international normalized ratio (INR) among warfarin users starting interacting versus noninteracting antimicrobials and bleeding. INR availability: 20.0-92.9%. Results differed between ignoring INR versus including INR using missing data methods (e.g., RD 0.05 [CI -0.03, 0.13] vs 0.09 [0.00, 0.18]). Indicator and PMM methods gave similar estimates. Multi-site studies must consider site variability in missing data. Different missing data methods performed similarly. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  12. Estimation of the frequency of occult mutations for an autosomal recessive disease in the presence of genetic heterogeneity: application to genetic hearing loss disorders.

    PubMed

    Kimberling, William J

    2005-11-01

    The routine testing for pathologic mutation(s) in a patient's DNA has become the foundation of modern molecular genetic diagnosis. It is especially valuable when the phenotype shows genetic heterogeneity, and its importance will grow as treatments become genotype specific. However, the technology of mutation detection is imperfect and mutations are often missed. This can be especially troublesome when dealing with a recessive disorder where the combination of genetic heterogeneity and missed mutation creates an imprecision in the genotypic assessment of individuals who do not appear to have the expected complement of two pathologic mutations. This article describes a statistical approach to the estimation of the likelihood of a genetic diagnosis under these conditions. In addition to providing a means of testing for missed mutations, it also provides a method of estimating and testing for the presence of genetic heterogeneity in the absence of linkage data. Gene frequencies as well as estimates of sensitivity and specificity can be obtained as well. The test is applied to GJB2 recessive nonsyndromic deafness, Usher syndrome types Ib and IIa, and Pendred-enlarged vestibular aqueduct syndrome. Copyright 2005 Wiley-Liss, Inc.

  13. Missed rib fractures on evaluation of initial chest CT for trauma patients: pattern analysis and diagnostic value of coronal multiplanar reconstruction images with multidetector row CT.

    PubMed

    Cho, S H; Sung, Y M; Kim, M S

    2012-10-01

    The objective of this study was to review the prevalence and radiological features of rib fractures missed on initial chest CT evaluation, and to examine the diagnostic value of additional coronal images in a large series of trauma patients. 130 patients who presented to an emergency room for blunt chest trauma underwent multidetector row CT of the thorax within the first hour during their stay, and had follow-up CT or bone scans as diagnostic gold standards. Images were evaluated on two separate occasions: once with axial images and once with both axial and coronal images. The detection rates of missed rib fractures were compared between readings using a non-parametric method of clustered data. In the cases of missed rib fractures, the shapes, locations and associated fractures were evaluated. 58 rib fractures were missed with axial images only and 52 were missed with both axial and coronal images (p=0.088). The most common shape of missed rib fractures was buckled (56.9%), and the anterior arc (55.2%) was most commonly involved. 21 (36.2%) missed rib fractures had combined fractures on the same ribs, and 38 (65.5%) were accompanied by fracture on neighbouring ribs. Missed rib fractures are not uncommon, and radiologists should be familiar with buckle fractures, which are frequently missed. Additional coronal imagescan be helpful in the diagnosis of rib fractures that are not seen on axial images.

  14. Missed doses of oral antihyperglycemic medications in US adults with type 2 diabetes mellitus: prevalence and self-reported reasons.

    PubMed

    Vietri, Jeffrey T; Wlodarczyk, Catherine S; Lorenzo, Rose; Rajpathak, Swapnil

    2016-09-01

    Adherence to antihyperglycemic medication is thought to be suboptimal, but the proportion of patients missing doses, the number of doses missed, and reasons for missing are not well described. This survey was conducted to estimate the prevalence of and reasons for missed doses of oral antihyperglycemic medications among US adults with type 2 diabetes mellitus, and to explore associations between missed doses and health outcomes. The study was a cross-sectional patient survey. Respondents were contacted via a commercial survey panel and completed an on-line questionnaire via the Internet. Respondents provided information about their use of oral antihyperglycemic medications including doses missed in the prior 4 weeks, personal characteristics, and health outcomes. Weights were calculated to project the prevalence to the US adult population with type 2 diabetes mellitus. Outcomes were compared according to number of doses missed in the past 4 weeks using bivariate statistics and generalized linear models. Approximately 30% of adult patients with type 2 diabetes mellitus reported missing or reducing ≥1 dose of oral antihyperglycemic medication in the prior 4 weeks. Accidental missing was more commonly reported than purposeful skipping, with forgetting the most commonly reported reason. The timing of missed doses suggested respondents had also forgotten about doses missed, so the prevalence of missed doses is likely higher than reported. Outcomes were poorer among those who reported missing three or more doses in the prior 4 weeks. A substantial number of US adults with type 2 diabetes mellitus miss doses of their oral antihyperglycemic medications.

  15. To what degree does the missing-data technique influence the estimated growth in learning strategies over time? A tutorial example of sensitivity analysis for longitudinal data.

    PubMed

    Coertjens, Liesje; Donche, Vincent; De Maeyer, Sven; Vanthournout, Gert; Van Petegem, Peter

    2017-01-01

    Longitudinal data is almost always burdened with missing data. However, in educational and psychological research, there is a large discrepancy between methodological suggestions and research practice. The former suggests applying sensitivity analysis in order to the robustness of the results in terms of varying assumptions regarding the mechanism generating the missing data. However, in research practice, participants with missing data are usually discarded by relying on listwise deletion. To help bridge the gap between methodological recommendations and applied research in the educational and psychological domain, this study provides a tutorial example of sensitivity analysis for latent growth analysis. The example data concern students' changes in learning strategies during higher education. One cohort of students in a Belgian university college was asked to complete the Inventory of Learning Styles-Short Version, in three measurement waves. A substantial number of students did not participate on each occasion. Change over time in student learning strategies was assessed using eight missing data techniques, which assume different mechanisms for missingness. The results indicated that, for some learning strategy subscales, growth estimates differed between the models. Guidelines in terms of reporting the results from sensitivity analysis are synthesised and applied to the results from the tutorial example.

  16. Analyzing semi-competing risks data with missing cause of informative terminal event.

    PubMed

    Zhou, Renke; Zhu, Hong; Bondy, Melissa; Ning, Jing

    2017-02-28

    Cancer studies frequently yield multiple event times that correspond to landmarks in disease progression, including non-terminal events (i.e., cancer recurrence) and an informative terminal event (i.e., cancer-related death). Hence, we often observe semi-competing risks data. Work on such data has focused on scenarios in which the cause of the terminal event is known. However, in some circumstances, the information on cause for patients who experience the terminal event is missing; consequently, we are not able to differentiate an informative terminal event from a non-informative terminal event. In this article, we propose a method to handle missing data regarding the cause of an informative terminal event when analyzing the semi-competing risks data. We first consider the nonparametric estimation of the survival function for the terminal event time given missing cause-of-failure data via the expectation-maximization algorithm. We then develop an estimation method for semi-competing risks data with missing cause of the terminal event, under a pre-specified semiparametric copula model. We conduct simulation studies to investigate the performance of the proposed method. We illustrate our methodology using data from a study of early-stage breast cancer. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  17. On piecewise interpolation techniques for estimating solar radiation missing values in Kedah

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saaban, Azizan; Zainudin, Lutfi; Bakar, Mohd Nazari Abu

    2014-12-04

    This paper discusses the use of piecewise interpolation method based on cubic Ball and Bézier curves representation to estimate the missing value of solar radiation in Kedah. An hourly solar radiation dataset is collected at Alor Setar Meteorology Station that is taken from Malaysian Meteorology Deparment. The piecewise cubic Ball and Bézier functions that interpolate the data points are defined on each hourly intervals of solar radiation measurement and is obtained by prescribing first order derivatives at the starts and ends of the intervals. We compare the performance of our proposed method with existing methods using Root Mean Squared Errormore » (RMSE) and Coefficient of Detemination (CoD) which is based on missing values simulation datasets. The results show that our method is outperformed the other previous methods.« less

  18. On analyzing ordinal data when responses and covariates are both missing at random.

    PubMed

    Rana, Subrata; Roy, Surupa; Das, Kalyan

    2016-08-01

    In many occasions, particularly in biomedical studies, data are unavailable for some responses and covariates. This leads to biased inference in the analysis when a substantial proportion of responses or a covariate or both are missing. Except a few situations, methods for missing data have earlier been considered either for missing response or for missing covariates, but comparatively little attention has been directed to account for both missing responses and missing covariates, which is partly attributable to complexity in modeling and computation. This seems to be important as the precise impact of substantial missing data depends on the association between two missing data processes as well. The real difficulty arises when the responses are ordinal by nature. We develop a joint model to take into account simultaneously the association between the ordinal response variable and covariates and also that between the missing data indicators. Such a complex model has been analyzed here by using the Markov chain Monte Carlo approach and also by the Monte Carlo relative likelihood approach. Their performance on estimating the model parameters in finite samples have been looked into. We illustrate the application of these two methods using data from an orthodontic study. Analysis of such data provides some interesting information on human habit. © The Author(s) 2013.

  19. Quantifying the Consequences of Missing School: Linking School Nurses to Student Absences to Standardized Achievement

    ERIC Educational Resources Information Center

    Gottfried, Michael A.

    2013-01-01

    Background/Context: Parents, policymakers, and researchers uphold that missing school has negative implications on schooling success, particularly for students in urban schools. However, it has thus far been an empirical challenge within educational research to estimate the true effect that absences have on achievement outcomes. This study…

  20. A hierarchical model of daily stream temperature using air-water temperature synchronization, autocorrelation, and time lags

    USGS Publications Warehouse

    Letcher, Benjamin; Hocking, Daniel; O'Neil, Kyle; Whiteley, Andrew R.; Nislow, Keith H.; O'Donnell, Matthew

    2016-01-01

    Water temperature is a primary driver of stream ecosystems and commonly forms the basis of stream classifications. Robust models of stream temperature are critical as the climate changes, but estimating daily stream temperature poses several important challenges. We developed a statistical model that accounts for many challenges that can make stream temperature estimation difficult. Our model identifies the yearly period when air and water temperature are synchronized, accommodates hysteresis, incorporates time lags, deals with missing data and autocorrelation and can include external drivers. In a small stream network, the model performed well (RMSE = 0.59°C), identified a clear warming trend (0.63 °C decade−1) and a widening of the synchronized period (29 d decade−1). We also carefully evaluated how missing data influenced predictions. Missing data within a year had a small effect on performance (∼0.05% average drop in RMSE with 10% fewer days with data). Missing all data for a year decreased performance (∼0.6 °C jump in RMSE), but this decrease was moderated when data were available from other streams in the network.

  1. Higgs boson pair production at NNLO with top quark mass effects

    NASA Astrophysics Data System (ADS)

    Grazzini, M.; Heinrich, G.; Jones, S.; Kallweit, S.; Kerner, M.; Lindert, J. M.; Mazzitelli, J.

    2018-05-01

    We consider QCD radiative corrections to Higgs boson pair production through gluon fusion in proton collisions. We combine the exact next-to-leading order (NLO) contribution, which features two-loop virtual amplitudes with the full dependence on the top quark mass M t , with the next-to-next-to-leading order (NNLO) corrections computed in the large- M t approximation. The latter are improved with different reweighting techniques in order to account for finite- M t effects beyond NLO. Our reference NNLO result is obtained by combining one-loop double-real corrections with full M t dependence with suitably reweighted real-virtual and double-virtual contributions evaluated in the large- M t approximation. We present predictions for inclusive cross sections in pp collisions at √{s} = 13, 14, 27 and 100 TeV and we discuss their uncertainties due to missing M t effects. Our approximated NNLO corrections increase the NLO result by an amount ranging from +12% at √{s}=13 TeV to +7% at √{s}=100 TeV, and the residual uncertainty of the inclusive cross section from missing M t effects is estimated to be at the few percent level. Our calculation is fully differential in the Higgs boson pair and the associated jet activity: we also present predictions for various differential distributions at √{s}=14 and 100 TeV, and discuss the size of the missing M t effects, which can be larger, especially in the tails of certain observables. Our results represent the most advanced perturbative prediction available to date for this process.

  2. Morphological feature detection for cervical cancer screening

    NASA Astrophysics Data System (ADS)

    Narayanswamy, Ramkumar; Sharpe, John P.; Duke, Heather J.; Stewart, Rosemary J.; Johnson, Kristina M.

    1995-03-01

    An optoelectronic system has been designed to pre-screen pap-smear slides and detect the suspicious cells using the hit/miss transform. Computer simulation of the algorithm tested on 184 pap-smear images detected 95% of the suspicious region as suspect while tagging just 5% of the normal regions as suspect. An optoelectronic implementation of the hit/miss transform using a 4f Vander-Lugt correlator architecture is proposed and demonstrated with experimental results.

  3. Management of Esophageal Food Impaction Varies Among Gastroenterologists and Affects Identification of Eosinophilic Esophagitis.

    PubMed

    Hiremath, Girish; Vaezi, Michael F; Gupta, Sandeep K; Acra, Sari; Dellon, Evan S

    2018-06-01

    Esophageal food impaction (EFI) is a gastrointestinal emergency requiring immediate evaluation in the emergency room (ER) and an esophagogastroduodenoscopy (EGD) for disimpaction. EFI is also a distinct presenting feature of eosinophilic esophagitis (EoE). This study aimed at understanding the management of EFI among gastroenterologists (GIs) and estimated its impact on identification of EoE in USA. GIs associated with three major gastroenterology societies based in USA were invited to participate in a web-based survey. Information on the resources available and utilized, and the clinical decision-making process related to management of EFI cases was collected and analyzed. Of 428 responses, 49% were from pediatric GIs, 86% practiced in the USA, and 78% practiced in an academic setting. Compared to the pediatric GIs, adult GIs were more likely to perform EGD in the emergency room [OR 87.96 (25.43-304.16)] and advance the food bolus into stomach [5.58 (3.08-10.12)]. Only 34% of respondents obtained esophageal biopsies during EGD, and pediatric GIs were more likely to obtain esophageal biopsies [3.49 (1.12-10.84)] compared to adult GIs. In USA, by our conservative estimates, 10,494 patients presenting to ER with EFI and at risk of EoE are likely being missed each year. EFI management varies substantially among GIs associated with three major gastroenterology societies in USA. Based on their practice patterns, the GIs in USA are likely to miss numerous EoE patients presenting to ER with EFI. Our findings highlight the need for developing and disseminating evidence-based EFI management practice guidelines.

  4. 40 CFR 98.145 - Procedures for estimating missing data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Glass Production § 98.145 Procedures for estimating... carbonate-based raw materials charged to any continuous glass melting furnace use the best available...

  5. 40 CFR 98.145 - Procedures for estimating missing data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... PROGRAMS (CONTINUED) MANDATORY GREENHOUSE GAS REPORTING Glass Production § 98.145 Procedures for estimating... carbonate-based raw materials charged to any continuous glass melting furnace use the best available...

  6. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study

    PubMed Central

    Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-01-01

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914

  7. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

    PubMed

    Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-03-15

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.

  8. Inverse probability weighting and doubly robust methods in correcting the effects of non-response in the reimbursed medication and self-reported turnout estimates in the ATH survey.

    PubMed

    Härkänen, Tommi; Kaikkonen, Risto; Virtala, Esa; Koskinen, Seppo

    2014-11-06

    To assess the nonresponse rates in a questionnaire survey with respect to administrative register data, and to correct the bias statistically. The Finnish Regional Health and Well-being Study (ATH) in 2010 was based on a national sample and several regional samples. Missing data analysis was based on socio-demographic register data covering the whole sample. Inverse probability weighting (IPW) and doubly robust (DR) methods were estimated using the logistic regression model, which was selected using the Bayesian information criteria. The crude, weighted and true self-reported turnout in the 2008 municipal election and prevalences of entitlements to specially reimbursed medication, and the crude and weighted body mass index (BMI) means were compared. The IPW method appeared to remove a relatively large proportion of the bias compared to the crude prevalence estimates of the turnout and the entitlements to specially reimbursed medication. Several demographic factors were shown to be associated with missing data, but few interactions were found. Our results suggest that the IPW method can improve the accuracy of results of a population survey, and the model selection provides insight into the structure of missing data. However, health-related missing data mechanisms are beyond the scope of statistical methods, which mainly rely on socio-demographic information to correct the results.

  9. Analyzing time-ordered event data with missed observations.

    PubMed

    Dokter, Adriaan M; van Loon, E Emiel; Fokkema, Wimke; Lameris, Thomas K; Nolet, Bart A; van der Jeugd, Henk P

    2017-09-01

    A common problem with observational datasets is that not all events of interest may be detected. For example, observing animals in the wild can difficult when animals move, hide, or cannot be closely approached. We consider time series of events recorded in conditions where events are occasionally missed by observers or observational devices. These time series are not restricted to behavioral protocols, but can be any cyclic or recurring process where discrete outcomes are observed. Undetected events cause biased inferences on the process of interest, and statistical analyses are needed that can identify and correct the compromised detection processes. Missed observations in time series lead to observed time intervals between events at multiples of the true inter-event time, which conveys information on their detection probability. We derive the theoretical probability density function for observed intervals between events that includes a probability of missed detection. Methodology and software tools are provided for analysis of event data with potential observation bias and its removal. The methodology was applied to simulation data and a case study of defecation rate estimation in geese, which is commonly used to estimate their digestive throughput and energetic uptake, or to calculate goose usage of a feeding site from dropping density. Simulations indicate that at a moderate chance to miss arrival events ( p  = 0.3), uncorrected arrival intervals were biased upward by up to a factor 3, while parameter values corrected for missed observations were within 1% of their true simulated value. A field case study shows that not accounting for missed observations leads to substantial underestimates of the true defecation rate in geese, and spurious rate differences between sites, which are introduced by differences in observational conditions. These results show that the derived methodology can be used to effectively remove observational biases in time-ordered event data.

  10. "Testing-only" visits: an assessment of missed diagnoses in clients attending sexually transmitted disease clinics.

    PubMed

    Xu, Fujie; Stoner, Bradley P; Taylor, Stephanie N; Mena, Leandro; Martin, David H; Powell, Suzanne; Markowitz, Lauri E

    2013-01-01

    At sexually transmitted disease (STD) clinics, advances in testing technology coupled with increasing demands and diminishing resources have promoted the use of testing-only visits (clinic visits with testing for STDs but no full examination) to meet increasing demands for STD services. The aims of the present study were to estimate the prevalence of STD diagnoses that could become "missed diagnoses" if patients would use testing-only visits and to examine patient characteristics associated with these potential missed diagnoses. We conducted a self-administered survey of STD-related symptoms and sexual risk behaviors in patients seeking routine clinical care at 3 STD clinics. Medical charts were abstracted to estimate the prevalence of viral STDs, trichomoniasis, and other diagnoses from standard clinical services that could become missed diagnoses. Of 2582 patients included, the median age was 24 years and 50% were women. In women, overall, 3.2% were diagnosed as having a viral STD; 9.6%, trichomoniasis; and 41.0%, vulvovaginal candidiasis or symptomatic bacterial vaginosis. The prevalence of these potential missed diagnoses varied by patient characteristics, but in women who reported no symptoms, the prevalence of trichomoniasis was still 6.3%. In men, 19.3% received a diagnosis of urethritis but tested negative for both gonorrhea and chlamydia; this prevalence varied from 15.7% in those who reported no symptoms to 32.6% in those who reported malodor. A high proportion of STD clients received diagnoses from standard care visits that would be missed by testing-only visits. When patients, even those asymptomatic, use testing-only visits, missed diagnoses of STDs or related genital tract conditions can be substantial. The potential disadvantages of testing-only visits should be weighed against the advantages of such visits.

  11. Injury risk and noise exposure in firefighter training operations

    PubMed Central

    Neitzel, Richard L.; Long, Rachel; Sun, Kan; Sayler, Stephanie; von Thaden, Terry L.

    2016-01-01

    Introduction Firefighters have high rate of injuries and illnesses, as well as exposures to high levels of noise. This study explored the relationship between noise exposure and injury among firefighters. Methods We recruited firefighters undergoing vehicle extrication and structural collapse emergency response training at a highly realistic training facility. Demographics, health status, body mass index, and history of serious injuries (i.e., injuries requiring first aid treatment, treatment in a medical clinic or office, or treatment at a hospital) were assessed at baseline, and daily activities, injury events, and near-misses were assessed daily using surveys. Participants' noise exposures were monitored for one 24-hour period using noise dosimeters. We used a mixed-effects logistic regression model to estimate the odds of injury events and near-misses associated with noise exposure as an independent variable. Results Of 56 subjects, twenty (36%) reported that they had ever suffered a serious injury during firefighting activities, and nine (16%) reported a serious injury within the past year. We estimated rates of 6.6 lifetime serious injuries per 100 FTE 16.1 serious injuries per 100 FTE within the past year. Our models indicated a significant increase in injury events and near misses among those with higher BMI, and as well as a dose-response relationship between near-misses/injuries and increasing noise levels. Noise levels >90 dBA in the 30 min prior to time of injury or near-miss were associated with substantially increased odds ratios for injury or near-miss. Our models further indicated that perceived job demands were significantly associated with increased risk of injury or near-miss. Conclusion Our results suggest that noise exposures may need to be incorporated into injury prevention programs for firefighters to reduce injuries among this high-risk occupational group. PMID:26712895

  12. Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline.

    PubMed

    Zhang, Jie; Li, Qingyang; Caselli, Richard J; Thompson, Paul M; Ye, Jieping; Wang, Yalin

    2017-06-01

    Alzheimer's Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms.

  13. H∞ filtering for discrete-time systems subject to stochastic missing measurements: a decomposition approach

    NASA Astrophysics Data System (ADS)

    Gu, Zhou; Fei, Shumin; Yue, Dong; Tian, Engang

    2014-07-01

    This paper deals with the problem of H∞ filtering for discrete-time systems with stochastic missing measurements. A new missing measurement model is developed by decomposing the interval of the missing rate into several segments. The probability of the missing rate in each subsegment is governed by its corresponding random variables. We aim to design a linear full-order filter such that the estimation error converges to zero exponentially in the mean square with a less conservatism while the disturbance rejection attenuation is constrained to a given level by means of an H∞ performance index. Based on Lyapunov theory, the reliable filter parameters are characterised in terms of the feasibility of a set of linear matrix inequalities. Finally, a numerical example is provided to demonstrate the effectiveness and applicability of the proposed design approach.

  14. Examining solutions to missing data in longitudinal nursing research.

    PubMed

    Roberts, Mary B; Sullivan, Mary C; Winchester, Suzy B

    2017-04-01

    Longitudinal studies are highly valuable in pediatrics because they provide useful data about developmental patterns of child health and behavior over time. When data are missing, the value of the research is impacted. The study's purpose was to (1) introduce a three-step approach to assess and address missing data and (2) illustrate this approach using categorical and continuous-level variables from a longitudinal study of premature infants. A three-step approach with simulations was followed to assess the amount and pattern of missing data and to determine the most appropriate imputation method for the missing data. Patterns of missingness were Missing Completely at Random, Missing at Random, and Not Missing at Random. Missing continuous-level data were imputed using mean replacement, stochastic regression, multiple imputation, and fully conditional specification (FCS). Missing categorical-level data were imputed using last value carried forward, hot-decking, stochastic regression, and FCS. Simulations were used to evaluate these imputation methods under different patterns of missingness at different levels of missing data. The rate of missingness was 16-23% for continuous variables and 1-28% for categorical variables. FCS imputation provided the least difference in mean and standard deviation estimates for continuous measures. FCS imputation was acceptable for categorical measures. Results obtained through simulation reinforced and confirmed these findings. Significant investments are made in the collection of longitudinal data. The prudent handling of missing data can protect these investments and potentially improve the scientific information contained in pediatric longitudinal studies. © 2017 Wiley Periodicals, Inc.

  15. The Costs of Supervised Classification: The Effect of Learning Task on Conceptual Flexibility

    ERIC Educational Resources Information Center

    Hoffman, Aaron B.; Rehder, Bob

    2010-01-01

    Research has shown that learning a concept via standard supervised classification leads to a focus on diagnostic features, whereas learning by inferring missing features promotes the acquisition of within-category information. Accordingly, we predicted that classification learning would produce a deficit in people's ability to draw "novel…

  16. Autism spectrum disorders and fetal hypoxia in a population-based cohort: Accounting for missing exposures via Estimation-Maximization algorithm

    PubMed Central

    2011-01-01

    Background Autism spectrum disorders (ASD) are associated with complications of pregnancy that implicate fetal hypoxia (FH); the excess of ASD in male gender is poorly understood. We tested the hypothesis that risk of ASD is related to fetal hypoxia and investigated whether this effect is greater among males. Methods Provincial delivery records (PDR) identified the cohort of all 218,890 singleton live births in the province of Alberta, Canada, between 01-01-98 and 12-31-04. These were followed-up for ASD via ICD-9 diagnostic codes assigned by physician billing until 03-31-08. Maternal and obstetric risk factors, including FH determined from blood tests of acidity (pH), were extracted from PDR. The binary FH status was missing in approximately half of subjects. Assuming that characteristics of mothers and pregnancies would be correlated with FH, we used an Estimation-Maximization algorithm to estimate HF-ASD association, allowing for both missing-at-random (MAR) and specific not-missing-at-random (NMAR) mechanisms. Results Data indicated that there was excess risk of ASD among males who were hypoxic at birth, not materially affected by adjustment for potential confounding due to birth year and socio-economic status: OR 1.13, 95%CI: 0.96, 1.33 (MAR assumption). Limiting analysis to full-term males, the adjusted OR under specific NMAR assumptions spanned 95%CI of 1.0 to 1.6. Conclusion Our results are consistent with a weak effect of fetal hypoxia on risk of ASD among males. E-M algorithm is an efficient and flexible tool for modeling missing data in the studied setting. PMID:21208442

  17. Missed rib fractures on evaluation of initial chest CT for trauma patients: pattern analysis and diagnostic value of coronal multiplanar reconstruction images with multidetector row CT

    PubMed Central

    Cho, S H; Sung, Y M; Kim, M S

    2012-01-01

    Objective The objective of this study was to review the prevalence and radiological features of rib fractures missed on initial chest CT evaluation, and to examine the diagnostic value of additional coronal images in a large series of trauma patients. Methods 130 patients who presented to an emergency room for blunt chest trauma underwent multidetector row CT of the thorax within the first hour during their stay, and had follow-up CT or bone scans as diagnostic gold standards. Images were evaluated on two separate occasions: once with axial images and once with both axial and coronal images. The detection rates of missed rib fractures were compared between readings using a non-parametric method of clustered data. In the cases of missed rib fractures, the shapes, locations and associated fractures were evaluated. Results 58 rib fractures were missed with axial images only and 52 were missed with both axial and coronal images (p=0.088). The most common shape of missed rib fractures was buckled (56.9%), and the anterior arc (55.2%) was most commonly involved. 21 (36.2%) missed rib fractures had combined fractures on the same ribs, and 38 (65.5%) were accompanied by fracture on neighbouring ribs. Conclusion Missed rib fractures are not uncommon, and radiologists should be familiar with buckle fractures, which are frequently missed. Additional coronal imagescan be helpful in the diagnosis of rib fractures that are not seen on axial images. PMID:22514102

  18. Sample size considerations for paired experimental design with incomplete observations of continuous outcomes.

    PubMed

    Zhu, Hong; Xu, Xiaohan; Ahn, Chul

    2017-01-01

    Paired experimental design is widely used in clinical and health behavioral studies, where each study unit contributes a pair of observations. Investigators often encounter incomplete observations of paired outcomes in the data collected. Some study units contribute complete pairs of observations, while the others contribute either pre- or post-intervention observations. Statistical inference for paired experimental design with incomplete observations of continuous outcomes has been extensively studied in literature. However, sample size method for such study design is sparsely available. We derive a closed-form sample size formula based on the generalized estimating equation approach by treating the incomplete observations as missing data in a linear model. The proposed method properly accounts for the impact of mixed structure of observed data: a combination of paired and unpaired outcomes. The sample size formula is flexible to accommodate different missing patterns, magnitude of missingness, and correlation parameter values. We demonstrate that under complete observations, the proposed generalized estimating equation sample size estimate is the same as that based on the paired t-test. In the presence of missing data, the proposed method would lead to a more accurate sample size estimate comparing with the crude adjustment. Simulation studies are conducted to evaluate the finite-sample performance of the generalized estimating equation sample size formula. A real application example is presented for illustration.

  19. Missing Data Treatments at the Second Level of Hierarchical Linear Models

    ERIC Educational Resources Information Center

    St. Clair, Suzanne W.

    2011-01-01

    The current study evaluated the performance of traditional versus modern MDTs in the estimation of fixed-effects and variance components for data missing at the second level of an hierarchical linear model (HLM) model across 24 different study conditions. Variables manipulated in the analysis included, (a) number of Level-2 variables with missing…

  20. Toward Best Practices in Analyzing Datasets with Missing Data: Comparisons and Recommendations

    ERIC Educational Resources Information Center

    Johnson, David R.; Young, Rebekah

    2011-01-01

    Although several methods have been developed to allow for the analysis of data in the presence of missing values, no clear guide exists to help family researchers in choosing among the many options and procedures available. We delineate these options and examine the sensitivity of the findings in a regression model estimated in three random…

Top