missing values due: Topics by Science.gov

Sample records for missing values due

Treatment of Missing Data in Workforce Education Research

ERIC Educational Resources Information Center

Gemici, Sinan; Rojewski, Jay W.; Lee, In Heok

2012-01-01

Most quantitative analyses in workforce education are affected by missing data. Traditional approaches to remedy missing data problems often result in reduced statistical power and biased parameter estimates due to systematic differences between missing and observed values. This article examines the treatment of missing data in pertinent…
A Spatiotemporal Prediction Framework for Air Pollution Based on Deep RNN

NASA Astrophysics Data System (ADS)

Fan, J.; Li, Q.; Hou, J.; Feng, X.; Karimian, H.; Lin, S.

2017-10-01

Time series data in practical applications always contain missing values due to sensor malfunction, network failure, outliers etc. In order to handle missing values in time series, as well as the lack of considering temporal properties in machine learning models, we propose a spatiotemporal prediction framework based on missing value processing algorithms and deep recurrent neural network (DRNN). By using missing tag and missing interval to represent time series patterns, we implement three different missing value fixing algorithms, which are further incorporated into deep neural network that consists of LSTM (Long Short-term Memory) layers and fully connected layers. Real-world air quality and meteorological datasets (Jingjinji area, China) are used for model training and testing. Deep feed forward neural networks (DFNN) and gradient boosting decision trees (GBDT) are trained as baseline models against the proposed DRNN. Performances of three missing value fixing algorithms, as well as different machine learning models are evaluated and analysed. Experiments show that the proposed DRNN framework outperforms both DFNN and GBDT, therefore validating the capacity of the proposed framework. Our results also provides useful insights for better understanding of different strategies that handle missing values.
Improving data sharing in research with context-free encoded missing data.

PubMed

Hoevenaar-Blom, Marieke P; Guillemont, Juliette; Ngandu, Tiia; Beishuizen, Cathrien R L; Coley, Nicola; Moll van Charante, Eric P; Andrieu, Sandrine; Kivipelto, Miia; Soininen, Hilkka; Brayne, Carol; Meiller, Yannick; Richard, Edo

2017-01-01

Lack of attention to missing data in research may result in biased results, loss of power and reduced generalizability. Registering reasons for missing values at the time of data collection, or-in the case of sharing existing data-before making data available to other teams, can save time and efforts, improve scientific value and help to prevent erroneous assumptions and biased results. To ensure that encoding of missing data is sufficient to understand the reason why data are missing, it should ideally be context-free. Therefore, 11 context-free codes of missing data were carefully designed based on three completed randomized controlled clinical trials and tested in a new randomized controlled clinical trial by an international team consisting of clinical researchers and epidemiologists with extended experience in designing and conducting trials and an Information System expert. These codes can be divided into missing due to participant and/or participation characteristics (n = 6), missing by design (n = 4), and due to a procedural error (n = 1). Broad implementation of context-free missing data encoding may enhance the possibilities of data sharing and pooling, thus allowing more powerful analyses using existing data.
Depth inpainting by tensor voting.

PubMed

Kulkarni, Mandar; Rajagopalan, Ambasamudram N

2013-06-01

Depth maps captured by range scanning devices or by using optical cameras often suffer from missing regions due to occlusions, reflectivity, limited scanning area, sensor imperfections, etc. In this paper, we propose a fast and reliable algorithm for depth map inpainting using the tensor voting (TV) framework. For less complex missing regions, local edge and depth information is utilized for synthesizing missing values. The depth variations are modeled by local planes using 3D TV, and missing values are estimated using plane equations. For large and complex missing regions, we collect and evaluate depth estimates from self-similar (training) datasets. We align the depth maps of the training set with the target (defective) depth map and evaluate the goodness of depth estimates among candidate values using 3D TV. We demonstrate the effectiveness of the proposed approaches on real as well as synthetic data.
Missing value imputation for gene expression data by tailored nearest neighbors.

PubMed

Faisal, Shahla; Tutz, Gerhard

2017-04-25

High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.
A novel approach for incremental uncertainty rule generation from databases with missing values handling: application to dynamic medical databases.

PubMed

Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos

2005-09-01

Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.
Purposeful Variable Selection and Stratification to Impute Missing FAST Data in Trauma Research

PubMed Central

Fuchs, Paul A.; del Junco, Deborah J.; Fox, Erin E.; Holcomb, John B.; Rahbar, Mohammad H.; Wade, Charles A.; Alarcon, Louis H.; Brasel, Karen J.; Bulger, Eileen M.; Cohen, Mitchell J.; Myers, John G.; Muskat, Peter; Phelan, Herb A.; Schreiber, Martin A.; Cotton, Bryan A.

2013-01-01

Background The Focused Assessment with Sonography for Trauma (FAST) exam is an important variable in many retrospective trauma studies. The purpose of this study was to devise an imputation method to overcome missing data for the FAST exam. Due to variability in patients’ injuries and trauma care, these data are unlikely to be missing completely at random (MCAR), raising concern for validity when analyses exclude patients with missing values. Methods Imputation was conducted under a less restrictive, more plausible missing at random (MAR) assumption. Patients with missing FAST exams had available data on alternate, clinically relevant elements that were strongly associated with FAST results in complete cases, especially when considered jointly. Subjects with missing data (32.7%) were divided into eight mutually exclusive groups based on selected variables that both described the injury and were associated with missing FAST values. Additional variables were selected within each group to classify missing FAST values as positive or negative, and correct FAST exam classification based on these variables was determined for patients with non-missing FAST values. Results Severe head/neck injury (odds ratio, OR=2.04), severe extremity injury (OR=4.03), severe abdominal injury (OR=1.94), no injury (OR=1.94), other abdominal injury (OR=0.47), other head/neck injury (OR=0.57) and other extremity injury (OR=0.45) groups had significant ORs for missing data; the other group odds ratio was not significant (OR=0.84). All 407 missing FAST values were imputed, with 109 classified as positive. Correct classification of non-missing FAST results using the alternate variables was 87.2%. Conclusions Purposeful imputation for missing FAST exams based on interactions among selected variables assessed by simple stratification may be a useful adjunct to sensitivity analysis in the evaluation of imputation strategies under different missing data mechanisms. This approach has the potential for widespread application in clinical and translational research and validation is warranted. Level of Evidence Level II Prognostic or Epidemiological PMID:23778515
A context-intensive approach to imputation of missing values in data sets from networks of environmental monitors.

PubMed

Larsen, Lawrence C; Shah, Mena

2016-01-01

Although networks of environmental monitors are constantly improving through advances in technology and management, instances of missing data still occur. Many methods of imputing values for missing data are available, but they are often difficult to use or produce unsatisfactory results. I-Bot (short for "Imputation Robot") is a context-intensive approach to the imputation of missing data in data sets from networks of environmental monitors. I-Bot is easy to use and routinely produces imputed values that are highly reliable. I-Bot is described and demonstrated using more than 10 years of California data for daily maximum 8-hr ozone, 24-hr PM2.5 (particulate matter with an aerodynamic diameter <2.5 μm), mid-day average surface temperature, and mid-day average wind speed. I-Bot performance is evaluated by imputing values for observed data as if they were missing, and then comparing the imputed values with the observed values. In many cases, I-Bot is able to impute values for long periods with missing data, such as a week, a month, a year, or even longer. Qualitative visual methods and standard quantitative metrics demonstrate the effectiveness of the I-Bot methodology. Many resources are expended every year to analyze and interpret data sets from networks of environmental monitors. A large fraction of those resources is used to cope with difficulties due to the presence of missing data. The I-Bot method of imputing values for such missing data may help convert incomplete data sets into virtually complete data sets that facilitate the analysis and reliable interpretation of vital environmental data.
mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.

PubMed

Lee, Geunho; Lee, Hyun Beom; Jung, Byung Hwa; Nam, Hojung

2017-07-01

Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These 'dirty data' problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine-learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open-source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp-applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.
Multiple imputation of missing passenger boarding data in the national census of ferry operators

DOT National Transportation Integrated Search

2008-08-01

This report presents findings from the 2006 National Census of Ferry Operators (NCFO) augmented : with imputed values for passengers and passenger miles. Due to the imputation procedures used to calculate missing data, totals in Table 1 may not corre...
Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

PubMed

Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

2013-09-06

Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information.

PubMed

Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor

2013-01-01

In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
Estimation of missing values in solar radiation data using piecewise interpolation methods: Case study at Penang city

NASA Astrophysics Data System (ADS)

Zainudin, Mohd Lutfi; Saaban, Azizan; Bakar, Mohd Nazari Abu

2015-12-01

The solar radiation values have been composed by automatic weather station using the device that namely pyranometer. The device is functions to records all the radiation values that have been dispersed, and these data are very useful for it experimental works and solar device's development. In addition, for modeling and designing on solar radiation system application is needed for complete data observation. Unfortunately, lack for obtained the complete solar radiation data frequently occur due to several technical problems, which mainly contributed by monitoring device. Into encountering this matter, estimation missing values in an effort to substitute absent values with imputed data. This paper aimed to evaluate several piecewise interpolation techniques likes linear, splines, cubic, and nearest neighbor into dealing missing values in hourly solar radiation data. Then, proposed an extendable work into investigating the potential used of cubic Bezier technique and cubic Said-ball method as estimator tools. As result, methods for cubic Bezier and Said-ball perform the best compare to another piecewise imputation technique.
Evaluation of missing value methods for predicting ambient BTEX concentrations in two neighbouring cities in Southwestern Ontario Canada

NASA Astrophysics Data System (ADS)

Miller, Lindsay; Xu, Xiaohong; Wheeler, Amanda; Zhang, Tianchu; Hamadani, Mariam; Ejaz, Unam

2018-05-01

High density air monitoring campaigns provide spatial patterns of pollutant concentrations which are integral in exposure assessment. Such analysis can assist with the determination of links between air quality and health outcomes, however, problems due to missing data can threaten to compromise these studies. This research evaluates four methods; mean value imputation, inverse distance weighting (IDW), inter-species ratios, and regression, to address missing spatial concentration data ranging from one missing data point up to 50% missing data. BTEX (benzene, toluene, ethylbenzene, and xylenes) concentrations were measured in Windsor and Sarnia, Ontario in the fall of 2005. Concentrations and inter-species ratios were generally similar between the two cities. Benzene (B) was observed to be higher in Sarnia, whereas toluene (T) and the T/B ratios were higher in Windsor. Using these urban, industrialized cities as case studies, this research demonstrates that using inter-species ratios or regression of the data for which there is complete information, along with one measured concentration (i.e. benzene) to predict for missing concentrations (i.e. TEX) results in good agreement between predicted and measured values. In both cities, the general trend remains that best agreement is observed for the leave-one-out scenario, followed by 10% and 25% missing, and the least agreement for the 50% missing cases. In the absence of any known concentrations IDW can provide reasonable agreement between observed and estimated concentrations for the BTEX species, and was superior over mean value imputation which was not able to preserve the spatial trend. The proposed methods can be used to fill in missing data, while preserving the general characteristics and rank order of the data which are sufficient for epidemiologic studies.
A comparison of model-based imputation methods for handling missing predictor values in a linear regression model: A simulation study

NASA Astrophysics Data System (ADS)

Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah

2017-08-01

In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
Spatial Copula Model for Imputing Traffic Flow Data from Remote Microwave Sensors.

PubMed

Ma, Xiaolei; Luan, Sen; Du, Bowen; Yu, Bin

2017-09-21

Issues of missing data have become increasingly serious with the rapid increase in usage of traffic sensors. Analyses of the Beijing ring expressway have showed that up to 50% of microwave sensors pose missing values. The imputation of missing traffic data must be urgently solved although a precise solution that cannot be easily achieved due to the significant number of missing portions. In this study, copula-based models are proposed for the spatial interpolation of traffic flow from remote traffic microwave sensors. Most existing interpolation methods only rely on covariance functions to depict spatial correlation and are unsuitable for coping with anomalies due to Gaussian consumption. Copula theory overcomes this issue and provides a connection between the correlation function and the marginal distribution function of traffic flow. To validate copula-based models, a comparison with three kriging methods is conducted. Results indicate that copula-based models outperform kriging methods, especially on roads with irregular traffic patterns. Copula-based models demonstrate significant potential to impute missing data in large-scale transportation networks.
Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

PubMed

Chiu, Chia-Chun; Chan, Shih-Yao; Wang, Chung-Ching; Wu, Wei-Sheng

2013-01-01

Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.
Dental health state utility values associated with tooth loss in two contrasting cultures.

PubMed

Nassani, M Z; Locker, D; Elmesallati, A A; Devlin, H; Mohammadi, T M; Hajizamani, A; Kay, E J

2009-08-01

The study aimed to assess the value placed on oral health states by measuring the utility of mouths in which teeth had been lost and to explore variations in utility values within and between two contrasting cultures, UK and Iran. One hundred and fifty eight patients, 84 from UK and 74 from Iran, were recruited from clinics at University-based faculties of dentistry. All had experienced tooth loss and had restored or unrestored dental spaces. They were presented with 19 different scenarios of mouths with missing teeth. Fourteen involved the loss of one tooth and five involved shortened dental arches (SDAs) with varying numbers of missing posterior teeth. Each written description was accompanied by a verbal explanation and digital pictures of mouth models. Participants were asked to indicate on a standardized Visual Analogue Scale how they would value the health of their mouth if they had lost the tooth/teeth described and the resulting space was left unrestored. With a utility value of 0.0 representing the worst possible health state for a mouth and 1.0 representing the best, the mouth with the upper central incisor missing attracted the lowest utility value in both samples (UK = 0.16; Iran = 0.06), while the one with a missing upper second molar the highest utility values (0.42, 0.39 respectively). In both countries the utility value increased as the tooth in the scenario moved from the anterior towards the posterior aspect of the mouth. There were significant differences in utility values between UK and Iranian samples for four scenarios all involving the loss of anterior teeth. These differences remained after controlling for gender, age and the state of the dentition. With respect to the SDA scenarios, a mouth with a SDA with only the second molar teeth missing in all quadrants attracted the highest utility values, while a mouth with an extreme SDA with both missing molar and premolar teeth in all quadrants attracted the lowest utility values. The study provided further evidence of the validity of the scaling approach to utility measurement in mouths with missing teeth. Some cross-cultural variations in values were observed but these should be viewed with due caution because the magnitude of the differences was small.
Spatial Copula Model for Imputing Traffic Flow Data from Remote Microwave Sensors

PubMed Central

Ma, Xiaolei; Du, Bowen; Yu, Bin

2017-01-01

Issues of missing data have become increasingly serious with the rapid increase in usage of traffic sensors. Analyses of the Beijing ring expressway have showed that up to 50% of microwave sensors pose missing values. The imputation of missing traffic data must be urgently solved although a precise solution that cannot be easily achieved due to the significant number of missing portions. In this study, copula-based models are proposed for the spatial interpolation of traffic flow from remote traffic microwave sensors. Most existing interpolation methods only rely on covariance functions to depict spatial correlation and are unsuitable for coping with anomalies due to Gaussian consumption. Copula theory overcomes this issue and provides a connection between the correlation function and the marginal distribution function of traffic flow. To validate copula-based models, a comparison with three kriging methods is conducted. Results indicate that copula-based models outperform kriging methods, especially on roads with irregular traffic patterns. Copula-based models demonstrate significant potential to impute missing data in large-scale transportation networks. PMID:28934164
Missing value imputation for microarray data: a comprehensive comparison study and a web tool

PubMed Central

2013-01-01

Background Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. Results In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. Conclusions In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses. PMID:24565220

Potential value of health information exchange for people with epilepsy: crossover patterns and missing clinical data.

PubMed

Grinspan, Zachary M; Abramson, Erika L; Banerjee, Samprit; Kern, Lisa M; Kaushal, Rainu; Shapiro, Jason S

2013-01-01

For people with epilepsy, the potential value of health information exchange (HIE) is unknown. We reviewed two years of clinical encounters for 8055 people with epilepsy from seven Manhattan hospitals. We created network graphs illustrating crossover among these hospitals for multiple encounter types, and calculated a novel metric of care fragmentation: "encounters at risk for missing clinical data." Given two hospitals, a median of 109 [range 46 - 588] patients with epilepsy had visited both. Due to this crossover, recent, relevant clinical data may be missing at the time of care frequently (44.8% of ED encounters, 34.5% inpatient, 24.9% outpatient, and 23.2% radiology). Though a smaller percentage of outpatient encounters were at risk for missing data than ED encounters, the absolute number of outpatient encounters at risk was three times higher (14,579 vs. 5041). People with epilepsy may benefit from HIE. Future HIE initiatives should prioritize outpatient access.
Assessment of BSRN radiation records for the computation of monthly means

NASA Astrophysics Data System (ADS)

Roesch, A.; Wild, M.; Ohmura, A.; Dutton, E. G.; Long, C. N.; Zhang, T.

2011-02-01

The integrity of the Baseline Surface Radiation Network (BSRN) radiation monthly averages are assessed by investigating the impact on monthly means due to the frequency of data gaps caused by missing or discarded high time resolution data. The monthly statistics, especially means, are considered to be important and useful values for climate research, model performance evaluations and for assessing the quality of satellite (time- and space-averaged) data products. The study investigates the spread in different algorithms that have been applied for the computation of monthly means from 1-min values. The paper reveals that the computation of monthly means from 1-min observations distinctly depends on the method utilized to account for the missing data. The intra-method difference generally increases with an increasing fraction of missing data. We found that a substantial fraction of the radiation fluxes observed at BSRN sites is either missing or flagged as questionable. The percentage of missing data is 4.4%, 13.0%, and 6.5% for global radiation, direct shortwave radiation, and downwelling longwave radiation, respectively. Most flagged data in the shortwave are due to nighttime instrumental noise and can reasonably be set to zero after correcting for thermal offsets in the daytime data. The study demonstrates that the handling of flagged data clearly impacts on monthly mean estimates obtained with different methods. We showed that the spread of monthly shortwave fluxes is generally clearly higher than for downwelling longwave radiation. Overall, BSRN observations provide sufficient accuracy and completeness for reliable estimates of monthly mean values. However, the value of future data could be further increased by reducing the frequency of data gaps and the number of outliers. It is shown that two independent methods for accounting for the diurnal and seasonal variations in the missing data permit consistent monthly means to within less than 1 W m-2 in most cases. The authors suggest using a standardized method for the computation of monthly means which addresses diurnal variations in the missing data in order to avoid a mismatch of future published monthly mean radiation fluxes from BSRN. The application of robust statistics would probably lead to less biased results for data records with frequent gaps and/or flagged data and outliers. The currently applied empirical methods should, therefore, be completed by the development of robust methods.
Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity.

PubMed

Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki

2018-04-01

Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
Replacing missing values using trustworthy data values from web data sources

NASA Astrophysics Data System (ADS)

Izham Jaya, M.; Sidi, Fatimah; Mat Yusof, Sharmila; Suriani Affendey, Lilly; Ishak, Iskandar; Jabar, Marzanah A.

2017-09-01

In practice, collected data usually are incomplete and contains missing value. Existing approaches in managing missing values overlook the importance of trustworthy data values in replacing missing values. In view that trusted completed data is very important in data analysis, we proposed a framework of missing value replacement using trustworthy data values from web data sources. The proposed framework adopted ontology to map data values from web data sources to the incomplete dataset. As data from web is conflicting with each other, we proposed a trust score measurement based on data accuracy and data reliability. Trust score is then used to select trustworthy data values from web data sources for missing values replacement. We successfully implemented the proposed framework using financial dataset and presented the findings in this paper. From our experiment, we manage to show that replacing missing values with trustworthy data values is important especially in a case of conflicting data to solve missing values problem.
Upgraded automotive gas turbine engine design and development program, volume 2

NASA Technical Reports Server (NTRS)

Wagner, C. E. (Editor); Pampreen, R. C. (Editor)

1979-01-01

Results are presented for the design and development of an upgraded engine. The design incorporated technology advancements which resulted from development testing on the Baseline Engine. The final engine performance with all retro-fitted components from the development program showed a value of 91 HP at design speed in contrast to the design value of 104 HP. The design speed SFC was 0.53 versus the goal value of 0.44. The miss in power was primarily due to missing the efficiency targets of small size turbomachinery. Most of the SFC deficit was attributed to missed goals in the heat recovery system relative to regenerator effectiveness and expected values of heat loss. Vehicular fuel consumption, as measured on a chassis dynamometer, for a vehicle inertia weight of 3500 lbs., was 15 MPG for combined urban and highway driving cycles. The baseline engine achieved 8 MPG with a 4500 lb. vehicle. Even though the goal of 18.3 MPG was not achieved with the upgraded engine, there was an improvement in fuel economy of 46% over the baseline engine, for comparable vehicle inertia weight.
Missing value imputation: with application to handwriting data

NASA Astrophysics Data System (ADS)

Xu, Zhen; Srihari, Sargur N.

2015-01-01

Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.
Shrinkage regression-based methods for microarray missing value imputation.

PubMed

Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng

2013-01-01

Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.
Large Scale Crop Classification in Ukraine using Multi-temporal Landsat-8 Images with Missing Data

NASA Astrophysics Data System (ADS)

Kussul, N.; Skakun, S.; Shelestov, A.; Lavreniuk, M. S.

2014-12-01

At present, there are no globally available Earth observation (EO) derived products on crop maps. This issue is being addressed within the Sentinel-2 for Agriculture initiative where a number of test sites (including from JECAM) participate to provide coherent protocols and best practices for various global agriculture systems, and subsequently crop maps from Sentinel-2. One of the problems in dealing with optical images for large territories (more than 10,000 sq. km) is the presence of clouds and shadows that result in having missing values in data sets. In this abstract, a new approach to classification of multi-temporal optical satellite imagery with missing data due to clouds and shadows is proposed. First, self-organizing Kohonen maps (SOMs) are used to restore missing pixel values in a time series of satellite imagery. SOMs are trained for each spectral band separately using non-missing values. Missing values are restored through a special procedure that substitutes input sample's missing components with neuron's weight coefficients. After missing data restoration, a supervised classification is performed for multi-temporal satellite images. For this, an ensemble of neural networks, in particular multilayer perceptrons (MLPs), is proposed. Ensembling of neural networks is done by the technique of average committee, i.e. to calculate the average class probability over classifiers and select the class with the highest average posterior probability for the given input sample. The proposed approach is applied for large scale crop classification using multi temporal Landsat-8 images for the JECAM test site in Ukraine [1-2]. It is shown that ensemble of MLPs provides better performance than a single neural network in terms of overall classification accuracy and kappa coefficient. The obtained classification map is also validated through estimated crop and forest areas and comparison to official statistics. 1. A.Yu. Shelestov et al., "Geospatial information system for agricultural monitoring," Cybernetics Syst. Anal., vol. 49, no. 1, pp. 124-132, 2013. 2. J. Gallego et al., "Efficiency Assessment of Different Approaches to Crop Classification Based on Satellite and Ground Observations," J. Autom. Inform. Scie., vol. 44, no. 5, pp. 67-80, 2012.
Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

PubMed

Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

2016-01-01

We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

PubMed

Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

2005-05-15

Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. The CMVE software is available upon request from the authors.
Effects of correcting missing daily feed intake values on the genetic parameters and estimated breeding values for feeding traits in pigs.

PubMed

Ito, Tetsuya; Fukawa, Kazuo; Kamikawa, Mai; Nikaidou, Satoshi; Taniguchi, Masaaki; Arakawa, Aisaku; Tanaka, Genki; Mikawa, Satoshi; Furukawa, Tsutomu; Hirose, Kensuke

2018-01-01

Daily feed intake (DFI) is an important consideration for improving feed efficiency, but measurements using electronic feeder systems contain many missing and incorrect values. Therefore, we evaluated three methods for correcting missing DFI data (quadratic, orthogonal polynomial, and locally weighted (Loess) regression equations) and assessed the effects of these missing values on the genetic parameters and the estimated breeding values (EBV) for feeding traits. DFI records were obtained from 1622 Duroc pigs, comprising 902 individuals without missing DFI and 720 individuals with missing DFI. The Loess equation was the most suitable method for correcting the missing DFI values in 5-50% randomly deleted datasets among the three equations. Both variance components and heritability for the average DFI (ADFI) did not change because of the missing DFI proportion and Loess correction. In terms of rank correlation and information criteria, Loess correction improved the accuracy of EBV for ADFI compared to randomly deleted cases. These findings indicate that the Loess equation is useful for correcting missing DFI values for individual pigs and that the correction of missing DFI values could be effective for the estimation of breeding values and genetic improvement using EBV for feeding traits. © 2017 The Authors. Animal Science Journal published by John Wiley & Sons Australia, Ltd on behalf of Japanese Society of Animal Science.
Planning a Study for Testing the Rasch Model given Missing Values due to the use of Test-booklets.

PubMed

Yanagida, Takuya; Kubinger, Klaus D; Rasch, Dieter

2015-01-01

Though calibration of an achievement test within a psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009, 2011) suggested an approach for the determination of sample size according to a given Type-I and Type-II risk and a certain effect of model contradiction when testing the Rasch model. The approach uses a three-way analysis of variance design with mixed classification. For the while, their simulation studies deal with complete data, meaning every examinee is administered with all of the items of an item pool. The simulation study now presented in this paper deals with the practical relevant case, in particular for large-scale assessments, that item presentation happens to use several test-booklets. As a consequence, there are missing values by design. Therefore, the question to be considered is, whether this approach works in this case as well. Besides the fact, that data are not normally distributed but there is a dichotomous variable (an examinee either solves an item or fails to solve it), only a single entry for each cell exists in the given three-way analysis of variance design, if at all, due to missing values. Hence, the obligatory test-statistic's distribution may not be retained, in contrast to the case of having no missing values. The result of our simulation study, despite applying only to a very special scenario, is that this approach works, indeed: Whether test-booklets were used or every examinee is administered all of the items changes nothing in respect to the actual Type-I risk or to the power of the test, given almost the same amount of information of examinees per item. However, as the results are limited to a special scenario, we currently recommend any interested researcher to simulate the appropriate one in advance by him/herself.
Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets.

PubMed

Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

2018-01-01

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.
Missing value imputation strategies for metabolomics data.

PubMed

Armitage, Emily Grace; Godzien, Joanna; Alonso-Herranz, Vanesa; López-Gonzálvez, Ángeles; Barbas, Coral

2015-12-01

The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.

PubMed

Wei, Runmin; Wang, Jingye; Su, Mingming; Jia, Erik; Chen, Shaoqiu; Chen, Tianlu; Ni, Yan

2018-01-12

Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student's t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics ( https://metabolomics.cc.hawaii.edu/software/MetImp/ ).
The Effects of Methods of Imputation for Missing Values on the Validity and Reliability of Scales

ERIC Educational Resources Information Center

Cokluk, Omay; Kayri, Murat

2011-01-01

The main aim of this study is the comparative examination of the factor structures, corrected item-total correlations, and Cronbach-alpha internal consistency coefficients obtained by different methods used in imputation for missing values in conditions of not having missing values, and having missing values of different rates in terms of testing…
Handling missing values in the MDS-UPDRS.

PubMed

Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T

2015-10-01

This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zainudin, Mohd Lutfi, E-mail: mdlutfi07@gmail.com; Institut Matematik Kejuruteraan; Saaban, Azizan, E-mail: azizan.s@uum.edu.my

The solar radiation values have been composed by automatic weather station using the device that namely pyranometer. The device is functions to records all the radiation values that have been dispersed, and these data are very useful for it experimental works and solar device’s development. In addition, for modeling and designing on solar radiation system application is needed for complete data observation. Unfortunately, lack for obtained the complete solar radiation data frequently occur due to several technical problems, which mainly contributed by monitoring device. Into encountering this matter, estimation missing values in an effort to substitute absent values with imputedmore » data. This paper aimed to evaluate several piecewise interpolation techniques likes linear, splines, cubic, and nearest neighbor into dealing missing values in hourly solar radiation data. Then, proposed an extendable work into investigating the potential used of cubic Bezier technique and cubic Said-ball method as estimator tools. As result, methods for cubic Bezier and Said-ball perform the best compare to another piecewise imputation technique.« less
Autoregressive-model-based missing value estimation for DNA microarray time series data.

PubMed

Choong, Miew Keen; Charbit, Maurice; Yan, Hong

2009-01-01

Missing value estimation is important in DNA microarray data analysis. A number of algorithms have been developed to solve this problem, but they have several limitations. Most existing algorithms are not able to deal with the situation where a particular time point (column) of the data is missing entirely. In this paper, we present an autoregressive-model-based missing value estimation method (ARLSimpute) that takes into account the dynamic property of microarray temporal data and the local similarity structures in the data. ARLSimpute is especially effective for the situation where a particular time point contains many missing values or where the entire time point is missing. Experiment results suggest that our proposed algorithm is an accurate missing value estimator in comparison with other imputation methods on simulated as well as real microarray time series datasets.
When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts.

PubMed

Jakobsen, Janus Christian; Gluud, Christian; Wetterslev, Jørn; Winkel, Per

2017-12-06

Missing data may seriously compromise inferences from randomised clinical trials, especially if missing data are not handled appropriately. The potential bias due to missing data depends on the mechanism causing the data to be missing, and the analytical methods applied to amend the missingness. Therefore, the analysis of trial data with missing values requires careful planning and attention. The authors had several meetings and discussions considering optimal ways of handling missing data to minimise the bias potential. We also searched PubMed (key words: missing data; randomi*; statistical analysis) and reference lists of known studies for papers (theoretical papers; empirical studies; simulation studies; etc.) on how to deal with missing data when analysing randomised clinical trials. Handling missing data is an important, yet difficult and complex task when analysing results of randomised clinical trials. We consider how to optimise the handling of missing data during the planning stage of a randomised clinical trial and recommend analytical approaches which may prevent bias caused by unavoidable missing data. We consider the strengths and limitations of using of best-worst and worst-best sensitivity analyses, multiple imputation, and full information maximum likelihood. We also present practical flowcharts on how to deal with missing data and an overview of the steps that always need to be considered during the analysis stage of a trial. We present a practical guide and flowcharts describing when and how multiple imputation should be used to handle missing data in randomised clinical.

Identifying the Relationships between Motivational Features of High and Low Performing Students and Science Literacy Achievement in PISA 2015 Turkey

ERIC Educational Resources Information Center

Kartal, Seval Kula; Kutlu, Ömer

2017-01-01

In this study, the predictive roles of intrinsic and instrumental motivations, science self-efficacy on success in the lower and upper quartiles of score distribution are analyzed in scientific domain of PISA 2015 Turkey sample. Since their index values cannot be calculated due to missing values, some students are excluded from the sample and the…
Missing data within a quantitative research study: How to assess it, treat it, and why you should care.

PubMed

Bannon, William

2015-04-01

Missing data typically refer to the absence of one or more values within a study variable(s) contained in a dataset. The development is often the result of a study participant choosing not to provide a response to a survey item. In general, a greater number of missing values within a dataset reflects a greater challenge to the data analyst. However, if researchers are armed with just a few basic tools, they can quite effectively diagnose how serious the issue of missing data is within a dataset, as well as prescribe the most appropriate solution. Specifically, the keys to effectively assessing and treating missing data values within a dataset involve specifying how missing data will be defined in a study, assessing the amount of missing data, identifying the pattern of the missing data, and selecting the best way to treat the missing data values. I will touch on each of these processes and provide a brief illustration of how the validity of study findings are at great risk if missing data values are not treated effectively. ©2015 American Association of Nurse Practitioners.
How to improve breeding value prediction for feed conversion ratio in the case of incomplete longitudinal body weights.

PubMed

Tran, V H Huynh; Gilbert, H; David, I

2017-01-01

With the development of automatic self-feeders, repeated measurements of feed intake are becoming easier in an increasing number of species. However, the corresponding BW are not always recorded, and these missing values complicate the longitudinal analysis of the feed conversion ratio (FCR). Our aim was to evaluate the impact of missing BW data on estimations of the genetic parameters of FCR and ways to improve the estimations. On the basis of the missing BW profile in French Large White pigs (male pigs weighed weekly, females and castrated males weighed monthly), we compared 2 different ways of predicting missing BW, 1 using a Gompertz model and 1 using a linear interpolation. For the first part of the study, we used 17,398 weekly records of BW and feed intake recorded over 16 consecutive weeks in 1,222 growing male pigs. We performed a simulation study on this data set to mimic missing BW values according to the pattern of weekly proportions of incomplete BW data in females and castrated males. The FCR was then computed for each week using observed data (obser_FCR), data with missing BW (miss_FCR), data with BW predicted using a Gompertz model (Gomp_FCR), and data with BW predicted by linear interpolation (interp_FCR). Heritability (h) was estimated, and the EBV was predicted for each repeated FCR using a random regression model. In the second part of the study, the full data set (males with their complete BW records, castrated males and females with missing BW) was analyzed using the same methods (miss_FCR, Gomp_FCR, and interp_FCR). Results of the simulation study showed that h were overestimated in the case of missing BW and that predicting BW using a linear interpolation provided a more accurate estimation of h and of EBV than a Gompertz model. Over 100 simulations, the correlation between obser_EBV and interp_EBV, Gomp_EBV, and miss_EBV was 0.93 ± 0.02, 0.91 ± 0.01, and 0.79 ± 0.04, respectively. The heritabilities obtained with the full data set were quite similar for miss_FCR, Gomp_FCR, and interp_FCR. In conclusion, when the proportion of missing BW is high, genetic parameters of FCR are not well estimated. In French Large White pigs, in the growing period extending from d 65 to 168, prediction of missing BW using a Gompertz growth model slightly improved the estimations, but the linear interpolation improved the estimation to a greater extent. This result is due to the linear rather than sigmoidal increase in BW over the study period.
Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

PubMed

Voillet, Valentin; Besse, Philippe; Liaubet, Laurence; San Cristobal, Magali; González, Ignacio

2016-10-03

In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated.
Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

PubMed

Hopke, P K; Liu, C; Rubin, D B

2001-03-01

Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.
Allowing for uncertainty due to missing continuous outcome data in pairwise and network meta-analysis.

PubMed

Mavridis, Dimitris; White, Ian R; Higgins, Julian P T; Cipriani, Andrea; Salanti, Georgia

2015-02-28

Missing outcome data are commonly encountered in randomized controlled trials and hence may need to be addressed in a meta-analysis of multiple trials. A common and simple approach to deal with missing data is to restrict analysis to individuals for whom the outcome was obtained (complete case analysis). However, estimated treatment effects from complete case analyses are potentially biased if informative missing data are ignored. We develop methods for estimating meta-analytic summary treatment effects for continuous outcomes in the presence of missing data for some of the individuals within the trials. We build on a method previously developed for binary outcomes, which quantifies the degree of departure from a missing at random assumption via the informative missingness odds ratio. Our new model quantifies the degree of departure from missing at random using either an informative missingness difference of means or an informative missingness ratio of means, both of which relate the mean value of the missing outcome data to that of the observed data. We propose estimating the treatment effects, adjusted for informative missingness, and their standard errors by a Taylor series approximation and by a Monte Carlo method. We apply the methodology to examples of both pairwise and network meta-analysis with multi-arm trials. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Estimating missing daily temperature extremes in Jaffna, Sri Lanka

NASA Astrophysics Data System (ADS)

Thevakaran, A.; Sonnadara, D. U. J.

2018-04-01

The accuracy of reconstructing missing daily temperature extremes in the Jaffna climatological station, situated in the northern part of the dry zone of Sri Lanka, is presented. The adopted method utilizes standard departures of daily maximum and minimum temperature values at four neighbouring stations, Mannar, Anuradhapura, Puttalam and Trincomalee to estimate the standard departures of daily maximum and minimum temperatures at the target station, Jaffna. The daily maximum and minimum temperatures from 1966 to 1980 (15 years) were used to test the validity of the method. The accuracy of the estimation is higher for daily maximum temperature compared to daily minimum temperature. About 95% of the estimated daily maximum temperatures are within ±1.5 °C of the observed values. For daily minimum temperature, the percentage is about 92. By calculating the standard deviation of the difference in estimated and observed values, we have shown that the error in estimating the daily maximum and minimum temperatures is ±0.7 and ±0.9 °C, respectively. To obtain the best accuracy when estimating the missing daily temperature extremes, it is important to include Mannar which is the nearest station to the target station, Jaffna. We conclude from the analysis that the method can be applied successfully to reconstruct the missing daily temperature extremes in Jaffna where no data is available due to frequent disruptions caused by civil unrests and hostilities in the region during the period, 1984 to 2000.
Clustering with Missing Values: No Imputation Required

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri

2004-01-01

Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
Integrative missing value estimation for microarray data.

PubMed

Hu, Jianjun; Li, Haifeng; Waterman, Michael S; Zhou, Xianghong Jasmine

2006-10-12

Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests. We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.
Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data

NASA Astrophysics Data System (ADS)

Yin, Shen; Wang, Guang; Yang, Xu

2014-07-01

In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.
40 CFR 98.35 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...
40 CFR 98.35 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...
40 CFR 98.35 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...
40 CFR 98.35 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...
40 CFR 98.35 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. Whenever a quality-assured value of a required parameter is... substitute data value for the missing parameter shall be used in the calculations. (a) For all units subject...
Missing data exploration: highlighting graphical presentation of missing pattern.

PubMed

Zhang, Zhongheng

2015-12-01

Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations.
Counting missing values in a metabolite-intensity data set for measuring the analytical performance of a metabolomics platform.

PubMed

Huan, Tao; Li, Liang

2015-01-20

Metabolomics requires quantitative comparison of individual metabolites present in an entire sample set. Unfortunately, missing intensity values in one or more samples are very common. Because missing values can have a profound influence on metabolomic results, the extent of missing values found in a metabolomic data set should be treated as an important parameter for measuring the analytical performance of a technique. In this work, we report a study on the scope of missing values and a robust method of filling the missing values in a chemical isotope labeling (CIL) LC-MS metabolomics platform. Unlike conventional LC-MS, CIL LC-MS quantifies the concentration differences of individual metabolites in two comparative samples based on the mass spectral peak intensity ratio of a peak pair from a mixture of differentially labeled samples. We show that this peak-pair feature can be explored as a unique means of extracting metabolite intensity information from raw mass spectra. In our approach, a peak-pair peaking algorithm, IsoMS, is initially used to process the LC-MS data set to generate a CSV file or table that contains metabolite ID and peak ratio information (i.e., metabolite-intensity table). A zero-fill program, freely available from MyCompoundID.org , is developed to automatically find a missing value in the CSV file and go back to the raw LC-MS data to find the peak pair and, then, calculate the intensity ratio and enter the ratio value into the table. Most of the missing values are found to be low abundance peak pairs. We demonstrate the performance of this method in analyzing an experimental and technical replicate data set of human urine metabolome. Furthermore, we propose a standardized approach of counting missing values in a replicate data set as a way of gauging the extent of missing values in a metabolomics platform. Finally, we illustrate that applying the zero-fill program, in conjunction with dansylation CIL LC-MS, can lead to a marked improvement in finding significant metabolites that differentiate bladder cancer patients and their controls in a metabolomics study of 109 subjects.
A Review On Missing Value Estimation Using Imputation Algorithm

NASA Astrophysics Data System (ADS)

Armina, Roslan; Zain, Azlan Mohd; Azizah Ali, Nor; Sallehuddin, Roselina

2017-09-01

The presence of the missing value in the data set has always been a major problem for precise prediction. The method for imputing missing value needs to minimize the effect of incomplete data sets for the prediction model. Many algorithms have been proposed for countermeasure of missing value problem. In this review, we provide a comprehensive analysis of existing imputation algorithm, focusing on the technique used and the implementation of global or local information of data sets for missing value estimation. In addition validation method for imputation result and way to measure the performance of imputation algorithm also described. The objective of this review is to highlight possible improvement on existing method and it is hoped that this review gives reader better understanding of imputation method trend.
Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation.

PubMed

Matafora, Vittoria; Corno, Andrea; Ciliberto, Andrea; Bachi, Angela

2017-04-07

In global proteomic analysis, it is estimated that proteins span from millions to less than 100 copies per cell. The challenge of protein quantitation by classic shotgun proteomic techniques relies on the presence of missing values in peptides belonging to low-abundance proteins that lowers intraruns reproducibility affecting postdata statistical analysis. Here, we present a new analytical workflow MvM (missing value monitoring) able to recover quantitation of missing values generated by shotgun analysis. In particular, we used confident data-dependent acquisition (DDA) quantitation only for proteins measured in all the runs, while we filled the missing values with data-independent acquisition analysis using the library previously generated in DDA. We analyzed cell cycle regulated proteins, as they are low abundance proteins with highly dynamic expression levels. Indeed, we found that cell cycle related proteins are the major components of the missing values-rich proteome. Using the MvM workflow, we doubled the number of robustly quantified cell cycle related proteins, and we reduced the number of missing values achieving robust quantitation for proteins over ∼50 molecules per cell. MvM allows lower quantification variance among replicates for low abundance proteins with respect to DDA analysis, which demonstrates the potential of this novel workflow to measure low abundance, dynamically regulated proteins.
Performance study of the application of Artificial Neural Networks to the completion and prediction of data retrieved by underwater sensors.

PubMed

Baladrón, Carlos; Aguiar, Javier M; Calavia, Lorena; Carro, Belén; Sánchez-Esguevillas, Antonio; Hernández, Luis

2012-01-01

This paper presents a proposal for an Artificial Neural Network (ANN)-based architecture for completion and prediction of data retrieved by underwater sensors. Due to the specific conditions under which these sensors operate, it is not uncommon for them to fail, and maintenance operations are difficult and costly. Therefore, completion and prediction of the missing data can greatly improve the quality of the underwater datasets. A performance study using real data is presented to validate the approach, concluding that the proposed architecture is able to provide very low errors. The numbers show as well that the solution is especially suitable for cases where large portions of data are missing, while in situations where the missing values are isolated the improvement over other simple interpolation methods is limited.

40 CFR 98.285 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...
40 CFR 98.285 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...
40 CFR 98.285 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...
40 CFR 98.285 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...
40 CFR 98.285 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.283(b), a complete record of all...) For each missing value of the monthly carbon content of petroleum coke, the substitute data value...
Missing data exploration: highlighting graphical presentation of missing pattern

PubMed Central

2015-01-01

Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations. PMID:26807411
Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.

PubMed

Ritz, Cecilia; Edén, Patrik

2008-01-19

For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.
Order-restricted inference for means with missing values.

PubMed

Wang, Heng; Zhong, Ping-Shou

2017-09-01

Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

PubMed Central

Jia, Erik; Chen, Tianlu

2018-01-01

Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
New Insights into Handling Missing Values in Environmental Epidemiological Studies

PubMed Central

Roda, Célina; Nicolis, Ioannis; Momas, Isabelle; Guihenneuc, Chantal

2014-01-01

Missing data are unavoidable in environmental epidemiologic surveys. The aim of this study was to compare methods for handling large amounts of missing values: omission of missing values, single and multiple imputations (through linear regression or partial least squares regression), and a fully Bayesian approach. These methods were applied to the PARIS birth cohort, where indoor domestic pollutant measurements were performed in a random sample of babies' dwellings. A simulation study was conducted to assess performances of different approaches with a high proportion of missing values (from 50% to 95%). Different simulation scenarios were carried out, controlling the true value of the association (odds ratio of 1.0, 1.2, and 1.4), and varying the health outcome prevalence. When a large amount of data is missing, omitting these missing data reduced statistical power and inflated standard errors, which affected the significance of the association. Single imputation underestimated the variability, and considerably increased risk of type I error. All approaches were conservative, except the Bayesian joint model. In the case of a common health outcome, the fully Bayesian approach is the most efficient approach (low root mean square error, reasonable type I error, and high statistical power). Nevertheless for a less prevalent event, the type I error is increased and the statistical power is reduced. The estimated posterior distribution of the OR is useful to refine the conclusion. Among the methods handling missing values, no approach is absolutely the best but when usual approaches (e.g. single imputation) are not sufficient, joint modelling approach of missing process and health association is more efficient when large amounts of data are missing. PMID:25226278
Cox regression analysis with missing covariates via nonparametric multiple imputation.

PubMed

Hsu, Chiu-Hsieh; Yu, Mandi

2018-01-01

We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Suspended Education in Massachusetts: Using Days of Lost Instruction Due to Suspension to Evaluate Our Schools

ERIC Educational Resources Information Center

Losen, Daniel J.; Sun, Wei-Ling; Keith, Michael A., II

2017-01-01

Missed instruction can have a devastating impact on educational outcomes. Some reasons for missed instruction are beyond the control of schools and districts: some students miss school due to mental or physical illness or injury, and transportation problems sometimes are to blame. One major reason for missed instruction that schools can directly…
Instrumental Variable Methods for Continuous Outcomes That Accommodate Nonignorable Missing Baseline Values.

PubMed

Ertefaie, Ashkan; Flory, James H; Hennessy, Sean; Small, Dylan S

2017-06-15

Instrumental variable (IV) methods provide unbiased treatment effect estimation in the presence of unmeasured confounders under certain assumptions. To provide valid estimates of treatment effect, treatment effect confounders that are associated with the IV (IV-confounders) must be included in the analysis, and not including observations with missing values may lead to bias. Missing covariate data are particularly problematic when the probability that a value is missing is related to the value itself, which is known as nonignorable missingness. In such cases, imputation-based methods are biased. Using health-care provider preference as an IV method, we propose a 2-step procedure with which to estimate a valid treatment effect in the presence of baseline variables with nonignorable missing values. First, the provider preference IV value is estimated by performing a complete-case analysis using a random-effects model that includes IV-confounders. Second, the treatment effect is estimated using a 2-stage least squares IV approach that excludes IV-confounders with missing values. Simulation results are presented, and the method is applied to an analysis comparing the effects of sulfonylureas versus metformin on body mass index, where the variables baseline body mass index and glycosylated hemoglobin have missing values. Our result supports the association of sulfonylureas with weight gain. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis.

PubMed

Wallace, Meredith L; Anderson, Stewart J; Mazumdar, Sati

2010-12-20

Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.
Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

PubMed

Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

2018-03-01

Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.
Predicting missing values in a home care database using an adaptive uncertainty rule method.

PubMed

Konias, S; Gogou, G; Bamidis, P D; Vlahavas, I; Maglaveras, N

2005-01-01

Contemporary literature illustrates an abundance of adaptive algorithms for mining association rules. However, most literature is unable to deal with the peculiarities, such as missing values and dynamic data creation, that are frequently encountered in fields like medicine. This paper proposes an uncertainty rule method that uses an adaptive threshold for filling missing values in newly added records. A new approach for mining uncertainty rules and filling missing values is proposed, which is in turn particularly suitable for dynamic databases, like the ones used in home care systems. In this study, a new data mining method named FiMV (Filling Missing Values) is illustrated based on the mined uncertainty rules. Uncertainty rules have quite a similar structure to association rules and are extracted by an algorithm proposed in previous work, namely AURG (Adaptive Uncertainty Rule Generation). The main target was to implement an appropriate method for recovering missing values in a dynamic database, where new records are continuously added, without needing to specify any kind of thresholds beforehand. The method was applied to a home care monitoring system database. Randomly, multiple missing values for each record's attributes (rate 5-20% by 5% increments) were introduced in the initial dataset. FiMV demonstrated 100% completion rates with over 90% success in each case, while usual approaches, where all records with missing values are ignored or thresholds are required, experienced significantly reduced completion and success rates. It is concluded that the proposed method is appropriate for the data-cleaning step of the Knowledge Discovery process in databases. The latter, containing much significance for the output efficiency of any data mining technique, can improve the quality of the mined information.
Recurrent Neural Networks for Multivariate Time Series with Missing Values.

PubMed

Che, Zhengping; Purushotham, Sanjay; Cho, Kyunghyun; Sontag, David; Liu, Yan

2018-04-17

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.
Gaussian mixture clustering and imputation of microarray data.

PubMed

Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

2004-04-12

In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

PubMed

Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

2014-01-01

Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.
Improving record linkage performance in the presence of missing linkage data.

PubMed

Ong, Toan C; Mannino, Michael V; Schilling, Lisa M; Kahn, Michael G

2014-12-01

Existing record linkage methods do not handle missing linking field values in an efficient and effective manner. The objective of this study is to investigate three novel methods for improving the accuracy and efficiency of record linkage when record linkage fields have missing values. By extending the Fellegi-Sunter scoring implementations available in the open-source Fine-grained Record Linkage (FRIL) software system we developed three novel methods to solve the missing data problem in record linkage, which we refer to as: Weight Redistribution, Distance Imputation, and Linkage Expansion. Weight Redistribution removes fields with missing data from the set of quasi-identifiers and redistributes the weight from the missing attribute based on relative proportions across the remaining available linkage fields. Distance Imputation imputes the distance between the missing data fields rather than imputing the missing data value. Linkage Expansion adds previously considered non-linkage fields to the linkage field set to compensate for the missing information in a linkage field. We tested the linkage methods using simulated data sets with varying field value corruption rates. The methods developed had sensitivity ranging from .895 to .992 and positive predictive values (PPV) ranging from .865 to 1 in data sets with low corruption rates. Increased corruption rates lead to decreased sensitivity for all methods. These new record linkage algorithms show promise in terms of accuracy and efficiency and may be valuable for combining large data sets at the patient level to support biomedical and clinical research. Copyright © 2014 Elsevier Inc. All rights reserved.

Wane detection on rough lumber using surface approximation

Treesearch

Sang-Mook Lee; A. Lynn Abbott; Daniel L. Schmoldt

2000-01-01

The initial breakdown of hardwood logs into lumber produces boards with rough surfaces. These boards contain wane (missing wood due to the curved log exterior) that is removed by edge and trim cuts prior to sale. Because hardwood lumber value is determined using a combination of board size and quality, knowledge of wane position and defects is essential for selecting...
A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

PubMed

De Silva, Anurika Priyanjali; Moreno-Betancur, Margarita; De Livera, Alysha Madhu; Lee, Katherine Jane; Simpson, Julie Anne

2017-07-25

Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies. We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. We recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window.
Missing value imputation in DNA microarrays based on conjugate gradient method.

PubMed

Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

2012-02-01

Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
Meaning of Missing Values in Eyewitness Recall and Accident Records

PubMed Central

Uttl, Bob; Kisinger, Kelly

2010-01-01

Background Eyewitness recalls and accident records frequently do not mention the conditions and behaviors of interest to researchers and lead to missing values and to uncertainty about the prevalence of these conditions and behaviors surrounding accidents. Missing values may occur because eyewitnesses report the presence but not the absence of obvious clues/accident features. We examined this possibility. Methodology/Principal Findings Participants watched car accident videos and were asked to recall as much information as they could remember about each accident. The results showed that eyewitnesses were far more likely to report the presence of present obvious clues than the absence of absent obvious clues even though they were aware of their absence. Conclusions One of the principal mechanisms causing missing values may be eyewitnesses' tendency to not report the absence of obvious features. We discuss the implications of our findings for both retrospective and prospective analyses of accident records, and illustrate the consequences of adopting inappropriate assumptions about the meaning of missing values using the Avaluator Avalanche Accident Prevention Card. PMID:20824054
Meaning of missing values in eyewitness recall and accident records.

PubMed

Uttl, Bob; Kisinger, Kelly

2010-09-02

Eyewitness recalls and accident records frequently do not mention the conditions and behaviors of interest to researchers and lead to missing values and to uncertainty about the prevalence of these conditions and behaviors surrounding accidents. Missing values may occur because eyewitnesses report the presence but not the absence of obvious clues/accident features. We examined this possibility. Participants watched car accident videos and were asked to recall as much information as they could remember about each accident. The results showed that eyewitnesses were far more likely to report the presence of present obvious clues than the absence of absent obvious clues even though they were aware of their absence. One of the principal mechanisms causing missing values may be eyewitnesses' tendency to not report the absence of obvious features. We discuss the implications of our findings for both retrospective and prospective analyses of accident records, and illustrate the consequences of adopting inappropriate assumptions about the meaning of missing values using the Avaluator Avalanche Accident Prevention Card.
Selection-Fusion Approach for Classification of Datasets with Missing Values

PubMed Central

Ghannad-Rezaie, Mostafa; Soltanian-Zadeh, Hamid; Ying, Hao; Dong, Ming

2010-01-01

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. PMID:20212921
MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.

PubMed

Wu, Wei-Sheng; Jhou, Meng-Jhun

2017-01-13

Missing value imputation is important for microarray data analyses because microarray data with missing values would significantly degrade the performance of the downstream analyses. Although many microarray missing value imputation algorithms have been developed, an objective and comprehensive performance comparison framework is still lacking. To solve this problem, we previously proposed a framework which can perform a comprehensive performance comparison of different existing algorithms. Also the performance of a new algorithm can be evaluated by our performance comparison framework. However, constructing our framework is not an easy task for the interested researchers. To save researchers' time and efforts, here we present an easy-to-use web tool named MVIAeval (Missing Value Imputation Algorithm evaluator) which implements our performance comparison framework. MVIAeval provides a user-friendly interface allowing users to upload the R code of their new algorithm and select (i) the test datasets among 20 benchmark microarray (time series and non-time series) datasets, (ii) the compared algorithms among 12 existing algorithms, (iii) the performance indices from three existing ones, (iv) the comprehensive performance scores from two possible choices, and (v) the number of simulation runs. The comprehensive performance comparison results are then generated and shown as both figures and tables. MVIAeval is a useful tool for researchers to easily conduct a comprehensive and objective performance evaluation of their newly developed missing value imputation algorithm for microarray data or any data which can be represented as a matrix form (e.g. NGS data or proteomics data). Thus, MVIAeval will greatly expedite the progress in the research of missing value imputation algorithms.
Two-pass imputation algorithm for missing value estimation in gene expression time series.

PubMed

Tsiporkova, Elena; Boeva, Veselka

2007-10-01

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.
Lumbopelvic control and days missed due to injury in professional baseball pitchers

PubMed Central

Chaudhari, Ajit M.W.; McKenzie, Christopher S.; Pan, Xueliang; Oñate, James A.

2014-01-01

Background Recently lumbopelvic control has been linked to pitching performance, kinematics and loading; however, poor lumbopelvic control has not been prospectively investigated as a risk factor for injury in baseball pitchers. Hypothesis Pitchers with poor lumbopelvic control during spring training are more likely to miss 30 or more days due to injury through an entire baseball season than pitchers with good lumbopelvic control. Study design Cohort study. Methods Three hundred forty-seven professional baseball pitchers were enrolled into the study during the last 2 weeks of spring training and stayed with the same team for the entire season. Lumbopelvic control was quantified by peak anterior-posterior deviation of the pelvis relative to starting position during a single leg raise test (APScore). Days missed due to injury through the entire season were recorded by each team's medical staff. Results Higher APScore was significantly associated with a higher likelihood of missing 30 days or more (Chi-Square, p=0.023). When divided into tertiles based on their APScore, participants in the highest tertile were 3.0 times and 2.2 times more likely to miss at least 30 days throughout the course of a baseball season relative to those in the lowest or middle tertiles, respectively. Higher APScore was also significantly associated with missing more days due to injury within participants who missed at least one day to injury (ANOVA, p=0.018), with the highest tertile missing significantly more days (mean=98.6 d) than the middle tertile (mean=45.8d, p=0.017) or the lowest tertile (mean=43.8, p=0.017). Conclusion This study found that poor lumbopelvic control in professional pitchers was associated with increased risk of missing significant time due to injury. PMID:25159541
Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

PubMed

Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

2015-12-01

Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.
77 FR 60089 - Approval and Promulgation of Air Quality Implementation Plans; Delaware, New Jersey, and...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-10-02

... quarter substitution test. ``Collocated'' indicates that the collocated data was substituted for missing... 24-hour standard design value is greater than the level of the standard. EPA addresses missing data... substituted for the missing data. In the maximum quarter test, maximum recorded values are substituted for the...
Results of Database Studies in Spine Surgery Can Be Influenced by Missing Data.

PubMed

Basques, Bryce A; McLynn, Ryan P; Fice, Michael P; Samuel, Andre M; Lukasiewicz, Adam M; Bohl, Daniel D; Ahn, Junyoung; Singh, Kern; Grauer, Jonathan N

2017-12-01

National databases are increasingly being used for research in spine surgery; however, one limitation of such databases that has received sparse mention is the frequency of missing data. Studies using these databases often do not emphasize the percentage of missing data for each variable used and do not specify how patients with missing data are incorporated into analyses. This study uses the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database to examine whether different treatments of missing data can influence the results of spine studies. (1) What is the frequency of missing data fields for demographics, medical comorbidities, preoperative laboratory values, operating room times, and length of stay recorded in ACS-NSQIP? (2) Using three common approaches to handling missing data, how frequently do those approaches agree in terms of finding particular variables to be associated with adverse events? (3) Do different approaches to handling missing data influence the outcomes and effect sizes of an analysis testing for an association with these variables with occurrence of adverse events? Patients who underwent spine surgery between 2005 and 2013 were identified from the ACS-NSQIP database. A total of 88,471 patients undergoing spine surgery were identified. The most common procedures were anterior cervical discectomy and fusion, lumbar decompression, and lumbar fusion. Demographics, comorbidities, and perioperative laboratory values were tabulated for each patient, and the percent of missing data was noted for each variable. These variables were tested for an association with "any adverse event" using three separate multivariate regressions that used the most common treatments for missing data. In the first regression, patients with any missing data were excluded. In the second regression, missing data were treated as a negative or "reference" value; for continuous variables, the mean of each variable's reference range was computed and imputed. In the third regression, any variables with > 10% rate of missing data were removed from the regression; among variables with ≤ 10% missing data, individual cases with missing values were excluded. The results of these regressions were compared to determine how the different treatments of missing data could affect the results of spine studies using the ACS-NSQIP database. Of the 88,471 patients, as many as 4441 (5%) had missing elements among demographic data, 69,184 (72%) among comorbidities, 70,892 (80%) among preoperative laboratory values, and 56,551 (64%) among operating room times. Considering the three different treatments of missing data, we found different risk factors for adverse events. Of 44 risk factors found to be associated with adverse events in any analysis, only 15 (34%) of these risk factors were common among the three regressions. The second treatment of missing data (assuming "normal" value) found the most risk factors (40) to be associated with any adverse event, whereas the first treatment (deleting patients with missing data) found the fewest associations at 20. Among the risk factors associated with any adverse event, the 10 with the greatest effect size (odds ratio) by each regression were ranked. Of the 15 variables in the top 10 for any regression, six of these were common among all three lists. Differing treatments of missing data can influence the results of spine studies using the ACS-NSQIP. The current study highlights the importance of considering how such missing data are handled. Until there are better guidelines on the best approaches to handle missing data, investigators should report how missing data were handled to increase the quality and transparency of orthopaedic database research. Readers of large database studies should note whether handling of missing data was addressed and consider potential bias with high rates or unspecified or weak methods for handling missing data.
77 FR 24436 - Approval and Promulgation of Air Quality Implementation Plans; Wisconsin; Milwaukee-Racine...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-24

.... How did EPA address missing data? V. Proposed Action VI. What is the effect of this action? VII.... ** Indicates incomplete data due to monitor shut down. IV. How did EPA address missing data? Appendix N of 40... in Milwaukee, where there are missing or incomplete data due to monitor shutdown or other factors...
40 CFR 98.265 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...
40 CFR 98.265 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...
40 CFR 98.265 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) For each missing value of the inorganic carbon content of phosphate rock or... immediately preceding and immediately following the missing data incident. You must document and keep records...
Establishing a threshold for the number of missing days using 7 d pedometer data.

PubMed

Kang, Minsoo; Hart, Peter D; Kim, Youngdeok

2012-11-01

The purpose of this study was to examine the threshold of the number of missing days of recovery using the individual information (II)-centered approach. Data for this study came from 86 participants, aged from 17 to 79 years old, who had 7 consecutive days of complete pedometer (Yamax SW 200) wear. Missing datasets (1 d through 5 d missing) were created by a SAS random process 10,000 times each. All missing values were replaced using the II-centered approach. A 7 d average was calculated for each dataset, including the complete dataset. Repeated measure ANOVA was used to determine the differences between 1 d through 5 d missing datasets and the complete dataset. Mean absolute percentage error (MAPE) was also computed. Mean (SD) daily step count for the complete 7 d dataset was 7979 (3084). Mean (SD) values for the 1 d through 5 d missing datasets were 8072 (3218), 8066 (3109), 7968 (3273), 7741 (3050) and 8314 (3529), respectively (p > 0.05). The lower MAPEs were estimated for 1 d missing (5.2%, 95% confidence interval (CI) 4.4-6.0) and 2 d missing (8.4%, 95% CI 7.0-9.8), while all others were greater than 10%. The results of this study show that the 1 d through 5 d missing datasets, with replaced values, were not significantly different from the complete dataset. Based on the MAPE results, it is not recommended to replace more than two days of missing step counts.
40 CFR 98.315 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... measured parameters used in the GHG emissions calculations is required (e.g., carbon content values, etc... such estimates. (a) For each missing value of the monthly carbon content of calcined petroleum coke the substitute data value shall be the arithmetic average of the quality-assured values of carbon contents for...
A pattern-mixture model approach for handling missing continuous outcome data in longitudinal cluster randomized trials.

PubMed

Fiero, Mallorie H; Hsu, Chiu-Hsieh; Bell, Melanie L

2017-11-20

We extend the pattern-mixture approach to handle missing continuous outcome data in longitudinal cluster randomized trials, which randomize groups of individuals to treatment arms, rather than the individuals themselves. Individuals who drop out at the same time point are grouped into the same dropout pattern. We approach extrapolation of the pattern-mixture model by applying multilevel multiple imputation, which imputes missing values while appropriately accounting for the hierarchical data structure found in cluster randomized trials. To assess parameters of interest under various missing data assumptions, imputed values are multiplied by a sensitivity parameter, k, which increases or decreases imputed values. Using simulated data, we show that estimates of parameters of interest can vary widely under differing missing data assumptions. We conduct a sensitivity analysis using real data from a cluster randomized trial by increasing k until the treatment effect inference changes. By performing a sensitivity analysis for missing data, researchers can assess whether certain missing data assumptions are reasonable for their cluster randomized trial. Copyright © 2017 John Wiley & Sons, Ltd.
78 FR 49403 - Approval and Promulgation of Air Quality Implementation Plans; Pennsylvania; Determination of...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-08-14

... requirement for one or more quarters during 2010-2012 monitoring period. EPA has addressed missing data from... recorded values are substituted for the missing data, and the resulting 24-hour design value is compared to... missing data from the Greensburg monitor by performing a statistical analysis of the data, in which a...

Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

ERIC Educational Resources Information Center

Finch, W. Holmes

2016-01-01

Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…
Exact Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data.

PubMed

Lin, Yan; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett; Lipshultz, Steven

2017-01-01

Altham (Altham PME. Exact Bayesian analysis of a 2 × 2 contingency table, and Fisher's "exact" significance test. J R Stat Soc B 1969; 31: 261-269) showed that a one-sided p-value from Fisher's exact test of independence in a 2 × 2 contingency table is equal to the posterior probability of negative association in the 2 × 2 contingency table under a Bayesian analysis using an improper prior. We derive an extension of Fisher's exact test p-value in the presence of missing data, assuming the missing data mechanism is ignorable (i.e., missing at random or completely at random). Further, we propose Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data using alternative priors; we also present results from a simulation study exploring the Type I error rate and power of the proposed exact test p-values. An example, using data on the association between blood pressure and a cardiac enzyme, is presented to illustrate the methods.
Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

PubMed

Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

2003-11-22

Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).
Finding missing heritability in less significant Loci and allelic heterogeneity: genetic variation in human height.

PubMed

Zhang, Ge; Karns, Rebekah; Sun, Guangyun; Indugula, Subba Rao; Cheng, Hong; Havas-Augustin, Dubravka; Novokmet, Natalija; Durakovic, Zijad; Missoni, Sasa; Chakraborty, Ranajit; Rudan, Pavao; Deka, Ranjan

2012-01-01

Genome-wide association studies (GWAS) have identified many common variants associated with complex traits in human populations. Thus far, most reported variants have relatively small effects and explain only a small proportion of phenotypic variance, leading to the issues of 'missing' heritability and its explanation. Using height as an example, we examined two possible sources of missing heritability: first, variants with smaller effects whose associations with height failed to reach genome-wide significance and second, allelic heterogeneity due to the effects of multiple variants at a single locus. Using a novel analytical approach we examined allelic heterogeneity of height-associated loci selected from SNPs of different significance levels based on the summary data of the GIANT (stage 1) studies. In a sample of 1,304 individuals collected from an island population of the Adriatic coast of Croatia, we assessed the extent of height variance explained by incorporating the effects of less significant height loci and multiple effective SNPs at the same loci. Our results indicate that approximately half of the 118 loci that achieved stringent genome-wide significance (p-value<5×10(-8)) showed evidence of allelic heterogeneity. Additionally, including less significant loci (i.e., p-value<5×10(-4)) and accounting for effects of allelic heterogeneity substantially improved the variance explained in height.
Comparison of the Results of MISSE 6 Atomic Oxygen Erosion Yields of Layered Kapton H Films with Monte Carlo Computational Predictions

NASA Technical Reports Server (NTRS)

Banks, Bruce A.; Groh, Kim De; Kneubel, Christian A.

2014-01-01

A space experiment flown as part of the Materials International Space Station Experiment 6B (MISSE 6B) was designed to compare the atomic oxygen erosion yield (Ey) of layers of Kapton H polyimide with no spacers between layers with that of layers of Kapton H with spacers between layers. The results were compared to a solid Kapton H (DuPont, Wilmington, DE) sample. Monte Carlo computational modeling was performed to optimize atomic oxygen interaction parameter values to match the results of both the MISSE 6B multilayer experiment and the undercut erosion profile from a crack defect in an aluminized Kapton H sample flown on the Long Duration Exposure Facility (LDEF). The Monte Carlo modeling produced credible agreement with space results of increased Ey for all samples with spacers as well as predicting the space-observed enhancement in erosion near the edges of samples due to scattering from the beveled edges of the sample holders.
Decision analysis and drug development portfolio management: uncovering the real options value of your projects.

PubMed

Rosati, Nicoletta

2002-04-01

Project selection and portfolio management are particularly challenging in the pharmaceutical industry due to the high risk - high stake nature of the drug development process. In the recent years, scholars and industry experts have agreed that traditional Net-Present-Value evaluation of the projects fails to capture the value of managerial flexibility, and encouraged adopting a real options approach to recover the missed value. In this paper, we take a closer look at the drug development process and at the indices currently used to rank projects. We discuss the economic value of information and of real options arising in drug development and present decision analysis as an ideal framework for the implementation of real options valuation.
An alternative data filling approach for prediction of missing data in soft sets (ADFIS).

PubMed

Sadiq Khan, Muhammad; Al-Garadi, Mohammed Ali; Wahab, Ainuddin Wahid Abdul; Herawan, Tutut

2016-01-01

Soft set theory is a mathematical approach that provides solution for dealing with uncertain data. As a standard soft set, it can be represented as a Boolean-valued information system, and hence it has been used in hundreds of useful applications. Meanwhile, these applications become worthless if the Boolean information system contains missing data due to error, security or mishandling. Few researches exist that focused on handling partially incomplete soft set and none of them has high accuracy rate in prediction performance of handling missing data. It is shown that the data filling approach for incomplete soft set (DFIS) has the best performance among all previous approaches. However, in reviewing DFIS, accuracy is still its main problem. In this paper, we propose an alternative data filling approach for prediction of missing data in soft sets, namely ADFIS. The novelty of ADFIS is that, unlike the previous approach that used probability, we focus more on reliability of association among parameters in soft set. Experimental results on small, 04 UCI benchmark data and causality workbench lung cancer (LUCAP2) data shows that ADFIS performs better accuracy as compared to DFIS.
29 CFR Appendix B to Part 4050 - Examples of Benefit Payments for Missing Participants Under §§ 4050.8 Through 4050.10

Code of Federal Regulations, 2010 CFR

2010-07-01

...) of the definition of “missing participant annuity assumptions” in § 4050.2, the present value as of... Plan B's deemed distribution date (and using the missing participant annuity assumptions), the present value per dollar of annual benefit (payable monthly as a joint and 50 percent survivor annuity...
[Health status, use of health services and reported morbidity: application of correspondence analysis].

PubMed

Espinàs, J A; Riba, M D; Borràs, J M; Sánchez, V

1995-01-01

The study of the relationship between self-reported morbidity, health status and health care utilization presents methodological problems due to the variety of illnesses and medical conditions that one individual may report. In this article, correspondence analysis was use to analyse these relationships. Data from the Spanish National Health Survey pertaining to the region of Catalonia was studied. Statistical analysis included multi-way correspondence analysis (MCA) followed by cluster analysis. The first factor extracted is defined by self-assessed health perception; the second, by limitation of activities, and the third is related to self-reported morbidity caused by chronic and acute health problems. Fourth and fifth factors, capture residual variability and missing values. Acute problems are more related to perception of poor health while chronic problems are related to perception of fair health. Also, it may be possible to distinguish self-reported morbidity due to relapses of chronic diseases from true acute health problems. Cluster analysis classified individuals into four groups: 1) healthy people; 2) people who assess their health as being poor and those with acute health problems; 3) people with chronic health problems, limited activity and a perception of fair health; and 4) missing values. Correspondence analysis is a useful tool when analyzing qualitative variables like those in a health survey.
Comparison of DIGE and post-stained gel electrophoresis with both traditional and SameSpots analysis for quantitative proteomics.

PubMed

Karp, Natasha A; Feret, Renata; Rubtsov, Denis V; Lilley, Kathryn S

2008-03-01

2-DE is an important tool in quantitative proteomics. Here, we compare the deep purple (DP) system with DIGE using both a traditional and the SameSpots approach to gel analysis. Missing values in the traditional approach were found to be a significant issue for both systems. SameSpots attempts to address the missing value problem. SameSpots was found to increase the proportion of low volume data for DP but not for DIGE. For all the analysis methods applied in this study, the assumptions of parametric tests were met. Analysis of the same images gave significantly lower noise with SameSpots (over traditional) for DP, but no difference for DIGE. We propose that SameSpots gave lower noise with DP due to the stabilisation of the spot area by the common spot outline, but this was not seen with DIGE due to the co-detection process which stabilises the area selected. For studies where measurement of small abundance changes is required, a cost-benefit analysis highlights that DIGE was significantly cheaper regardless of the analysis methods. For studies analysing large changes, DP with SameSpots could be an effective alternative to DIGE but this will be dependent on the biological noise of the system under investigation.
[Comparison of different methods in dealing with HIV viral load data with diversified missing value mechanism on HIV positive MSM].

PubMed

Jiang, Z; Dou, Z; Song, W L; Xu, J; Wu, Z Y

2017-11-10

Objective: To compare results of different methods: in organizing HIV viral load (VL) data with missing values mechanism. Methods We used software SPSS 17.0 to simulate complete and missing data with different missing value mechanism from HIV viral loading data collected from MSM in 16 cities in China in 2013. Maximum Likelihood Methods Using the Expectation and Maximization Algorithm (EM), regressive method, mean imputation, delete method, and Markov Chain Monte Carlo (MCMC) were used to supplement missing data respectively. The results: of different methods were compared according to distribution characteristics, accuracy and precision. Results HIV VL data could not be transferred into a normal distribution. All the methods showed good results in iterating data which is Missing Completely at Random Mechanism (MCAR). For the other types of missing data, regressive and MCMC methods were used to keep the main characteristic of the original data. The means of iterating database with different methods were all close to the original one. The EM, regressive method, mean imputation, and delete method under-estimate VL while MCMC overestimates it. Conclusion: MCMC can be used as the main imputation method for HIV virus loading missing data. The iterated data can be used as a reference for mean HIV VL estimation among the investigated population.
Kalman Filtering for Genetic Regulatory Networks with Missing Values

PubMed Central

Liu, Qiuhua; Lai, Tianyue; Wang, Wu

2017-01-01

The filter problem with missing value for genetic regulation networks (GRNs) is addressed, in which the noises exist in both the state dynamics and measurement equations; furthermore, the correlation between process noise and measurement noise is also taken into consideration. In order to deal with the filter problem, a class of discrete-time GRNs with missing value, noise correlation, and time delays is established. Then a new observation model is proposed to decrease the adverse effect caused by the missing value and to decouple the correlation between process noise and measurement noise in theory. Finally, a Kalman filtering is used to estimate the states of GRNs. Meanwhile, a typical example is provided to verify the effectiveness of the proposed method, and it turns out to be the case that the concentrations of mRNA and protein could be estimated accurately. PMID:28814967
DOE Office of Scientific and Technical Information (OSTI.GOV)

Pichara, Karim; Protopapas, Pavlos

We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine howmore » classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.« less
Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

PubMed Central

Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

2015-01-01

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

PubMed

Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

2015-01-01

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
A nonparametric multiple imputation approach for missing categorical data.

PubMed

Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

2017-06-06

Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.
Modeling Achievement Trajectories when Attrition Is Informative

ERIC Educational Resources Information Center

Feldman, Betsy J.; Rabe-Hesketh, Sophia

2012-01-01

In longitudinal education studies, assuming that dropout and missing data occur completely at random is often unrealistic. When the probability of dropout depends on covariates and observed responses (called "missing at random" [MAR]), or on values of responses that are missing (called "informative" or "not missing at random" [NMAR]),…
40 CFR 98.275 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.365 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.365 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...

40 CFR 98.175 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.345 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.465 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.355 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...
40 CFR 98.215 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.55 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.155 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...
40 CFR 98.125 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...
40 CFR 98.265 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.175 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.125 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...
40 CFR 98.275 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.215 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.345 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.345 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.155 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...
40 CFR 98.115 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.325 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.175 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.215 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...

40 CFR 98.55 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.325 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.275 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.215 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.355 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...
40 CFR 98.275 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.155 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...
40 CFR 98.365 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.65 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...
40 CFR 98.65 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...
40 CFR 98.115 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.115 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.115 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.225 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.175 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.115 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.125 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...
40 CFR 98.355 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...
40 CFR 98.465 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.325 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...

40 CFR 98.365 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.465 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.225 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.345 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.65 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...
40 CFR 98.125 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... unavailable, a substitute data value for the missing parameter must be used in the calculations as specified...
40 CFR 98.55 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.155 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...
40 CFR 98.55 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.65 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...
40 CFR 98.265 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter must be used in the calculations as specified in paragraphs...
40 CFR 98.355 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter must be used in the calculations, according to the following...
40 CFR 98.345 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... for estimating missing data. A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
40 CFR 98.215 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.325 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.465 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, in accordance with...
40 CFR 98.175 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.225 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.155 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG...), a substitute data value for the missing parameter shall be used in the calculations, according to...
40 CFR 98.65 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations, according to the following...

40 CFR 98.225 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... substitute data value for the missing parameter shall be used in the calculations as specified in paragraphs...
40 CFR 98.365 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emissions... substitute data value for the missing parameter shall be used in the calculations, according to the...
Working with Missing Values

ERIC Educational Resources Information Center

Acock, Alan C.

2005-01-01

Less than optimum strategies for missing values can produce biased estimates, distorted statistical power, and invalid conclusions. After reviewing traditional approaches (listwise, pairwise, and mean substitution), selected alternatives are covered including single imputation, multiple imputation, and full information maximum likelihood…
A Correlated Random Effects Model for Nonignorable Missing Data in Value-Added Assessment of Teacher Effects

ERIC Educational Resources Information Center

Karl, Andrew T.; Yang, Yan; Lohr, Sharon L.

2013-01-01

Value-added models have been widely used to assess the contributions of individual teachers and schools to students' academic growth based on longitudinal student achievement outcomes. There is concern, however, that ignoring the presence of missing values, which are common in longitudinal studies, can bias teachers' value-added scores.…
The multiple imputation method: a case study involving secondary data analysis.

PubMed

Walani, Salimah R; Cleland, Charles M

2015-05-01

To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.
Apparatus And Method For Reconstructing Data Using Cross-Parity Stripes On Storage Media

DOEpatents

Hughes, James Prescott

2003-06-17

An apparatus and method for reconstructing missing data using cross-parity stripes on a storage medium is provided. The apparatus and method may operate on data symbols having sizes greater than a data bit. The apparatus and method makes use of a plurality of parity stripes for reconstructing missing data stripes. The parity symbol values in the parity stripes are used as a basis for determining the value of the missing data symbol in a data stripe. A correction matrix is shifted along the data stripes, correcting missing data symbols as it is shifted. The correction is performed from the outside data stripes towards the inner data stripes to thereby use previously reconstructed data symbols to reconstruct other missing data symbols.
Growth Modeling with Non-Ignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial

PubMed Central

Muthén, Bengt; Asparouhov, Tihomir; Hunter, Aimee; Leuchter, Andrew

2011-01-01

This paper uses a general latent variable framework to study a series of models for non-ignorable missingness due to dropout. Non-ignorable missing data modeling acknowledges that missingness may depend on not only covariates and observed outcomes at previous time points as with the standard missing at random (MAR) assumption, but also on latent variables such as values that would have been observed (missing outcomes), developmental trends (growth factors), and qualitatively different types of development (latent trajectory classes). These alternative predictors of missing data can be explored in a general latent variable framework using the Mplus program. A flexible new model uses an extended pattern-mixture approach where missingness is a function of latent dropout classes in combination with growth mixture modeling using latent trajectory classes. A new selection model allows not only an influence of the outcomes on missingness, but allows this influence to vary across latent trajectory classes. Recommendations are given for choosing models. The missing data models are applied to longitudinal data from STAR*D, the largest antidepressant clinical trial in the U.S. to date. Despite the importance of this trial, STAR*D growth model analyses using non-ignorable missing data techniques have not been explored until now. The STAR*D data are shown to feature distinct trajectory classes, including a low class corresponding to substantial improvement in depression, a minority class with a U-shaped curve corresponding to transient improvement, and a high class corresponding to no improvement. The analyses provide a new way to assess drug efficiency in the presence of dropout. PMID:21381817
Implicit Valuation of the Near-Miss is Dependent on Outcome Context.

PubMed

Banks, Parker J; Tata, Matthew S; Bennett, Patrick J; Sekuler, Allison B; Gruber, Aaron J

2018-03-01

Gambling studies have described a "near-miss effect" wherein the experience of almost winning increases gambling persistence. The near-miss has been proposed to inflate the value of preceding actions through its perceptual similarity to wins. We demonstrate here, however, that it acts as a conditioned stimulus to positively or negatively influence valuation, dependent on reward expectation and cognitive engagement. When subjects are asked to choose between two simulated slot machines, near-misses increase valuation of machines with a low payout rate, whereas they decrease valuation of high payout machines. This contextual effect impairs decisions and persists regardless of manipulations to outcome feedback or financial incentive provided for good performance. It is consistent with proposals that near-misses cause frustration when wins are expected, and we propose that it increases choice stochasticity and overrides avoidance of low-valued options. Intriguingly, the near-miss effect disappears when subjects are required to explicitly value machines by placing bets, rather than choosing between them. We propose that this task increases cognitive engagement and recruits participation of brain regions involved in cognitive processing, causing inhibition of otherwise dominant systems of decision-making. Our results reveal that only implicit, rather than explicit strategies of decision-making are affected by near-misses, and that the brain can fluidly shift between these strategies according to task demands.
40 CFR 98.205 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...
40 CFR 98.255 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...
40 CFR 98.415 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...
40 CFR 98.315 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.85 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...
40 CFR 98.425 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...
40 CFR 98.85 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...
40 CFR 98.415 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...
40 CFR 98.295 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.85 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...
40 CFR 98.185 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.85 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations. The owner or operator must...

40 CFR 98.425 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a) of this subpart cannot... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...
40 CFR 98.205 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...
40 CFR 98.255 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...
40 CFR 98.205 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...
40 CFR 98.185 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.255 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...
40 CFR 98.315 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.315 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.205 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) A complete record of all measured parameters used in the GHG emission... substitute data value for the missing parameter will be used in the calculations as specified in paragraph (b...
40 CFR 98.315 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the petroleum coke input procedure in § 98.313(b), a complete record of all... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.295 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.425 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...
40 CFR 98.185 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.255 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...
40 CFR 98.195 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... all available process data or data used for accounting purposes. (b) For missing values related to the...
40 CFR 98.295 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.415 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...
40 CFR 98.415 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...
40 CFR 98.185 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... missing data. A complete record of all measured parameters used in the GHG emissions calculations in § 98... substitute data value for the missing parameter shall be used in the calculations as specified in the...
40 CFR 98.425 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...

40 CFR 98.295 - Procedures for estimating missing data.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 40 Protection of Environment 21 2014-07-01 2014-07-01 false Procedures for estimating missing data... estimating missing data. For the emission calculation methodologies in § 98.293(b)(2) and (b)(3), a complete... unavailable, a substitute data value for the missing parameter shall be used in the calculations as specified...
40 CFR 98.255 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. A complete record of all measured parameters used in the GHG emissions calculations... during unit operation or if a required fuel sample is not taken), a substitute data value for the missing...
40 CFR 98.425 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. (a) Whenever the quality assurance procedures in § 98.424(a)(1) of this subpart... following missing data procedures shall be followed: (1) A quarterly CO2 mass flow or volumetric flow value...
40 CFR 98.415 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... Procedures for estimating missing data. (a) A complete record of all measured parameters used in the GHG... unavailable (e.g., if a meter malfunctions), a substitute data value for the missing parameter shall be used...
METHODS FOR CLUSTERING TIME SERIES DATA ACQUIRED FROM MOBILE HEALTH APPS.

PubMed

Tignor, Nicole; Wang, Pei; Genes, Nicholas; Rogers, Linda; Hershman, Steven G; Scott, Erick R; Zweig, Micol; Yvonne Chan, Yu-Feng; Schadt, Eric E

2017-01-01

In our recent Asthma Mobile Health Study (AMHS), thousands of asthma patients across the country contributed medical data through the iPhone Asthma Health App on a daily basis for an extended period of time. The collected data included daily self-reported asthma symptoms, symptom triggers, and real time geographic location information. The AMHS is just one of many studies occurring in the context of now many thousands of mobile health apps aimed at improving wellness and better managing chronic disease conditions, leveraging the passive and active collection of data from mobile, handheld smart devices. The ability to identify patient groups or patterns of symptoms that might predict adverse outcomes such as asthma exacerbations or hospitalizations from these types of large, prospectively collected data sets, would be of significant general interest. However, conventional clustering methods cannot be applied to these types of longitudinally collected data, especially survey data actively collected from app users, given heterogeneous patterns of missing values due to: 1) varying survey response rates among different users, 2) varying survey response rates over time of each user, and 3) non-overlapping periods of enrollment among different users. To handle such complicated missing data structure, we proposed a probability imputation model to infer missing data. We also employed a consensus clustering strategy in tandem with the multiple imputation procedure. Through simulation studies under a range of scenarios reflecting real data conditions, we identified favorable performance of the proposed method over other strategies that impute the missing value through low-rank matrix completion. When applying the proposed new method to study asthma triggers and symptoms collected as part of the AMHS, we identified several patient groups with distinct phenotype patterns. Further validation of the methods described in this paper might be used to identify clinically important patterns in large data sets with complicated missing data structure, improving the ability to use such data sets to identify at-risk populations for potential intervention.
Reverse engineering gene regulatory networks from measurement with missing values.

PubMed

Ogundijo, Oyetunji E; Elmas, Abdulkadir; Wang, Xiaodong

2016-12-01

Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values. We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements . The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae . PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.
29 CFR 4050.11 - Limitations.

Code of Federal Regulations, 2010 CFR

2010-07-01

... missing participants. (b) Limitation on benefit value. The total actuarial present value of all benefits... Relating to Labor (Continued) PENSION BENEFIT GUARANTY CORPORATION PLAN TERMINATIONS MISSING PARTICIPANTS § 4050.11 Limitations. (a) Exclusive benefit. The benefits provided for under this part will be the only...
A Review of Missing Data Handling Methods in Education Research

ERIC Educational Resources Information Center

Cheema, Jehanzeb R.

2014-01-01

Missing data are a common occurrence in survey-based research studies in education, and the way missing values are handled can significantly affect the results of analyses based on such data. Despite known problems with performance of some missing data handling methods, such as mean imputation, many researchers in education continue to use those…
40 CFR 98.195 - Procedures for estimating missing data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 40 Protection of Environment 22 2012-07-01 2012-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...
77 FR 3147 - Approval and Promulgation of Air Quality Implementation Plans; Delaware, New Jersey, and...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-01-23

... monitors with missing data. Maximum recorded values are substituted for the missing data. The resulting... which the incomplete site is missing data. The linear regression relationship is based on time periods... between the monitors is used to fill in missing data for the incomplete monitor, so that the normal data...
40 CFR 98.195 - Procedures for estimating missing data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 40 Protection of Environment 21 2011-07-01 2011-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...
40 CFR 98.195 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(2), a complete record of all measured parameters... process data or data used for accounting purposes. (b) For missing values related to the CaO and MgO...
40 CFR 98.195 - Procedures for estimating missing data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 22 2013-07-01 2013-07-01 false Procedures for estimating missing data... estimating missing data. For the procedure in § 98.193(b)(1), a complete record of all measured parameters... available process data or data used for accounting purposes. (b) For missing values related to the CaO and...
Daily reference crop evapotranspiration with reduced data sets in the humid environments of Azores islands using estimates of actual vapor pressure, solar radiation, and wind speed

NASA Astrophysics Data System (ADS)

Paredes, P.; Fontes, J. C.; Azevedo, E. B.; Pereira, L. S.

2017-11-01

Reference crop evapotranspiration (ETo) estimations using the FAO Penman-Monteith equation (PM-ETo) require a set of weather data including maximum and minimum air temperatures (T max, T min), actual vapor pressure (e a), solar radiation (R s), and wind speed (u 2). However, those data are often not available, or data sets are incomplete due to missing values. A set of procedures were proposed in FAO56 (Allen et al. 1998) to overcome these limitations, and which accuracy for estimating daily ETo in the humid climate of Azores islands is assessed in this study. Results show that after locally and seasonally calibrating the temperature adjustment factor a d used for dew point temperature (T dew) computation from mean temperature, ETo estimations shown small bias and small RMSE ranging from 0.15 to 0.53 mm day-1. When R s data are missing, their estimation from the temperature difference (T max-T min), using a locally and seasonal calibrated radiation adjustment coefficient (k Rs), yielded highly accurate ETo estimates, with RMSE averaging 0.41 mm day-1 and ranging from 0.33 to 0.58 mm day-1. If wind speed observations are missing, the use of the default u 2 = 2 m s-1, or 3 m s-1 in case of weather measurements over clipped grass in airports, revealed appropriated even for the windy locations (u 2 > 4 m s-1), with RMSE < 0.36 mm day-1. The appropriateness of procedure to estimating the missing values of e a, R s, and u 2 was confirmed.
Examining solutions to missing data in longitudinal nursing research.

PubMed

Roberts, Mary B; Sullivan, Mary C; Winchester, Suzy B

2017-04-01

Longitudinal studies are highly valuable in pediatrics because they provide useful data about developmental patterns of child health and behavior over time. When data are missing, the value of the research is impacted. The study's purpose was to (1) introduce a three-step approach to assess and address missing data and (2) illustrate this approach using categorical and continuous-level variables from a longitudinal study of premature infants. A three-step approach with simulations was followed to assess the amount and pattern of missing data and to determine the most appropriate imputation method for the missing data. Patterns of missingness were Missing Completely at Random, Missing at Random, and Not Missing at Random. Missing continuous-level data were imputed using mean replacement, stochastic regression, multiple imputation, and fully conditional specification (FCS). Missing categorical-level data were imputed using last value carried forward, hot-decking, stochastic regression, and FCS. Simulations were used to evaluate these imputation methods under different patterns of missingness at different levels of missing data. The rate of missingness was 16-23% for continuous variables and 1-28% for categorical variables. FCS imputation provided the least difference in mean and standard deviation estimates for continuous measures. FCS imputation was acceptable for categorical measures. Results obtained through simulation reinforced and confirmed these findings. Significant investments are made in the collection of longitudinal data. The prudent handling of missing data can protect these investments and potentially improve the scientific information contained in pediatric longitudinal studies. © 2017 Wiley Periodicals, Inc.
Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

NASA Astrophysics Data System (ADS)

Friedel, M. J.; Daughney, C.

2016-12-01

The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.
Method variation in the impact of missing data on response shift detection.

PubMed

Schwartz, Carolyn E; Sajobi, Tolulope T; Verdam, Mathilde G E; Sebille, Veronique; Lix, Lisa M; Guilleux, Alice; Sprangers, Mirjam A G

2015-03-01

Missing data due to attrition or item non-response can result in biased estimates and loss of power in longitudinal quality-of-life (QOL) research. The impact of missing data on response shift (RS) detection is relatively unknown. This overview article synthesizes the findings of three methods tested in this special section regarding the impact of missing data patterns on RS detection in incomplete longitudinal data. The RS detection methods investigated include: (1) Relative importance analysis to detect reprioritization RS in stroke caregivers; (2) Oort's structural equation modeling (SEM) to detect recalibration, reprioritization, and reconceptualization RS in cancer patients; and (3) Rasch-based item-response theory-based (IRT) models as compared to SEM models to detect recalibration and reprioritization RS in hospitalized chronic disease patients. Each method dealt with missing data differently, either with imputation (1), attrition-based multi-group analysis (2), or probabilistic analysis that is robust to missingness due to the specific objectivity property (3). Relative importance analyses were sensitive to the type and amount of missing data and imputation method, with multiple imputation showing the largest RS effects. The attrition-based multi-group SEM revealed differential effects of both the changes in health-related QOL and the occurrence of response shift by attrition stratum, and enabled a more complete interpretation of findings. The IRT RS algorithm found evidence of small recalibration and reprioritization effects in General Health, whereas SEM mostly evidenced small recalibration effects. These differences may be due to differences between the two methods in handling of missing data. Missing data imputation techniques result in different conclusions about the presence of reprioritization RS using the relative importance method, while the attrition-based SEM approach highlighted different recalibration and reprioritization RS effects by attrition group. The IRT analyses detected more recalibration and reprioritization RS effects than SEM, presumably due to IRT's robustness to missing data. Future research should apply simulation techniques in order to make conclusive statements about the impacts of missing data according to the type and amount of RS.
A novel application of the Intent to Attend assessment to reduce bias due to missing data in a randomized controlled clinical trial

PubMed Central

Rabideau, Dustin J; Nierenberg, Andrew A; Sylvia, Louisa G; Friedman, Edward S.; Bowden, Charles L.; Thase, Michael E.; Ketter, Terence; Ostacher, Michael J.; Reilly-Harrington, Noreen; Iosifescu, Dan V.; Calabrese, Joseph R.; Leon, Andrew C.; Schoenfeld, David A

2014-01-01

Background Missing data are unavoidable in most randomized controlled clinical trials, especially when measurements are taken repeatedly. If strong assumptions about the missing data are not accurate, crude statistical analyses are biased and can lead to false inferences. Furthermore, if we fail to measure all predictors of missing data, we may not be able to model the missing data process sufficiently. In longitudinal randomized trials, measuring a patient's intent to attend future study visits may help to address both of these problems. Leon et al. developed and included the Intent to Attend assessment in the Lithium Treatment—Moderate dose Use Study (LiTMUS), aiming to remove bias due to missing data from the primary study hypothesis [1]. Purpose The purpose of this study is to assess the performance of the Intent to Attend assessment with regard to its use in a sensitivity analysis of missing data. Methods We fit marginal models to assess whether a patient's self-rated intent predicted actual study adherence. We applied inverse probability of attrition weighting (IPAW) coupled with patient intent to assess whether there existed treatment group differences in response over time. We compared the IPAW results to those obtained using other methods. Results Patient-rated intent predicted missed study visits, even when adjusting for other predictors of missing data. On average, the hazard of retention increased by 19% for every one-point increase in intent. We also found that more severe mania, male gender, and a previously missed visit predicted subsequent absence. Although we found no difference in response between the randomized treatment groups, IPAW increased the estimated group difference over time. Limitations LiTMUS was designed to limit missed study visits, which may have attenuated the effects of adjusting for missing data. Additionally, IPAW can be less efficient and less powerful than maximum likelihood or Bayesian estimators, given that the parametric model is well-specified. Conclusions In LiTMUS, the Intent to Attend assessment predicted missed study visits. This item was incorporated into our IPAW models and helped reduce bias due to informative missing data. This analysis should both encourage and facilitate future use of the Intent to Attend assessment along with IPAW to address missing data in a randomized trial. PMID:24872362
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES

PubMed Central

He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.

2017-01-01

Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
A study on the mortality patterns of missing and deceased persons with dementia who died due to wandering.

PubMed

Kikuchi, Kazunori; Ijuin, Mutsuo; Awata, Shuichi; Suzuki, Takao

2016-01-01

To clarify the mortality patterns derived from differences in the causes of death and to subsequently promote search activity and prevent the death of missing persons. The Ministry of Health, Labour and Welfare (MHLW) performed a mail survey using a self-administered questionnaire. The families of all 388 deceased dementia patients from among all of the missing persons reports involving dementia patients that were submitted to the police in 2013, and the 10,322 missing persons with dementia (or suspected cases) were the subjects of this survey. The survey was conducted from January 5 to February 2 in 2015. We analyzed the data provided by the MHLW on 61 cases in which the cause of death was recorded; the factors that were related to the differences in the causes of death were examined using a chi-squared test (Fisher's direct method) and a residual analysis. Based on previous studies, we classified the causes of death into three categories: "drowning," "hypothermia," and "others (e.g., traumatic injury, disease progression)." When the cause of death was hypothermia, death often occurred between three to four days from the time that the deceased individual went missing. A significantly higher number of patients who died of other causes were found to have died on the day that they went missing. More than 40% of the drowning cases occurred on the day that the deceased individual went missing. We identified 3 patterns of mortality: (1) death on the day that the deceased individual went missing due to traumatic injury, disease progression, drowning, and other causes; (2) death due to hypothermia within a few days after the deceased individual went missing; and (3) patterns other than (1) and (2).

29 CFR 4050.8 - Automatic lump sum.

Code of Federal Regulations, 2010 CFR

2010-07-01

... present value (determined as of the deemed distribution date under the missing participant lump sum... Relating to Labor (Continued) PENSION BENEFIT GUARANTY CORPORATION PLAN TERMINATIONS MISSING PARTICIPANTS § 4050.8 Automatic lump sum. This section applies to a missing participant whose designated benefit was...
Handling missing Mini-Mental State Examination (MMSE) values: Results from a cross-sectional long-term-care study.

PubMed

Godin, Judith; Keefe, Janice; Andrew, Melissa K

2017-04-01

Missing values are commonly encountered on the Mini Mental State Examination (MMSE), particularly when administered to frail older people. This presents challenges for MMSE scoring in research settings. We sought to describe missingness in MMSEs administered in long-term-care facilities (LTCF) and to compare and contrast approaches to dealing with missing items. As part of the Care and Construction project in Nova Scotia, Canada, LTCF residents completed an MMSE. Different methods of dealing with missing values (e.g., use of raw scores, raw scores/number of items attempted, scale-level multiple imputation [MI], and blended approaches) are compared to item-level MI. The MMSE was administered to 320 residents living in 23 LTCF. The sample was predominately female (73%), and 38% of participants were aged >85 years. At least one item was missing from 122 (38.2%) of the MMSEs. Data were not Missing Completely at Random (MCAR), χ 2 (1110) = 1,351, p < 0.001. Using raw scores for those missing <6 items in combination with scale-level MI resulted in the regression coefficients and standard errors closest to item-level MI. Patterns of missing items often suggest systematic problems, such as trouble with manual dexterity, literacy, or visual impairment. While these observations may be relatively easy to take into account in clinical settings, non-random missingness presents challenges for research and must be considered in statistical analyses. We present suggestions for dealing with missing MMSE data based on the extent of missingness and the goal of analyses. Copyright © 2016 The Authors. Production and hosting by Elsevier B.V. All rights reserved.
40 CFR 98.295 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... value shall be the best available estimate(s) of the parameter(s), based on all available process data or data used for accounting purposes. (c) For each missing value collected during the performance test (hourly CO2 concentration, stack gas volumetric flow rate, or average process vent flow from mine...
On Obtaining Estimates of the Fraction of Missing Information from Full Information Maximum Likelihood

ERIC Educational Resources Information Center

Savalei, Victoria; Rhemtulla, Mijke

2012-01-01

Fraction of missing information [lambda][subscript j] is a useful measure of the impact of missing data on the quality of estimation of a particular parameter. This measure can be computed for all parameters in the model, and it communicates the relative loss of efficiency in the estimation of a particular parameter due to missing data. It has…
Examining Solutions to Missing Data in Longitudinal Nursing Research

PubMed Central

Roberts, Mary B.; Sullivan, Mary C.; Winchester, Suzy B.

2017-01-01

Purpose Longitudinal studies are highly valuable in pediatrics because they provide useful data about developmental patterns of child health and behavior over time. When data are missing, the value of the research is impacted. The study’s purpose was to: (1) introduce a 3-step approach to assess and address missing data; (2) illustrate this approach using categorical and continuous level variables from a longitudinal study of premature infants. Methods A three-step approach with simulations was followed to assess the amount and pattern of missing data and to determine the most appropriate imputation method for the missing data. Patterns of missingness were Missing Completely at Random, Missing at Random, and Not Missing at Random. Missing continuous-level data were imputed using mean replacement, stochastic regression, multiple imputation, and fully conditional specification. Missing categorical-level data were imputed using last value carried forward, hot-decking, stochastic regression, and fully conditional specification. Simulations were used to evaluate these imputation methods under different patterns of missingness at different levels of missing data. Results The rate of missingness was 16–23% for continuous variables and 1–28% for categorical variables. Fully conditional specification imputation provided the least difference in mean and standard deviation estimates for continuous measures. Fully conditional specification imputation was acceptable for categorical measures. Results obtained through simulation reinforced and confirmed these findings. Practice Implications Significant investments are made in the collection of longitudinal data. The prudent handling of missing data can protect these investments and potentially improve the scientific information contained in pediatric longitudinal studies. PMID:28425202
Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

PubMed

Taylor, Sandra L; Leiserowitz, Gary S; Kim, Kyoungmi

2013-12-01

Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.
Simulation-based sensitivity analysis for non-ignorably missing data.

PubMed

Yin, Peng; Shi, Jian Q

2017-01-01

Sensitivity analysis is popular in dealing with missing data problems particularly for non-ignorable missingness, where full-likelihood method cannot be adopted. It analyses how sensitively the conclusions (output) may depend on assumptions or parameters (input) about missing data, i.e. missing data mechanism. We call models with the problem of uncertainty sensitivity models. To make conventional sensitivity analysis more useful in practice we need to define some simple and interpretable statistical quantities to assess the sensitivity models and make evidence based analysis. We propose a novel approach in this paper on attempting to investigate the possibility of each missing data mechanism model assumption, by comparing the simulated datasets from various MNAR models with the observed data non-parametrically, using the K-nearest-neighbour distances. Some asymptotic theory has also been provided. A key step of this method is to plug in a plausibility evaluation system towards each sensitivity parameter, to select plausible values and reject unlikely values, instead of considering all proposed values of sensitivity parameters as in the conventional sensitivity analysis method. The method is generic and has been applied successfully to several specific models in this paper including meta-analysis model with publication bias, analysis of incomplete longitudinal data and mean estimation with non-ignorable missing data.
29 CFR 4050.6 - Payment and required documentation.

Code of Federal Regulations, 2013 CFR

2013-07-01

... MISSING PARTICIPANTS § 4050.6 Payment and required documentation. (a) Time of payment and filing. The plan... administrator and the plan's enrolled actuary) specified in the missing participant forms and instructions, by the time the post-distribution certification is due. Except as otherwise provided in the missing...
29 CFR 4050.6 - Payment and required documentation.

Code of Federal Regulations, 2010 CFR

2010-07-01

... MISSING PARTICIPANTS § 4050.6 Payment and required documentation. (a) Time of payment and filing. The plan... administrator and the plan's enrolled actuary) specified in the missing participant forms and instructions, by the time the post-distribution certification is due. Except as otherwise provided in the missing...
Making the most of missing values : object clustering with partial data in astronomy

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Laidler, Victoria G.

2004-01-01

We demonstrate a clustering analysis algorithm, KSC, that a) uses all observed values and b) does not discard the partially observed objects. KSC uses soft constraints defined by the fully observed objects to assist in the grouping of objects with missing values. We present an analysis of objects taken from the Sloan Digital Sky Survey to demonstrate how imputing the values can be misleading and why the KSC approach can produce more appropriate results.
A meta-data based method for DNA microarray imputation.

PubMed

Jörnsten, Rebecka; Ouyang, Ming; Wang, Hui-Yu

2007-03-29

DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available.
Incomplete Data in Smart Grid: Treatment of Values in Electric Vehicle Charging Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Majipour, Mostafa; Chu, Peter; Gadh, Rajit

2014-11-03

In this paper, five imputation methods namely Constant (zero), Mean, Median, Maximum Likelihood, and Multiple Imputation methods have been applied to compensate for missing values in Electric Vehicle (EV) charging data. The outcome of each of these methods have been used as the input to a prediction algorithm to forecast the EV load in the next 24 hours at each individual outlet. The data is real world data at the outlet level from the UCLA campus parking lots. Given the sparsity of the data, both Median and Constant (=zero) imputations improved the prediction results. Since in most missing value casesmore » in our database, all values of that instance are missing, the multivariate imputation methods did not improve the results significantly compared to univariate approaches.« less
Estimating monthly streamflow values by cokriging

USGS Publications Warehouse

Solow, A.R.; Gorelick, S.M.

1986-01-01

Cokriging is applied to estimation of missing monthly streamflow values in three records from gaging stations in west central Virginia. Missing values are estimated from optimal consideration of the pattern of auto- and cross-correlation among standardized residual log-flow records. Investigation of the sensitivity of estimation to data configuration showed that when observations are available within two months of a missing value, estimation is improved by accounting for correlation. Concurrent and lag-one observations tend to screen the influence of other available observations. Three models of covariance structure in residual log-flow records are compared using cross-validation. Models differ in how much monthly variation they allow in covariance. Precision of estimation, reflected in mean squared error (MSE), proved to be insensitive to this choice. Cross-validation is suggested as a tool for choosing an inverse transformation when an initial nonlinear transformation is applied to flow values. ?? 1986 Plenum Publishing Corporation.
Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

ERIC Educational Resources Information Center

Pohl, Steffi; Gräfe, Linda; Rose, Norman

2014-01-01

Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Reuse of imputed data in microarray analysis increases imputation efficiency

PubMed Central

Kim, Ki-Yeol; Kim, Byoung-Jin; Yi, Gwan-Su

2004-01-01

Background The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. Results We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. Conclusions Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data. PMID:15504240
Least-Squares Approximation of an Improper Correlation Matrix by a Proper One.

ERIC Educational Resources Information Center

Knol, Dirk L.; ten Berge, Jos M. F.

1989-01-01

An algorithm, based on a solution for C. I. Mosier's oblique Procrustes rotation problem, is presented for the best least-squares fitting correlation matrix approximating a given missing value or improper correlation matrix. Results are of interest for missing value and tetrachoric correlation, indefinite matrix correlation, and constrained…
Missed rib fractures on evaluation of initial chest CT for trauma patients: pattern analysis and diagnostic value of coronal multiplanar reconstruction images with multidetector row CT.

PubMed

Cho, S H; Sung, Y M; Kim, M S

2012-10-01

The objective of this study was to review the prevalence and radiological features of rib fractures missed on initial chest CT evaluation, and to examine the diagnostic value of additional coronal images in a large series of trauma patients. 130 patients who presented to an emergency room for blunt chest trauma underwent multidetector row CT of the thorax within the first hour during their stay, and had follow-up CT or bone scans as diagnostic gold standards. Images were evaluated on two separate occasions: once with axial images and once with both axial and coronal images. The detection rates of missed rib fractures were compared between readings using a non-parametric method of clustered data. In the cases of missed rib fractures, the shapes, locations and associated fractures were evaluated. 58 rib fractures were missed with axial images only and 52 were missed with both axial and coronal images (p=0.088). The most common shape of missed rib fractures was buckled (56.9%), and the anterior arc (55.2%) was most commonly involved. 21 (36.2%) missed rib fractures had combined fractures on the same ribs, and 38 (65.5%) were accompanied by fracture on neighbouring ribs. Missed rib fractures are not uncommon, and radiologists should be familiar with buckle fractures, which are frequently missed. Additional coronal imagescan be helpful in the diagnosis of rib fractures that are not seen on axial images.
[Study on correction of data bias caused by different missing mechanisms in survey of medical expenditure among students enrolling in Urban Resident Basic Medical Insurance].

PubMed

Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong

2015-05-01

The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.
Missing data imputation: focusing on single imputation.

PubMed

Zhang, Zhongheng

2016-01-01

Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.
Missing data imputation: focusing on single imputation

PubMed Central

2016-01-01

Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations. PMID:26855945

Decoding the Effect of Isobaric Substitutions on Identifying Missing Proteins and Variant Peptides in Human Proteome.

PubMed

Choong, Wai-Kok; Lih, Tung-Shing Mamie; Chen, Yu-Ju; Sung, Ting-Yi

2017-12-01

To confirm the existence of missing proteins, we need to identify at least two unique peptides with length of 9-40 amino acids of a missing protein in bottom-up mass-spectrometry-based proteomic experiments. However, an identified unique peptide of the missing protein, even identified with high level of confidence, could possibly coincide with a peptide of a commonly observed protein due to isobaric substitutions, mass modifications, alternative splice isoforms, or single amino acid variants (SAAVs). Besides unique peptides of missing proteins, identified variant peptides (SAAV-containing peptides) could also alternatively map to peptides of other proteins due to the aforementioned issues. Therefore, we conducted a thorough comparative analysis on data sets in PeptideAtlas Tiered Human Integrated Search Proteome (THISP, 2017-03 release), including neXtProt (2017-01 release), to systematically investigate the possibility of unique peptides in missing proteins (PE2-4), unique peptides in dubious proteins, and variant peptides affected by isobaric substitutions, causing doubtful identification results. In this study, we considered 11 isobaric substitutions. From our analysis, we found <5% of the unique peptides of missing proteins and >6% of variant peptides became shared with peptides of PE1 proteins after isobaric substitutions.
What are we missing? Scope 3 greenhouse gas emissions accounting in the metals and minerals industry

NASA Astrophysics Data System (ADS)

Greene, Suzanne E.

2018-05-01

Metal and mineral companies have significant greenhouse gas emissions in their upstream and downstream value chains due to outsourced extraction, beneficiation and transportation activities, depending on a firm's business model. While many companies move towards more transparent reporting of corporate greenhouse gas emissions, value chain emissions remain difficult to capture, particularly in the global supply chain. Incomplete reports make it difficult for companies to track emissions reductions goals or implement sustainable supply chain improvements, especially for commodity products that form the base of many other sector's value chains. Using voluntarily-reported CDP data, this paper sheds light on hotspots in value chain emissions for individual metal and mineral companies, and for the sector as a whole. The state of value chain emissions reporting for the industry is discussed in general, with a focus on where emissions could potentially be underestimated and how estimates could be improved.
Comparison of methods for dealing with missing values in the EPV-R.

PubMed

Paniagua, David; Amor, Pedro J; Echeburúa, Enrique; Abad, Francisco J

2017-08-01

The development of an effective instrument to assess the risk of partner violence is a topic of great social relevance. This study evaluates the scale of “Predicción del Riesgo de Violencia Grave Contra la Pareja” –Revisada– (EPV-R - Severe Intimate Partner Violence Risk Prediction Scale-Revised), a tool developed in Spain, which is facing the problem of how to treat the high rate of missing values, as is usual in this type of scale. First, responses to the EPV-R in a sample of 1215 male abusers who were reported to the police were used to analyze the patterns of occurrence of missing values, as well as the factor structure. Second, we analyzed the performance of various imputation methods using simulated data that emulates the missing data mechanism found in the empirical database. The imputation procedure originally proposed by the authors of the scale provides acceptable results, although the application of a method based on the Item Response Theory could provide greater accuracy and offers some additional advantages. Item Response Theory appears to be a useful tool for imputing missing data in this type of questionnaire.
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

PubMed

Liu, Yuzhe; Gopalakrishnan, Vanathi

2017-03-01

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.
Tensor completion for estimating missing values in visual data.

PubMed

Liu, Ji; Musialski, Przemyslaw; Wonka, Peter; Ye, Jieping

2013-01-01

In this paper, we propose an algorithm to estimate missing values in tensors of visual data. The values can be missing due to problems in the acquisition process or because the user manually identified unwanted outliers. Our algorithm works even with a small amount of samples and it can propagate structure to fill larger missing regions. Our methodology is built on recent studies about matrix completion using the matrix trace norm. The contribution of our paper is to extend the matrix case to the tensor case by proposing the first definition of the trace norm for tensors and then by building a working algorithm. First, we propose a definition for the tensor trace norm that generalizes the established definition of the matrix trace norm. Second, similarly to matrix completion, the tensor completion is formulated as a convex optimization problem. Unfortunately, the straightforward problem extension is significantly harder to solve than the matrix case because of the dependency among multiple constraints. To tackle this problem, we developed three algorithms: simple low rank tensor completion (SiLRTC), fast low rank tensor completion (FaLRTC), and high accuracy low rank tensor completion (HaLRTC). The SiLRTC algorithm is simple to implement and employs a relaxation technique to separate the dependent relationships and uses the block coordinate descent (BCD) method to achieve a globally optimal solution; the FaLRTC algorithm utilizes a smoothing scheme to transform the original nonsmooth problem into a smooth one and can be used to solve a general tensor trace norm minimization problem; the HaLRTC algorithm applies the alternating direction method of multipliers (ADMMs) to our problem. Our experiments show potential applications of our algorithms and the quantitative evaluation indicates that our methods are more accurate and robust than heuristic approaches. The efficiency comparison indicates that FaLTRC and HaLRTC are more efficient than SiLRTC and between FaLRTC an- HaLRTC the former is more efficient to obtain a low accuracy solution and the latter is preferred if a high-accuracy solution is desired.
Comprehensive evaluation of multisatellite precipitation estimates over India using gridded rainfall data

NASA Astrophysics Data System (ADS)

Sunilkumar, K.; Narayana Rao, T.; Saikranthi, K.; Purnachandra Rao, M.

2015-09-01

This study presents a comprehensive evaluation of five widely used multisatellite precipitation estimates (MPEs) against 1° × 1° gridded rain gauge data set as ground truth over India. One decade observations are used to assess the performance of various MPEs (Climate Prediction Center (CPC)-South Asia data set, CPC Morphing Technique (CMORPH), Precipitation Estimation From Remotely Sensed Information Using Artificial Neural Networks, Tropical Rainfall Measuring Mission's Multisatellite Precipitation Analysis (TMPA-3B42), and Global Precipitation Climatology Project). All MPEs have high detection skills of rain with larger probability of detection (POD) and smaller "missing" values. However, the detection sensitivity differs from one product (and also one region) to the other. While the CMORPH has the lowest sensitivity of detecting rain, CPC shows highest sensitivity and often overdetects rain, as evidenced by large POD and false alarm ratio and small missing values. All MPEs show higher rain sensitivity over eastern India than western India. These differential sensitivities are found to alter the biases in rain amount differently. All MPEs show similar spatial patterns of seasonal rain bias and root-mean-square error, but their spatial variability across India is complex and pronounced. The MPEs overestimate the rainfall over the dry regions (northwest and southeast India) and severely underestimate over mountainous regions (west coast and northeast India), whereas the bias is relatively small over the core monsoon zone. Higher occurrence of virga rain due to subcloud evaporation and possible missing of small-scale convective events by gauges over the dry regions are the main reasons for the observed overestimation of rain by MPEs. The decomposed components of total bias show that the major part of overestimation is due to false precipitation. The severe underestimation of rain along the west coast is attributed to the predominant occurrence of shallow rain and underestimation of moderate to heavy rain by MPEs. The decomposed components suggest that the missed precipitation and hit bias are the leading error sources for the total bias along the west coast. All evaluation metrics are found to be nearly equal in two contrasting monsoon seasons (southwest and northeast), indicating that the performance of MPEs does not change with the season, at least over southeast India. Among various MPEs, the performance of TMPA is found to be better than others, as it reproduced most of the spatial variability exhibited by the reference.
Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation

PubMed Central

Bryndová, Michala; Kasari, Liis; Norberg, Anna; Weiss, Matthias; Bishop, Tom R.; Luke, Sarah H.; Sam, Katerina; Le Bagousse-Pinguet, Yoann; Lepš, Jan; Götzenberger, Lars; de Bello, Francesco

2016-01-01

Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data. PMID:26881747
Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation.

PubMed

Májeková, Maria; Paal, Taavi; Plowman, Nichola S; Bryndová, Michala; Kasari, Liis; Norberg, Anna; Weiss, Matthias; Bishop, Tom R; Luke, Sarah H; Sam, Katerina; Le Bagousse-Pinguet, Yoann; Lepš, Jan; Götzenberger, Lars; de Bello, Francesco

2016-01-01

Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package "traitor" to facilitate assessments of missing trait data.
Recovering incomplete data using Statistical Multiple Imputations (SMI): a case study in environmental chemistry.

PubMed

Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D

2011-10-15

This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset. Copyright © 2011 Elsevier B.V. All rights reserved.
Missed rib fractures on evaluation of initial chest CT for trauma patients: pattern analysis and diagnostic value of coronal multiplanar reconstruction images with multidetector row CT

PubMed Central

Cho, S H; Sung, Y M; Kim, M S

2012-01-01

Objective The objective of this study was to review the prevalence and radiological features of rib fractures missed on initial chest CT evaluation, and to examine the diagnostic value of additional coronal images in a large series of trauma patients. Methods 130 patients who presented to an emergency room for blunt chest trauma underwent multidetector row CT of the thorax within the first hour during their stay, and had follow-up CT or bone scans as diagnostic gold standards. Images were evaluated on two separate occasions: once with axial images and once with both axial and coronal images. The detection rates of missed rib fractures were compared between readings using a non-parametric method of clustered data. In the cases of missed rib fractures, the shapes, locations and associated fractures were evaluated. Results 58 rib fractures were missed with axial images only and 52 were missed with both axial and coronal images (p=0.088). The most common shape of missed rib fractures was buckled (56.9%), and the anterior arc (55.2%) was most commonly involved. 21 (36.2%) missed rib fractures had combined fractures on the same ribs, and 38 (65.5%) were accompanied by fracture on neighbouring ribs. Conclusion Missed rib fractures are not uncommon, and radiologists should be familiar with buckle fractures, which are frequently missed. Additional coronal imagescan be helpful in the diagnosis of rib fractures that are not seen on axial images. PMID:22514102
Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example Using National Data on Drug Injection in Prisons

PubMed Central

Haji-Maghsoudi, Saiedeh; Haghdoost, Ali-akbar; Rastegari, Azam; Baneshi, Mohammad Reza

2013-01-01

Background: Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods: We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation. Results: In scenario 2, bias in estimates was low and performances of all methods for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion: In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data. PMID:24596839
Attrition in Developmental Psychology: A Review of Modern Missing Data Reporting and Practices

ERIC Educational Resources Information Center

Nicholson, Jody S.; Deboeck, Pascal R.; Howard, Waylon

2017-01-01

Inherent in applied developmental sciences is the threat to validity and generalizability due to missing data as a result of participant drop-out. The current paper provides an overview of how attrition should be reported, which tests can examine the potential of bias due to attrition (e.g., t-tests, logistic regression, Little's MCAR test,…
Maternal Near-Miss Due to Unsafe Abortion and Associated Short-Term Health and Socio-Economic Consequences in Nigeria

PubMed Central

Prada, Elena; Bankole, Akinrinola; Oladapo, Olufemi T.; Awolude, Olutosin A.; Adewole, Isaac F.; Onda, Tsuyoshi

2016-01-01

Little is known about maternal near-miss (MNM) due to unsafe abortion in Nigeria. We used the WHO criteria to identify near-miss events and the proportion due to unsafe abortion among women of childbearing age in eight large secondary and tertiary hospitals across the six geo-political zones. We also explored the characteristics of women with these events, delays in seeking care and the short-term socioeconomic and health impacts on women and their families. Between July 2011 and January 2012, 137 MNM cases were identified of which 13 or 9.5% were due to unsafe abortions. Severe bleeding, pain and fever were the most common immediate abortion complications. On average, treatment of MNM due to abortion costs six times more than induced abortion procedures. Unsafe abortion and delays in care seeking are important contributors to MNM. Programs to prevent unsafe abortion and delays in seeking postabortion care are urgently needed to reduce abortion related MNM in Nigeria. PMID:26506658
Maternal Near-Miss Due to Unsafe Abortion and Associated Short-Term Health and Socio-Economic Consequences in Nigeria.

PubMed

Prada, Elena; Bankole, Akinrinola; Oladapo, Olufemi T; Awolude, Olutosin A; Adewole, Isaac F; Onda, Tsuyoshi

2015-06-01

Little is known about maternal near-miss (MNM) due to unsafe abortion in Nigeria. We used the WHO criteria to identify near-miss events and the proportion due to unsafe abortion among women of childbearing age in eight large secondary and tertiary hospitals across the six geo-political zones. We also explored the characteristics of women with these events, delays in seeking care and the short-term socioeconomic and health impacts on women and their families. Between July 2011 and January 2012, 137 MNM cases were identified of which 13 or 9.5% were due to unsafe abortions. Severe bleeding, pain and fever were the most common immediate abortion complications. On average, treatment of MNM due to abortion costs six times more than induced abortion procedures. Unsafe abortion and delays in care seeking are important contributors to MNM. Programs to prevent unsafe abortion and delays in seeking postabortion care are urgently needed to reduce abortion related MNM in Nigeria.
Analysis of variance calculations for irregular experiments

Treesearch

Jonathan W. Wright

1977-01-01

Irregular experiments may be more useful than much smaller regular experiments and can be analyzed statistically without undue expenditure of time. For a few missing plots, standard methods of calculating missing-plot values can be used. For more missing plots (up to 10 percent), seedlot means or randomly chosen plot means of the same seedlot can be substituted for...
Handling Missing Data in Structural Equation Models in R: A Replication Study for Applied Researchers

ERIC Educational Resources Information Center

Wolgast, Anett; Schwinger, Malte; Hahnel, Carolin; Stiensmeier-Pelster, Joachim

2017-01-01

Introduction: Multiple imputation (MI) is one of the most highly recommended methods for replacing missing values in research data. The scope of this paper is to demonstrate missing data handling in SEM by analyzing two modified data examples from educational psychology, and to give practical recommendations for applied researchers. Method: We…
A Note on the Use of Missing Auxiliary Variables in Full Information Maximum Likelihood-Based Structural Equation Models

ERIC Educational Resources Information Center

Enders, Craig K.

2008-01-01

Recent missing data studies have argued in favor of an "inclusive analytic strategy" that incorporates auxiliary variables into the estimation routine, and Graham (2003) outlined methods for incorporating auxiliary variables into structural equation analyses. In practice, the auxiliary variables often have missing values, so it is reasonable to…
A Primer for Handling Missing Values in the Analysis of Education and Training Data

ERIC Educational Resources Information Center

Gemici, Sinan; Bednarz, Alice; Lim, Patrick

2012-01-01

Quantitative research in vocational education and training (VET) is routinely affected by missing or incomplete information. However, the handling of missing data in published VET research is often sub-optimal, leading to a real risk of generating results that can range from being slightly biased to being plain wrong. Given that the growing…
SPSS Syntax for Missing Value Imputation in Test and Questionnaire Data

ERIC Educational Resources Information Center

van Ginkel, Joost R.; van der Ark, L. Andries

2005-01-01

A well-known problem in the analysis of test and questionnaire data is that some item scores may be missing. Advanced methods for the imputation of missing data are available, such as multiple imputation under the multivariate normal model and imputation under the saturated logistic model (Schafer, 1997). Accompanying software was made available…
Nonparametric Multiple Imputation for Questionnaires with Individual Skip Patterns and Constraints: The Case of Income Imputation in The National Educational Panel Study

ERIC Educational Resources Information Center

Aßmann, Christian; Würbach, Ariane; Goßmann, Solange; Geissler, Ferdinand; Bela, Anika

2017-01-01

Large-scale surveys typically exhibit data structures characterized by rich mutual dependencies between surveyed variables and individual-specific skip patterns. Despite high efforts in fieldwork and questionnaire design, missing values inevitably occur. One approach for handling missing values is to provide multiply imputed data sets, thus…

Relying on Your Own Best Judgment: Imputing Values to Missing Information in Decision Making.

ERIC Educational Resources Information Center

Johnson, Richard D.; And Others

Processes involved in making estimates of the value of missing information that could help in a decision making process were studied. Hypothetical purchases of ground beef were selected for the study as such purchases have the desirable property of quantifying both the price and quality. A total of 150 students at the University of Iowa rated the…
Microarray missing data imputation based on a set theoretic framework and biological knowledge.

PubMed

Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

2006-01-01

Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.
Prostate-Specific Membrane Antigen Expression in Distal Radius Fracture.

PubMed

Hoberück, Sebastian; Michler, Enrico; Kaiser, Daniel; Röhnert, Anne; Zöphel, Klaus; Kotzerke, Jörg

2018-06-12

A 79-year old man with prostate cancer under active surveillance for 5 years was referred for a PSMA-PET/MRI for re-evaluation because of a rising prostate-specific antigen value. PET/MRI revealed a ribbonlike tracer accumulation in a healing fracture of the distal radius. This case illustrates that PSMA expression may occur in healing bone fractures in the distal radius. It can be assumed that benign causes of tracer accumulations in the upper extremities are missed in PET/CT due to elevated position of the arms during image acquisition.
Missing data and multiple imputation in clinical epidemiological research.

PubMed

Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

2017-01-01

Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.
Missing data and multiple imputation in clinical epidemiological research

PubMed Central

Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

2017-01-01

Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data. PMID:28352203
Assessing artificial neural networks and statistical methods for infilling missing soil moisture records

NASA Astrophysics Data System (ADS)

Dumedah, Gift; Walker, Jeffrey P.; Chik, Li

2014-07-01

Soil moisture information is critically important for water management operations including flood forecasting, drought monitoring, and groundwater recharge estimation. While an accurate and continuous record of soil moisture is required for these applications, the available soil moisture data, in practice, is typically fraught with missing values. There are a wide range of methods available to infilling hydrologic variables, but a thorough inter-comparison between statistical methods and artificial neural networks has not been made. This study examines 5 statistical methods including monthly averages, weighted Pearson correlation coefficient, a method based on temporal stability of soil moisture, and a weighted merging of the three methods, together with a method based on the concept of rough sets. Additionally, 9 artificial neural networks are examined, broadly categorized into feedforward, dynamic, and radial basis networks. These 14 infilling methods were used to estimate missing soil moisture records and subsequently validated against known values for 13 soil moisture monitoring stations for three different soil layer depths in the Yanco region in southeast Australia. The evaluation results show that the top three highest performing methods are the nonlinear autoregressive neural network, rough sets method, and monthly replacement. A high estimation accuracy (root mean square error (RMSE) of about 0.03 m/m) was found in the nonlinear autoregressive network, due to its regression based dynamic network which allows feedback connections through discrete-time estimation. An equally high accuracy (0.05 m/m RMSE) in the rough sets procedure illustrates the important role of temporal persistence of soil moisture, with the capability to account for different soil moisture conditions.
40 CFR 98.225 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... data shall be the best available estimate based on all available process data or data used for accounting purposes (such as sales records). (b) For missing values related to the performance test...
Data imputation analysis for Cosmic Rays time series

NASA Astrophysics Data System (ADS)

Fernandes, R. C.; Lucio, P. S.; Fernandez, J. H.

2017-05-01

The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.
Harnessing data structure for recovery of randomly missing structural vibration responses time history: Sparse representation versus low-rank structure

NASA Astrophysics Data System (ADS)

Yang, Yongchao; Nagarajaiah, Satish

2016-06-01

Randomly missing data of structural vibration responses time history often occurs in structural dynamics and health monitoring. For example, structural vibration responses are often corrupted by outliers or erroneous measurements due to sensor malfunction; in wireless sensing platforms, data loss during wireless communication is a common issue. Besides, to alleviate the wireless data sampling or communication burden, certain accounts of data are often discarded during sampling or before transmission. In these and other applications, recovery of the randomly missing structural vibration responses from the available, incomplete data, is essential for system identification and structural health monitoring; it is an ill-posed inverse problem, however. This paper explicitly harnesses the data structure itself-of the structural vibration responses-to address this (inverse) problem. What is relevant is an empirical, but often practically true, observation, that is, typically there are only few modes active in the structural vibration responses; hence a sparse representation (in frequency domain) of the single-channel data vector, or, a low-rank structure (by singular value decomposition) of the multi-channel data matrix. Exploiting such prior knowledge of data structure (intra-channel sparse or inter-channel low-rank), the new theories of ℓ1-minimization sparse recovery and nuclear-norm-minimization low-rank matrix completion enable recovery of the randomly missing or corrupted structural vibration response data. The performance of these two alternatives, in terms of recovery accuracy and computational time under different data missing rates, is investigated on a few structural vibration response data sets-the seismic responses of the super high-rise Canton Tower and the structural health monitoring accelerations of a real large-scale cable-stayed bridge. Encouraging results are obtained and the applicability and limitation of the presented methods are discussed.
Effect of data gaps on correlation dimension computed from light curves of variable stars

NASA Astrophysics Data System (ADS)

George, Sandip V.; Ambika, G.; Misra, R.

2015-11-01

Observational data, especially astrophysical data, is often limited by gaps in data that arises due to lack of observations for a variety of reasons. Such inadvertent gaps are usually smoothed over using interpolation techniques. However the smoothing techniques can introduce artificial effects, especially when non-linear analysis is undertaken. We investigate how gaps can affect the computed values of correlation dimension of the system, without using any interpolation. For this we introduce gaps artificially in synthetic data derived from standard chaotic systems, like the Rössler and Lorenz, with frequency of occurrence and size of missing data drawn from two Gaussian distributions. Then we study the changes in correlation dimension with change in the distributions of position and size of gaps. We find that for a considerable range of mean gap frequency and size, the value of correlation dimension is not significantly affected, indicating that in such specific cases, the calculated values can still be reliable and acceptable. Thus our study introduces a method of checking the reliability of computed correlation dimension values by calculating the distribution of gaps with respect to its size and position. This is illustrated for the data from light curves of three variable stars, R Scuti, U Monocerotis and SU Tauri. We also demonstrate how a cubic spline interpolation can cause a time series of Gaussian noise with missing data to be misinterpreted as being chaotic in origin. This is demonstrated for the non chaotic light curve of variable star SS Cygni, which gives a saturated D2 value, when interpolated using a cubic spline. In addition we also find that a careful choice of binning, in addition to reducing noise, can help in shifting the gap distribution to the reliable range for D2 values.
Impact of Missing Data for Body Mass Index in an Epidemiologic Study.

PubMed

Razzaghi, Hilda; Tinker, Sarah C; Herring, Amy H; Howards, Penelope P; Waller, D Kim; Johnson, Candice Y

2016-07-01

Objective To assess the potential impact of missing data on body mass index (BMI) on the association between prepregnancy obesity and specific birth defects. Methods Data from the National Birth Defects Prevention Study (NBDPS) were analyzed. We assessed the factors associated with missing BMI data among mothers of infants without birth defects. Four analytic methods were then used to assess the impact of missing BMI data on the association between maternal prepregnancy obesity and three birth defects; spina bifida, gastroschisis, and cleft lip with/without cleft palate. The analytic methods were: (1) complete case analysis; (2) assignment of missing values to either obese or normal BMI; (3) multiple imputation; and (4) probabilistic sensitivity analysis. Logistic regression was used to estimate crude and adjusted odds ratios (aOR) and 95 % confidence intervals (CI). Results Of NBDPS control mothers 4.6 % were missing BMI data, and most of the missing values were attributable to missing height (~90 %). Missing BMI data was associated with birth outside of the US (aOR 8.6; 95 % CI 5.5, 13.4), interview in Spanish (aOR 2.4; 95 % CI 1.8, 3.2), Hispanic ethnicity (aOR 2.0; 95 % CI 1.2, 3.4), and <12 years education (aOR 2.3; 95 % CI 1.7, 3.1). Overall the results of the multiple imputation and probabilistic sensitivity analysis were similar to the complete case analysis. Conclusions Although in some scenarios missing BMI data can bias the magnitude of association, it does not appear likely to have impacted conclusions from a traditional complete case analysis of these data.
40 CFR 98.55 - Procedures for estimating missing data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... substitute data shall be the best available estimate based on all available process data or data used for accounting purposes (such as sales records). (b) For missing values related to the performance test...
Impact of missing data imputation methods on gene expression clustering and classification.

PubMed

de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G

2015-02-26

Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .
Improving Long-term Quality and Continuity of Landsat-7 Data Through Inpainting of Lost Data Based on the Nonconvex Model of Dynamic Dictionary Learning

NASA Astrophysics Data System (ADS)

Miao, J.; Zhou, Z.; Zhou, X.; Huang, T.

2017-12-01

On May 31, 2003, the scan line corrector (SLC) of the Enhance Thematic Mapper Plus (ETM+) on board the Landsat-7 satellite was broken down, resulting in strips of lost data in the Landsat-7 images, which seriously affected the quality and continuous applications of the ETM+ data for space and earth science. This paper proposes a new inpainting method for repairing the Landsat-7 ETM+ images taking into account the physical characteristics and geometric features of the ground area of which the data are missed. Firstly, the two geometric slopes of the boundaries of each missing stripe of the georeferenced ETM+ image is calculated by the Hough, ignoring the slope of the part of the missing strip that are on the same edges of the whole image. Secondly, an adaptive dictionary was developed and trained using a large number of Landsat-7 ETM+ SLC-ON images. When the adaptive dictionary is used to restore an image with missing data, the dictionary is actually dynamic. Then the data-missing strips were repaired along their slope directions by using the logdet (.) low-rank non-convex model along with dynamic dictionary. Imperfect points are defined as the pixels whose values are quite different from its surrounding pixel values. They can be real values but most likely can be noise. Lastly, the imperfect points after the second step were replaced by using the method of sparse restoration of the overlapping groups. We take the Landsat ETM+ images of June 10, 2002 as the test image for our algorithm evaluation. There is no data missing in this image. Therefore we extract the same missing -stripes of the images of the same WRS path and WRS row as the 2002 image but acquired after 2003 to form the missing-stripe model. Then we overlay the missing-stripe model over the image of 2002 to get the simulated missing image. Fig.1(a)-(c) show the simulated missing images of Bands 1, 3, and 5 of the 2002 ETM+ image data. We apply the algorithm to restore the missing stripes. Fig.1(d)-(f) show the restored images of Bands 1, 3, and 5, corresponding to the images (a)-(c). The repaired images are then compared with the original images band by band and it is found the algorithm works very well. We will show application of the algorithm to other images and the details in comparison.
On piecewise interpolation techniques for estimating solar radiation missing values in Kedah

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saaban, Azizan; Zainudin, Lutfi; Bakar, Mohd Nazari Abu

2014-12-04

This paper discusses the use of piecewise interpolation method based on cubic Ball and Bézier curves representation to estimate the missing value of solar radiation in Kedah. An hourly solar radiation dataset is collected at Alor Setar Meteorology Station that is taken from Malaysian Meteorology Deparment. The piecewise cubic Ball and Bézier functions that interpolate the data points are defined on each hourly intervals of solar radiation measurement and is obtained by prescribing first order derivatives at the starts and ends of the intervals. We compare the performance of our proposed method with existing methods using Root Mean Squared Errormore » (RMSE) and Coefficient of Detemination (CoD) which is based on missing values simulation datasets. The results show that our method is outperformed the other previous methods.« less
Resilient Sensor Networks with Spatiotemporal Interpolation of Missing Sensors: An Example of Space Weather Forecasting by Multiple Satellites

PubMed Central

Tokumitsu, Masahiro; Hasegawa, Keisuke; Ishida, Yoshiteru

2016-01-01

This paper attempts to construct a resilient sensor network model with an example of space weather forecasting. The proposed model is based on a dynamic relational network. Space weather forecasting is vital for a satellite operation because an operational team needs to make a decision for providing its satellite service. The proposed model is resilient to failures of sensors or missing data due to the satellite operation. In the proposed model, the missing data of a sensor is interpolated by other sensors associated. This paper demonstrates two examples of space weather forecasting that involves the missing observations in some test cases. In these examples, the sensor network for space weather forecasting continues a diagnosis by replacing faulted sensors with virtual ones. The demonstrations showed that the proposed model is resilient against sensor failures due to suspension of hardware failures or technical reasons. PMID:27092508
Resilient Sensor Networks with Spatiotemporal Interpolation of Missing Sensors: An Example of Space Weather Forecasting by Multiple Satellites.

PubMed

Tokumitsu, Masahiro; Hasegawa, Keisuke; Ishida, Yoshiteru

2016-04-15

This paper attempts to construct a resilient sensor network model with an example of space weather forecasting. The proposed model is based on a dynamic relational network. Space weather forecasting is vital for a satellite operation because an operational team needs to make a decision for providing its satellite service. The proposed model is resilient to failures of sensors or missing data due to the satellite operation. In the proposed model, the missing data of a sensor is interpolated by other sensors associated. This paper demonstrates two examples of space weather forecasting that involves the missing observations in some test cases. In these examples, the sensor network for space weather forecasting continues a diagnosis by replacing faulted sensors with virtual ones. The demonstrations showed that the proposed model is resilient against sensor failures due to suspension of hardware failures or technical reasons.
Empirical likelihood method for non-ignorable missing data problems.

PubMed

Guan, Zhong; Qin, Jing

2017-01-01

Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.
Association Between Breast Cancer Disease Progression and Workplace Productivity in the United States.

PubMed

Yin, Wesley; Horblyuk, Ruslan; Perkins, Julia Jane; Sison, Steve; Smith, Greg; Snider, Julia Thornton; Wu, Yanyu; Philipson, Tomas J

2017-02-01

Determine workplace productivity losses attributable to breast cancer progression. Longitudinal analysis linking 2005 to 2012 medical and pharmacy claims and workplace absence data in the US patients were commercially insured women aged 18 to 64 diagnosed with breast cancer. Productivity was measured as employment status and total quarterly workplace hours missed, and valued using average US wages. Six thousand four hundred and nine women were included. Breast cancer progression was associated with a lower probability of employment (hazard ratio [HR] = 0.65, P < 0.01) and increased workplace hours missed. The annual value of missed work was $24,166 for non-metastatic and $30,666 for metastatic patients. Thus, progression to metastatic disease is associated with an additional $6500 in lost work time (P < 0.05), or 14% of average US wages. Breast cancer progression leads to diminished likelihood of employment, increased workplace hours missed, and increased cost burden.
Improving cluster-based missing value estimation of DNA microarray data.

PubMed

Brás, Lígia P; Menezes, José C

2007-06-01

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

PubMed

Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

2016-04-01

Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.
Bayesian sensitivity analysis methods to evaluate bias due to misclassification and missing data using informative priors and external validation data.

PubMed

Luta, George; Ford, Melissa B; Bondy, Melissa; Shields, Peter G; Stamey, James D

2013-04-01

Recent research suggests that the Bayesian paradigm may be useful for modeling biases in epidemiological studies, such as those due to misclassification and missing data. We used Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to the potential effect of these two important sources of bias. We used data from a study of the joint associations of radiotherapy and smoking with primary lung cancer among breast cancer survivors. We used Bayesian methods to provide an operational way to combine both validation data and expert opinion to account for misclassification of the two risk factors and missing data. For comparative purposes we considered a "full model" that allowed for both misclassification and missing data, along with alternative models that considered only misclassification or missing data, and the naïve model that ignored both sources of bias. We identified noticeable differences between the four models with respect to the posterior distributions of the odds ratios that described the joint associations of radiotherapy and smoking with primary lung cancer. Despite those differences we found that the general conclusions regarding the pattern of associations were the same regardless of the model used. Overall our results indicate a nonsignificantly decreased lung cancer risk due to radiotherapy among nonsmokers, and a mildly increased risk among smokers. We described easy to implement Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to misclassification and missing data. Copyright © 2012 Elsevier Ltd. All rights reserved.
South Atlantic Anomaly

Atmospheric Science Data Center

2013-04-19

... This map was created by specially processing MISR "dark" data taken between 3 February and 16 February 2000, while the cover was still ... Individual orbit tracks are visible, and some tracks are missing due to data gaps, missing spacecraft navigation information, or other ...
FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

NASA Astrophysics Data System (ADS)

Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

2017-08-01

The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.
Does Minimally Invasive Spine Surgery Minimize Surgical Site Infections?

PubMed

Kulkarni, Arvind Gopalrao; Patel, Ravish Shammi; Dutta, Shumayou

2016-12-01

Retrospective review of prospectively collected data. To evaluate the incidence of surgical site infections (SSIs) in minimally invasive spine surgery (MISS) in a cohort of patients and compare with available historical data on SSI in open spinal surgery cohorts, and to evaluate additional direct costs incurred due to SSI. SSI can lead to prolonged antibiotic therapy, extended hospitalization, repeated operations, and implant removal. Small incisions and minimal dissection intrinsic to MISS may minimize the risk of postoperative infections. However, there is a dearth of literature on infections after MISS and their additional direct financial implications. All patients from January 2007 to January 2015 undergoing posterior spinal surgery with tubular retractor system and microscope in our institution were included. The procedures performed included tubular discectomies, tubular decompressions for spinal stenosis and minimal invasive transforaminal lumbar interbody fusion (TLIF). The incidence of postoperative SSI was calculated and compared to the range of cited SSI rates from published studies. Direct costs were calculated from medical billing for index cases and for patients with SSI. A total of 1,043 patients underwent 763 noninstrumented surgeries (discectomies, decompressions) and 280 instrumented (TLIF) procedures. The mean age was 52.2 years with male:female ratio of 1.08:1. Three infections were encountered with fusion surgeries (mean detection time, 7 days). All three required wound wash and debridement with one patient requiring unilateral implant removal. Additional direct cost due to infection was $2,678 per 100 MISS-TLIF. SSI increased hospital expenditure per patient 1.5-fold after instrumented MISS. Overall infection rate after MISS was 0.29%, with SSI rate of 0% in non-instrumented MISS and 1.07% with instrumented MISS. MISS can markedly reduce the SSI rate and can be an effective tool to minimize hospital costs.
Nuclear Forensics Analysis with Missing and Uncertain Data

DOE PAGES

Langan, Roisin T.; Archibald, Richard K.; Lamberti, Vincent

2015-10-05

We have applied a new imputation-based method for analyzing incomplete data, called Monte Carlo Bayesian Database Generation (MCBDG), to the Spent Fuel Isotopic Composition (SFCOMPO) database. About 60% of the entries are absent for SFCOMPO. The method estimates missing values of a property from a probability distribution created from the existing data for the property, and then generates multiple instances of the completed database for training a machine learning algorithm. Uncertainty in the data is represented by an empirical or an assumed error distribution. The method makes few assumptions about the underlying data, and compares favorably against results obtained bymore » replacing missing information with constant values.« less
A bias-corrected estimator in multiple imputation for missing data.

PubMed

Tomita, Hiroaki; Fujisawa, Hironori; Henmi, Masayuki

2018-05-29

Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. Copyright © 2018 John Wiley & Sons, Ltd.
A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

NASA Astrophysics Data System (ADS)

Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr

2017-08-01

This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
From receptor binding kinetics to signal transduction; a missing link in predicting in vivo drug-action.

PubMed

Nederpelt, Indira; Kuzikov, Maria; de Witte, Wilbert E A; Schnider, Patrick; Tuijt, Bruno; Gul, Sheraz; IJzerman, Adriaan P; de Lange, Elizabeth C M; Heitman, Laura H

2017-10-26

An important question in drug discovery is how to overcome the significant challenge of high drug attrition rates due to lack of efficacy and safety. A missing link in the understanding of determinants for drug efficacy is the relation between drug-target binding kinetics and signal transduction, particularly in the physiological context of (multiple) endogenous ligands. We hypothesized that the kinetic binding parameters of both drug and endogenous ligand play a crucial role in determining cellular responses, using the NK1 receptor as a model system. We demonstrated that the binding kinetics of both antagonists (DFA and aprepitant) and endogenous agonists (NKA and SP) have significantly different effects on signal transduction profiles, i.e. potency values, in vitro efficacy values and onset rate of signal transduction. The antagonistic effects were most efficacious with slowly dissociating aprepitant and slowly associating NKA while the combination of rapidly dissociating DFA and rapidly associating SP had less significant effects on the signal transduction profiles. These results were consistent throughout different kinetic assays and cellular backgrounds. We conclude that knowledge of the relationship between in vitro drug-target binding kinetics and cellular responses is important to ultimately improve the understanding of drug efficacy in vivo.
A method to estimate the additional uncertainty in gap-filled NEE resulting from long gaps in the CO2 flux record

Treesearch

Andrew D. Richardson; David Y. Hollinger

2007-01-01

Missing values in any data set create problems for researchers. The process by which missing values are replaced, and the data set is made complete, is generally referred to as imputation. Within the eddy flux community, the term "gap filling" is more commonly applied. A major challenge is that random errors in measured data result in uncertainty in the gap-...
Cara Status and Upcoming Enhancements

NASA Technical Reports Server (NTRS)

Newman, Lauri

2015-01-01

RIC Miss Values in Summary TableTabular presentation of miss vector in Summary Section RIC Uncertainty Values in Details SectionNumerical presentation of miss component uncertainty values in Details SectionGreen Events with Potentially Maneuverable Secondary ObjectsAll potentially maneuverable secondary objects will be reported out to 7-days prior to TCA for LEO events and 10-days for NONLEO events, regardless of risk (relates to MOWG Action Item 1309-11) All green events with potentially active secondary objects included in Summary ReportsAllows more time for contacting other OOBlack Box FixSometimes a black square appeared in the summary report where the ASW RIC time history plot should beAppendix Orbit RegimeMission Name MismatchPc 0 Plotting BugAll Pc points less than 1e-10 (zero) are now plotted as 1e-10 (instead of not at all)Maneuver Indication FixManeuver indicator now present even if maneuver was in the past.
Technical note: Validation of an automated system for monitoring and restricting water intake in group-housed beef steers.

PubMed

Allwardt, K; Ahlberg, C; Broocks, A; Bruno, K; Taylor, A; Place, S; Richards, C; Krehbiel, C; Calvo-Lorenzo, M; DeSilva, U; VanOverbeke, D; Mateescu, R; Goad, C; Rolf, M M

2017-09-01

The Insentec Roughage Intake Control (RIC) system has been validated for the collection of water intake; however, this system has not been validated for water restriction. The objective of this validation was to evaluate the agreement between direct observations and automated intakes collected by the RIC system under both ad libitum and restricted water conditions. A total of 239 crossbred steers were used in a 3-d validation trial, which assessed intake values generated by the RIC electronic intake monitoring system for both ad libitum water intake ( = 122; BASE) and restricted water intake ( = 117; RES). Direct human observations were collected on 4 Insentec water bins for three 24-h periods and three 12-h periods for BASE and RES, respectively. An intake event was noted by the observer when the electronic identification of the animal was read by the transponder and the gate lowered, and starting and ending bin weights were recorded for each intake event. Data from direct observations across each validation period were compared to automated observations generated from the RIC system. Missing beginning or ending weight values for visual observations occasionally occurred due to the observer being unable to capture the value before the monitor changed when bin activity was high. To estimate the impact of these missing values, analyses denoted as OBS were completed with the incomplete record coded as missing data. These analyses were contrasted with analyses where observations with a single missing beginning or end weight (but not both) were assumed to be identical to that which was recorded by the Insentec system (OBS). Difference in mean total intake across BASE steers was 0.60 ± 2.06 kg OBS (0.54 ± 1.99 kg OBS) greater for system observations than visual observations. The comparison of mean total intake across the 3 RES validation days was 0.53 ± 2.30 kg OBS (0.13 ± 1.83 kg OBS) greater for system observations than direct observations. Day was not a significant source of error in this study ( > 0.05). These results indicate that the system was capable of limiting water of individual animals with reasonable accuracy, although errors are slightly higher during water restriction than during ad libitum access. The Insentec system is a suitable resource for monitoring individual water intake of growing, group-housed steers under ad libitum and restricted water conditions.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Khachatryan, Vardan

The performance of missing transverse energy reconstruction algorithms is presented by our team using√s=8 TeV proton-proton (pp) data collected with the CMS detector. Events with anomalous missing transverse energy are studied, and the performance of algorithms used to identify and remove these events is presented. The scale and resolution for missing transverse energy, including the effects of multiple pp interactions (pileup), are measured using events with an identified Z boson or isolated photon, and are found to be well described by the simulation. Novel missing transverse energy reconstruction algorithms developed specifically to mitigate the effects of large numbers of pileupmore » interactions on the missing transverse energy resolution are presented. These algorithms significantly reduce the dependence of the missing transverse energy resolution on pileup interactions. Furthermore, an algorithm that provides an estimate of the significance of the missing transverse energy is presented, which is used to estimate the compatibility of the reconstructed missing transverse energy with a zero nominal value.« less
3D-3D facial superimposition between monozygotic twins: A novel morphological approach to the assessment of differences due to environmental factors.

PubMed

Gibelli, Daniele; Pucciarelli, Valentina; Poppa, Pasquale; De Angelis, Danilo; Cummaudo, Marco; Pisoni, Luca; Codari, Marina; Cattaneo, Cristina; Sforza, Chiarella

2018-03-01

Distinction of one twin with respect to the other, based on external appearance, is challenging; nevertheless, facial morphology may provide individualizing features that may help distinguish twin siblings. This study aims at exposing an innovative method for the facial assessment in monozygotic twins for personal identification, based on the registration and comparison of 3D models of faces. Ten couples of monozygotic twins aged between 25 and 69 years were acquired twice by a stereophotogrammetric system (VECTRA-3D® M3: Canfield Scientific, Inc., Fairfield, NJ); the 3D reconstruction of each person was then registered and superimposed onto the model belonging to the same person (self-matches), the corresponding sibling (twin-matches) and to unrelated participants from the other couples (miss-matches); RMS (root mean square) point-to-point distances were automatically calculated for all the 220 superimpositions. One-way ANOVA was used to evaluate the differences among miss-matches, twin-matches and self-matches (p < .05). RMS values for self-matches, twin-matches and miss-matches were respectively 1.0 mm (SD: 0.3 mm), 1.9 mm (0.5 mm) and 3.4 mm (0.70 mm). Statistically significant differences were found among the three groups (p < .01). Comparing RMS values in the three groups, mean facial variability in twin siblings was 55.9% of that assessed between unrelated persons and about twice higher than that observed between models belonging to the same individual. The present study proposed an innovative method for the facial assessment of twin siblings, based on 3D surface analysis, which may provide additional information concerning the relation between genes and environment. Copyright © 2017 Elsevier B.V. All rights reserved.
Missed Work Due to Occupational Illness among Hispanic Horse Workers.

PubMed

Bush, Ashley M; Westneat, Susan; Browning, Steven R; Swanberg, Jennifer

2018-05-07

Occupational illnesses are inadequately reported for agriculture, an industry dominated by a vulnerable Hispanic population and high fatal and nonfatal injury rates. Work-related illnesses can contribute to missed work, caused by a combination of personal and work factors, with costs to the individual, employer, and society. To better understand agricultural occupational illnesses, 225 Hispanic horse workers were interviewed via community-based convenience sampling. Descriptive statistics, bivariate analyses, and log binomial regression modeling were used to: (1) describe the prevalence of missed work due to work-related illnesses among Hispanic horse workers, (2) examine work-related and personal factors associated with missed work, and (3) identify health symptoms and work-related characteristics potentially associated with missed work. Key findings reveal that having at least one child (PR = 1.71, 95% CI = 1.03, 2.84), having poor self-reported general health (PR = 0.72, 95% CI = 0.48, 1.08), experiencing stress during a typical workday (PR = 2.58, 95% CI = 1.25, 5.32), or spending less time with horses (PR = 1.87, 95% CI = 1.15, 3.05) are significant predictors of missing work. Interventions can be designed to identify workers most susceptible to missing work and provide resources to reduce absenteeism. Future research should examine work-related illness in agricultural horse production, including personal and work-related factors, in order to diminish occupational health disparities among these workers, who are more likely to be employed in hazardous agricultural work. Copyright© by the American Society of Agricultural Engineers.
Socioeconomic position and occupational social class and their association with risky alcohol consumption among adolescents.

PubMed

Obradors-Rial, Núria; Ariza, Carles; Rajmil, Luis; Muntaner, Carles

2018-05-01

To compare different measures of socioeconomic position (SEP) and occupational social class (OSC) and to evaluate their association with risky alcohol consumption among adolescents attending the last mandatory secondary school (ages 15-17 years). This was a cross-sectional study. 1268 adolescents in Catalonia (Spain) participated in the study. Family affluence scale (FAS), parents' OSC, parents' level of education and monthly familiar income were used to compare socioeconomic indicators. Logistic regression analyses were conducted to evaluate socioeconomic variables and missing associated factors, and to observe the relation between each SEP variable and OSC adjusting by sociodemographic variables. Familiar income had more than 30% of missing values. OSC had the fewest missing values associated factors. Being immigrant was associated with all SEP missing values. All SEP measures were positively associated with risky alcohol consumption, yet the strength of these associations diminished after adjustment for sociodemographic variables. Weekly available money was the variable with the strongest association with risky alcohol consumption. OSC seems to be as good as the other indicators to assess adolescents' SEP. Adolescents with high SEP and those belonging to upper social classes reported higher levels of risky alcohol consumption.
Performance of the CMS missing transverse momentum reconstruction in pp data at $$\\sqrt{s}$$ = 8 TeV

DOE PAGES

Khachatryan, Vardan

2015-02-12

The performance of missing transverse energy reconstruction algorithms is presented by our team using√s=8 TeV proton-proton (pp) data collected with the CMS detector. Events with anomalous missing transverse energy are studied, and the performance of algorithms used to identify and remove these events is presented. The scale and resolution for missing transverse energy, including the effects of multiple pp interactions (pileup), are measured using events with an identified Z boson or isolated photon, and are found to be well described by the simulation. Novel missing transverse energy reconstruction algorithms developed specifically to mitigate the effects of large numbers of pileupmore » interactions on the missing transverse energy resolution are presented. These algorithms significantly reduce the dependence of the missing transverse energy resolution on pileup interactions. Furthermore, an algorithm that provides an estimate of the significance of the missing transverse energy is presented, which is used to estimate the compatibility of the reconstructed missing transverse energy with a zero nominal value.« less
A regressive methodology for estimating missing data in rainfall daily time series

NASA Astrophysics Data System (ADS)

Barca, E.; Passarella, G.

2009-04-01

The "presence" of gaps in environmental data time series represents a very common, but extremely critical problem, since it can produce biased results (Rubin, 1976). Missing data plagues almost all surveys. The problem is how to deal with missing data once it has been deemed impossible to recover the actual missing values. Apart from the amount of missing data, another issue which plays an important role in the choice of any recovery approach is the evaluation of "missingness" mechanisms. When data missing is conditioned by some other variable observed in the data set (Schafer, 1997) the mechanism is called MAR (Missing at Random). Otherwise, when the missingness mechanism depends on the actual value of the missing data, it is called NCAR (Not Missing at Random). This last is the most difficult condition to model. In the last decade interest arose in the estimation of missing data by using regression (single imputation). More recently multiple imputation has become also available, which returns a distribution of estimated values (Scheffer, 2002). In this paper an automatic methodology for estimating missing data is presented. In practice, given a gauging station affected by missing data (target station), the methodology checks the randomness of the missing data and classifies the "similarity" between the target station and the other gauging stations spread over the study area. Among different methods useful for defining the similarity degree, whose effectiveness strongly depends on the data distribution, the Spearman correlation coefficient was chosen. Once defined the similarity matrix, a suitable, nonparametric, univariate, and regressive method was applied in order to estimate missing data in the target station: the Theil method (Theil, 1950). Even though the methodology revealed to be rather reliable an improvement of the missing data estimation can be achieved by a generalization. A first possible improvement consists in extending the univariate technique to the multivariate approach. Another approach follows the paradigm of the "multiple imputation" (Rubin, 1987; Rubin, 1988), which consists in using a set of "similar stations" instead than the most similar. This way, a sort of estimation range can be determined allowing the introduction of uncertainty. Finally, time series can be grouped on the basis of monthly rainfall rates defining classes of wetness (i.e.: dry, moderately rainy and rainy), in order to achieve the estimation using homogeneous data subsets. We expect that integrating the methodology with these enhancements will certainly improve its reliability. The methodology was applied to the daily rainfall time series data registered in the Candelaro River Basin (Apulia - South Italy) from 1970 to 2001. REFERENCES D.B., Rubin, 1976. Inference and Missing Data. Biometrika 63 581-592 D.B. Rubin, 1987. Multiple Imputation for Nonresponce in Surveys, New York: John Wiley & Sons, Inc. D.B. Rubin, 1988. An overview of multiple imputation. In Survey Research Section, pp. 79-84, American Statistical Association, 1988. J.L., Schafer, 1997. Analysis of Incomplete Multivariate Data, Chapman & Hall. J., Scheffer, 2002. Dealing with Missing Data. Res. Lett. Inf. Math. Sci. 3, 153-160. Available online at http://www.massey.ac.nz/~wwiims/research/letters/ H. Theil, 1950. A rank-invariant method of linear and polynomial regression analysis. Indicationes Mathematicae, 12, pp.85-91.
Prevalence of plaque and dental decay in the first permanent molar in a school population of south Mexico City.

PubMed

Taboada-Aranza, Olga; Rodríguez-Nieto, Karen

2018-01-01

The first permanent molar is susceptible to acquire tooth decay since its eruption, due to its anatomy and because it has been exposed before other teeth. An observational, prolective, transversal and comparative study in 194 students, with an average age of 9.9 ± 1.8 years. The evaluation of the dentobacterial plate (DBP) was analyzed using the O'Leary index and the tooth decay experience with the DMFS (sum of decayed, missing, extracted and filling dental surfaces) and DMFT (sum of decayed, missing, extracted and filling per tooth) indexes. The prevalence of DBP in the first permanent molar was of 99.4% and tooth decay of 57.2%. The value of DMFT was 1.4 ± 1.4. The tooth decay experience was higher in children from 7.10 years old with a value of 2.2 ± 2.3, who are 7.9 times more likely to develop lesions than younger children (odds ratio: 8.9; 95% confidence interval: 4.1-19.5; p < 0.0001). We found an association between age and the values of the tooth decay experience indexes; even though these were weak in the case of DMF (r = 0.439), the model allowed to explain 19% of the association, and 22% for DMFT (r = 0.464). Tooth decay develops rapidly in the first permanent molars; however, it does not receive the necessary care because it is usually unknown that it is a permanent tooth. Copyright: © 2018 Permanyer.
Factors associated with rushed and missed resident care in western Canadian nursing homes: a cross-sectional survey of health care aides.

PubMed

Knopp-Sihota, Jennifer A; Niehaus, Linda; Squires, Janet E; Norton, Peter G; Estabrooks, Carole A

2015-10-01

To describe the nature, frequency and factors associated with care that was rushed or missed by health care aides in western Canadian nursing homes. The growing number of nursing home residents with dementia has created job strain for frontline health care providers, the majority of whom are health care aides. Due to the associated complexity of care, health care aides are challenged to complete more care tasks in less time. Rushed or missed resident care are associated with adverse resident outcomes (e.g. falls) and poorer quality of staff work life (e.g. burnout) making this an important quality of care concern. Cross-sectional survey of health care aides (n = 583) working in a representative sample of nursing homes (30 urban, six rural) in western Canada. Data were collected in 2010 as part of the Translating Research in Elder Care study. We collected data on individual health care aides (demographic characteristics, job and vocational satisfaction, physical and mental health, burnout), unit level characteristics associated with organisational context, facility characteristics (location, size, owner/operator model), and the outcome variables of rushed and missed resident care. Most health care aides (86%) reported being rushed. Due to lack of time, 75% left at least one care task missed during their previous shift. Tasks most frequently missed were talking with residents (52% of health care aides) and assisting with mobility (51%). Health care aides working on units with higher organisational context scores were less likely to report rushed and missed care. Health care aides frequently report care that is rushed and tasks omitted due to lack of time. Considering the resident population in nursing homes today--many with advanced dementia and all with complex care needs--health care aides having enough time to provide physical and psychosocial care of high quality is a critical concern. © 2015 John Wiley & Sons Ltd.

Work Productivity in Scleroderma – Analysis from the UCLA Scleroderma Quality of Life Study

PubMed Central

Singh, Manjit K.; Clements, Philip J.; Furst, Daniel E.; Maranian, Paul; Khanna, Dinesh

2011-01-01

Objective To examine the productivity of patients with scleroderma (SSc) both outside and within the home in a large observational cohort. Methods 162 patients completed the Work Productivity Survey. Patients indicated whether or not they were employed outside of the home, how many days/month they missed work (employment or household work) due to SSc and how many days/month productivity was decreased ≥ 50%. Patients also completed other patient-reported outcome measures. We developed binomial regression models to assess the predictors of days missed from work (paid employment or household activities). The covariates included: type of SSc, education, physician and patient global assessments, HAQ-DI, FACIT-Fatigue, and Center of Epidemiologic Studies Depression Scale – Short Form (CESD). Results The average age of patients was 51.8 years and 51% had limited SSc. Of 37% patients employed outside of the home, patients reported missing 2.6 days/month of work and had 2.5 days per month productivity reduced by half. Of the 102 patients who were not employed, 39.4% were unable to work due to their SSc. When we assessed patients for household activities (N = 162), patients missed an average of 8 days of housework/month and had productivity reduced by average of 6 days/month. In the regression models, patients with lower education and poor assessment of overall health by physician were more likely to miss work outside the home. Patients with limited SSc and high HAQ-DI were more likely to miss work at home. Conclusion SSc has a major impact on productivity at home and at work. Nearly 40% of patients reported disability due to their SSc. PMID:22012885
Exploring Missing Values on Responses to Experienced and Labeled Event as Harassment in 2004 Reserves Data

DTIC Science & Technology

2008-07-01

Personal Experiences of Sexual Harassment and Missing Values on Sexual Harassment Questions by Perceptions of Sexism in a Unit (Quartiles... sexism in a unit). The “worst” category indicates units with the highest levels of reported sexist behavior, and the “best” category indicates the...Education and Prevention, 19 (6), 519–530. Harris, R. J., & Firestone, J. M., (1997). Subtle sexism in the U.S. Military: Individual responses to
Multiple imputation for assessment of exposures to drinking water contaminants: evaluation with the Atrazine Monitoring Program.

PubMed

Jones, Rachael M; Stayner, Leslie T; Demirtas, Hakan

2014-10-01

Drinking water may contain pollutants that harm human health. The frequency of pollutant monitoring may occur quarterly, annually, or less frequently, depending upon the pollutant, the pollutant concentration, and community water system. However, birth and other health outcomes are associated with narrow time-windows of exposure. Infrequent monitoring impedes linkage between water quality and health outcomes for epidemiological analyses. To evaluate the performance of multiple imputation to fill in water quality values between measurements in community water systems (CWSs). The multiple imputation method was implemented in a simulated setting using data from the Atrazine Monitoring Program (AMP, 2006-2009 in five Midwestern states). Values were deleted from the AMP data to leave one measurement per month. Four patterns reflecting drinking water monitoring regulations were used to delete months of data in each CWS: three patterns were missing at random and one pattern was missing not at random. Synthetic health outcome data were created using a linear and a Poisson exposure-response relationship with five levels of hypothesized association, respectively. The multiple imputation method was evaluated by comparing the exposure-response relationships estimated based on multiply imputed data with the hypothesized association. The four patterns deleted 65-92% months of atrazine observations in AMP data. Even with these high rates of missing information, our procedure was able to recover most of the missing information when the synthetic health outcome was included for missing at random patterns and for missing not at random patterns with low-to-moderate exposure-response relationships. Multiple imputation appears to be an effective method for filling in water quality values between measurements. Copyright © 2014 Elsevier Inc. All rights reserved.
How shared preferences in music create bonds between people: values as the missing link.

PubMed

Boer, Diana; Fischer, Ronald; Strack, Micha; Bond, Michael H; Lo, Eva; Lam, Jason

2011-09-01

How can shared music preferences create social bonds between people? A process model is developed in which music preferences as value-expressive attitudes create social bonds via conveyed value similarity. The musical bonding model links two research streams: (a) music preferences as indicators of similarity in value orientations and (b) similarity in value orientations leading to social attraction. Two laboratory experiments and one dyadic field study demonstrated that music can create interpersonal bonds between young people because music preferences can be cues for similar or dissimilar value orientations, with similarity in values then contributing to social attraction. One study tested and ruled out an alternative explanation (via personality similarity), illuminating the differential impact of perceived value similarity versus personality similarity on social attraction. Value similarity is the missing link in explaining the musical bonding phenomenon, which seems to hold for Western and non-Western samples and in experimental and natural settings.
Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

PubMed

Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

2018-04-09

The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the occurrence of any adverse event, severe adverse events, and hospital readmission. Multiple imputation is a rigorous statistical procedure that is being increasingly used to address missing values in large datasets. Using this technique for ACDF avoided the loss of cases that may have affected the representativeness and power of the study and led to different results than complete case analysis. Multiple imputation should be considered for future spine studies. Copyright © 2018 Elsevier Inc. All rights reserved.
Restoration of HST images with missing data

NASA Technical Reports Server (NTRS)

Adorf, Hans-Martin

1992-01-01

Missing data are a fairly common problem when restoring Hubble Space Telescope observations of extended sources. On Wide Field and Planetary Camera images cosmic ray hits and CCD hot spots are the prevalent causes of data losses, whereas on Faint Object Camera images data are lossed due to reseaux marks, blemishes, areas of saturation and the omnipresent frame edges. This contribution discusses a technique for 'filling in' missing data by statistical inference using information from the surrounding pixels. The major gain consists in minimizing adverse spill-over effects to the restoration in areas neighboring those where data are missing. When the mask delineating the support of 'missing data' is made dynamic, cosmic ray hits, etc. can be detected on the fly during restoration.
A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

PubMed

Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

2017-05-31

Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.
An entropy decision approach in flash flood warning: rainfall thresholds definition

NASA Astrophysics Data System (ADS)

Montesarchio, V.; Napolitano, F.; Ridolfi, E.

2009-09-01

Flash floods events are floods characterised by very rapid response of the basins to the storms, and often they involve loss of life and damage to common and private properties. Due to the specific space-time scale of this kind of flood, generally only a short lead time is available for triggering civil protection measures. Thresholds values specify the precipitation amount for a given duration that generates a critical discharge in a given cross section. The overcoming of these values could produce a critical situation in river sites exposed to alluvial risk, so it is possible to compare directly the observed or forecasted precipitation with critical reference values, without running on line real time forecasting systems. This study is focused on the Mignone River basin, located in Central Italy. The critical rainfall threshold values are evaluated minimising an utility function based on the informative entropy concept. The study concludes with a system performance analysis, in terms of correctly issued warning, false alarms and missed alarms.
Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random.

PubMed

Doidge, James C

2018-02-01

Population-based cohort studies are invaluable to health research because of the breadth of data collection over time, and the representativeness of their samples. However, they are especially prone to missing data, which can compromise the validity of analyses when data are not missing at random. Having many waves of data collection presents opportunity for participants' responsiveness to be observed over time, which may be informative about missing data mechanisms and thus useful as an auxiliary variable. Modern approaches to handling missing data such as multiple imputation and maximum likelihood can be difficult to implement with the large numbers of auxiliary variables and large amounts of non-monotone missing data that occur in cohort studies. Inverse probability-weighting can be easier to implement but conventional wisdom has stated that it cannot be applied to non-monotone missing data. This paper describes two methods of applying inverse probability-weighting to non-monotone missing data, and explores the potential value of including measures of responsiveness in either inverse probability-weighting or multiple imputation. Simulation studies are used to compare methods and demonstrate that responsiveness in longitudinal studies can be used to mitigate bias induced by missing data, even when data are not missing at random.
Transitioning to multiple imputation : a new method to impute missing blood alcohol concentration (BAC) values in FARS

DOT National Transportation Integrated Search

2002-01-01

The National Center for Statistics and Analysis (NCSA) of the National Highway Traffic Safety : Administration (NHTSA) has undertaken several approaches to remedy the problem of missing blood alcohol : test results in the Fatality Analysis Reporting ...
Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

PubMed Central

2013-01-01

Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which may help to identify deleterious alleles that are the basis of inbreeding depression in the species. PMID:23324311
Non-LTE H2+ as the source of missing opacity in the solar atmosphere

NASA Technical Reports Server (NTRS)

Swamy, K. S. K.; Stecher, T. P.

1974-01-01

The population of the various vibrational levels of the H2+ molecule has been calculated from the consideration of formation and destruction mechanisms. The resulting population is used in calculating the total absorption due to H2+ and is compared with the other known sources of opacity at several optical depths of the solar atmosphere. It is shown that the absorption due to H2+ can probably account for the missing ultraviolet opacity in the solar atmosphere.
Testing of NASA LaRC Materials under MISSE 6 and MISSE 7 Missions

NASA Technical Reports Server (NTRS)

Prasad, Narasimha S.

2009-01-01

The objective of the Materials International Space Station Experiment (MISSE) is to study the performance of novel materials when subjected to the synergistic effects of the harsh space environment for several months. MISSE missions provide an opportunity for developing space qualifiable materials. Two lasers and a few optical components from NASA Langley Research Center (LaRC) were included in the MISSE 6 mission for long term exposure. MISSE 6 items were characterized and packed inside a ruggedized Passive Experiment Container (PEC) that resembles a suitcase. The PEC was tested for survivability due to launch conditions. MISSE 6 was transported to the international Space Station (ISS) via STS 123 on March 11. 2008. The astronauts successfully attached the PEC to external handrails of the ISS and opened the PEC for long term exposure to the space environment. The current plan is to bring the MISSE 6 PEC back to the Earth via STS 128 mission scheduled for launch in August 2009. Currently, preparations for launching the MISSE 7 mission are progressing. Laser and lidar components assembled on a flight-worthy platform are included from NASA LaRC. MISSE 7 launch is scheduled to be launched on STS 129 mission. This paper will briefly review recent efforts on MISSE 6 and MISSE 7 missions at NASA Langley Research Center (LaRC).
75 FR 53631 - Missing Parts Practice

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-01

... extended missing parts pilot program is expected to benefit applicants by permitting additional time to determine if patent protection should be sought at a relatively low cost and by permitting applicants to... to reduce the costs due one year after filing a provisional application, the USPTO published a...
Approach to addressing missing data for electronic medical records and pharmacy claims data research.

PubMed

Bounthavong, Mark; Watanabe, Jonathan H; Sullivan, Kevin M

2015-04-01

The complete capture of all values for each variable of interest in pharmacy research studies remains aspirational. The absence of these possibly influential values is a common problem for pharmacist investigators. Failure to account for missing data may translate to biased study findings and conclusions. Our goal in this analysis was to apply validated statistical methods for missing data to a previously analyzed data set and compare results when missing data methods were implemented versus standard analytics that ignore missing data effects. Using data from a retrospective cohort study, the statistical method of multiple imputation was used to provide regression-based estimates of the missing values to improve available data usable for study outcomes measurement. These findings were then contrasted with a complete-case analysis that restricted estimation to subjects in the cohort that had no missing values. Odds ratios were compared to assess differences in findings of the analyses. A nonadjusted regression analysis ("crude analysis") was also performed as a reference for potential bias. Veterans Integrated Systems Network that includes VA facilities in the Southern California and Nevada regions. New statin users between November 30, 2006, and December 2, 2007, with a diagnosis of dyslipidemia. We compared the odds ratios (ORs) and 95% confidence intervals (CIs) for the crude, complete-case, and multiple imputation analyses for the end points of a 25% or greater reduction in atherogenic lipids. Data were missing for 21.5% of identified patients (1665 subjects of 7739). Regression model results were similar for the crude, complete-case, and multiple imputation analyses with overlap of 95% confidence limits at each end point. The crude, complete-case, and multiple imputation ORs (95% CIs) for a 25% or greater reduction in low-density lipoprotein cholesterol were 3.5 (95% CI 3.1-3.9), 4.3 (95% CI 3.8-4.9), and 4.1 (95% CI 3.7-4.6), respectively. The crude, complete-case, and multiple imputation ORs (95% CIs) for a 25% or greater reduction in non-high-density lipoprotein cholesterol were 3.5 (95% CI 3.1-3.9), 4.5 (95% CI 4.0-5.2), and 4.4 (95% CI 3.9-4.9), respectively. The crude, complete-case, and multiple imputation ORs (95% CIs) for 25% or greater reduction in TGs were 3.1 (95% CI 2.8-3.6), 4.0 (95% CI 3.5-4.6), and 4.1 (95% CI 3.6-4.6), respectively. The use of the multiple imputation method to account for missing data did not alter conclusions based on a complete-case analysis. Given the frequency of missing data in research using electronic health records and pharmacy claims data, multiple imputation may play an important role in the validation of study findings. © 2015 Pharmacotherapy Publications, Inc.
Imputation of missing data in time series for air pollutants

NASA Astrophysics Data System (ADS)

Junger, W. L.; Ponce de Leon, A.

2015-02-01

Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.
29 CFR 4050.9 - Annuity or elective lump sum-living missing participant.

Code of Federal Regulations, 2010 CFR

2010-07-01

... unloaded designated benefit divided by the present value (determined as of the deemed distribution date.... 4050.9 Section 4050.9 Labor Regulations Relating to Labor (Continued) PENSION BENEFIT GUARANTY... participant. This section applies to a missing participant whose designated benefit was determined under...
What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates

PubMed Central

Thomas, Sarah L.; Schmidt, Karen M.; Erbacher, Monica K.; Bergeman, Cindy S.

2017-01-01

The authors investigated the effect of Missing Completely at Random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates. PMID:26784376
Does Missed Care in Isolated Rural Hospitals Matter?

PubMed

Smith, Jessica G

2018-06-01

Missed care is associated with adverse outcomes such as patient falls and decreased nurse job satisfaction. Although studied in populations of interest such as neonates, children, and heart failure patients, there are no studies about missed care in rural hospitals. Reducing care omissions in rural hospitals might help improve rural patient outcomes and ensure that rural hospitals can remain open in an era of hospital reimbursement dependent on care outcomes, such as through value-based purchasing. Understanding the extent of missed nursing care and its implications for rural populations might provide crucial information to alert rural hospital administrators and nurses about the incidence and influence of missed care on health outcomes. Focusing on missed care within rural hospitals and other rural health care settings is important to address the specific health needs of aging rural U.S. residents who are isolated from high-volume, urban health care facilities.
Cognitive Testing in Patients with CKD: The Problem of Missing Cases.

PubMed

Neumann, Denise; Robinski, Maxi; Mau, Wilfried; Girndt, Matthias

2017-03-07

Cognitive testing is only valid in individuals with sufficient visual and motor skills and motivation to participate. Patients on dialysis usually suffer from limitations, such as impaired vision, motor difficulties, and depression. Hence, it is doubtful that the true value of cognitive functioning can be measured without bias. Consequently, many patients are excluded from cognitive testing. We focused on reasons for exclusion and analyzed characteristics of nontestable patients. Within the Choice of Renal Replacement Therapy Project (baseline survey: May 2014 to May 2015), n =767 patients on peritoneal dialysis ( n =240) or hemodialysis ( n =527) were tested with the Trail Making Test-B and the German d2-Revision Test and completed the Kidney Disease Quality of Life Short Form cognition subscale. We divided the sample into patients with missing cognitive testing data and patients with full cognitive testing data, analyzed reasons for nonfeasibility, and compared subsamples with regard to psychosocial and physical metrics. The exclusion categories were linked to patient characteristics potentially associated with missing data (age, comorbidity, depression, and education level) by calculation of λ -coefficient. The subsamples consisted of n =366 (48%) patients with missing data (peritoneal dialysis =62, hemodialysis =304) and n =401 patients with full cognitive testing data (peritoneal dialysis =178, hemodialysis =223). Patients were excluded due to visual impairment (49%), lack of motivation (31%), and motor impairment (13%). The remaining 8% did not follow instructions, suffered from medical incidents, or had language difficulties. Compared with patients with full cognitive testing data, they were more likely to have depression; be treated with hemodialysis; be older, nonworking, or more comorbid; and experience poorer shared decision making. Reasons for exclusion were not related to levels of age, comorbidity score, depression score, or education level. We excluded almost one half of eligible patients from cognitive testing due to visual, motivational, or motor difficulties. Our findings are consistent with exclusion categories reported from the literature. We should be aware that, because of disease-related limitations, conclusions about cognitive functioning in the CKD population may be biased. In the future, nonvisual and nonverbal cognitive testing can be a valuable resource. Copyright © 2017 by the American Society of Nephrology.

Cognitive Testing in Patients with CKD: The Problem of Missing Cases

PubMed Central

Neumann, Denise; Mau, Wilfried; Girndt, Matthias

2017-01-01

Background and objectives Cognitive testing is only valid in individuals with sufficient visual and motor skills and motivation to participate. Patients on dialysis usually suffer from limitations, such as impaired vision, motor difficulties, and depression. Hence, it is doubtful that the true value of cognitive functioning can be measured without bias. Consequently, many patients are excluded from cognitive testing. We focused on reasons for exclusion and analyzed characteristics of nontestable patients. Design, setting, participants & measurements Within the Choice of Renal Replacement Therapy Project (baseline survey: May 2014 to May 2015), n=767 patients on peritoneal dialysis (n=240) or hemodialysis (n=527) were tested with the Trail Making Test-B and the German d2-Revision Test and completed the Kidney Disease Quality of Life Short Form cognition subscale. We divided the sample into patients with missing cognitive testing data and patients with full cognitive testing data, analyzed reasons for nonfeasibility, and compared subsamples with regard to psychosocial and physical metrics. The exclusion categories were linked to patient characteristics potentially associated with missing data (age, comorbidity, depression, and education level) by calculation of λ-coefficient. Results The subsamples consisted of n=366 (48%) patients with missing data (peritoneal dialysis =62, hemodialysis =304) and n=401 patients with full cognitive testing data (peritoneal dialysis =178, hemodialysis =223). Patients were excluded due to visual impairment (49%), lack of motivation (31%), and motor impairment (13%). The remaining 8% did not follow instructions, suffered from medical incidents, or had language difficulties. Compared with patients with full cognitive testing data, they were more likely to have depression; be treated with hemodialysis; be older, nonworking, or more comorbid; and experience poorer shared decision making. Reasons for exclusion were not related to levels of age, comorbidity score, depression score, or education level. Conclusions We excluded almost one half of eligible patients from cognitive testing due to visual, motivational, or motor difficulties. Our findings are consistent with exclusion categories reported from the literature. We should be aware that, because of disease-related limitations, conclusions about cognitive functioning in the CKD population may be biased. In the future, nonvisual and nonverbal cognitive testing can be a valuable resource. PMID:28148556
DOE Office of Scientific and Technical Information (OSTI.GOV)

Monaghan, P; Shneor, R; Subedi, R

The five-fold differential cross section for the 12C(e,e'p)11B reaction was determined over a missing momentum range of 200-400 MeV/c, in a kinematics regime with Bjorken x > 1 and Q2 = 2.0 (GeV/c)2. A comparison of the results and theoretical models and previous lower missing momentum data is shown. The theoretical calculations agree well with the data up to a missing momentum value of 325 MeV/c and then diverge for larger missing momenta. The extracted distorted momentum distribution is shown to be consistent with previous data and extends the range of available data up to 400 MeV/c.
Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.

PubMed

Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S

2018-04-10

Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.
Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods

PubMed Central

Shara, Nawar; Yassin, Sayf A.; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V.; Wang, Wenyu; Lee, Elisa T.; Umans, Jason G.

2015-01-01

Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989–1991), 2 (1993–1995), and 3 (1998–1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results. PMID:26414328
Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

PubMed

Shara, Nawar; Yassin, Sayf A; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V; Wang, Wenyu; Lee, Elisa T; Umans, Jason G

2015-01-01

Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991), 2 (1993-1995), and 3 (1998-1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.
The value of testing multiple anatomic sites for gonorrhoea and chlamydia in sexually transmitted infection centres in the Netherlands, 2006-2010.

PubMed

Koedijk, F D H; van Bergen, J E A M; Dukers-Muijrers, N H T M; van Leeuwen, A P; Hoebe, C J P A; van der Sande, M A B

2012-09-01

National surveillance data from 2006 to 2010 of the Dutch sexually transmitted infection (STI) centres were used to analyse current practices on testing extragenital sites for chlamydia and gonorrhoea in men who have sex with men (MSM) and women. In MSM, 76.0% and 88.9% were tested at least at one extragenital site (pharyngeal and/or anorectal) for chlamydia and gonorrhoea, respectively; for women this was 20.5% and 30.2%. Testing more than one anatomic site differed by STI centre, ranging from 2% to 100%. In MSM tested at multiple sites, 63.0% and 66.5% of chlamydia and gonorrhoea diagnoses, respectively, would have been missed if screened at the urogenital site only, mainly anorectal infections. For women tested at multiple sites, the proportions of missed chlamydia and gonorrhoea diagnoses would have been 12.9% and 30.0%, respectively. Testing extragenital sites appears warranted, due to the numerous infections that would have been missed. Adding anorectal screening to urogenital screening for all MSM visiting an STI centre should be recommended. Since actual testing practices differ by centre, there is a need for clearer guidelines. Routine gonorrhoea and chlamydia screening at multiple sites in STI centres should be investigated further as this might be a more effective approach to reduce transmission than current practice.
A Two-Step Approach for Analysis of Nonignorable Missing Outcomes in Longitudinal Regression: an Application to Upstate KIDS Study.

PubMed

Liu, Danping; Yeung, Edwina H; McLain, Alexander C; Xie, Yunlong; Buck Louis, Germaine M; Sundaram, Rajeshwari

2017-09-01

Imperfect follow-up in longitudinal studies commonly leads to missing outcome data that can potentially bias the inference when the missingness is nonignorable; that is, the propensity of missingness depends on missing values in the data. In the Upstate KIDS Study, we seek to determine if the missingness of child development outcomes is nonignorable, and how a simple model assuming ignorable missingness would compare with more complicated models for a nonignorable mechanism. To correct for nonignorable missingness, the shared random effects model (SREM) jointly models the outcome and the missing mechanism. However, the computational complexity and lack of software packages has limited its practical applications. This paper proposes a novel two-step approach to handle nonignorable missing outcomes in generalized linear mixed models. We first analyse the missing mechanism with a generalized linear mixed model and predict values of the random effects; then, the outcome model is fitted adjusting for the predicted random effects to account for heterogeneity in the missingness propensity. Extensive simulation studies suggest that the proposed method is a reliable approximation to SREM, with a much faster computation. The nonignorability of missing data in the Upstate KIDS Study is estimated to be mild to moderate, and the analyses using the two-step approach or SREM are similar to the model assuming ignorable missingness. The two-step approach is a computationally straightforward method that can be conducted as sensitivity analyses in longitudinal studies to examine violations to the ignorable missingness assumption and the implications relative to health outcomes. © 2017 John Wiley & Sons Ltd.
Direct measurement of NO3 radical reactivity in a boreal forest

NASA Astrophysics Data System (ADS)

Liebmann, Jonathan; Karu, Einar; Sobanski, Nicolas; Schuladen, Jan; Ehn, Mikael; Schallhart, Simon; Quéléver, Lauriane; Hellen, Heidi; Hakola, Hannele; Hoffmann, Thorsten; Williams, Jonathan; Fischer, Horst; Lelieveld, Jos; Crowley, John N.

2018-03-01

We present the first direct measurements of NO3 reactivity (or inverse lifetime, s-1) in the Finnish boreal forest. The data were obtained during the IBAIRN campaign (Influence of Biosphere-Atmosphere Interactions on the Reactive Nitrogen budget) which took place in Hyytiälä, Finland during the summer/autumn transition in September 2016. The NO3 reactivity was generally very high with a maximum value of 0.94 s-1 and displayed a strong diel variation with a campaign-averaged nighttime mean value of 0.11 s-1 compared to a daytime value of 0.04 s-1. The highest nighttime NO3 reactivity was accompanied by major depletion of canopy level ozone and was associated with strong temperature inversions and high levels of monoterpenes. The daytime reactivity was sufficiently large that reactions of NO3 with organic trace gases could compete with photolysis and reaction with NO. There was no significant reduction in the measured NO3 reactivity between the beginning and end of the campaign, indicating that any seasonal reduction in canopy emissions of reactive biogenic trace gases was offset by emissions from the forest floor. Observations of biogenic hydrocarbons (BVOCs) suggested a dominant role for monoterpenes in determining the NO3 reactivity. Reactivity not accounted for by in situ measurement of NO and BVOCs was variable across the diel cycle with, on average, ≈ 30 % missing during nighttime and ≈ 60 % missing during the day. Measurement of the NO3 reactivity at various heights (8.5 to 25 m) both above and below the canopy, revealed a strong nighttime, vertical gradient with maximum values closest to the ground. The gradient disappeared during the daytime due to efficient vertical mixing.
Reliability of dietary information from surrogate respondents.

PubMed

Hislop, T G; Coldman, A J; Zheng, Y Y; Ng, V T; Labo, T

1992-01-01

A self-administered food frequency questionnaire was included as part of a case-control study of breast cancer in 1980-82. In 1986-87, a second food frequency questionnaire was sent to surviving cases and husbands of deceased cases; 30 spouses (86% response rate) and 263 surviving cases (88% response rate) returned questionnaires. The dietary questions concerned consumption of specific food items by the case before diagnosis of breast cancer. Missing values were less common in the second questionnaire; there was no significant difference in missing values between surviving cases and spouses of deceased cases. Kappa statistics comparing responses in the first and second questionnaires were significantly lower for spouses of deceased cases than for surviving cases. Reported level of confidence by the husbands regarding knowledge about their wives' eating habits did not influence the kappa statistics or the frequencies of missing values. The lack of good agreement has important implications for the use of proxy interviews from husbands in retrospective dietary studies.
Infilling and quality checking of discharge, precipitation and temperature data using a copula based approach

NASA Astrophysics Data System (ADS)

Anwar, Faizan; Bárdossy, András; Seidel, Jochen

2017-04-01

Estimating missing values in a time series of a hydrological variable is an everyday task for a hydrologist. Existing methods such as inverse distance weighting, multivariate regression, and kriging, though simple to apply, provide no indication of the quality of the estimated value and depend mainly on the values of neighboring stations at a given step in the time series. Copulas have the advantage of representing the pure dependence structure between two or more variables (given the relationship between them is monotonic). They rid us of questions such as transforming the data before use or calculating functions that model the relationship between the considered variables. A copula-based approach is suggested to infill discharge, precipitation, and temperature data. As a first step the normal copula is used, subsequently, the necessity to use non-normal / non-symmetrical dependence is investigated. Discharge and temperature are treated as regular continuous variables and can be used without processing for infilling and quality checking. Due to the mixed distribution of precipitation values, it has to be treated differently. This is done by assigning a discrete probability to the zeros and treating the rest as a continuous distribution. Building on the work of others, along with infilling, the normal copula is also utilized to identify values in a time series that might be erroneous. This is done by treating the available value as missing, infilling it using the normal copula and checking if it lies within a confidence band (5 to 95% in our case) of the obtained conditional distribution. Hydrological data from two catchments Upper Neckar River (Germany) and Santa River (Peru) are used to demonstrate the application for datasets with different data quality. The Python code used here is also made available on GitHub. The required input is the time series of a given variable at different stations.
Adaptation of clinical prediction models for application in local settings.

PubMed

Kappen, Teus H; Vergouwe, Yvonne; van Klei, Wilton A; van Wolfswinkel, Leo; Kalkman, Cor J; Moons, Karel G M

2012-01-01

When planning to use a validated prediction model in new patients, adequate performance is not guaranteed. For example, changes in clinical practice over time or a different case mix than the original validation population may result in inaccurate risk predictions. To demonstrate how clinical information can direct updating a prediction model and development of a strategy for handling missing predictor values in clinical practice. A previously derived and validated prediction model for postoperative nausea and vomiting was updated using a data set of 1847 patients. The update consisted of 1) changing the definition of an existing predictor, 2) reestimating the regression coefficient of a predictor, and 3) adding a new predictor to the model. The updated model was then validated in a new series of 3822 patients. Furthermore, several imputation models were considered to handle real-time missing values, so that possible missing predictor values could be anticipated during actual model use. Differences in clinical practice between our local population and the original derivation population guided the update strategy of the prediction model. The predictive accuracy of the updated model was better (c statistic, 0.68; calibration slope, 1.0) than the original model (c statistic, 0.62; calibration slope, 0.57). Inclusion of logistical variables in the imputation models, besides observed patient characteristics, contributed to a strategy to deal with missing predictor values at the time of risk calculation. Extensive knowledge of local, clinical processes provides crucial information to guide the process of adapting a prediction model to new clinical practices.
Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data.

PubMed

Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard

2014-04-01

Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices.
Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data

PubMed Central

Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard

2014-01-01

Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices. PMID:24772273
Dealing with missing data in remote sensing images within land and crop classification

NASA Astrophysics Data System (ADS)

Skakun, Sergii; Kussul, Nataliia; Basarab, Ruslan

Optical remote sensing images from space provide valuable data for environmental monitoring, disaster management [1], agriculture mapping [2], so forth. In many cases, a time-series of satellite images is used to discriminate or estimate particular land parameters. One of the factors that influence the efficiency of satellite imagery is the presence of clouds. This leads to the occurrence of missing data that need to be addressed. Numerous approaches have been proposed to fill in missing data (or gaps) and can be categorized into inpainting-based, multispectral-based, and multitemporal-based. In [3], ancillary MODIS data are utilized for filling gaps and predicting Landsat data. In this paper we propose to use self-organizing Kohonen maps (SOMs) for missing data restoration in time-series of satellite imagery. Such approach was previously used for MODIS data [4], but applying this approach for finer spatial resolution data such as Sentinel-2 and Landsat-8 represents a challenge. Moreover, data for training the SOMs are selected manually in [4] that complicates the use of the method in an automatic mode. SOM is a type of artificial neural network that is trained using unsupervised learning to produce a discretised representation of the input space of the training samples, called a map. The map seeks to preserve the topological properties of the input space. The reconstruction of satellite images is performed for each spectral band separately, i.e. a separate SOM is trained for each spectral band. Pixels that have no missing values in the time-series are selected for training. Selecting the number of training pixels represent a trade-off, in particular increasing the number of training samples will lead to the increased time of SOM training while increasing the quality of restoration. Also, training data sets should be selected automatically. As such, we propose to select training samples on a regular grid of pixels. Therefore, the SOM seeks to project a large number of non-missing data to the subspace vectors in the map. Restoration of the missing values is performed in the following way. The multi-temporal pixel values (with gaps) are put to the neural network. A neuron-winner (or a best matching unit, BMU) in the SOM is selected based on the distance metric (for example, Euclidian). It should be noted that missing values are omitted from metric estimation when selecting BMU. When the BMU is selected, missing values are substituted by corresponding components of the BMU values. The efficiency of the proposed approach was tested on a time-series of Landsat-8 images over the JECAM test site in Ukraine and Sich-2 images over Crimea (Sich-2 is Ukrainian remote sensing satellite acquiring images at 8m spatial resolution). Landsat-8 images were first converted to the TOA reflectance, and then were atmospherically corrected so each pixel value represents a surface reflectance in the range from 0 to 1. The error of reconstruction (error of quantization) on training data was: band-2: 0.015; band-3: 0.020; band-4: 0.026; band-5: 0.070; band-6: 0.060; band-7: 0.055. The reconstructed images were also used for crop classification using a multi-layer perceptron (MLP). Overall accuracy was 85.98% and Cohen's kappa was 0.83. References. 1. Skakun, S., Kussul, N., Shelestov, A. and Kussul, O. “Flood Hazard and Flood Risk Assessment Using a Time Series of Satellite Images: A Case Study in Namibia,” Risk Analysis, 2013, doi: 10.1111/risa.12156. 2. Gallego, F.J., Kussul, N., Skakun, S., Kravchenko, O., Shelestov, A., Kussul, O. “Efficiency assessment of using satellite data for crop area estimation in Ukraine,” International Journal of Applied Earth Observation and Geoinformation, vol. 29, pp. 22-30, 2014. 3. Roy D.P., Ju, J., Lewis, P., Schaaf, C., Gao, F., Hansen, M., and Lindquist, E., “Multi-temporal MODIS-Landsat data fusion for relative radiometric normalization, gap filling, and prediction of Landsat data,” Remote Sensing of Environment, 112(6), pp. 3112-3130, 2008. 4. Latif, B.A., and Mercier, G., “Self-Organizing maps for processing of data with missing values and outliers: application to remote sensing images,” Self-Organizing Maps. InTech, pp. 189-210, 2010.
Cambro-ordovician sea-level fluctuations and sequence boundaries: The missing record and the evolution of new taxa

USGS Publications Warehouse

Lehnert, O.; Miller, J.F.; Leslie, Stephen A.; Repetski, J.E.; Ethington, Raymond L.

2005-01-01

The evolution of early Palaeozoic conodont faunas shows a clear connection to sea-level changes. One way that this connection manifests itself is that thick successions of carbonates are missing beneath major sequence boundaries due to karstification and erosion. From this observation arises the question of how many taxa have been lost from different conodont lineages in these incomplete successions. Although many taxa suffered extinction due to the environmental stresses associated with falling sea-levels, some must have survived in these extreme conditions. The number of taxa missing in the early Palaeozoic tropics always will be unclear, but it will be even more difficult to evaluate the missing record in detrital successions of higher latitudes. A common pattern in the evolution of Cambrian-Ordovician conodont lineages is appearances of new species at sea-level rises and disappearances at sea-level drops. This simple picture can be complicated by intervals that consistently have no representatives of a particular lineage, even after extensive sampling of the most complete sections. Presumably the lineages survived in undocumented refugia. In this paper, we give examples of evolution in Cambrian-Ordovician shallowmarine conodont faunas and highlight problems of undiscovered or truly missing segments of lineages. ?? The Palaeontological Association.
Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

NASA Astrophysics Data System (ADS)

Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

2013-04-01

This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.
Hydrological and glaciological balances on Antizana Volcano, Ecuador

NASA Astrophysics Data System (ADS)

Favier, V.; Cadier, E.; Coudrain, A.; Francou, B.; Maisincho, L.; Praderio, E.; Villacis, M.; Wagnon, P.

2006-12-01

Water supply for Quito, the capital of Ecuador, is partly fed by the water collected at the piedmont of Antizana ice covered stratovolcano. In order to assess the contribution of glaciers to the local water resources, a comparison of hydrological and glaciological datasets collected over the 1995-2005 period on Antizana Glacier 15 watershed was realized. Over the study period, Antizana glacier 15 retreated quickly, inducing an important water contribution to lower altitude discharges. However, comparison of hydrological and glaciological balances allowed observation of important missing runoffs due to underground circulations. Subsuperficial circulations were initially questioned due to the total disappearance of surface streams at the level of the frontal moraine, a surface stream being observed again downstream the moraine. Brine injections were performed upstream the moraine and in a small lake located on the moraine and restitution rates of salt were computed. Tracer experiments demonstrated a complete restitution of discharges implying that missing runoff were not involved in subsuperficial circulations but in deeper ones that may have flown through the fractured rock environment of the stratovlocano. Experiments also demonstrated that infiltrations occurred directly at the bedrock of the glaciers. Then, taking into account the weak discharges observed at the glacier front would induce computation of a strongly underestimated value of the actual water contribution from glaciers to lower altitude discharges. Finally, assessing water contribution from glaciers of Ecuador requires a comparison of glaciological and hydrological data.
Relatively high prevalence of pox-like lesions in Henslow's Sparrow (Ammodramus henslowii) among nine species of migratory grassland passerines in Wisconsin, USA

USGS Publications Warehouse

Ellison, Kevin S.; Hofmeister, Erik K.; Ribic, Christine A.; Sample, David W.

2014-01-01

Globally, Avipoxvirus species affect over 230 species of wild birds and can significantly impair survival. During banding of nine grassland songbird species (n = 346 individuals) in southwestern Wisconsin, USA, we noted species with a 2–6% prevalence of pox-like lesions (possible evidence of current infection) and 4–10% missing digits (potential evidence of past infection). These prevalences approach those recorded among island endemic birds (4–9% and 9–20% for the Galapagos and Hawaii, respectively) for which Avipoxvirus species have been implicated as contributing to dramatic population declines. Henslow's Sparrow Ammodramus henslowii (n = 165 individuals) had the highest prevalence of lesions (6.1%) and missing digits (9.7%). Among a subset of 26 Henslow's Sparrows from which blood samples were obtained, none had detectable antibody reactive to fowlpox virus antigen. However, four samples (18%) had antibody to canarypox virus antigen with test sample and negative control ratios (P/N values) ranging from 2.4 to 6.5 (median 4.3). Of four antibody-positive birds, two had lesions recorded (one was also missing a digit), one had digits missing, and one had no signs. Additionally, the birds with lesions or missing digits had higher P/N values than did the antibody-positive bird without missing digits or recorded lesions. This study represents an impetus for considering the impacts and dynamics of disease caused by Avipoxvirus among North American grassland bird species.
Relatively high prevalence of pox-like lesions in Henslow's sparrow (Ammodrammus henslowii) among nine species of migratory grassland passerines in Wisconsin, USA.

PubMed

Ellison, Kevin S; Hofmeister, Erik K; Ribic, Christine A; Sample, David W

2014-10-01

Globally, Avipoxvirus species affect over 230 species of wild birds and can significantly impair survival. During banding of nine grassland songbird species (n=346 individuals) in southwestern Wisconsin, USA, we noted species with a 2-6% prevalence of pox-like lesions (possible evidence of current infection) and 4-10% missing digits (potential evidence of past infection). These prevalences approach those recorded among island endemic birds (4-9% and 9-20% for the Galapagos and Hawaii, respectively) for which Avipoxvirus species have been implicated as contributing to dramatic population declines. Henslow's Sparrow Ammodramus henslowii (n=165 individuals) had the highest prevalence of lesions (6.1%) and missing digits (9.7%). Among a subset of 26 Henslow's Sparrows from which blood samples were obtained, none had detectable antibody reactive to fowlpox virus antigen. However, four samples (18%) had antibody to canarypox virus antigen with test sample and negative control ratios (P/N values) ranging from 2.4 to 6.5 (median 4.3). Of four antibody-positive birds, two had lesions recorded (one was also missing a digit), one had digits missing, and one had no signs. Additionally, the birds with lesions or missing digits had higher P/N values than did the antibody-positive bird without missing digits or recorded lesions. This study represents an impetus for considering the impacts and dynamics of disease caused by Avipoxvirus among North American grassland bird species.
The Effects and Costs of Intimate Partner Violence for Work Organizations

ERIC Educational Resources Information Center

Reeves, Carol; OLeary-Kelly, Anne M.

2007-01-01

This study examines the productivity-related effects and costs of intimate partner violence (IPV) on the workplace. Specifically, it explores whether IPV victims and nonvictims differ in the number of work hours missed due to absenteeism, tardiness, and work distraction and the costs for employers from these missed work hours. The research…

Some Activities of MISSE 6 Mission

NASA Technical Reports Server (NTRS)

Prasad, Narasimha S.

2009-01-01

The objective of the Materials International Space Station Experiment (MISSE) is to study the performance of novel materials when subjected to the synergistic effects of the harsh space environment for several months. In this paper, a few laser and optical elements from NASA Langley Research Center (LaRC) that have been flown on MISSE 6 mission will be discussed. These items were characterized and packed inside a ruggedized Passive Experiment Container (PEC) that resembles a suitcase. The PEC was tested for survivability due to launch conditions. Subsequently, the MISSE 6 PEC was transported by the STS-123 mission to International Space Station (ISS) on March 11, 2008. The astronauts successfully attached the PEC to external handrails and opened the PEC for long term exposure to the space environment. The plan is to retrieve the MISSE 6 PEC by STS-128 mission in August 2009.
MISSE 6-Testing Materials in Space

NASA Technical Reports Server (NTRS)

Prasad, Narasimha S; Kinard, William H.

2008-01-01

The objective of the Materials International Space Station Experiment (MISSE) is to study the performance of novel materials when subjected to the synergistic effects of the harsh space environment by placing them in space environment for several months. In this paper, a few materials and components from NASA Langley Research Center (LaRC) that have been flown on MISSE 6 mission will be discussed. These include laser and optical elements for photonic devices. The pre-characterized MISSE 6 materials were packed inside a ruggedized Passive Experiment Container (PEC) that resembles a suitcase. The PEC was tested for survivability due to launch conditions. Subsequently, the MISSE 6 PEC was transported by the STS-123 mission to International Space Station (ISS) on March 11, 2008. The astronauts successfully attached the PEC to external handrails and opened the PEC for long term exposure to the space environment.
The Missing Link: Teaching the Dispositions to Lead

ERIC Educational Resources Information Center

Allen, James G.; Wasicsko, M. Mark; Chirichello, Michael

2014-01-01

In this article the authors contend that the element that is typically missing or underdeveloped in the education and development of most leaders is the intentional integration of the research and practices for assessing and developing the deeply held core beliefs, attitudes, and values (what we will call leadership dispositions) that play a…
Comparison of Modern Methods for Analyzing Repeated Measures Data with Missing Values

ERIC Educational Resources Information Center

Vallejo, G.; Fernandez, M. P.; Livacic-Rojas, P. E.; Tuero-Herrero, E.

2011-01-01

Missing data are a pervasive problem in many psychological applications in the real world. In this article we study the impact of dropout on the operational characteristics of several approaches that can be easily implemented with commercially available software. These approaches include the covariance pattern model based on an unstructured…
A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data

ERIC Educational Resources Information Center

Mistler, Stephen A.; Enders, Craig K.

2017-01-01

Multiple imputation methods can generally be divided into two broad frameworks: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution, whereas FCS imputes variables one at a time from a series of univariate conditional…
Variable Selection in the Presence of Missing Data: Imputation-based Methods.

PubMed

Zhao, Yize; Long, Qi

2017-01-01

Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.
Inverse-Probability-Weighted Estimation for Monotone and Nonmonotone Missing Data.

PubMed

Sun, BaoLuo; Perkins, Neil J; Cole, Stephen R; Harel, Ofer; Mitchell, Emily M; Schisterman, Enrique F; Tchetgen Tchetgen, Eric J

2018-03-01

Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568-575).
Missing data may lead to changes in hip fracture database studies: a study of the American College of Surgeons National Surgical Quality Improvement Program.

PubMed

Basques, B A; McLynn, R P; Lukasiewicz, A M; Samuel, A M; Bohl, D D; Grauer, J N

2018-02-01

The aims of this study were to characterize the frequency of missing data in the National Surgical Quality Improvement Program (NSQIP) database and to determine how missing data can influence the results of studies dealing with elderly patients with a fracture of the hip. Patients who underwent surgery for a fracture of the hip between 2005 and 2013 were identified from the NSQIP database and the percentage of missing data was noted for demographics, comorbidities and laboratory values. These variables were tested for association with 'any adverse event' using multivariate regressions based on common ways of handling missing data. A total of 26 066 patients were identified. The rate of missing data was up to 77.9% for many variables. Multivariate regressions comparing three methods of handling missing data found different risk factors for postoperative adverse events. Only seven of 35 identified risk factors (20%) were common to all three analyses. Missing data is an important issue in national database studies that researchers must consider when evaluating such investigations. Cite this article: Bone Joint J 2018;100-B:226-32. ©2018 The British Editorial Society of Bone & Joint Surgery.
Inverse-Probability-Weighted Estimation for Monotone and Nonmonotone Missing Data

PubMed Central

Sun, BaoLuo; Perkins, Neil J; Cole, Stephen R; Harel, Ofer; Mitchell, Emily M; Schisterman, Enrique F; Tchetgen Tchetgen, Eric J

2018-01-01

Abstract Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568–575). PMID:29165557
Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

PubMed

Resche-Rigon, Matthieu; White, Ian R

2018-06-01

In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.
Analyzing time-ordered event data with missed observations.

PubMed

Dokter, Adriaan M; van Loon, E Emiel; Fokkema, Wimke; Lameris, Thomas K; Nolet, Bart A; van der Jeugd, Henk P

2017-09-01

A common problem with observational datasets is that not all events of interest may be detected. For example, observing animals in the wild can difficult when animals move, hide, or cannot be closely approached. We consider time series of events recorded in conditions where events are occasionally missed by observers or observational devices. These time series are not restricted to behavioral protocols, but can be any cyclic or recurring process where discrete outcomes are observed. Undetected events cause biased inferences on the process of interest, and statistical analyses are needed that can identify and correct the compromised detection processes. Missed observations in time series lead to observed time intervals between events at multiples of the true inter-event time, which conveys information on their detection probability. We derive the theoretical probability density function for observed intervals between events that includes a probability of missed detection. Methodology and software tools are provided for analysis of event data with potential observation bias and its removal. The methodology was applied to simulation data and a case study of defecation rate estimation in geese, which is commonly used to estimate their digestive throughput and energetic uptake, or to calculate goose usage of a feeding site from dropping density. Simulations indicate that at a moderate chance to miss arrival events ( p = 0.3), uncorrected arrival intervals were biased upward by up to a factor 3, while parameter values corrected for missed observations were within 1% of their true simulated value. A field case study shows that not accounting for missed observations leads to substantial underestimates of the true defecation rate in geese, and spurious rate differences between sites, which are introduced by differences in observational conditions. These results show that the derived methodology can be used to effectively remove observational biases in time-ordered event data.
The MISSE-9 Polymers and Composites Experiment Being Flown on the MISSE-Flight Facility

NASA Technical Reports Server (NTRS)

De Groh, Kim K.; Banks, Bruce A.

2017-01-01

Materials on the exterior of spacecraft in low Earth orbit (LEO) are subject to extremely harsh environmental conditions, including various forms of radiation (cosmic rays, ultraviolet, x-ray, and charged particle radiation), micrometeoroids and orbital debris, temperature extremes, thermal cycling, and atomic oxygen (AO). These environmental exposures can result in erosion, embrittlement and optical property degradation of susceptible materials, threatening spacecraft performance and durability. To increase our understanding of space environmental effects such as AO erosion and radiation induced embrittlement of spacecraft materials, NASA Glenn has developed a series of experiments flown as part of the Materials International Space Station Experiment (MISSE) missions on the exterior of the International Space Station (ISS). These experiments have provided critical LEO space environment durability data such as AO erosion yield values for many materials and mechanical properties changes after long term space exposure. In continuing these studies, a new Glenn experiment has been proposed, and accepted, for flight on the new MISSE-Flight Facility (MISSE-FF). This experiment is called the Polymers and Composites Experiment and it will be flown as part of the MISSE-9 mission, the inaugural mission of MISSE-FF. Figure 1 provides an artist rendition of MISSE-FF ISS external platform. The MISSE-FF is manifested for launch on SpaceX-13.
Identifying Heat Waves in Florida: Considerations of Missing Weather Data

PubMed Central

Leary, Emily; Young, Linda J.; DuClos, Chris; Jordan, Melissa M.

2015-01-01

Background Using current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information. Objectives To identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida. Methods In addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida. Results Modeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September). Conclusions Missing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised. PMID:26619198
Identifying Heat Waves in Florida: Considerations of Missing Weather Data.

PubMed

Leary, Emily; Young, Linda J; DuClos, Chris; Jordan, Melissa M

2015-01-01

Using current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information. To identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida. In addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida. Modeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September). Missing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised.
Missing data imputation of solar radiation data under different atmospheric conditions.

PubMed

Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

2014-10-29

Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.
Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

PubMed Central

Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

2014-01-01

Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644
A measurement-based study of concurrency in a multiprocessor

NASA Technical Reports Server (NTRS)

Mcguire, Patrick John

1987-01-01

A systematic measurement-based methodology for characterizing the amount of concurrency present in a workload, and the effect of concurrency on system performance indices such as cache miss rate and bus activity are developed. Hardware and software instrumentation of an Alliant FX/8 was used to obtain data from a real workload environment. Results show that 35% of the workload is concurrent, with the concurrent periods typically using all available processors. Measurements of periods of change in concurrency show uneven usage of processors during these times. Other system measures, including cache miss rate and processor bus activity, are analyzed with respect to the concurrency measures. Probability of a cache miss is seen to increase with concurrency. The change in cache miss rate is much more sensitive to the fraction of concurrent code in the worklaod than the number of processors active during concurrency. Regression models are developed to quantify the relationships between cache miss rate, bus activity, and the concurrency measures. The model for cache miss rate predicts an increase in the median miss rate value as much as 300% for a 100% increase in concurrency in the workload.
Missing Data in Clinical Studies: Issues and Methods

PubMed Central

Ibrahim, Joseph G.; Chu, Haitao; Chen, Ming-Hui

2012-01-01

Missing data are a prevailing problem in any type of data analyses. A participant variable is considered missing if the value of the variable (outcome or covariate) for the participant is not observed. In this article, various issues in analyzing studies with missing data are discussed. Particularly, we focus on missing response and/or covariate data for studies with discrete, continuous, or time-to-event end points in which generalized linear models, models for longitudinal data such as generalized linear mixed effects models, or Cox regression models are used. We discuss various classifications of missing data that may arise in a study and demonstrate in several situations that the commonly used method of throwing out all participants with any missing data may lead to incorrect results and conclusions. The methods described are applied to data from an Eastern Cooperative Oncology Group phase II clinical trial of liver cancer and a phase III clinical trial of advanced non–small-cell lung cancer. Although the main area of application discussed here is cancer, the issues and methods we discuss apply to any type of study. PMID:22649133
Bayesian Sensitivity Analysis of Statistical Models with Missing Data

PubMed Central

ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

2013-01-01

Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718
Preliminary Climate Uncertainty Quantification Study on Model-Observation Test Beds at Earth Systems Grid Federation Repository

NASA Astrophysics Data System (ADS)

Lin, G.; Stephan, E.; Elsethagen, T.; Meng, D.; Riihimaki, L. D.; McFarlane, S. A.

2012-12-01

Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in applications. It determines how likely certain outcomes are if some aspects of the system are not exactly known. UQ studies such as the atmosphere datasets greatly increased in size and complexity because they now comprise of additional complex iterative steps, involve numerous simulation runs and can consist of additional analytical products such as charts, reports, and visualizations to explain levels of uncertainty. These new requirements greatly expand the need for metadata support beyond the NetCDF convention and vocabulary and as a result an additional formal data provenance ontology is required to provide a historical explanation of the origin of the dataset that include references between the explanations and components within the dataset. This work shares a climate observation data UQ science use case and illustrates how to reduce climate observation data uncertainty and use a linked science application called Provenance Environment (ProvEn) to enable and facilitate scientific teams to publish, share, link, and discover knowledge about the UQ research results. UQ results include terascale datasets that are published to an Earth Systems Grid Federation (ESGF) repository. Uncertainty exists in observation data sets, which is due to sensor data process (such as time averaging), sensor failure in extreme weather conditions, and sensor manufacture error etc. To reduce the uncertainty in the observation data sets, a method based on Principal Component Analysis (PCA) was proposed to recover the missing values in observation data. Several large principal components (PCs) of data with missing values are computed based on available values using an iterative method. The computed PCs can approximate the true PCs with high accuracy given a condition of missing values is met; the iterative method greatly improve the computational efficiency in computing PCs. Moreover, noise removal is done at the same time during the process of computing missing values by using only several large PCs. The uncertainty quantification is done through statistical analysis of the distribution of different PCs. To record above UQ process, and provide an explanation on the uncertainty before and after the UQ process on the observation data sets, additional data provenance ontology, such as ProvEn, is necessary. In this study, we demonstrate how to reduce observation data uncertainty on climate model-observation test beds and using ProvEn to record the UQ process on ESGF. ProvEn demonstrates how a scientific team conducting UQ studies can discover dataset links using its domain knowledgebase, allowing them to better understand and convey the UQ study research objectives, the experimental protocol used, the resulting dataset lineage, related analytical findings, ancillary literature citations, along with the social network of scientists associated with the study. Climate scientists will not only benefit from understanding a particular dataset within a knowledge context, but also benefit from the cross reference of knowledge among the numerous UQ studies being stored in ESGF.

Structural performance of space station trusses with missing members

NASA Technical Reports Server (NTRS)

Dorsey, J. T.

1986-01-01

Structural performance of orthogonal tetrahedral and Warren-type full truss beams and platforms are compared. In addition, degradation of truss structural performance is determined for beams, platforms and a space station when individual struts are removed from the trusses. The truss beam, space station, and truss platform analytical models used in the studies are described. Stiffness degradation of the trusses due to single strut failures is determined using flexible body vibration modes. Ease of strut replacement is assessed by removing a strut and examining the truss deflection at the resulting gap due to applied forces. Finally, the reduction in truss beam strength due to a missing longeron is determined for a space station transverse boom model.
Bayesian analysis of longitudinal dyadic data with informative missing data using a dyadic shared-parameter model.

PubMed

Ahn, Jaeil; Morita, Satoshi; Wang, Wenyi; Yuan, Ying

2017-01-01

Analyzing longitudinal dyadic data is a challenging task due to the complicated correlations from repeated measurements and within-dyad interdependence, as well as potentially informative (or non-ignorable) missing data. We propose a dyadic shared-parameter model to analyze longitudinal dyadic data with ordinal outcomes and informative intermittent missing data and dropouts. We model the longitudinal measurement process using a proportional odds model, which accommodates the within-dyad interdependence using the concept of the actor-partner interdependence effects, as well as dyad-specific random effects. We model informative dropouts and intermittent missing data using a transition model, which shares the same set of random effects as the longitudinal measurement model. We evaluate the performance of the proposed method through extensive simulation studies. As our approach relies on some untestable assumptions on the missing data mechanism, we perform sensitivity analyses to evaluate how the analysis results change when the missing data mechanism is misspecified. We demonstrate our method using a longitudinal dyadic study of metastatic breast cancer.
The HCUP SID Imputation Project: Improving Statistical Inferences for Health Disparities Research by Imputing Missing Race Data.

PubMed

Ma, Yan; Zhang, Wei; Lyman, Stephen; Huang, Yihe

2018-06-01

To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. HCUP SID. A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conditional MI substantially improved statistical inferences for racial health disparities research with the SID. © Health Research and Educational Trust.
Under conditions of large geometric miss, tumor control probability can be higher for static gantry intensity-modulated radiation therapy compared to volume-modulated arc therapy for prostate cancer.

PubMed

Balderson, Michael; Brown, Derek; Johnson, Patricia; Kirkby, Charles

2016-01-01

The purpose of this work was to compare static gantry intensity-modulated radiation therapy (IMRT) with volume-modulated arc therapy (VMAT) in terms of tumor control probability (TCP) under scenarios involving large geometric misses, i.e., those beyond what are accounted for when margin expansion is determined. Using a planning approach typical for these treatments, a linear-quadratic-based model for TCP was used to compare mean TCP values for a population of patients who experiences a geometric miss (i.e., systematic and random shifts of the clinical target volume within the planning target dose distribution). A Monte Carlo approach was used to account for the different biological sensitivities of a population of patients. Interestingly, for errors consisting of coplanar systematic target volume offsets and three-dimensional random offsets, static gantry IMRT appears to offer an advantage over VMAT in that larger shift errors are tolerated for the same mean TCP. For example, under the conditions simulated, erroneous systematic shifts of 15mm directly between or directly into static gantry IMRT fields result in mean TCP values between 96% and 98%, whereas the same errors on VMAT plans result in mean TCP values between 45% and 74%. Random geometric shifts of the target volume were characterized using normal distributions in each Cartesian dimension. When the standard deviations were doubled from those values assumed in the derivation of the treatment margins, our model showed a 7% drop in mean TCP for the static gantry IMRT plans but a 20% drop in TCP for the VMAT plans. Although adding a margin for error to a clinical target volume is perhaps the best approach to account for expected geometric misses, this work suggests that static gantry IMRT may offer a treatment that is more tolerant to geometric miss errors than VMAT. Copyright © 2016 American Association of Medical Dosimetrists. Published by Elsevier Inc. All rights reserved.
Factors associated with tooth loss and prosthodontic status among Sudanese adults.

PubMed

Khalifa, Nadia; Allen, Patrick F; Abu-bakr, Neamat H; Abdel-Rahman, Manar E

2012-01-01

A study was conducted to determine the degree of tooth loss, factors influencing tooth loss, and the extent of prosthodontic rehabilitation in Sudanese adults (≥ 16 years old) attending outpatient clinics in Khartoum State. Pearson and multivariate analyses were used to examine the relationships between tooth loss and specific characteristics determined through interviews and clinical examinations. The mean number of missing teeth was 3.6 (SD, 4.9) and the prevalence of edentulism was 0.1%. The prevalence of tooth loss (missing at least one tooth) was 78%; 66.9% of tooth loss was due to caries, and 11.2% was attributable to other reasons. Prosthetic replacement of missing teeth was evident in 3%, whereas a need for prosthetic replacement was evident in 57%. Having < 20 teeth was associated with age, gender, and socioeconomic status; tooth loss due to caries was associated with age, tribe, frequency of tooth-brushing, and a low rate of dental consultation. Tooth loss due to other reasons was associated with age, tribe, education, periodontal pocketing, tobacco use, tooth wear, and prosthetic status. The results of the present study indicated that the major cause of tooth loss was dental caries, thus emphasizing the importance of a public prevention-based healthcare program. Replacement of missing teeth was uncommon in the study subjects, which may reflect lack of access to this type of oral healthcare.
Growth Modeling with Nonignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial

ERIC Educational Resources Information Center

Muthen, Bengt; Asparouhov, Tihomir; Hunter, Aimee M.; Leuchter, Andrew F.

2011-01-01

This article uses a general latent variable framework to study a series of models for nonignorable missingness due to dropout. Nonignorable missing data modeling acknowledges that missingness may depend not only on covariates and observed outcomes at previous time points as with the standard missing at random assumption, but also on latent…
An Analysis of the Effect of Lecture Capture Initiatives on Student-Athletes at an NCAA Division I Institution

ERIC Educational Resources Information Center

Smith, Gregory Allen

2012-01-01

Student-athletes often miss class due to travel and competitions (Diersen, 2005; F. Wiseman, personal communication, September 30, 2010; Hosick, 2010; NCAA On-line, 2008; Rhatigan, 1984). Missing class is negatively associated with grades (Park & Kerr, 1990; Romer, 1993; Schmidt, 1983). Therefore, as classroom instruction time is replaced by…
Corrupting Learning: Evidence from Missing Federal Education Funds in Brazil. NBER Working Paper No. 18150

ERIC Educational Resources Information Center

Ferraz, Claudio; Finan, Frederico; Moreira, Diana B.

2012-01-01

This paper examines if money matters in education by looking at whether missing resources due to corruption affect student outcomes. We use data from the auditing of Brazil's local governments to construct objective measures of corruption involving educational block grants transferred from the central government to municipalities. Using variation…
The effect of dental scaling noise during intravenous sedation on acoustic respiration rate (RRa™).

PubMed

Kim, Jung Ho; Chi, Seong In; Kim, Hyun Jeong; Seo, Kwang-Suk

2018-04-01

Respiration monitoring is necessary during sedation for dental treatment. Recently, acoustic respiration rate (RRa™), an acoustics-based respiration monitoring method, has been used in addition to auscultation or capnography. The accuracy of this method may be compromised in an environment with excessive noise. This study evaluated whether noise from the ultrasonic scaler affects the performance of RRa in respiratory rate measurement. We analyzed data from 49 volunteers who underwent scaling under intravenous sedation. Clinical tests were divided into preparation, sedation, and scaling periods; respiratory rate was measured at 2-s intervals for 3 min in each period. Missing values ratios of the RRa during each period were measuerd; correlation analysis and Bland-Altman analysis were performed on respiratory rates measured by RRa and capnogram. Respective missing values ratio from RRa were 5.62%, 8.03%, and 23.95% in the preparation, sedation, and scaling periods, indicating an increased missing values ratio in the scaling period (P < 0.001). Correlation coefficients of the respiratory rate, measured with two different methods, were 0.692, 0.677, and 0.562 in each respective period. Mean capnography-RRa biases in Bland-Altman analyses were -0.03, -0.27, and -0.61 in each respective period (P < 0.001); limits of agreement were -4.84-4.45, -4.89-4.15, and -6.18-4.95 (P < 0.001). The probability of missing respiratory rate values was higher during scaling when RRa was used for measurement. Therefore, the use of RRa alone for respiration monitoring during ultrasonic scaling may not be safe.
Reliability and Validity in Measuring the Value Added of Schools

ERIC Educational Resources Information Center

van de Grift, Wim

2009-01-01

Instability in the school population between school entrance and school leaving is not "just a problem of missing data" but often the visible result of the educational problems in some schools and is, therefore, not merely to be treated as missing data but as indicator for the quality of educational processes. Even the most superior…
Toward Best Practices in Analyzing Datasets with Missing Data: Comparisons and Recommendations

ERIC Educational Resources Information Center

Johnson, David R.; Young, Rebekah

2011-01-01

Although several methods have been developed to allow for the analysis of data in the presence of missing values, no clear guide exists to help family researchers in choosing among the many options and procedures available. We delineate these options and examine the sensitivity of the findings in a regression model estimated in three random…
Ionosphere dynamics in the auroral zone during the magnetic storm of March 17-18, 2015

NASA Astrophysics Data System (ADS)

Blagoveshchensky, D. V.; Sergeeva, M. A.

2016-11-01

A comprehensive study of the ionospheric processes encountered during the superstorm which started on March 17th 2015 has been carried out using magnetometer, ionosonde, riometer, ionospheric tomography and an all-sky camera installed in the observatory of Sodankylä, Finland. The storm manifested a number of interesting features. From 12:00 on March 17 there was a significant decrease of critical frequencies foF2 and intensive sporadic Es layers were observed. During the disturbance, there was a lack of variation of the X-component of the magnetic field at times, but the absorption level measured by the riometer was high. A comparison of the electron density distributions for the quiet and disturbed days as shown in the tomography data were very different. Where results were available at the same times, the tomographic foF2 values coincided with the ;real; foF2 values from the ionosonde. Where the ionosonde data was missing due to absorption, the tomographic foF2 values were used instead. The keograms from the all-sky camera showed that during disturbed days the aurorae manifested themselves as bright discrete forms. It was shown that the peaks of absorption due to particle precipitation seen by the riometer coincided in time with the brightenings of aurorae seen on the keograms.
Marginalized zero-inflated Poisson models with missing covariates.

PubMed

Benecha, Habtamu K; Preisser, John S; Divaris, Kimon; Herring, Amy H; Das, Kalyan

2018-05-11

Unlike zero-inflated Poisson regression, marginalized zero-inflated Poisson (MZIP) models for counts with excess zeros provide estimates with direct interpretations for the overall effects of covariates on the marginal mean. In the presence of missing covariates, MZIP and many other count data models are ordinarily fitted using complete case analysis methods due to lack of appropriate statistical methods and software. This article presents an estimation method for MZIP models with missing covariates. The method, which is applicable to other missing data problems, is illustrated and compared with complete case analysis by using simulations and dental data on the caries preventive effects of a school-based fluoride mouthrinse program. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Rainfall threshold definition using an entropy decision approach and radar data

NASA Astrophysics Data System (ADS)

Montesarchio, V.; Ridolfi, E.; Russo, F.; Napolitano, F.

2011-07-01

Flash flood events are floods characterised by a very rapid response of basins to storms, often resulting in loss of life and property damage. Due to the specific space-time scale of this type of flood, the lead time available for triggering civil protection measures is typically short. Rainfall threshold values specify the amount of precipitation for a given duration that generates a critical discharge in a given river cross section. If the threshold values are exceeded, it can produce a critical situation in river sites exposed to alluvial risk. It is therefore possible to directly compare the observed or forecasted precipitation with critical reference values, without running online real-time forecasting systems. The focus of this study is the Mignone River basin, located in Central Italy. The critical rainfall threshold values are evaluated by minimising a utility function based on the informative entropy concept and by using a simulation approach based on radar data. The study concludes with a system performance analysis, in terms of correctly issued warnings, false alarms and missed alarms.
A modified discrete algebraic reconstruction technique for multiple grey image reconstruction for limited angle range tomography.

PubMed

Liang, Zhiting; Guan, Yong; Liu, Gang; Chen, Xiangyu; Li, Fahu; Guo, Pengfei; Tian, Yangchao

2016-03-01

The `missing wedge', which is due to a restricted rotation range, is a major challenge for quantitative analysis of an object using tomography. With prior knowledge of the grey levels, the discrete algebraic reconstruction technique (DART) is able to reconstruct objects accurately with projections in a limited angle range. However, the quality of the reconstructions declines as the number of grey levels increases. In this paper, a modified DART (MDART) was proposed, in which each independent region of homogeneous material was chosen as a research object, instead of the grey values. The grey values of each discrete region were estimated according to the solution of the linear projection equations. The iterative process of boundary pixels updating and correcting the grey values of each region was executed alternately. Simulation experiments of binary phantoms as well as multiple grey phantoms show that MDART is capable of achieving high-quality reconstructions with projections in a limited angle range. The interesting advancement of MDART is that neither prior knowledge of the grey values nor the number of grey levels is necessary.
Comparison of data analysis strategies for intent-to-treat analysis in pre-test-post-test designs with substantial dropout rates.

PubMed

Salim, Agus; Mackinnon, Andrew; Christensen, Helen; Griffiths, Kathleen

2008-09-30

The pre-test-post-test design (PPD) is predominant in trials of psychotherapeutic treatments. Missing data due to withdrawals present an even bigger challenge in assessing treatment effectiveness under the PPD than under designs with more observations since dropout implies an absence of information about response to treatment. When confronted with missing data, often it is reasonable to assume that the mechanism underlying missingness is related to observed but not to unobserved outcomes (missing at random, MAR). Previous simulation and theoretical studies have shown that, under MAR, modern techniques such as maximum-likelihood (ML) based methods and multiple imputation (MI) can be used to produce unbiased estimates of treatment effects. In practice, however, ad hoc methods such as last observation carried forward (LOCF) imputation and complete-case (CC) analysis continue to be used. In order to better understand the behaviour of these methods in the PPD, we compare the performance of traditional approaches (LOCF, CC) and theoretically sound techniques (MI, ML), under various MAR mechanisms. We show that the LOCF method is seriously biased and conclude that its use should be abandoned. Complete-case analysis produces unbiased estimates only when the dropout mechanism does not depend on pre-test values even when dropout is related to fixed covariates including treatment group (covariate-dependent: CD). However, CC analysis is generally biased under MAR. The magnitude of the bias is largest when the correlation of post- and pre-test is relatively low.
Multiple Imputation of Completely Missing Repeated Measures Data within Person from a Complex Sample: Application to Accelerometer Data in the National Health and Nutrition Examination Survey

PubMed Central

Liu, Benmei; Yu, Mandi; Graubard, Barry I; Troiano, Richard P; Schenker, Nathaniel

2016-01-01

The Physical Activity Monitor (PAM) component was introduced into the 2003-2004 National Health and Nutrition Examination Survey (NHANES) to collect objective information on physical activity including both movement intensity counts and ambulatory steps. Due to an error in the accelerometer device initialization process, the steps data were missing for all participants in several primary sampling units (PSUs), typically a single county or group of contiguous counties, who had intensity count data from their accelerometers. To avoid potential bias and loss in efficiency in estimation and inference involving the steps data, we considered methods to accurately impute the missing values for steps collected in the 2003-2004 NHANES. The objective was to come up with an efficient imputation method which minimized model-based assumptions. We adopted a multiple imputation approach based on Additive Regression, Bootstrapping and Predictive mean matching (ARBP) methods. This method fits alternative conditional expectation (ace) models, which use an automated procedure to estimate optimal transformations for both the predictor and response variables. This paper describes the approaches used in this imputation and evaluates the methods by comparing the distributions of the original and the imputed data. A simulation study using the observed data is also conducted as part of the model diagnostics. Finally some real data analyses are performed to compare the before and after imputation results. PMID:27488606
Teaching the Value of Science

ERIC Educational Resources Information Center

Shumow, Lee; Schmidt, Jennifer A.

2015-01-01

Why and under what conditions might students value their science learning? To find out, the authors observed approximately 400 science classes. They found that although several teachers were amazingly adept at regularly promoting the value of science, many others missed out on important opportunities to promote the value of science. The authors…
Network sampling coverage II: The effect of non-random missing data on network measurement

PubMed Central

Smith, Jeffrey A.; Moody, James; Morgan, Jonathan

2016-01-01

Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data. PMID:27867254
Network sampling coverage II: The effect of non-random missing data on network measurement.

PubMed

Smith, Jeffrey A; Moody, James; Morgan, Jonathan

2017-01-01

Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.

Restoring method for missing data of spatial structural stress monitoring based on correlation

NASA Astrophysics Data System (ADS)

Zhang, Zeyu; Luo, Yaozhi

2017-07-01

Long-term monitoring of spatial structures is of great importance for the full understanding of their performance and safety. The missing part of the monitoring data link will affect the data analysis and safety assessment of the structure. Based on the long-term monitoring data of the steel structure of the Hangzhou Olympic Center Stadium, the correlation between the stress change of the measuring points is studied, and an interpolation method of the missing stress data is proposed. Stress data of correlated measuring points are selected in the 3 months of the season when missing data is required for fitting correlation. Data of daytime and nighttime are fitted separately for interpolation. For a simple linear regression when single point's correlation coefficient is 0.9 or more, the average error of interpolation is about 5%. For multiple linear regression, the interpolation accuracy is not significantly increased after the number of correlated points is more than 6. Stress baseline value of construction step should be calculated before interpolating missing data in the construction stage, and the average error is within 10%. The interpolation error of continuous missing data is slightly larger than that of the discrete missing data. The data missing rate of this method should better not exceed 30%. Finally, a measuring point's missing monitoring data is restored to verify the validity of the method.
What impact do assumptions about missing data have on conclusions? A practical sensitivity analysis for a cancer survival registry.

PubMed

Smuk, M; Carpenter, J R; Morris, T P

2017-02-06

Within epidemiological and clinical research, missing data are a common issue and often over looked in publications. When the issue of missing observations is addressed it is usually assumed that the missing data are 'missing at random' (MAR). This assumption should be checked for plausibility, however it is untestable, thus inferences should be assessed for robustness to departures from missing at random. We highlight the method of pattern mixture sensitivity analysis after multiple imputation using colorectal cancer data as an example. We focus on the Dukes' stage variable which has the highest proportion of missing observations. First, we find the probability of being in each Dukes' stage given the MAR imputed dataset. We use these probabilities in a questionnaire to elicit prior beliefs from experts on what they believe the probability would be in the missing data. The questionnaire responses are then used in a Dirichlet draw to create a Bayesian 'missing not at random' (MNAR) prior to impute the missing observations. The model of interest is applied and inferences are compared to those from the MAR imputed data. The inferences were largely insensitive to departure from MAR. Inferences under MNAR suggested a smaller association between Dukes' stage and death, though the association remained positive and with similarly low p values. We conclude by discussing the positives and negatives of our method and highlight the importance of making people aware of the need to test the MAR assumption.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Saaban, Azizan; Zainudin, Lutfi; Bakar, Mohd Nazari Abu

This paper intends to reveal the ability of the linear interpolation method to predict missing values in solar radiation time series. Reliable dataset is equally tends to complete time series observed dataset. The absence or presence of radiation data alters long-term variation of solar radiation measurement values. Based on that change, the opportunities to provide bias output result for modelling and the validation process is higher. The completeness of the observed variable dataset has significantly important for data analysis. Occurrence the lack of continual and unreliable time series solar radiation data widely spread and become the main problematic issue. However,more » the limited number of research quantity that has carried out to emphasize and gives full attention to estimate missing values in the solar radiation dataset.« less
Reducing Noise in the MSU Daily Lower-Tropospheric Global Temperature Dataset

NASA Technical Reports Server (NTRS)

Christy, John R.; Spencer, Roy W.; McNider, Richard T.

1996-01-01

The daily global-mean values of the lower-tropospheric temperature determined from microwave emissions measured by satellites are examined in terms of their signal, noise, and signal-to-noise ratio. Daily and 30-day average noise estimates are reduced by almost 50% and 35%. respectively, by analyzing and adjusting (if necessary) for errors due to 1) missing data, 2) residual harmonics of the annual cycle unique to particular satellites, 3) lack of filtering, and 4) spurious trends. After adjustments, the decadal trend of the lower-tropospheric global temperature from January 1979 through February 1994 becomes -0.058 C. or about 0.03 C per decade cooler than previously calculated.
Reducing Noise in the MSU Daily Lower-Tropospheric Global Temperature Dataset

NASA Technical Reports Server (NTRS)

Christy, John R.; Spencer, Roy W.; McNider, Richard T.

1995-01-01

The daily global-mean values of the lower-tropospheric temperature determined from microwave emissions measured by satellites are examined in terms of their signal, noise, and signal-to-noise ratio. Daily and 30-day average noise estimates are reduced by, almost 50% and 35%, respectively, by analyzing and adjusting (if necessary) for errors due to (1) missing data, (2) residual harmonics of the annual cycle unique to particular satellites, (3) lack of filtering, and (4) spurious trends. After adjustments, the decadal trend of the lower-tropospheric global temperature from January 1979 through February 1994 becomes -0.058 C, or about 0.03 C per decade cooler than previously calculated.
Strangeness contribution to the proton spin from lattice QCD.

PubMed

Bali, Gunnar S; Collins, Sara; Göckeler, Meinulf; Horsley, Roger; Nakamura, Yoshifumi; Nobile, Andrea; Pleiter, Dirk; Rakow, P E L; Schäfer, Andreas; Schierholz, Gerrit; Zanotti, James M

2012-06-01

We compute the strangeness and light-quark contributions Δs, Δu, and Δd to the proton spin in n(f)=2 lattice QCD at a pion mass of about 285 MeV and at a lattice spacing a≈0.073 fm, using the nonperturbatively improved Sheikholeslami-Wohlert Wilson action. We carry out the renormalization of these matrix elements, which involves mixing between contributions from different quark flavors. Our main result is the small negative value Δs(MS)(√(7.4) GeV)=-0.020(10)(4) of the strangeness contribution to the nucleon spin. The second error is an estimate of the uncertainty, due to the missing extrapolation to the physical point.
A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

ERIC Educational Resources Information Center

Wolkowitz, Amanda A.; Skorupski, William P.

2013-01-01

When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
Learning from Non-Reported Data: Interpreting Missing Body Mass Index Values in Young Children

ERIC Educational Resources Information Center

Arbour-Nicitopoulos, Kelly P.; Faulkner, Guy E.; Leatherdale, Scott T.

2010-01-01

The objective of this study was to examine the pattern of relations between missing weight and height (BMI) data and a range of demographic, physical activity, sedentary behavior, and academic measures in a young sample of elementary school children. A secondary analysis of a large cross-sectional study, PLAY-On, was conducted using self-reported…
Database Comparison of the Adult-to-Adult Living Donor Liver Transplantation Cohort Study (A2ALL) and the SRTR U.S. Transplant Registry1,2,3

PubMed Central

Gillespie, BW; Merion, RM; Ortiz-Rios, E; Tong, L; Shaked, A; Brown, RS; Ojo, AO; Hayashi, PH; Berg, CL; Abecassis, MM; Ashworth, AS; Friese, CE; Hong, JC; Trotter, JF; Everhart, JE

2010-01-01

Data submitted by transplant programs to the Organ Procurement and Transplantation Network (OPTN) are used by the Scientific Registry of Transplant Recipients (SRTR) for policy development, performance evaluation, and research. This study compared OPTN/SRTR data with data extracted from medical records by research coordinators from the nine-center A2ALL study. A2ALL data were collected independently of OPTN data submission (48 data elements among 785 liver transplant candidates/recipients; 12 data elements among 386 donors). At least 90% agreement occurred between OPTN/SRTR and A2ALL for 11/29 baseline recipient elements, 4/19 recipient transplant or follow-up elements, and 6/12 donor elements. For the remaining recipient and donor elements, >10% of values were missing in OPTN/SRTR but present in A2ALL, confirming that missing data were largely avoidable. Other than variables required for allocation, the percentage missing varied widely by center. These findings support an expanded focus on data quality control by OPTN/SRTR for a broader variable set than those used for allocation. Center-specific monitoring of missing values could substantially improve the data. PMID:20199501
Role of MRA in the detection of intracranial aneurysm in the acute phase of subarachnoid hemorrhage.

PubMed

Pierot, Laurent; Portefaix, Christophe; Rodriguez-Régent, Christine; Gallas, Sophie; Meder, Jean-François; Oppenheim, Catherine

2013-07-01

Magnetic resonance angiography (MRA) has been evaluated for the detection of unruptured intracranial aneurysms with favorable results at 3 Tesla (3T) and with similar diagnostic accuracy as both 3D time-of-flight (3D-TOF) and contrast-enhanced (CE-MRA) MRA. However, the diagnostic value and place of MRA in the detection of ruptured aneurysms has been little evaluated. Thus, the goal of this prospective single-center series was to assess the feasibility and diagnostic value of 3T 3D-TOF MRA and CE-MRA for aneurysm detection in acute non-traumatic subarachnoid hemorrhage (SAH). From March 2006 to December 2007, all consecutive patients admitted to our hospital with acute non-traumatic SAH (≤10 days) were prospectively included in this study evaluating MRA in the diagnostic workup of SAH. Feasibility of MRA and sensitivity/specificity of 3D-TOF and CE-MRA were assessed compared with gold standard DSA. In all, 84 consecutive patients (45 women, 39 men; age 23-86 years) were included. The feasibility of MRA was low (43/84, 51.2%). The reasons given for patients not undergoing magnetic resonance imaging (MRI) examination were clinical status (27 patients), potential delay in aneurysm treatment (11 patients) and contraindications to MRI (three patients). In patients explored by MRA, the sensitivity of CE-MRA (95%) was higher compared with 3D-TOF (86%) with similar specificity (80%). Also, 3D-TOF missed five aneurysms while CE-MRA missed two. The value of MRA in the diagnostic workup of ruptured aneurysms is limited due to its low feasibility during the acute phase of bleeding. Sensitivity for aneurysm detection was good for both MRA techniques, but tended to be better with CE-MRA. Copyright © 2013. Published by Elsevier Masson SAS.
Dissolved organic nitrogen dynamics in the North Sea: A time series analysis (1995-2005)

NASA Astrophysics Data System (ADS)

Van Engeland, T.; Soetaert, K.; Knuijt, A.; Laane, R. W. P. M.; Middelburg, J. J.

2010-09-01

Dissolved organic nitrogen (DON) dynamics in the North Sea was explored by means of long-term time series of nitrogen parameters from the Dutch national monitoring program. Generally, the data quality was good with little missing data points. Different imputation methods were used to verify the robustness of the patterns against these missing data. No long-term trends in DON concentrations were found over the sampling period (1995-2005). Inter-annual variability in the different time series showed both common and station-specific behavior. The stations could be divided into two regions, based on absolute concentrations and the dominant times scales of variability. Average DON concentrations were 11 μmol l -1 in the coastal region and 5 μmol l -1 in the open sea. Organic fractions of total dissolved nitrogen (TDN) averaged 38 and 71% in the coastal zone and open sea, respectively, but increased over time due to decreasing dissolved inorganic nitrogen (DIN) concentrations. In both regions intra-annual variability dominated over inter-annual variability, but DON variation in the open sea was markedly shifted towards shorter time scales relative to coastal stations. In the coastal zone a consistent seasonal DON cycle existed with high values in spring-summer and low values in autumn-winter. In the open sea seasonality was weak. A marked shift in the seasonality was found at the Dogger Bank, with DON accumulation towards summer and low values in winter prior to 1999, and accumulation in spring and decline throughout summer after 1999. This study clearly shows that DON is a dynamic actor in the North Sea and should be monitored systematically to enable us to understand fully the functioning of this ecosystem.
Emergency medical dispatch priority in chest pain patients due to life threatening conditions: A cohort study examining circadian variations and impact of the education.

PubMed

Rawshani, Araz; Rawshani, Nina; Gelang, Carita; Andersson, Jan-Otto; Larsson, Anna; Bång, Angela; Herlitz, Johan; Gellerstedt, Martin

2017-06-01

We examined the accuracy in assessments of emergency dispatchers according to their education and time of the day. We examined this in chest pain patients who were diagnosed with a potentially life-threatening condition (LTC) or died within 30days. Among 2205 persons, 482 died, 1631 experienced an acute coronary syndrome (ACS), 1914 had a LTC. Multivariable logistic regression was used to study how time of the call and the dispatcher's education were associated with the risk of missing to give priority 1 (the highest). Among patients who died, a 7-fold increase in odds of missing to give priority 1 was noted at 1.00pm, as compared with midnight. Compared with assistant nurses, odds ratio for dispatchers with no (medical) training was 0.34 (95% CI 0.14 to 0.77). Among patients with an ACS, odds ratio for calls arriving before lunch was 2.02 (95% CI 1.22 to 3.43), compared with midnight. Compared with assistant nurses, odds ratio for operators with no training was 0.23 (95% CI 0.13 to 0.40). Similar associations were noted for those with any LTC. Dispatcher's education was not associated with the patient's survival. In this group of patients, which experience substantial mortality and morbidity, the risk of not obtaining highest dispatch priority was increased up to 7-fold during lunchtime. Dispatch operators without medical education had the lowest risk, compared with nurses and assistant nurses, of missing to give priority 1, at the expense of lower positive predictive value. What is already known about this subject? Use of the emergency medical service (EMS) increases survival among patients with acute coronary syndromes. It is unknown whether the efficiency - as judged by the ability to identify life-threatening cases among patients with chest pain - varies according to the dispatcher's educational level and the time of day. What does this study add? We provide evidence that the dispatcher's education does not influence survival among patients calling the EMS due to chest discomfort. However, medically educated dispatchers are at greatest risk of missing to identify life-threatening cases, which is explained by more parsimonious use of the highest dispatch priority. We also show that the risk of missing life-threatening cases is at highest around lunch time. How might this impact on clinical practice? Dispatch centers are operated differently all over the world and chest discomfort is one of the most frequent symptoms encountered; we provide evidence that it is safe to operate a dispatch center without medically trained personnel, who actually miss fewer cases of acute coronary syndromes. However, non-medically trained dispatchers consume more pre-hospital resources. Copyright © 2017 Elsevier B.V. All rights reserved.
Missed therapeutic and prevention opportunities in women with BRCA-mutated epithelial ovarian cancer and their families due to low referral rates for genetic counseling and BRCA testing: A review of the literature.

PubMed

Hoskins, Paul J; Gotlieb, Walter H

2017-11-01

Answer questions and earn CME/CNE Fifteen percent of women with epithelial ovarian cancer have inherited mutations in the BRCA breast cancer susceptibility genes. Knowledge of her BRCA status has value both for the woman and for her family. A therapeutic benefit exists for the woman with cancer, because a new family of oral drugs, the poly ADP-ribose polymerase (PARP) inhibitors, has recently been approved, and these drugs have the greatest efficacy in women who carry the mutation. For her family, there is the potential to prevent ovarian cancer in those carrying the mutation by using risk-reducing surgery. Such surgery significantly reduces the chance of developing this, for the most part, incurable cancer. Despite these potential benefits, referral rates for genetic counseling and subsequent BRCA testing are low, ranging from 10% to 30%, indicating that these therapeutic and prevention opportunities are being missed. The authors have reviewed the relevant available literature. Topics discussed are BRCA and its relation to ovarian cancer, the rates of referral for genetic counseling/BRCA testing, reasons for these low rates, potential strategies to improve on those rates, lack of effectiveness of current screening strategies, the pros and cons of risk-reducing surgery, other prevention options, and the role and value of PARP inhibitors. CA Cancer J Clin 2017;67:493-506. © 2017 American Cancer Society. © 2017 American Cancer Society.
Searching for missing heritability: Designing rare variant association studies

PubMed Central

Zuk, Or; Schaffner, Stephen F.; Samocha, Kaitlin; Do, Ron; Hechter, Eliana; Kathiresan, Sekar; Daly, Mark J.; Neale, Benjamin M.; Sunyaev, Shamil R.; Lander, Eric S.

2014-01-01

Genetic studies have revealed thousands of loci predisposing to hundreds of human diseases and traits, revealing important biological pathways and defining novel therapeutic hypotheses. However, the genes discovered to date typically explain less than half of the apparent heritability. Because efforts have largely focused on common genetic variants, one hypothesis is that much of the missing heritability is due to rare genetic variants. Studies of common variants are typically referred to as genomewide association studies, whereas studies of rare variants are often simply called sequencing studies. Because they are actually closely related, we use the terms common variant association study (CVAS) and rare variant association study (RVAS). In this paper, we outline the similarities and differences between RVAS and CVAS and describe a conceptual framework for the design of RVAS. We apply the framework to address key questions about the sample sizes needed to detect association, the relative merits of testing disruptive alleles vs. missense alleles, frequency thresholds for filtering alleles, the value of predictors of the functional impact of missense alleles, the potential utility of isolated populations, the value of gene-set analysis, and the utility of de novo mutations. The optimal design depends critically on the selection coefficient against deleterious alleles and thus varies across genes. The analysis shows that common variant and rare variant studies require similarly large sample collections. In particular, a well-powered RVAS should involve discovery sets with at least 25,000 cases, together with a substantial replication set. PMID:24443550
DOE Office of Scientific and Technical Information (OSTI.GOV)

Balderson, Michael, E-mail: michael.balderson@rmp.uhn.ca; Brown, Derek; Johnson, Patricia

The purpose of this work was to compare static gantry intensity-modulated radiation therapy (IMRT) with volume-modulated arc therapy (VMAT) in terms of tumor control probability (TCP) under scenarios involving large geometric misses, i.e., those beyond what are accounted for when margin expansion is determined. Using a planning approach typical for these treatments, a linear-quadratic–based model for TCP was used to compare mean TCP values for a population of patients who experiences a geometric miss (i.e., systematic and random shifts of the clinical target volume within the planning target dose distribution). A Monte Carlo approach was used to account for themore » different biological sensitivities of a population of patients. Interestingly, for errors consisting of coplanar systematic target volume offsets and three-dimensional random offsets, static gantry IMRT appears to offer an advantage over VMAT in that larger shift errors are tolerated for the same mean TCP. For example, under the conditions simulated, erroneous systematic shifts of 15 mm directly between or directly into static gantry IMRT fields result in mean TCP values between 96% and 98%, whereas the same errors on VMAT plans result in mean TCP values between 45% and 74%. Random geometric shifts of the target volume were characterized using normal distributions in each Cartesian dimension. When the standard deviations were doubled from those values assumed in the derivation of the treatment margins, our model showed a 7% drop in mean TCP for the static gantry IMRT plans but a 20% drop in TCP for the VMAT plans. Although adding a margin for error to a clinical target volume is perhaps the best approach to account for expected geometric misses, this work suggests that static gantry IMRT may offer a treatment that is more tolerant to geometric miss errors than VMAT.« less
Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns.

PubMed

Wolfe, Edward W; McGill, Michael T

2011-01-01

This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Incorporating molecular breeding values with variable call rates into genetic evaluations

USDA-ARS?s Scientific Manuscript database

A partial genotype for an animal can result from panels with low call rates used to calculate a molecular breeding value. A molecular breeding value can still be calculated using a partial genotype by replacing the missing marker covariates with their mean value. This approach is expected to chang...
Using decision trees to understand structure in missing data

PubMed Central

Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

2015-01-01

Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509
Missing persons-missing data: the need to collect antemortem dental records of missing persons.

PubMed

Blau, Soren; Hill, Anthony; Briggs, Christopher A; Cordner, Stephen M

2006-03-01

The subject of missing persons is of great concern to the community with numerous associated emotional, financial, and health costs. This paper examines the forensic medical issues raised by the delayed identification of individuals classified as "missing" and highlights the importance of including dental data in the investigation of missing persons. Focusing on Australia, the current approaches employed in missing persons investigations are outlined. Of particular significance is the fact that each of the eight Australian states and territories has its own Missing Persons Unit that operates within distinct state and territory legislation. Consequently, there is a lack of uniformity within Australia about the legal and procedural framework within which investigations of missing persons are conducted, and the interaction of that framework with coronial law procedures. One of the main investigative problems in missing persons investigations is the lack of forensic medical, particularly, odontological input. Forensic odontology has been employed in numerous cases in Australia where identity is unknown or uncertain because of remains being skeletonized, incinerated, or partly burnt. The routine employment of the forensic odontologist to assist in missing person inquiries, has however, been ignored. The failure to routinely employ forensic odontology in missing persons inquiries has resulted in numerous delays in identification. Three Australian cases are presented where the investigation of individuals whose identity was uncertain or unknown was prolonged due to the failure to utilize the appropriate (and available) dental resources. In light of the outcomes of these cases, we suggest that a national missing persons dental records database be established for future missing persons investigations. Such a database could be easily managed between a coronial system and a forensic medical institute. In Australia, a national missing persons dental records database could be incorporated into the National Coroners Information System (NCIS) managed, on behalf of Australia's Coroners, by the Victorian Institute of Forensic Medicine. The existence of the NCIS would ensure operational collaboration in the implementation of the system and cost savings to Australian policing agencies involved in missing person inquiries. The implementation of such a database would facilitate timely and efficient reconciliation of clinical and postmortem dental records and have subsequent social and financial benefits.
Application of pattern mixture models to address missing data in longitudinal data analysis using SPSS.

PubMed

Son, Heesook; Friedmann, Erika; Thomas, Sue A

2012-01-01

Longitudinal studies are used in nursing research to examine changes over time in health indicators. Traditional approaches to longitudinal analysis of means, such as analysis of variance with repeated measures, are limited to analyzing complete cases. This limitation can lead to biased results due to withdrawal or data omission bias or to imputation of missing data, which can lead to bias toward the null if data are not missing completely at random. Pattern mixture models are useful to evaluate the informativeness of missing data and to adjust linear mixed model (LMM) analyses if missing data are informative. The aim of this study was to provide an example of statistical procedures for applying a pattern mixture model to evaluate the informativeness of missing data and conduct analyses of data with informative missingness in longitudinal studies using SPSS. The data set from the Patients' and Families' Psychological Response to Home Automated External Defibrillator Trial was used as an example to examine informativeness of missing data with pattern mixture models and to use a missing data pattern in analysis of longitudinal data. Prevention of withdrawal bias, omitted data bias, and bias toward the null in longitudinal LMMs requires the assessment of the informativeness of the occurrence of missing data. Missing data patterns can be incorporated as fixed effects into LMMs to evaluate the contribution of the presence of informative missingness to and control for the effects of missingness on outcomes. Pattern mixture models are a useful method to address the presence and effect of informative missingness in longitudinal studies.

Predictors of the frequency and subjective experience of cycling near misses: Findings from the first two years of the UK Near Miss Project.

PubMed

Aldred, Rachel; Goodman, Anna

2018-01-01

Using 2014 and 2015 data from the UK Near Miss Project, this paper examines the stability of self-report incident rates for cycling near misses across these two years. It further examines the stability of the individual-level predictors of experiencing a near miss, including what influences the scariness of an incident. The paper uses three questions asked for only in 2015, which allow further exploration of factors shaping near miss rates and impacts of incidents. Firstly, a respondent's level of cycling experience; secondly, whether an incident was perceived as deliberate; and finally, whether the respondent themselves described the incident as a 'near miss' (as opposed to only a frightening and/or annoying non-injury incident). Using this data, we find a decline of almost a third in incident rates in 2015 compared to 2014, which we believe is likely to be largely an artefact due to differences in reporting rates. This suggests caution about interpreting small fluctuations in subjectively reported near miss rates. However, in both years near miss rates are many times more frequent than injury collisions. In both years of data collection our findings are very similar in terms of the patterning of incident types, and how frightening different incident categories are, which increases confidence in these findings. We find that new cyclists experience very high incident rates compared to other cyclists, and test a conceptual model explaining how perceived deliberateness, near-miss status, and scariness are connected. For example, incidents that are perceived to be deliberate are more likely to be experienced as very frightening, independent of their 'near miss' status. Copyright © 2017 Elsevier Ltd. All rights reserved.
Missed losses loom larger than missed gains: Electrodermal reactivity to decision choices and outcomes in a gambling task.

PubMed

Wu, Yin; Van Dijk, Eric; Aitken, Mike; Clark, Luke

2016-04-01

Loss aversion is a defining characteristic of prospect theory, whereby responses are stronger to losses than to equivalently sized gains (Kahneman & Tversky Econometrica, 47, 263-291, 1979). By monitoring electrodermal activity (EDA) during a gambling task, in this study we examined physiological activity during risky decisions, as well as to both obtained (e.g., gains and losses) and counterfactual (e.g., narrowly missed gains and losses) outcomes. During the bet selection phase, EDA increased linearly with bet size, highlighting the role of somatic signals in decision-making under uncertainty in a task without any learning requirement. Outcome-related EDA scaled with the magnitudes of monetary wins and losses, and losses had a stronger impact on EDA than did equivalently sized wins. Narrowly missed wins (i.e., near-wins) and narrowly missed losses (i.e., near-losses) also evoked EDA responses, and the change of EDA as a function of the size of the missed outcome was modestly greater for near-losses than for near-wins, suggesting that near-losses have more impact on subjective value than do near-wins. Across individuals, the slope for choice-related EDA (as a function of bet size) correlated with the slope for outcome-related EDA as a function of both the obtained and counterfactual outcome magnitudes, and these correlations were stronger for loss and near-loss conditions than for win and near-win conditions. Taken together, these asymmetrical EDA patterns to objective wins and losses, as well as to near-wins and near-losses, provide a psychophysiological instantiation of the value function curve in prospect theory, which is steeper in the negative than in the positive domain.
Pragmatic criteria of the definition of neonatal near miss: a comparative study.

PubMed

Kale, Pauline Lorena; Jorge, Maria Helena Prado de Mello; Laurenti, Ruy; Fonseca, Sandra Costa; Silva, Kátia Silveira da

2017-12-04

The objective of this study was to test the validity of the pragmatic criteria of the definitions of neonatal near miss, extending them throughout the infant period, and to estimate the indicators of perinatal care in public maternity hospitals. A cohort of live births from six maternity hospitals in the municipalities of São Paulo, Niterói, and Rio de Janeiro, Brazil, was carried out in 2011. We carried out interviews and checked prenatal cards and medical records. We compared the pragmatic criteria (birth weight, gestational age, and 5' Apgar score) of the definitions of near miss of Pileggi et al., Pileggi-Castro et al., Souza et al., and Silva et al. We calculated sensitivity, specificity (gold standard: infant mortality), percentage of deaths among newborns with life-threatening conditions, and rates of near miss, mortality, and severe outcomes per 1,000 live births. A total 7,315 newborns were analyzed (completeness of information > 99%). The sensitivity of the definition of Pileggi-Castro et al. was higher, resulting in a higher number of cases of near miss, Souza et al. presented lower value, and Pileggi et al. and de Silva et al. presented intermediate values. There is an increase in sensitivity when the period goes from 0-6 to 0-27 days, and there is a decrease when it goes to 0-364 days. Specificities were high (≥ 97%) and above sensitivities (54% to 77%). One maternity hospital in São Paulo and one in Niterói presented, respectively, the lowest and highest rates of infant mortality, near miss, and frequency of births with life-threatening conditions, regardless of the definition. The definitions of near miss based exclusively on pragmatic criteria are valid and can be used for monitoring purposes. Based on the perinatal literature, the cutoff points adopted by Silva et al. were more appropriate. Periodic studies could apply a more complete definition, incorporating clinical, laboratory, and management criteria, including congenital anomalies predictive of infant mortality.
Reconstruction of Missing Pixels in Satellite Images Using the Data Interpolating Empirical Orthogonal Function (DINEOF)

NASA Astrophysics Data System (ADS)

Liu, X.; Wang, M.

2016-02-01

For coastal and inland waters, complete (in spatial) and frequent satellite measurements are important in order to monitor and understand coastal biological and ecological processes and phenomena, such as diurnal variations. High-frequency images of the water diffuse attenuation coefficient at the wavelength of 490 nm (Kd(490)) derived from the Korean Geostationary Ocean Color Imager (GOCI) provide a unique opportunity to study diurnal variation of the water turbidity in coastal regions of the Bohai Sea, Yellow Sea, and East China Sea. However, there are lots of missing pixels in the original GOCI-derived Kd(490) images due to clouds and various other reasons. Data Interpolating Empirical Orthogonal Function (DINEOF) is a method to reconstruct missing data in geophysical datasets based on Empirical Orthogonal Function (EOF). In this study, the DINEOF is applied to GOCI-derived Kd(490) data in the Yangtze River mouth and the Yellow River mouth regions, the DINEOF reconstructed Kd(490) data are used to fill in the missing pixels, and the spatial patterns and temporal functions of the first three EOF modes are also used to investigate the sub-diurnal variation due to the tidal forcing. In addition, DINEOF method is also applied to the Visible Infrared Imaging Radiometer Suite (VIIRS) on board the Suomi National Polar-orbiting Partnership (SNPP) satellite to reconstruct missing pixels in the daily Kd(490) and chlorophyll-a concentration images, and some application examples in the Chesapeake Bay and the Gulf of Mexico will be presented.
Missed nursing care: a concept analysis.

PubMed

Kalisch, Beatrice J; Landstrom, Gay L; Hinshaw, Ada Sue

2009-07-01

This paper is a report of the analysis of the concept of missed nursing care. According to patient safety literature, missed nursing care is an error of omission. This concept has been conspicuously absent in quality and patient safety literature, with individual aspects of nursing care left undone given only occasional mention. An 8-step method of concept analysis - select concept, determine purpose, identify uses, define attributes, identify model case, describe related and contrary cases, identify antecedents and consequences and define empirical referents - was used to examine the concept of missed nursing care. The sources for the analysis were identified by systematic searches of the World Wide Web, MEDLINE, CINAHL and reference lists of related journal articles with a timeline of 1970 to April 2008. Missed nursing care, conceptualized within the Missed Nursing Care Model, is defined as any aspect of required patient care that is omitted (either in part or in whole) or delayed. Various attribute categories reported by nurses in acute care settings contribute to missed nursing care: (1) antecedents that catalyse the need for a decision about priorities; (2) elements of the nursing process and (3) internal perceptions and values of the nurse. Multiple elements in the nursing environment and internal to nurses influence whether needed nursing care is provided. Missed care as conceptualized within the Missed Care Model is a universal phenomenon. The concept is expected to occur across all cultures and countries, thus being international in scope.
Tackling Missing Data in Community Health Studies Using Additive LS-SVM Classifier.

PubMed

Wang, Guanjin; Deng, Zhaohong; Choi, Kup-Sze

2018-03-01

Missing data is a common issue in community health and epidemiological studies. Direct removal of samples with missing data can lead to reduced sample size and information bias, which deteriorates the significance of the results. While data imputation methods are available to deal with missing data, they are limited in performance and could introduce noises into the dataset. Instead of data imputation, a novel method based on additive least square support vector machine (LS-SVM) is proposed in this paper for predictive modeling when the input features of the model contain missing data. The method also determines simultaneously the influence of the features with missing values on the classification accuracy using the fast leave-one-out cross-validation strategy. The performance of the method is evaluated by applying it to predict the quality of life (QOL) of elderly people using health data collected in the community. The dataset involves demographics, socioeconomic status, health history, and the outcomes of health assessments of 444 community-dwelling elderly people, with 5% to 60% of data missing in some of the input features. The QOL is measured using a standard questionnaire of the World Health Organization. Results show that the proposed method outperforms four conventional methods for handling missing data-case deletion, feature deletion, mean imputation, and K-nearest neighbor imputation, with the average QOL prediction accuracy reaching 0.7418. It is potentially a promising technique for tackling missing data in community health research and other applications.
The Effect of the Number of Carries on Injury Risk and Subsequent Season's Performance Among Running Backs in the National Football League.

PubMed

Kraeutler, Matthew J; Belk, John W; McCarty, Eric C

2017-02-01

In recent years, several studies have correlated pitch count with an increased risk for injury among baseball pitchers. However, no studies have attempted to draw a similar conclusion based on number of carries by running backs (RBs) in football. To determine whether there is a correlation between number of carries by RBs in the National Football League (NFL) and risk of injury or worsened performance in the subsequent season. Cohort study; Level of evidence, 3. The ESPN NFL statistics archives were searched from the 2004 through 2014 regular seasons. During each season, data were collected on RBs with 150 to 250 carries (group A) and 300+ carries (group B). The following data were collected for each player and compared between groups: number of carries and mean yards per carry during the regular season of interest and the subsequent season, number of games missed due to injury during the season of interest and the subsequent season, and the specific injuries resulting in missed playing time during the subsequent season. Matched-pair t tests were used to compare changes within each group from one season to the next in terms of number of carries, mean yards per carry, and games missed due to injury. During the seasons studied, a total of 275 RBs were included (group A, 212; group B, 63). In group A, 140 RBs (66%) missed at least 1 game the subsequent season due to injury, compared with 31 RBs (49%) in group B ( P = .016). In fact, players in group B missed significantly fewer games due to injury during the season of interest ( P < .0001) as well as the subsequent season ( P < .01). Mean yards per carry was not significantly different between groups in the preceding season ( P = .073) or the subsequent season ( P = .24). NFL RBs with a high number of carries are not placed at greater risk of injury or worsened performance during the subsequent season. These RBs may be generally less injury prone compared with other NFL RBs.
Missed Diagnosis of Syrinx

PubMed Central

Oh, Chang Hyun; Kim, Chan Gyu; Lee, Jae-Hwan; Park, Hyeong-Chun; Park, Chong Oon

2012-01-01

Study Design Prospective, randomized, controlled human study. Purpose We checked the proportion of missed syrinx diagnoses among the examinees of the Korean military conscription. Overview of Literature A syrinx is a fluid-filled cavity within the spinal cord or brain stem and causes various neurological symptoms. A syrinx could easily be diagnosed by magnetic resonance image (MRI), but missed diagnoses seldom occur. Methods In this study, we reviewed 103 cases using cervical images, cervical MRI, or whole spine sagittal MRI, and syrinxes was observed in 18 of these cases. A review of medical certificates or interviews was conducted, and the proportion of syrinx diagnoses was calculated. Results The proportion of syrinx diagnoses was about 66.7% (12 cases among 18). Missed diagnoses were not the result of the length of the syrinx, but due to the type of image used for the initial diagnosis. Conclusions The missed diagnosis proportion of the syrinx is relatively high, therefore, a more careful imaging review is recommended. PMID:22439081
Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review

PubMed Central

Mercieca-Bebber, Rebecca; Palmer, Michael J; Brundage, Michael; Stockler, Martin R; King, Madeleine T

2016-01-01

Objectives Patient-reported outcomes (PROs) provide important information about the impact of treatment from the patients' perspective. However, missing PRO data may compromise the interpretability and value of the findings. We aimed to report: (1) a non-technical summary of problems caused by missing PRO data; and (2) a systematic review by collating strategies to: (A) minimise rates of missing PRO data, and (B) facilitate transparent interpretation and reporting of missing PRO data in clinical research. Our systematic review does not address statistical handling of missing PRO data. Data sources MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases (inception to 31 March 2015), and citing articles and reference lists from relevant sources. Eligibility criteria English articles providing recommendations for reducing missing PRO data rates, or strategies to facilitate transparent interpretation and reporting of missing PRO data were included. Methods 2 reviewers independently screened articles against eligibility criteria. Discrepancies were resolved with the research team. Recommendations were extracted and coded according to framework synthesis. Results 117 sources (55% discussion papers, 26% original research) met the eligibility criteria. Design and methodological strategies for reducing rates of missing PRO data included: incorporating PRO-specific information into the protocol; carefully designing PRO assessment schedules and defining termination rules; minimising patient burden; appointing a PRO coordinator; PRO-specific training for staff; ensuring PRO studies are adequately resourced; and continuous quality assurance. Strategies for transparent interpretation and reporting of missing PRO data include utilising auxiliary data to inform analysis; transparently reporting baseline PRO scores, rates and reasons for missing data; and methods for handling missing PRO data. Conclusions The instance of missing PRO data and its potential to bias clinical research can be minimised by implementing thoughtful design, rigorous methodology and transparent reporting strategies. All members of the research team have a responsibility in implementing such strategies. PMID:27311907
[Study of the relationship between congenital missing of the third molar and the development of the mandibular angle].

PubMed

Chen, Yan-Na; Zheng, Bo-Wen; Liu, Yi

2017-02-01

Based on the research of the congenital missing of the third molar and the missing number, the relationship beteen congenital missing of the third molar and the development of the mandibular angle was evaluated. Patients were divided into experimental group and control group, the experimental group included 227 patients, each had at least one of the third molars congenital lost; 227 patients who had four third molar were selected as control group. Winceph software was used to measure the lateral cephalograms. SPSS17.0 software package was used to perform statistical analysis. Gonial angle, upper Gonial angle and lower Gonial angle between the experimental group and the control group showed significant difference and the values in the experimental group were significantly smaller than in the control group, but there was no gender difference between the two groups.There was no difference between Gonial angle, upper Gonial angle,lower Gonial angle and the missing number of the third molar. There is a close relationship between congenital missing third molar and Gonial angle, upper Gonial angle, lower Gonial angle, but there is no significant association with gender and the patients with congenital missing third molar have shorter craniofacial structure. Congenital missing number of the third molar has no significant association with Gonial angle, upper Gonial angle and lower Gonial angle.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Langan, Roisin T.; Archibald, Richard K.; Lamberti, Vincent

We have applied a new imputation-based method for analyzing incomplete data, called Monte Carlo Bayesian Database Generation (MCBDG), to the Spent Fuel Isotopic Composition (SFCOMPO) database. About 60% of the entries are absent for SFCOMPO. The method estimates missing values of a property from a probability distribution created from the existing data for the property, and then generates multiple instances of the completed database for training a machine learning algorithm. Uncertainty in the data is represented by an empirical or an assumed error distribution. The method makes few assumptions about the underlying data, and compares favorably against results obtained bymore » replacing missing information with constant values.« less
Melancholic depression prediction by identifying representative features in metabolic and microarray profiles with missing values.

PubMed

Nie, Zhi; Yang, Tao; Liu, Yashu; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

2015-01-01

Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed method give rise to significantly improved sensitivity scores, suggesting that the learned features allow prediction with high accuracy of disease status in those who are diagnosed with melancholic depression. To our best knowledge, this is the first work that applies sparse coding to deal with high feature correlations and missing values, which are common challenges in many biomedical applications. The proposed method can be readily adapted to other biomedical applications involving incomplete and high-dimensional data.
A RESEARCH DATABASE FOR IMPROVED DATA MANAGEMENT AND ANALYSIS IN LONGITUDINAL STUDIES

PubMed Central

BIELEFELD, ROGER A.; YAMASHITA, TOYOKO S.; KEREKES, EDWARD F.; ERCANLI, EHAT; SINGER, LYNN T.

2014-01-01

We developed a research database for a five-year prospective investigation of the medical, social, and developmental correlates of chronic lung disease during the first three years of life. We used the Ingres database management system and the Statit statistical software package. The database includes records containing 1300 variables each, the results of 35 psychological tests, each repeated five times (providing longitudinal data on the child, the parents, and behavioral interactions), both raw and calculated variables, and both missing and deferred values. The four-layer menu-driven user interface incorporates automatic activation of complex functions to handle data verification, missing and deferred values, static and dynamic backup, determination of calculated values, display of database status, reports, bulk data extraction, and statistical analysis. PMID:7596250
A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

PubMed

Liu, Wei; Ding, Jinhui

2018-04-01

The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.
Point of care experience with pneumococcal and influenza vaccine documentation among persons aged ≥65 years: high refusal rates and missing information.

PubMed

Brownfield, Elisha; Marsden, Justin E; Iverson, Patty J; Zhao, Yumin; Mauldin, Patrick D; Moran, William P

2012-09-01

Missed opportunities to vaccinate and refusal of vaccine by patients have hindered the achievement of national health care goals. The meaningful use of electronic medical records should improve vaccination rates, but few studies have examined the content of these records. In our vaccine intervention program using an electronic record with physician prompts, paper prompts, and nursing standing orders, we were unable to achieve national vaccine goals, due in large part to missing information and patient refusal. Copyright © 2012 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.
Values Acquisition and Moral Development: An Integration of Freudian, Eriksonian, Kohlbergian and Gilliganian Viewpoints

ERIC Educational Resources Information Center

Herman, William E.

2005-01-01

Consider the following important questions: Should values be transmitted or developed? As children grow up, what, if anything, should change in values acquisition? How important are locus of control issues in moral development? and Why might process versus product elements be crucial in the development of values? One key element missing in the…
Estimating Missing Unit Process Data in Life Cycle Assessment Using a Similarity-Based Approach.

PubMed

Hou, Ping; Cai, Jiarui; Qu, Shen; Xu, Ming

2018-05-01

In life cycle assessment (LCA), collecting unit process data from the empirical sources (i.e., meter readings, operation logs/journals) is often costly and time-consuming. We propose a new computational approach to estimate missing unit process data solely relying on limited known data based on a similarity-based link prediction method. The intuition is that similar processes in a unit process network tend to have similar material/energy inputs and waste/emission outputs. We use the ecoinvent 3.1 unit process data sets to test our method in four steps: (1) dividing the data sets into a training set and a test set; (2) randomly removing certain numbers of data in the test set indicated as missing; (3) using similarity-weighted means of various numbers of most similar processes in the training set to estimate the missing data in the test set; and (4) comparing estimated data with the original values to determine the performance of the estimation. The results show that missing data can be accurately estimated when less than 5% data are missing in one process. The estimation performance decreases as the percentage of missing data increases. This study provides a new approach to compile unit process data and demonstrates a promising potential of using computational approaches for LCA data compilation.
[Missed lessons, missed opportunities: a role for public health services in medical absenteeism in young people].

PubMed

Vanneste, Y T M; van de Goor, L A M; Feron, F J M

2016-01-01

Young people who often miss school for health reasons are not only missing education, but also the daily routine of school, and social intercourse with their classmates. Medical absenteeism among students merits greater attention. For a number of years, in various regions in the Netherlands, students with extensive medical absenteeism have been invited to see a youth healthcare specialist. The MASS intervention (Medical Advice of Students reported Sick; in Dutch: Medische Advisering van de Ziekgemelde Leerling, abbreviated as M@ZL) has been developed by the West Brabant Regional Public Health Service together with secondary schools to address school absenteeism due to reporting sick. In this paper we discuss the MASS intervention and explain why attention should be paid by public health services to the problem of school absenteeism, especially absenteeism on health grounds.
Toward a Better Understanding of Psychological Symptoms in People Confronted With the Disappearance of a Loved One: A Systematic Review.

PubMed

Lenferink, Lonneke I M; de Keijser, Jos; Wessel, Ineke; de Vries, Doety; Boelen, Paul A

2017-01-01

The disappearance of a loved one is claimed to be the most stressful type of loss. The present review explores the empirical evidence relating to this claim. Specifically, it summarizes studies exploring the prevalence and correlates of psychological symptoms in relatives of missing persons as well as studies comparing levels of psychopathology in relatives of the disappeared and the deceased. Two independent reviewers performed a systematic search in PsychINFO, Web of Science, and Medline, which resulted in 15 studies meeting predefined inclusion criteria. Eligible studies included quantitative peer-reviewed articles and dissertations that assessed psychopathology in relatives of missing person. All reviewed studies were focused on disappearances due to war or state terrorism. Prevalence rates of psychopathology were mainly described in terms of post-traumatic stress disorder and depression and varied considerably among the studies. Number of experienced traumatic events and kinship to the missing person were identified as correlates of psychopathology. Comparative studies showed that psychopathology levels did not differ between relatives of missing and deceased persons. The small number of studies and the heterogeneity of the studies limit the understanding of psychopathology in those left behind. More knowledge about psychopathology postdisappearance could be gained by expanding the focus of research beyond disappearances due to war or state terrorism.
Optimal and fast rotational alignment of volumes with missing data in Fourier space.

PubMed

Shatsky, Maxim; Arbelaez, Pablo; Glaeser, Robert M; Brenner, Steven E

2013-11-01

Electron tomography of intact cells has the potential to reveal the entire cellular content at a resolution corresponding to individual macromolecular complexes. Characterization of macromolecular complexes in tomograms is nevertheless an extremely challenging task due to the high level of noise, and due to the limited tilt angle that results in missing data in Fourier space. By identifying particles of the same type and averaging their 3D volumes, it is possible to obtain a structure at a more useful resolution for biological interpretation. Currently, classification and averaging of sub-tomograms is limited by the speed of computational methods that optimize alignment between two sub-tomographic volumes. The alignment optimization is hampered by the fact that the missing data in Fourier space has to be taken into account during the rotational search. A similar problem appears in single particle electron microscopy where the random conical tilt procedure may require averaging of volumes with a missing cone in Fourier space. We present a fast implementation of a method guaranteed to find an optimal rotational alignment that maximizes the constrained cross-correlation function (cCCF) computed over the actual overlap of data in Fourier space. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

Cyclists' Anger As Determinant of Near Misses Involving Different Road Users.

PubMed

Marín Puchades, Víctor; Prati, Gabriele; Rondinella, Gianni; De Angelis, Marco; Fassina, Filippo; Fraboni, Federico; Pietrantoni, Luca

2017-01-01

Road anger constitutes one of the determinant factors related to safety outcomes (e.g., accidents, near misses). Although cyclists are considered vulnerable road users due to their relatively high rate of fatalities in traffic, previous research has solely focused on car drivers, and no study has yet investigated the effect of anger on cyclists' safety outcomes. The present research aims to investigate, for the first time, the effects of cycling anger toward different types of road users on near misses involving such road users and near misses in general. Using a daily diary web-based questionnaire, we collected data about daily trips, bicycle use, near misses experienced, cyclist's anger and demographic information from 254 Spanish cyclists. Poisson regression was used to assess the association of cycling anger with near misses, which is a count variable. No relationship was found between general cycling anger and near misses occurrence. Anger toward specific road users had different effects on the probability of near misses with different road users. Anger toward the interaction with car drivers increased the probability of near misses involving cyclists and pedestrians. Anger toward interaction with pedestrians was associated with higher probability of near misses with pedestrians. Anger toward cyclists exerted no effect on the probability of near misses with any road user (i.e., car drivers, cyclists or pedestrians), whereas anger toward the interactions with the police had a diminishing effect on the occurrence of near misses' involving all types of road users. The present study demonstrated that the effect of road anger on safety outcomes among cyclists is different from that of motorists. Moreover, the target of anger played an important role on safety both for the cyclist and the specific road users. Possible explanations for these differences are based on the difference in status and power with motorists, as well as on the potential displaced aggression produced by the fear of retaliation by motorized vehicle users.
An interference account of the missing-VP effect

PubMed Central

Häussler, Jana; Bader, Markus

2015-01-01

Sentences with doubly center-embedded relative clauses in which a verb phrase (VP) is missing are sometimes perceived as grammatical, thus giving rise to an illusion of grammaticality. In this paper, we provide a new account of why missing-VP sentences, which are both complex and ungrammatical, lead to an illusion of grammaticality, the so-called missing-VP effect. We propose that the missing-VP effect in particular, and processing difficulties with multiply center-embedded clauses more generally, are best understood as resulting from interference during cue-based retrieval. When processing a sentence with double center-embedding, a retrieval error due to interference can cause the verb of an embedded clause to be erroneously attached into a higher clause. This can lead to an illusion of grammaticality in the case of missing-VP sentences and to processing complexity in the case of complete sentences with double center-embedding. Evidence for an interference account of the missing-VP effect comes from experiments that have investigated the missing-VP effect in German using a speeded grammaticality judgments procedure. We review this evidence and then present two new experiments that show that the missing-VP effect can be found in German also with less restricting procedures. One experiment was a questionnaire study which required grammaticality judgments from participants without imposing any time constraints. The second experiment used a self-paced reading procedure and did not require any judgments. Both experiments confirm the prior findings of missing-VP effects in German and also show that the missing-VP effect is subject to a primacy effect as known from the memory literature. Based on this evidence, we argue that an account of missing-VP effects in terms of interference during cue-based retrieval is superior to accounts in terms of limited memory resources or in terms of experience with embedded structures. PMID:26136698
Potential adjustment methodology for missing data and reporting delay in the HIV Surveillance System, European Union/European Economic Area, 2015.

PubMed

Rosinska, Magdalena; Pantazis, Nikos; Janiec, Janusz; Pharris, Anastasia; Amato-Gauci, Andrew J; Quinten, Chantal; Ecdc Hiv/Aids Surveillance Network

2018-06-01

Accurate case-based surveillance data remain the key data source for estimating HIV burden and monitoring prevention efforts in Europe. We carried out a literature review and exploratory analysis of surveillance data regarding two crucial issues affecting European surveillance for HIV: missing data and reporting delay. Initial screening showed substantial variability of these data issues, both in time and across countries. In terms of missing data, the CD4+ cell count is the most problematic variable because of the high proportion of missing values. In 20 of 31 countries of the European Union/European Economic Area (EU/EEA), CD4+ counts are systematically missing for all or some years. One of the key challenges related to reporting delays is that countries undertake specific one-off actions in effort to capture previously unreported cases, and that these cases are subsequently reported with excessive delays. Slightly different underlying assumptions and effectively different models may be required for individual countries to adjust for missing data and reporting delays. However, using a similar methodology is recommended to foster harmonisation and to improve the accuracy and usability of HIV surveillance data at national and EU/EEA levels.
Comparing multiple imputation methods for systematically missing subject-level data.

PubMed

Kline, David; Andridge, Rebecca; Kaizar, Eloise

2017-06-01

When conducting research synthesis, the collection of studies that will be combined often do not measure the same set of variables, which creates missing data. When the studies to combine are longitudinal, missing data can occur on the observation-level (time-varying) or the subject-level (non-time-varying). Traditionally, the focus of missing data methods for longitudinal data has been on missing observation-level variables. In this paper, we focus on missing subject-level variables and compare two multiple imputation approaches: a joint modeling approach and a sequential conditional modeling approach. We find the joint modeling approach to be preferable to the sequential conditional approach, except when the covariance structure of the repeated outcome for each individual has homogenous variance and exchangeable correlation. Specifically, the regression coefficient estimates from an analysis incorporating imputed values based on the sequential conditional method are attenuated and less efficient than those from the joint method. Remarkably, the estimates from the sequential conditional method are often less efficient than a complete case analysis, which, in the context of research synthesis, implies that we lose efficiency by combining studies. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Methods for estimating missing human skeletal element osteometric dimensions employed in the revised fully technique for estimating stature.

PubMed

Auerbach, Benjamin M

2011-05-01

One of the greatest limitations to the application of the revised Fully anatomical stature estimation method is the inability to measure some of the skeletal elements required in its calculation. These element dimensions cannot be obtained due to taphonomic factors, incomplete excavation, or disease processes, and result in missing data. This study examines methods of imputing these missing dimensions using observable Fully measurements from the skeleton and the accuracy of incorporating these missing element estimations into anatomical stature reconstruction. These are further assessed against stature estimations obtained from mathematical regression formulae for the lower limb bones (femur and tibia). Two thousand seven hundred and seventeen North and South American indigenous skeletons were measured, and subsets of these with observable Fully dimensions were used to simulate missing elements and create estimation methods and equations. Comparisons were made directly between anatomically reconstructed statures and mathematically derived statures, as well as with anatomically derived statures with imputed missing dimensions. These analyses demonstrate that, while mathematical stature estimations are more accurate, anatomical statures incorporating missing dimensions are not appreciably less accurate and are more precise. The anatomical stature estimation method using imputed missing dimensions is supported. Missing element estimation, however, is limited to the vertebral column (only when lumbar vertebrae are present) and to talocalcaneal height (only when femora and tibiae are present). Crania, entire vertebral columns, and femoral or tibial lengths cannot be reliably estimated. Further discussion of the applicability of these methods is discussed. Copyright © 2011 Wiley-Liss, Inc.
Clustering and variable selection in the presence of mixed variable types and missing data.

PubMed

Storlie, C B; Myers, S M; Katusic, S K; Weaver, A L; Voigt, R G; Croarkin, P E; Stoeckel, R E; Port, J D

2018-05-17

We consider the problem of model-based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines. Copyright © 2018 John Wiley & Sons, Ltd.
Variable selection under multiple imputation using the bootstrap in a prognostic study

PubMed Central

Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW

2007-01-01

Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912
Cartilage analysis by reflection spectroscopy

NASA Astrophysics Data System (ADS)

Laun, T.; Muenzer, M.; Wenzel, U.; Princz, S.; Hessling, M.

2015-07-01

A cartilage bioreactor with analytical functions for cartilage quality monitoring is being developed. For determining cartilage composition, reflection spectroscopy in the visible (VIS) and near infrared (NIR) spectral region is evaluated. Main goal is the determination of the most abundant cartilage compounds water, collagen I and collagen II. Therefore VIS and NIR reflection spectra of different cartilage samples of cow, pig and lamb are recorded. Due to missing analytical instrumentation for identifying the cartilage composition of these samples, typical literature concentration values are used for the development of chemometric models. In spite of these limitations the chemometric models provide good cross correlation results for the prediction of collagen I and II and water concentration based on the visible and the NIR reflection spectra.
Empirical orthogonal function analysis of cloud-containing coastal zone color scanner images of northeastern North American coastal waters

NASA Technical Reports Server (NTRS)

Eslinger, David L.; O'Brien, James J.; Iverson, Richard L.

1989-01-01

Empirical-orthogonal-function (EOF) analyses were carried out on 36 images of the Mid-Atlantic Bight and the Gulf of Maine, obtained by the CZCS aboard Nimbus 7 for the time period from February 28 through July 9, 1979, with the purpose of determining pigment concentrations in coastal waters. The EOF procedure was modified so as to include images with significant portions of data missing due to cloud obstruction, making it possible to estimate pigment values in areas beneath clouds. The results of image analyses explained observed variances in pigment concentrations and showed a south-to-north pattern corresponding to an April Mid-Atlantic Bight bloom and a June bloom over Nantucket Shoals and Platts Bank.
Left ventricular ejection fraction may not be useful as an end point of thrombolytic therapy comparative trials.

PubMed

Califf, R M; Harrelson-Woodlief, L; Topol, E J

1990-11-01

In the era of comparative and adjunctive trials in reperfusion therapy, the need to develop alternative end points for mortality reduction is clear. Left ventricular ejection fraction, which has been commonly used as a surrogate, is problematic due to missing values, technically inadequate studies, and lack of correlation with mortality results in controlled reperfusion trials performed to date. In this paper, we present a composite clinical end point that includes, in order, severity of adverse outcome death, hemorrhagic stroke, nonhemorrhagic stroke, poor ejection fraction (less than 30%), reinfarction, heart failure, and pulmonary edema. Such a composite index may be useful to detect true therapeutic benefit in reperfusion trials without necessitating greater than 20-30,000 patient enrollment.
Update of the trauma risk adjustment model of the TraumaRegister DGU™: the Revised Injury Severity Classification, version II.

PubMed

Lefering, Rolf; Huber-Wagner, Stefan; Nienaber, Ulrike; Maegele, Marc; Bouillon, Bertil

2014-09-05

The TraumaRegister DGU™ (TR-DGU) has used the Revised Injury Severity Classification (RISC) score for outcome adjustment since 2003. In recent years, however, the observed mortality rate has fallen to about 2% below the prognosis, and it was felt that further prognostic factors, like pupil size and reaction, should be included as well. Finally, an increasing number of cases did not receive a RISC prognosis due to the missing values. Therefore, there was a need for an updated model for risk of death prediction in severely injured patients to be developed and validated using the most recent data. The TR-DGU has been collecting data from severely injured patients since 1993. All injuries are coded according to the Abbreviated Injury Scale (AIS, version 2008). Severely injured patients from Europe (ISS ≥ 4) documented between 2010 and 2011 were selected for developing the new score (n = 30,866), and 21,918 patients from 2012 were used for validation. Age and injury codes were required, and transferred patients were excluded. Logistic regression analysis was applied with hospital mortality as the dependent variable. Results were evaluated in terms of discrimination (area under the receiver operating characteristic curve, AUC), precision (observed versus predicted mortality), and calibration (Hosmer-Lemeshow goodness-of-fit statistic). The mean age of the development population was 47.3 years; 71.6% were males, and the average ISS was 19.3 points. Hospital mortality rate was 11.5% in this group. The new RISC II model consists of the following predictors: worst and second-worst injury (AIS severity level), head injury, age, sex, pupil reactivity and size, pre-injury health status, blood pressure, acidosis (base deficit), coagulation, haemoglobin, and cardiopulmonary resuscitation. Missing values are included as a separate category for every variable. In the development and the validation dataset, the new RISC II outperformed the original RISC score, for example AUC in the development dataset 0.953 versus 0.939. The updated RISC II prognostic score has several advantages over the previous RISC model. Discrimination, precision and calibration are improved, and patients with partial missing values could now be included. Results were confirmed in a validation dataset.
Hide and Seek: Values in Early Childhood Education and Care

ERIC Educational Resources Information Center

Powell, Sacha

2010-01-01

Early childhood education and care settings in England and the people who work in them constitute an important sphere of influence, shaping young children's characters and values. But the values and dispositions expected of the early years workforce are missing from statutory policy documentation despite its clear requirement that practitioners…
Early pregnancy factor as a marker for assessing embryonic viability in threatened and missed abortions.

PubMed

Shahani, S K; Moniz, C L; Bordekar, A D; Gupta, S M; Naik, K

1994-01-01

It is now well recognized that the presence of early pregnancy factor (EPF) can signify the occurrence of fertilization, continuation of pregnancy and the existence of a viable embryo. With this in view, a study was undertaken to observe the potential of EPF as a marker in assessing embryo viability in cases complicated with vaginal bleeding during early pregnancy. The results indicated that the sensitivity of EPF as a marker in predicting threatened or missed abortion was 78.9% and the specificity 95.6%. The positive predictive value was observed to be 93.8% and the negative predictive value 84.6%. Our studies have shown that since EPF is present in viable but absent in non-viable pregnancies, it could be a useful marker of prognostic value in threatened abortions.
Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data*

PubMed Central

Cai, T. Tony; Zhang, Anru

2016-01-01

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data. PMID:27777471
Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.

PubMed

Cai, T Tony; Zhang, Anru

2016-09-01

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Minimax rates of convergence are established under the spectral norm loss and the proposed estimators are shown to be rate-optimal under mild regularity conditions. Simulation studies demonstrate that the estimators perform well numerically. The methods are also illustrated through an application to data from four ovarian cancer studies. The key technical tools developed in this paper are of independent interest and potentially useful for a range of related problems in high-dimensional statistical inference with missing data.
Using ecological propensity score to adjust for missing confounders in small area studies.

PubMed

Wang, Yingbo; Pirani, Monica; Hansell, Anna L; Richardson, Sylvia; Blangiardo, Marta

2017-11-09

Small area ecological studies are commonly used in epidemiology to assess the impact of area level risk factors on health outcomes when data are only available in an aggregated form. However, the resulting estimates are often biased due to unmeasured confounders, which typically are not available from the standard administrative registries used for these studies. Extra information on confounders can be provided through external data sets such as surveys or cohorts, where the data are available at the individual level rather than at the area level; however, such data typically lack the geographical coverage of administrative registries. We develop a framework of analysis which combines ecological and individual level data from different sources to provide an adjusted estimate of area level risk factors which is less biased. Our method (i) summarizes all available individual level confounders into an area level scalar variable, which we call ecological propensity score (EPS), (ii) implements a hierarchical structured approach to impute the values of EPS whenever they are missing, and (iii) includes the estimated and imputed EPS into the ecological regression linking the risk factors to the health outcome. Through a simulation study, we show that integrating individual level data into small area analyses via EPS is a promising method to reduce the bias intrinsic in ecological studies due to unmeasured confounders; we also apply the method to a real case study to evaluate the effect of air pollution on coronary heart disease hospital admissions in Greater London. © The Author 2017. Published by Oxford University Press.
Numerical Study on the Behaviour of Reduced Beam Section Presence in Rectangular Concrete Filled Tubes Connection

NASA Astrophysics Data System (ADS)

Amalia, A. R.; Suswanto, B.; Kristijanto, H.; Irawan, D.

2018-01-01

This paper discusses about the behaviour of two types of RCFT column connections with steel beams due to cyclic loads using software based on finite element method ABAQUS 6.14. This comparison involves modelling RCFT connections with rigid connection that do not allow any deformation and rotation in the joint. There are two types of model to be compared: BB and BRBS which include RCFT connections to ordinary beam without RBS (BB) and to Reduce Beam Section Beam (BRBS). The models behaviour can be discussed in this study are stress value, von misses stress pattern and rotational degree of each model. From the von misses stress pattern value, it found that the highest regions of stress occurs in vicinity of beam flange near column face for connection without RBS (BB). For earthquake resistant building, that behaviour needs to be avoided because sudden collapse often happen in that joint connection. Moreover, the connection with the presence of RBS (BRBS), the highest regions of stress occurs in reduced beam section of the beam, it means that the failure might be happen as proposed plan. The ultimate force that can be restrained by BB model (402 kN) is higher than BRBS model (257,18 kN) because of reducing of flange area. BRBS model has higher rotation angle (0,057 rad) than BB model (0,045 rad). The analysis results also observed that cyclic performances of the moment connection with RBS (BRBS) were more ductile than the connection with ordinary beam (BB).
Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
Classification models for identification of at-risk groups for incident memory complaints.

PubMed

van den Kommer, Tessa N; Comijs, Hannie C; Rijs, Kelly J; Heymans, Martijn W; van Boxtel, Martin P J; Deeg, Dorly J H

2014-02-01

Memory complaints in older adults may be a precursor of measurable cognitive decline. Causes for these complaints may vary across age groups. The goal of this study was to develop classification models for the early identification of persons at risk for memory complaints using a broad range of characteristics. Two age groups were studied, 55-65 years old (N = 1,416.8) and 65-75 years old (N = 471) using data from the Longitudinal Aging Study Amsterdam. Participants reporting memory complaints at baseline were excluded. Data on predictors of memory complaints were collected at baseline and analyzed using logistic regression analyses. Multiple imputation was applied to handle the missing data; missing data due to mortality were not imputed. In persons aged 55-65 years, 14.4% reported memory complaints after three years of follow-up. Persons using medication, who were former smokers and had insufficient/poor hearing, were at the highest risk of developing memory complaints, i.e., a predictive value of 33.3%. In persons 65-75 years old, the incidence of memory complaints was 22.5%. Persons with a low sense of mastery, who reported having pain, were at the highest risk of memory complaints resulting in a final predictive value of 56.9%. In the subsample of persons without a low sense of mastery who (almost) never visited organizations and had a low level of memory performance, 46.8% reported memory complaints at follow-up. The classification models led to the identification of specific target groups at risk for memory complaints. Suggestions for person-tailored interventions may be based on these risk profiles.
Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

DOE PAGES

Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie; ...

2018-05-28

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less

Persons with dementia missing in the community: is it wandering or something unique?

PubMed

Rowe, Meredeth A; Vandeveer, Sydney S; Greenblum, Catherine A; List, Cassandra N; Fernandez, Rachael M; Mixson, Natalie E; Ahn, Hyo C

2011-06-05

At some point in the disease process many persons with dementia (PWD) will have a missing incident and be unable to safely return to their care setting. In previous research studies, researchers have begun to question whether this phenomenon should continue to be called wandering since the antecedents and characteristics of a missing incident are dissimilar to accepted definitions of wandering in dementia. The purpose of this study was to confirm previous findings regarding the antecedents and characteristics of missing incidents, understand the differences between those found dead and alive, and compare the characteristics of a missing incident to that of wandering. A retrospective design was used to analyse 325 newspaper reports of PWD missing in the community. The primary antecedent to a missing incident, particularly in community-dwelling PWD, was becoming lost while conducting a normal and permitted activity alone in the community. The other common antecedent was a lapse in supervision with the expectation that the PWD would remain in a safe location but did not. Deaths most commonly occurred in unpopulated areas due to exposure and drowning. Those who died were found closer to the place last seen and took longer to find, but there were no significant differences in gender or age. The key characteristics of a missing incident were: unpredictable, non-repetitive, temporally appropriate but spatially-disordered, and while using multiple means of movement (walking, car, public transportation). Missing incidents occurred without the discernible pattern present in wandering such as lapping or pacing, repetitive and temporally-disordered. This research supports the mounting evidence that the concept of wandering, in its formal sense, and missing incidents are two distinct concepts. It will be important to further develop the concept of missing incidents by identifying the differences and similarities from wandering. This will allow a more targeted assessment and intervention strategy for each problem.
Persons with dementia missing in the community: Is it wandering or something unique?

PubMed Central

2011-01-01

Background At some point in the disease process many persons with dementia (PWD) will have a missing incident and be unable to safely return to their care setting. In previous research studies, researchers have begun to question whether this phenomenon should continue to be called wandering since the antecedents and characteristics of a missing incident are dissimilar to accepted definitions of wandering in dementia. The purpose of this study was to confirm previous findings regarding the antecedents and characteristics of missing incidents, understand the differences between those found dead and alive, and compare the characteristics of a missing incident to that of wandering. Methods A retrospective design was used to analyse 325 newspaper reports of PWD missing in the community. Results The primary antecedent to a missing incident, particularly in community-dwelling PWD, was becoming lost while conducting a normal and permitted activity alone in the community. The other common antecedent was a lapse in supervision with the expectation that the PWD would remain in a safe location but did not. Deaths most commonly occurred in unpopulated areas due to exposure and drowning. Those who died were found closer to the place last seen and took longer to find, but there were no significant differences in gender or age. The key characteristics of a missing incident were: unpredictable, non-repetitive, temporally appropriate but spatially-disordered, and while using multiple means of movement (walking, car, public transportation). Missing incidents occurred without the discernible pattern present in wandering such as lapping or pacing, repetitive and temporally-disordered. Conclusions This research supports the mounting evidence that the concept of wandering, in its formal sense, and missing incidents are two distinct concepts. It will be important to further develop the concept of missing incidents by identifying the differences and similarities from wandering. This will allow a more targeted assessment and intervention strategy for each problem. PMID:21639942
Estimation of the Percentage of Newly Diagnosed HIV-Positive Persons Linked to HIV Medical Care in CDC-Funded HIV Testing Programs.

PubMed

Wang, Guoshen; Pan, Yi; Seth, Puja; Song, Ruiguang; Belcher, Lisa

2017-01-01

Missing data create challenges for determining progress made in linking HIV-positive persons to HIV medical care. Statistical methods are not used to address missing program data on linkage. In 2014, 61 health department jurisdictions were funded by Centers for Disease Control and Prevention (CDC) and submitted data on HIV testing, newly diagnosed HIV-positive persons, and linkage to HIV medical care. Missing or unusable data existed in our data set. A new approach using multiple imputation to address missing linkage data was proposed, and results were compared to the current approach that uses data with complete information. There were 12,472 newly diagnosed HIV-positive persons from CDC-funded HIV testing events in 2014. Using multiple imputation, 94.1% (95% confidence interval (CI): [93.7%, 94.6%]) of newly diagnosed persons were referred to HIV medical care, 88.6% (95% CI: [88.0%, 89.1%]) were linked to care within any time frame, and 83.6% (95% CI: [83.0%, 84.3%]) were linked to care within 90 days. Multiple imputation is recommended for addressing missing linkage data in future analyses when the missing percentage is high. The use of multiple imputation for missing values can result in a better understanding of how programs are performing on key HIV testing and HIV service delivery indicators.
Reading Profiles in Multi-Site Data With Missingness.

PubMed

Eckert, Mark A; Vaden, Kenneth I; Gebregziabher, Mulugeta

2018-01-01

Children with reading disability exhibit varied deficits in reading and cognitive abilities that contribute to their reading comprehension problems. Some children exhibit primary deficits in phonological processing, while others can exhibit deficits in oral language and executive functions that affect comprehension. This behavioral heterogeneity is problematic when missing data prevent the characterization of different reading profiles, which often occurs in retrospective data sharing initiatives without coordinated data collection. Here we show that reading profiles can be reliably identified based on Random Forest classification of incomplete behavioral datasets, after the missForest method is used to multiply impute missing values. Results from simulation analyses showed that reading profiles could be accurately classified across degrees of missingness (e.g., ∼5% classification error for 30% missingness across the sample). The application of missForest to a real multi-site dataset with missingness ( n = 924) showed that reading disability profiles significantly and consistently differed in reading and cognitive abilities for cases with and without missing data. The results of validation analyses indicated that the reading profiles (cases with and without missing data) exhibited significant differences for an independent set of behavioral variables that were not used to classify reading profiles. Together, the results show how multiple imputation can be applied to the classification of cases with missing data and can increase the integrity of results from multi-site open access datasets.
Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review.

PubMed

Mercieca-Bebber, Rebecca; Palmer, Michael J; Brundage, Michael; Calvert, Melanie; Stockler, Martin R; King, Madeleine T

2016-06-15

Patient-reported outcomes (PROs) provide important information about the impact of treatment from the patients' perspective. However, missing PRO data may compromise the interpretability and value of the findings. We aimed to report: (1) a non-technical summary of problems caused by missing PRO data; and (2) a systematic review by collating strategies to: (A) minimise rates of missing PRO data, and (B) facilitate transparent interpretation and reporting of missing PRO data in clinical research. Our systematic review does not address statistical handling of missing PRO data. MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases (inception to 31 March 2015), and citing articles and reference lists from relevant sources. English articles providing recommendations for reducing missing PRO data rates, or strategies to facilitate transparent interpretation and reporting of missing PRO data were included. 2 reviewers independently screened articles against eligibility criteria. Discrepancies were resolved with the research team. Recommendations were extracted and coded according to framework synthesis. 117 sources (55% discussion papers, 26% original research) met the eligibility criteria. Design and methodological strategies for reducing rates of missing PRO data included: incorporating PRO-specific information into the protocol; carefully designing PRO assessment schedules and defining termination rules; minimising patient burden; appointing a PRO coordinator; PRO-specific training for staff; ensuring PRO studies are adequately resourced; and continuous quality assurance. Strategies for transparent interpretation and reporting of missing PRO data include utilising auxiliary data to inform analysis; transparently reporting baseline PRO scores, rates and reasons for missing data; and methods for handling missing PRO data. The instance of missing PRO data and its potential to bias clinical research can be minimised by implementing thoughtful design, rigorous methodology and transparent reporting strategies. All members of the research team have a responsibility in implementing such strategies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Generating daily weather data for ecosystem modelling in the Congo River Basin

NASA Astrophysics Data System (ADS)

Petritsch, Richard; Pietsch, Stephan A.

2010-05-01

Daily weather data are an important constraint for diverse applications in ecosystem research. In particular, temperature and precipitation are the main drivers for forest ecosystem productivity. Mechanistic modelling theory heavily relies on daily values for minimum and maximum temperatures, precipitation, incident solar radiation and vapour pressure deficit. Although the number of climate measurement stations increased during the last centuries, there are still regions with limited climate data. For example, in the WMO database there are only 16 stations located in Gabon with daily weather measurements. Additionally, the available time series are heavily affected by measurement errors or missing values. In the WMO record for Gabon, on average every second day is missing. Monthly means are more robust and may be estimated over larger areas. Therefore, a good alternative is to interpolate monthly mean values using a sparse network of measurement stations, and based on these monthly data generate daily weather data with defined characteristics. The weather generator MarkSim was developed to produce climatological time series for crop modelling in the tropics. It provides daily values for maximum and minimum temperature, precipitation and solar radiation. The monthly means can either be derived from the internal climate surfaces or prescribed as additional inputs. We compared the generated outputs observations from three climate stations in Gabon (Lastourville, Moanda and Mouilla) and found that maximum temperature and solar radiation were heavily overestimated during the long dry season. This is due to the internal dependency of the solar radiation estimates to precipitation. With no precipitation a cloudless sky is assumed and thus high incident solar radiation and a large diurnal temperature range. However, in reality it is cloudy in the Congo River Basin during the long dry season. Therefore, we applied a correction factor to solar radiation and temperature range based on the ratio of values on rainy days and days without rain, respectively. For assessing the impact of our correction, we simulated the ecosystem behaviour using the climate data from Lastourville, Moanda and Mouilla with the mechanistic ecosystem model Biome-BGC. Differences in terms of the carbon, nitrogen and water cycle were subsequently analysed and discussed.
Impact of Missing Physiologic Data on Performance of the Simplified Acute Physiology Score 3 Risk-Prediction Model.

PubMed

Engerström, Lars; Nolin, Thomas; Mårdh, Caroline; Sjöberg, Folke; Karlström, Göran; Fredrikson, Mats; Walther, Sten M

2017-12-01

The Simplified Acute Physiology 3 outcome prediction model has a narrow time window for recording physiologic measurements. Our objective was to examine the prevalence and impact of missing physiologic data on the Simplified Acute Physiology 3 model's performance. Retrospective analysis of prospectively collected data. Sixty-three ICUs in the Swedish Intensive Care Registry. Patients admitted during 2011-2014 (n = 107,310). None. Model performance was analyzed using the area under the receiver operating curve, scaled Brier's score, and standardized mortality rate. We used a recalibrated Simplified Acute Physiology 3 model and examined model performance in the original dataset and in a dataset of complete records where missing data were generated (simulated dataset). One or more data were missing in 40.9% of the admissions, more common in survivors and low-risk admissions than in nonsurvivors and high-risk admissions. Discrimination did not decrease with one to two missing variables, but accuracy was highest with no missing data. Calibration was best in the original dataset with a mix of full records and records with some missing values (area under the receiver operating curve was 0.85, scaled Brier 27%, and standardized mortality rate 0.99). With zero, one, and two data missing, the scaled Brier was 31%, 26%, and 21%; area under the receiver operating curve was 0.84, 0.87, and 0.89; and standardized mortality rate was 0.92, 1.05 and 1.10, respectively. Datasets where the missing data were simulated for oxygenation or oxygenation and hydrogen ion concentration together performed worse than datasets with these data originally missing. There is a coupling between missing physiologic data, admission type, low risk, and survival. Increased loss of physiologic data reduced model performance and will deflate mortality risk, resulting in falsely high standardized mortality rates.
Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

PubMed

Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

2016-06-01

Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.
Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

PubMed Central

Zhang, Zhaoyang; Wang, Honggang

2016-01-01

Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063
Spatio-temporal Outlier Detection in Precipitation Data

NASA Astrophysics Data System (ADS)

Wu, Elizabeth; Liu, Wei; Chawla, Sanjay

The detection of outliers from spatio-temporal data is an important task due to the increasing amount of spatio-temporal data available and the need to understand and interpret it. Due to the limitations of current data mining techniques, new techniques to handle this data need to be developed. We propose a spatio-temporal outlier detection algorithm called Outstretch, which discovers the outlier movement patterns of the top-k spatial outliers over several time periods. The top-k spatial outliers are found using the Exact-Grid Top- k and Approx-Grid Top- k algorithms, which are an extension of algorithms developed by Agarwal et al. [1]. Since they use the Kulldorff spatial scan statistic, they are capable of discovering all outliers, unaffected by neighbouring regions that may contain missing values. After generating the outlier sequences, we show one way they can be interpreted, by comparing them to the phases of the El Niño Southern Oscilliation (ENSO) weather phenomenon to provide a meaningful analysis of the results.
Nearest neighbor imputation using spatial–temporal correlations in wireless sensor networks

PubMed Central

Li, YuanYuan; Parker, Lynne E.

2016-01-01

Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network’s performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a kd-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for kd-tree construction, and Euclidean distance for kd-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental results show that our proposed 𝒦-NN imputation method has a competitive accuracy with state-of-the-art Expectation–Maximization (EM) techniques, while using much simpler computational techniques, thus making it suitable for use in resource-constrained WSNs. PMID:28435414
Pragmatic criteria of the definition of neonatal near miss: a comparative study

PubMed Central

Kale, Pauline Lorena; Jorge, Maria Helena Prado de Mello; Laurenti, Ruy; Fonseca, Sandra Costa; da Silva, Kátia Silveira

2017-01-01

ABSTRACT OBJECTIVE The objective of this study was to test the validity of the pragmatic criteria of the definitions of neonatal near miss, extending them throughout the infant period, and to estimate the indicators of perinatal care in public maternity hospitals. METHODS A cohort of live births from six maternity hospitals in the municipalities of São Paulo, Niterói, and Rio de Janeiro, Brazil, was carried out in 2011. We carried out interviews and checked prenatal cards and medical records. We compared the pragmatic criteria (birth weight, gestational age, and 5’ Apgar score) of the definitions of near miss of Pileggi et al., Pileggi-Castro et al., Souza et al., and Silva et al. We calculated sensitivity, specificity (gold standard: infant mortality), percentage of deaths among newborns with life-threatening conditions, and rates of near miss, mortality, and severe outcomes per 1,000 live births. RESULTS A total 7,315 newborns were analyzed (completeness of information > 99%). The sensitivity of the definition of Pileggi-Castro et al. was higher, resulting in a higher number of cases of near miss, Souza et al. presented lower value, and Pileggi et al. and de Silva et al. presented intermediate values. There is an increase in sensitivity when the period goes from 0–6 to 0–27 days, and there is a decrease when it goes to 0–364 days. Specificities were high (≥ 97%) and above sensitivities (54% to 77%). One maternity hospital in São Paulo and one in Niterói presented, respectively, the lowest and highest rates of infant mortality, near miss, and frequency of births with life-threatening conditions, regardless of the definition. CONCLUSIONS The definitions of near miss based exclusively on pragmatic criteria are valid and can be used for monitoring purposes. Based on the perinatal literature, the cutoff points adopted by Silva et al. were more appropriate. Periodic studies could apply a more complete definition, incorporating clinical, laboratory, and management criteria, including congenital anomalies predictive of infant mortality. PMID:29211204
Estimation of Return Values of Wave Height: Consequences of Missing Observations

ERIC Educational Resources Information Center

Ryden, Jesper

2008-01-01

Extreme-value statistics is often used to estimate so-called return values (actually related to quantiles) for environmental quantities like wind speed or wave height. A basic method for estimation is the method of block maxima which consists in partitioning observations in blocks, where maxima from each block could be considered independent.…
Comprehensive measurements of atmospheric OH reactivity and trace species within a suburban forest near Tokyo during AQUAS-TAMA campaign

NASA Astrophysics Data System (ADS)

Ramasamy, Sathiyamurthi; Nagai, Yoshihide; Takeuchi, Nobuhiro; Yamasaki, Shohei; Shoji, Koki; Ida, Akira; Jones, Charlotte; Tsurumaru, Hiroshi; Suzuki, Yuhi; Yoshino, Ayako; Shimada, Kojiro; Nakashima, Yoshihiro; Kato, Shungo; Hatakeyama, Shiro; Matsuda, Kazuhide; Kajii, Yoshizumi

2018-07-01

Total OH reactivity, which gives the instantaneous loss rate of OH radicals due to reactive species, is an invaluable technique to understand regional air quality, as it gives the overall reactivity of the air mass, the fraction of each trace species reactive to OH, the fraction of missing sinks, O3 formation potential, etc. Total OH reactivity measurement was conducted in a small suburban forest located ∼30 km from Tokyo during the air quality study at field museum TAMA (AQUAS-TAMA) campaign in early autumn 2012 and summer 2013. The average measured OH reactivities during that autumn and summer were 7.4 s-1 and 11.4 s-1, respectively. In summer, isoprene was the major contributor, accounting for 28.2% of the OH reactivity, as a result of enhanced light-dependent biogenic emission, whereas NO2 was major contributor in autumn, accounting for 19.6%, due to the diminished contribution from isoprene as a result of lower solar strength. Higher missing OH reactivity 34% was determined in summer, and linear regression analysis showed that oxygenated VOCs could be the potential candidates for missing OH reactivity. Lower missing OH reactivity 25% was determined in autumn and it was significantly reduced (11%) if the interference of peroxy radicals to the measured OH reactivity were considered.
Imputation for multisource data with comparison and assessment techniques

DOE PAGES

Casleton, Emily Michele; Osthus, David Allen; Van Buren, Kendra Lu

2017-12-27

Missing data are prevalent issue in analyses involving data collection. The problem of missing data is exacerbated for multisource analysis, where data from multiple sensors are combined to arrive at a single conclusion. In this scenario, it is more likely to occur and can lead to discarding a large amount of data collected; however, the information from observed sensors can be leveraged to estimate those values not observed. We propose two methods for imputation of multisource data, both of which take advantage of potential correlation between data from different sensors, through ridge regression and a state-space model. These methods, asmore » well as the common median imputation, are applied to data collected from a variety of sensors monitoring an experimental facility. Performance of imputation methods is compared with the mean absolute deviation; however, rather than using this metric to solely rank themethods,we also propose an approach to identify significant differences. Imputation techniqueswill also be assessed by their ability to produce appropriate confidence intervals, through coverage and length, around the imputed values. Finally, performance of imputed datasets is compared with a marginalized dataset through a weighted k-means clustering. In general, we found that imputation through a dynamic linearmodel tended to be the most accurate and to produce the most precise confidence intervals, and that imputing the missing values and down weighting them with respect to observed values in the analysis led to the most accurate performance.« less
Imputation for multisource data with comparison and assessment techniques

DOE Office of Scientific and Technical Information (OSTI.GOV)

Casleton, Emily Michele; Osthus, David Allen; Van Buren, Kendra Lu

Missing data are prevalent issue in analyses involving data collection. The problem of missing data is exacerbated for multisource analysis, where data from multiple sensors are combined to arrive at a single conclusion. In this scenario, it is more likely to occur and can lead to discarding a large amount of data collected; however, the information from observed sensors can be leveraged to estimate those values not observed. We propose two methods for imputation of multisource data, both of which take advantage of potential correlation between data from different sensors, through ridge regression and a state-space model. These methods, asmore » well as the common median imputation, are applied to data collected from a variety of sensors monitoring an experimental facility. Performance of imputation methods is compared with the mean absolute deviation; however, rather than using this metric to solely rank themethods,we also propose an approach to identify significant differences. Imputation techniqueswill also be assessed by their ability to produce appropriate confidence intervals, through coverage and length, around the imputed values. Finally, performance of imputed datasets is compared with a marginalized dataset through a weighted k-means clustering. In general, we found that imputation through a dynamic linearmodel tended to be the most accurate and to produce the most precise confidence intervals, and that imputing the missing values and down weighting them with respect to observed values in the analysis led to the most accurate performance.« less
Indications for Pelvic Nodal Treatment in Prostate Cancer Should Change. Validation of the Roach Formula in a Large Extended Nodal Dissection Series

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdollah, Firas; Cozzarini, Cesare; Suardi, Nazareno

2012-06-01

Purpose: Previous studies have criticized the predicting ability of the Roach formula in assessing the risk of lymph node invasion (LNI) in contemporary patients with prostate cancer (PCa) due to a significant overestimation of LNI rates. However, all those studies included patients treated with limited pelvic lymph node dissection (PLND), which is associated with high rates of false negative findings. We hypothesized that the Roach formula is still an accurate tool for LNI predictions if an extended PLND (ePLND) is performed. Methods and Materials: We included 3,115 consecutive patients treated with radical prostatectomy and ePLND between 2000 and 2010 atmore » a single tertiary referral center. Extended PLND consisted of removal of obturator, external iliac, and hypogastric lymph nodes. We externally validated the Roach formula by using the area under the receiver operating characteristics curve and calibration plot method. Moreover, we tested the performance characteristics of different formula-generated cutoff values ranging from 1% to 20%. Results: The accuracy of the Roach formula was 80.3%. The calibration showed only a minor underestimation of the LNI risk in high-risk patients (6.7%). According to the Roach formula, the use of 15% cut off would have allowed 74.2% (2,311/3,115) of patients to avoid nodal irradiation, while up to 32.7% (111/336) of all patients with LNI would have been missed. When the cut off was lowered to 6%, nodal treatment would have been spared in 1,541 (49.5%) patients while missing 41 LNI patients. The sensitivity, specificity, and negative predictive values associated with the 6% cut off were 87.9%, 54%, and 97.3%, respectively. Conclusions: The Roach formula is still accurate and does not overestimate the rate of LNI in contemporary prostate cancer patients if they are treated with ePLND. However, the recommended cut off of 15% would miss approximately one-third of patients with LNI. Based on our results, the cut off should be lowered to 6%.« less
Likelihood analysis of spatial capture-recapture models for stratified or class structured populations

USGS Publications Warehouse

Royle, J. Andrew; Sutherland, Christopher S.; Fuller, Angela K.; Sun, Catherine C.

2015-01-01

We develop a likelihood analysis framework for fitting spatial capture-recapture (SCR) models to data collected on class structured or stratified populations. Our interest is motivated by the necessity of accommodating the problem of missing observations of individual class membership. This is particularly problematic in SCR data arising from DNA analysis of scat, hair or other material, which frequently yields individual identity but fails to identify the sex. Moreover, this can represent a large fraction of the data and, given the typically small sample sizes of many capture-recapture studies based on DNA information, utilization of the data with missing sex information is necessary. We develop the class structured likelihood for the case of missing covariate values, and then we address the scaling of the likelihood so that models with and without class structured parameters can be formally compared regardless of missing values. We apply our class structured model to black bear data collected in New York in which sex could be determined for only 62 of 169 uniquely identified individuals. The models containing sex-specificity of both the intercept of the SCR encounter probability model and the distance coefficient, and including a behavioral response are strongly favored by log-likelihood. Estimated population sex ratio is strongly influenced by sex structure in model parameters illustrating the importance of rigorous modeling of sex differences in capture-recapture models.
A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy).

PubMed

Lo Presti, Rossella; Barca, Emanuele; Passarella, Giuseppe

2010-01-01

Environmental time series are often affected by the "presence" of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the "most similar" monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.
Application of a novel hybrid method for spatiotemporal data imputation: A case study of the Minqin County groundwater level

NASA Astrophysics Data System (ADS)

Zhang, Zhongrong; Yang, Xuan; Li, Hao; Li, Weide; Yan, Haowen; Shi, Fei

2017-10-01

The techniques for data analyses have been widely developed in past years, however, missing data still represent a ubiquitous problem in many scientific fields. In particular, dealing with missing spatiotemporal data presents an enormous challenge. Nonetheless, in recent years, a considerable amount of research has focused on spatiotemporal problems, making spatiotemporal missing data imputation methods increasingly indispensable. In this paper, a novel spatiotemporal hybrid method is proposed to verify and imputed spatiotemporal missing values. This new method, termed SOM-FLSSVM, flexibly combines three advanced techniques: self-organizing feature map (SOM) clustering, the fruit fly optimization algorithm (FOA) and the least squares support vector machine (LSSVM). We employ a cross-validation (CV) procedure and FOA swarm intelligence optimization strategy that can search available parameters and determine the optimal imputation model. The spatiotemporal underground water data for Minqin County, China, were selected to test the reliability and imputation ability of SOM-FLSSVM. We carried out a validation experiment and compared three well-studied models with SOM-FLSSVM using a different missing data ratio from 0.1 to 0.8 in the same data set. The results demonstrate that the new hybrid method performs well in terms of both robustness and accuracy for spatiotemporal missing data.

Multivariate test power approximations for balanced linear mixed models in studies with missing data.

PubMed

Ringham, Brandy M; Kreidler, Sarah M; Muller, Keith E; Glueck, Deborah H

2016-07-30

Multilevel and longitudinal studies are frequently subject to missing data. For example, biomarker studies for oral cancer may involve multiple assays for each participant. Assays may fail, resulting in missing data values that can be assumed to be missing completely at random. Catellier and Muller proposed a data analytic technique to account for data missing at random in multilevel and longitudinal studies. They suggested modifying the degrees of freedom for both the Hotelling-Lawley trace F statistic and its null case reference distribution. We propose parallel adjustments to approximate power for this multivariate test in studies with missing data. The power approximations use a modified non-central F statistic, which is a function of (i) the expected number of complete cases, (ii) the expected number of non-missing pairs of responses, or (iii) the trimmed sample size, which is the planned sample size reduced by the anticipated proportion of missing data. The accuracy of the method is assessed by comparing the theoretical results to the Monte Carlo simulated power for the Catellier and Muller multivariate test. Over all experimental conditions, the closest approximation to the empirical power of the Catellier and Muller multivariate test is obtained by adjusting power calculations with the expected number of complete cases. The utility of the method is demonstrated with a multivariate power analysis for a hypothetical oral cancer biomarkers study. We describe how to implement the method using standard, commercially available software products and give example code. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Variability of suspended-sediment concentration at tidal to annual time scales in San Francisco Bay, USA

USGS Publications Warehouse

Schoellhamer, D.H.

2002-01-01

Singular spectrum analysis for time series with missing data (SSAM) was used to reconstruct components of a 6-yr time series of suspended-sediment concentration (SSC) from San Francisco Bay. Data were collected every 15 min and the time series contained missing values that primarily were due to sensor fouling. SSAM was applied in a sequential manner to calculate reconstructed components with time scales of variability that ranged from tidal to annual. Physical processes that controlled SSC and their contribution to the total variance of SSC were (1) diurnal, semidiurnal, and other higher frequency tidal constituents (24%), (2) semimonthly tidal cycles (21%), (3) monthly tidal cycles (19%), (4) semiannual tidal cycles (12%), and (5) annual pulses of sediment caused by freshwater inflow, deposition, and subsequent wind-wave resuspension (13%). Of the total variance 89% was explained and subtidal variability (65%) was greater than tidal variability (24%). Processes at subtidal time scales accounted for more variance of SSC than processes at tidal time scales because sediment accumulated in the water column and the supply of easily erodible bed sediment increased during periods of increased subtidal energy. This large range of time scales that each contained significant variability of SSC and associated contaminants can confound design of sampling programs and interpretation of resulting data.
Inference in randomized trials with death and missingness.

PubMed

Wang, Chenguang; Scharfstein, Daniel O; Colantuoni, Elizabeth; Girard, Timothy D; Yan, Ying

2017-06-01

In randomized studies involving severely ill patients, functional outcomes are often unobserved due to missed clinic visits, premature withdrawal, or death. It is well known that if these unobserved functional outcomes are not handled properly, biased treatment comparisons can be produced. In this article, we propose a procedure for comparing treatments that is based on a composite endpoint that combines information on both the functional outcome and survival. We further propose a missing data imputation scheme and sensitivity analysis strategy to handle the unobserved functional outcomes not due to death. Illustrations of the proposed method are given by analyzing data from a recent non-small cell lung cancer clinical trial and a recent trial of sedation interruption among mechanically ventilated patients. © 2016, The International Biometric Society.
Bayesian Network Structure Learning for Urban Land Use Classification from Landsat ETM+ and Ancillary Data

NASA Astrophysics Data System (ADS)

Park, M.; Stenstrom, M. K.

2004-12-01

Recognizing urban information from the satellite imagery is problematic due to the diverse features and dynamic changes of urban landuse. The use of Landsat imagery for urban land use classification involves inherent uncertainty due to its spatial resolution and the low separability among land uses. To resolve the uncertainty problem, we investigated the performance of Bayesian networks to classify urban land use since Bayesian networks provide a quantitative way of handling uncertainty and have been successfully used in many areas. In this study, we developed the optimized networks for urban land use classification from Landsat ETM+ images of Marina del Rey area based on USGS land cover/use classification level III. The networks started from a tree structure based on mutual information between variables and added the links to improve accuracy. This methodology offers several advantages: (1) The network structure shows the dependency relationships between variables. The class node value can be predicted even with particular band information missing due to sensor system error. The missing information can be inferred from other dependent bands. (2) The network structure provides information of variables that are important for the classification, which is not available from conventional classification methods such as neural networks and maximum likelihood classification. In our case, for example, bands 1, 5 and 6 are the most important inputs in determining the land use of each pixel. (3) The networks can be reduced with those input variables important for classification. This minimizes the problem without considering all possible variables. We also examined the effect of incorporating ancillary data: geospatial information such as X and Y coordinate values of each pixel and DEM data, and vegetation indices such as NDVI and Tasseled Cap transformation. The results showed that the locational information improved overall accuracy (81%) and kappa coefficient (76%), and lowered the omission and commission errors compared with using only spectral data (accuracy 71%, kappa coefficient 62%). Incorporating DEM data did not significantly improve overall accuracy (74%) and kappa coefficient (66%) but lowered the omission and commission errors. Incorporating NDVI did not much improve the overall accuracy (72%) and k coefficient (65%). Including Tasseled Cap transformation reduced the accuracy (accuracy 70%, kappa 61%). Therefore, additional information from the DEM and vegetation indices was not useful as locational ancillary data.
Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

NASA Astrophysics Data System (ADS)

Kalantari, Mahdi; Yarmohammadi, Masoud; Hassani, Hossein; Silva, Emmanuel Sirimal

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.
Pre-selection and assessment of green organic solvents by clustering chemometric tools.

PubMed

Tobiszewski, Marek; Nedyalkova, Miroslava; Madurga, Sergio; Pena-Pereira, Francisco; Namieśnik, Jacek; Simeonov, Vasil

2018-01-01

The study presents the result of the application of chemometric tools for selection of physicochemical parameters of solvents for predicting missing variables - bioconcentration factors, water-octanol and octanol-air partitioning constants. EPI Suite software was successfully applied to predict missing values for solvents commonly considered as "green". Values for logBCF, logK OW and logK OA were modelled for 43 rather nonpolar solvents and 69 polar ones. Application of multivariate statistics was also proved to be useful in the assessment of the obtained modelling results. The presented approach can be one of the first steps and support tools in the assessment of chemicals in terms of their greenness. Copyright © 2017 Elsevier Inc. All rights reserved.
Space Environment Exposure Results from the MISSE 5 Polymer Film Thermal Control Experiment on the International Space Station

NASA Technical Reports Server (NTRS)

Miller, Sharon K. R.; Dever, Joyce A.

2009-01-01

It is known that polymer films can degrade in space due to exposure to the environment, but the magnitude of the mechanical property degradation and the degree to which the different environmental factors play a role in it is not well understood. This paper describes the results of an experiment flown on the Materials International Space Station Experiment (MISSE) 5 to determine the change in tensile strength and % elongation of some typical polymer films exposed in a nadir facing environment on the International Space Station and where possible compare to similar ram and wake facing experiments flown on MISSE 1 to get a better indication of the role the different environments play in mechanical property change.
A hybrid frame concealment algorithm for H.264/AVC.

PubMed

Yan, Bo; Gharavi, Hamid

2010-01-01

In packet-based video transmissions, packets loss due to channel errors may result in the loss of the whole video frame. Recently, many error concealment algorithms have been proposed in order to combat channel errors; however, most of the existing algorithms can only deal with the loss of macroblocks and are not able to conceal the whole missing frame. In order to resolve this problem, in this paper, we have proposed a new hybrid motion vector extrapolation (HMVE) algorithm to recover the whole missing frame, and it is able to provide more accurate estimation for the motion vectors of the missing frame than other conventional methods. Simulation results show that it is highly effective and significantly outperforms other existing frame recovery methods.
Implications of the Trauma Quality Improvement Project inclusion of nonsurvivable injuries in performance benchmarking.

PubMed

Heaney, Jiselle Bock; Schroll, Rebecca; Turney, Jennifer; Stuke, Lance; Marr, Alan B; Greiffenstein, Patrick; Robledo, Rosemarie; Theriot, Amanda; Duchesne, Juan; Hunt, John

2017-10-01

The Trauma Quality Improvement Project (TQIP) uses an injury prediction model for performance benchmarking. We hypothesize that at a Level I high-volume penetrating trauma center, performance outcomes will be biased due to inclusion of patients with nonsurvivable injuries. Retrospective chart review was conducted for all patients included in the institutional TQIP analysis from 2013 to 2014 with length of stay (LOS) less than 1 day to determine survivability of the injuries. Observed (O)/expected (E) mortality ratios were calculated before and after exclusion of these patients. Completeness of data reported to TQIP was examined. Eight hundred twenty-six patients were reported to TQIP including 119 deaths. Nonsurvivable injuries accounted 90.9% of the deaths in patients with an LOS of 1 day or less. The O/E mortality ratio for all patients was 1.061, and the O/E ratio after excluding all patients with LOS less than 1 day found to have nonsurvivable injuries was 0.895. Data for key variables were missing in 63.3% of patients who died in the emergency department, 50% of those taken to the operating room and 0% of those admitted to the intensive care unit. Charts for patients who died with LOS less than 1 day were significantly more likely than those who lived to be missing crucial. This study shows TQIP inclusion of patients with nonsurvivable injuries biases outcomes at an urban trauma center. Missing data results in imputation of values, increasing inaccuracy. Further investigation is needed to determine if these findings exist at other institutions, and whether the current TQIP model needs revision to accurately identify and exclude patients with nonsurvivable injuries. Prognostic and epidemiological, level III.
Missing energy and the measurement of the CP-violating phase in neutrino oscillations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ankowski, Artur M.; Coloma, Pilar; Huber, Patrick

In the next generation of long-baseline neutrino oscillation experiments aiming to determine the charge-parity-violating phase δ CP in the appearance channel, fine-grained time-projection chambers are expected to play an important role. In this study, we analyze an influence of realistic detector capabilities on the δ CP sensitivity for a setup similar to that of the Deep Underground Neutrino Experiment. We find that the effect of the missing energy carried out by undetected particles is sizable. Although the reconstructed neutrino energy can be corrected for the missing energy, the accuracy of such procedure has to exceed 20%, to avoid a sizablemore » bias in the extracted δ CP value.« less
Missing energy and the measurement of the CP-violating phase in neutrino oscillations

DOE PAGES

Ankowski, Artur M.; Coloma, Pilar; Huber, Patrick; ...

2015-11-30

In the next generation of long-baseline neutrino oscillation experiments aiming to determine the charge-parity-violating phase δ CP in the appearance channel, fine-grained time-projection chambers are expected to play an important role. In this study, we analyze an influence of realistic detector capabilities on the δ CP sensitivity for a setup similar to that of the Deep Underground Neutrino Experiment. We find that the effect of the missing energy carried out by undetected particles is sizable. Although the reconstructed neutrino energy can be corrected for the missing energy, the accuracy of such procedure has to exceed 20%, to avoid a sizablemore » bias in the extracted δ CP value.« less
Edge charge asymmetry in top pair production at the LHC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiao Bo; Wang Youkai; Zhou Zhongqiu

2011-03-01

In this brief report, we propose a new definition of charge asymmetry in top pair production at the LHC, namely, the edge charge asymmetry (ECA). ECA utilizes the information of drifting direction only for single top (or antitop) with hadronic decay. Therefore, ECA can be free from the uncertainty arising from the missing neutrino in the tt event reconstruction. Moreover, rapidity Y of top (or antitop) is required to be greater than a critical value Y{sub C} in order to suppress the symmetric tt events mainly due to the gluon-gluon fusion process. In this paper, ECA is calculated up tomore » next-to-leading order QCD in the standard model and the choice of the optimal Y{sub C} is investigated.« less
Higgs boson gluon-fusion production in QCD at three loops.

PubMed

Anastasiou, Charalampos; Duhr, Claude; Dulat, Falko; Herzog, Franz; Mistlberger, Bernhard

2015-05-29

We present the cross section for the production of a Higgs boson at hadron colliders at next-to-next-to-next-to-leading order (N^{3}LO) in perturbative QCD. The calculation is based on a method to perform a series expansion of the partonic cross section around the threshold limit to an arbitrary order. We perform this expansion to sufficiently high order to obtain the value of the hadronic cross at N^{3}LO in the large top-mass limit. For renormalization and factorization scales equal to half the Higgs boson mass, the N^{3}LO corrections are of the order of +2.2%. The total scale variation at N^{3}LO is 3%, reducing the uncertainty due to missing higher order QCD corrections by a factor of 3.
Multiple imputation to deal with missing EQ-5D-3L data: Should we impute individual domains or the actual index?

PubMed

Simons, Claire L; Rivero-Arias, Oliver; Yu, Ly-Mee; Simon, Judit

2015-04-01

Missing data are a well-known and widely documented problem in cost-effectiveness analyses alongside clinical trials using individual patient-level data. Current methodological research recommends multiple imputation (MI) to deal with missing health outcome data, but there is little guidance on whether MI for multi-attribute questionnaires, such as the EQ-5D-3L, should be carried out at domain or at summary score level. In this paper, we evaluated the impact of imputing individual domains versus imputing index values to deal with missing EQ-5D-3L data using a simulation study and developed recommendations for future practice. We simulated missing data in a patient-level dataset with complete EQ-5D-3L data at one point in time from a large multinational clinical trial (n = 1,814). Different proportions of missing data were generated using a missing at random (MAR) mechanism and three different scenarios were studied. The performance of using each method was evaluated using root mean squared error and mean absolute error of the actual versus predicted EQ-5D-3L indices. In large sample sizes (n > 500) and a missing data pattern that follows mainly unit non-response, imputing domains or the index produced similar results. However, domain imputation became more accurate than index imputation with pattern of missingness following an item non-response. For smaller sample sizes (n < 100), index imputation was more accurate. When MI models were misspecified, both domain and index imputations were inaccurate for any proportion of missing data. The decision between imputing the domains or the EQ-5D-3L index scores depends on the observed missing data pattern and the sample size available for analysis. Analysts conducting this type of exercises should also evaluate the sensitivity of the analysis to the MAR assumption and whether the imputation model is correctly specified.
Attrition Bias Related to Missing Outcome Data: A Longitudinal Simulation Study.

PubMed

Lewin, Antoine; Brondeel, Ruben; Benmarhnia, Tarik; Thomas, Frédérique; Chaix, Basile

2018-01-01

Most longitudinal studies do not address potential selection biases due to selective attrition. Using empirical data and simulating additional attrition, we investigated the effectiveness of common approaches to handle missing outcome data from attrition in the association between individual education level and change in body mass index (BMI). Using data from the two waves of the French RECORD Cohort Study (N = 7,172), we first examined how inverse probability weighting (IPW) and multiple imputation handled missing outcome data from attrition in the observed data (stage 1). Second, simulating additional missing data in BMI at follow-up under various missing-at-random scenarios, we quantified the impact of attrition and assessed how multiple imputation performed compared to complete case analysis and to a perfectly specified IPW model as a gold standard (stage 2). With the observed data in stage 1, we found an inverse association between individual education and change in BMI, with complete case analysis, as well as with IPW and multiple imputation. When we simulated additional attrition under a missing-at-random pattern (stage 2), the bias increased with the magnitude of selective attrition, and multiple imputation was useless to address it. Our simulations revealed that selective attrition in the outcome heavily biased the association of interest. The present article contributes to raising awareness that for missing outcome data, multiple imputation does not do better than complete case analysis. More effort is thus needed during the design phase to understand attrition mechanisms by collecting information on the reasons for dropout.
The Evolution of Work Values during the School-to-Work Transition: The Case of Young Adults in the "Missing Middle"

ERIC Educational Resources Information Center

Masdonati, Jonas; Fournier, Geneviève; Pinault, Mathieu; Lahrizi, Imane Z.

2016-01-01

Adopting a mixed method design, this paper explores the configuration and evolution of work values of 64 young adults in transition from education to employment. Qualitative analyses point out the existence of four categories of work values: interesting tasks, good relationships, self-fulfillment, and attractive work conditions. Quantitative…
Computation of Weapons Systems Effectiveness

DTIC Science & Technology

2013-09-01

denoted as SSPD2. SSPD = SSPD1 ∗ PNM + SSPD2 ∗ PHIT (5.13) PNM and PHIT are the weighing factors used to balance the direct hits and Gaussian miss...distribution unique for guided weapons. The addition of PNM and PHIT can be equal to or smaller than 1 due to the presence of the outliers gross...weapons to represent a zero miss distance for the PHIT component. SSPD2 Computation for Blast Effect SSPD2x = normcdf � LET 2 , 0,0� − normcdf(−LET 2
Structural Effects of Network Sampling Coverage I: Nodes Missing at Random1.

PubMed

Smith, Jeffrey A; Moody, James

2013-10-01

Network measures assume a census of a well-bounded population. This level of coverage is rarely achieved in practice, however, and we have only limited information on the robustness of network measures to incomplete coverage. This paper examines the effect of node-level missingness on 4 classes of network measures: centrality, centralization, topology and homophily across a diverse sample of 12 empirical networks. We use a Monte Carlo simulation process to generate data with known levels of missingness and compare the resulting network scores to their known starting values. As with past studies (Borgatti et al 2006; Kossinets 2006), we find that measurement bias generally increases with more missing data. The exact rate and nature of this increase, however, varies systematically across network measures. For example, betweenness and Bonacich centralization are quite sensitive to missing data while closeness and in-degree are robust. Similarly, while the tau statistic and distance are difficult to capture with missing data, transitivity shows little bias even with very high levels of missingness. The results are also clearly dependent on the features of the network. Larger, more centralized networks are generally more robust to missing data, but this is especially true for centrality and centralization measures. More cohesive networks are robust to missing data when measuring topological features but not when measuring centralization. Overall, the results suggest that missing data may have quite large or quite small effects on network measurement, depending on the type of network and the question being posed.
Academic Procrastination in Linking Motivation and Achievement-Related Behaviours: A Perspective of Expectancy-Value Theory

ERIC Educational Resources Information Center

Wu, Fan; Fan, Weihua

2017-01-01

The objective of the present study was to investigate the relationships among college students' achievement motivation (subjective task value and academic self-efficacy), academic procrastination (delay and missing deadlines) and achievement-related behaviours (effort and persistence). More specifically, the study investigated the mediating role…
Quasifree (e,e'p) reaction on /sup 3/He

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jans, E.; Barreau, P.; Bernheim, M.

1982-10-04

The proton momentum distribution of /sup 3/He has been determined up to momenta of 310 MeV/c by use of the reaction /sup 3/He(e,e'p). The experimental missing energy resolution, deltaE/sub m/ = 1.2 MeV, was sufficient to separate the two- and three-body breakup channels. Results for the three-body disintegration have been obtained up to missing energy values of 80 MeV. The resulting spectral function is compared with the predictions of Faddeev and variational calculations.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.