Science.gov

Sample records for filtering snps imputed

  1. Efficient genomewide selection of PCA-correlated tSNPs for genotype imputation.

    PubMed

    Javed, Asif; Drineas, Petros; Mahoney, Michael W; Paschou, Peristera

    2011-11-01

    The linkage disequilibrium structure of the human genome allows identification of small sets of single nucleotide polymorphisms (SNPs) (tSNPs) that efficiently represent dense sets of markers. This structure can be translated into linear algebraic terms as evidenced by the well documented principal components analysis (PCA)-based methods. Here we apply, for the first time, PCA-based methodology for efficient genomewide tSNP selection; and explore the linear algebraic structure of the human genome. Our algorithm divides the genome into contiguous nonoverlapping windows of high linear structure. Coupling this novel window definition with a PCA-based tSNP selection method, we analyze 2.5 million SNPs from the HapMap phase 2 dataset. We show that 10-25% of these SNPs suffice to predict the remaining genotypes with over 95% accuracy. A comparison with other popular methods in the ENCODE regions indicates significant genotyping savings. We evaluate the portability of genome-wide tSNPs across a diverse set of populations (HapMap phase 3 dataset). Interestingly, African populations are good reference populations for the rest of the world. Finally, we demonstrate the applicability of our approach in a real genome-wide disease association study. The chosen tSNP panels can be used toward genotype imputation using either a simple regression-based algorithm or more sophisticated genotype imputation methods.

  2. Mining SNPs from EST sequences using filters and ensemble classifiers.

    PubMed

    Wang, J; Zou, Q; Guo, M Z

    2010-05-04

    Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.

  3. Genotype imputation efficiency in Nelore Cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotype imputation efficiency in Nelore cattle was evaluated in different scenarios of lower density (LD) chips, imputation methods and sets of animals to have their genotypes imputed. Twelve commercial and virtual custom LD chips with densities varying from 7K to 75K SNPs were tested. Customized L...

  4. Imputation of missing genotypes: an empirical evaluation of IMPUTE

    PubMed Central

    Zhao, Zhenming; Timofeev, Nadia; Hartley, Stephen W; Chui, David HK; Fucharoen, Supan; Perls, Thomas T; Steinberg, Martin H; Baldwin, Clinton T; Sebastiani, Paola

    2008-01-01

    Background Imputation of missing genotypes is becoming a very popular solution for synchronizing genotype data collected with different microarray platforms but the effect of ethnic background, subject ascertainment, and amount of missing data on the accuracy of imputation are not well understood. Results We evaluated the accuracy of the program IMPUTE to generate the genotype data of partially or fully untyped single nucleotide polymorphisms (SNPs). The program uses a model-based approach to imputation that reconstructs the genotype distribution given a set of referent haplotypes and the observed data, and uses this distribution to compute the marginal probability of each missing genotype for each individual subject that is used to impute the missing data. We assembled genome-wide data from five different studies and three different ethnic groups comprising Caucasians, African Americans and Asians. We randomly removed genotype data and then compared the observed genotypes with those generated by IMPUTE. Our analysis shows 97% median accuracy in Caucasian subjects when less than 10% of the SNPs are untyped and missing genotypes are accepted regardless of their posterior probability. The median accuracy increases to 99% when we require 0.95 minimum posterior probability for an imputed genotype to be acceptable. The accuracy decreases to 86% or 94% when subjects are African Americans or Asians. We propose a strategy to improve the accuracy by leveraging the level of admixture in African Americans. Conclusion Our analysis suggests that IMPUTE is very accurate in samples of Caucasians origin, it is slightly less accurate in samples of Asians background, but substantially less accurate in samples of admixed background such as African Americans. Sample size and ascertainment do not seem to affect the accuracy of imputation. PMID:19077279

  5. Accuracy of genotype imputation in sheep breeds.

    PubMed

    Hayes, B J; Bowman, P J; Daetwyler, H D; Kijas, J W; van der Werf, J H J

    2012-02-01

    Although genomic selection offers the prospect of improving the rate of genetic gain in meat, wool and dairy sheep breeding programs, the key constraint is likely to be the cost of genotyping. Potentially, this constraint can be overcome by genotyping selection candidates for a low density (low cost) panel of SNPs with sparse genotype coverage, imputing a much higher density of SNP genotypes using a densely genotyped reference population. These imputed genotypes would then be used with a prediction equation to produce genomic estimated breeding values. In the future, it may also be desirable to impute very dense marker genotypes or even whole genome re-sequence data from moderate density SNP panels. Such a strategy could lead to an accurate prediction of genomic estimated breeding values across breeds, for example. We used genotypes from 48 640 (50K) SNPs genotyped in four sheep breeds to investigate both the accuracy of imputation of the 50K SNPs from low density SNP panels, as well as prospects for imputing very dense or whole genome re-sequence data from the 50K SNPs (by leaving out a small number of the 50K SNPs at random). Accuracy of imputation was low if the sparse panel had less than 5000 (5K) markers. Across breeds, it was clear that the accuracy of imputing from sparse marker panels to 50K was higher if the genetic diversity within a breed was lower, such that relationships among animals in that breed were higher. The accuracy of imputation from sparse genotypes to 50K genotypes was higher when the imputation was performed within breed rather than when pooling all the data, despite the fact that the pooled reference set was much larger. For Border Leicesters, Poll Dorsets and White Suffolks, 5K sparse genotypes were sufficient to impute 50K with 80% accuracy. For Merinos, the accuracy of imputing 50K from 5K was lower at 71%, despite a large number of animals with full genotypes (2215) being used as a reference. For all breeds, the relationship of

  6. Imputation accuracy is robust to cattle reference genome updates.

    PubMed

    Milanesi, M; Vicario, D; Stella, A; Valentini, A; Ajmone-Marsan, P; Biffani, S; Biscarini, F; Jansen, G; Nicolazzi, E L

    2015-02-01

    Genotype imputation is routinely applied in a large number of cattle breeds. Imputation has become a need due to the large number of SNP arrays with variable density (currently, from 2900 to 777,962 SNPs). Although many authors have studied the effect of different statistical methods on imputation accuracy, the impact of a (likely) change in the reference genome assembly on imputation from lower to higher density has not been determined so far. In this work, 1021 Italian Simmental SNP genotypes were remapped on the three most recent reference genome assemblies. Four imputation methods were used to assess the impact of an update in the reference genome. As expected, the four methods behaved differently, with large differences in terms of accuracy. Updating SNP coordinates on the three tested cattle reference genome assemblies determined only a slight variation on imputation results within method.

  7. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation.

  8. Genotype imputation via matrix completion

    PubMed Central

    Chi, Eric C.; Zhou, Hua; Chen, Gary K.; Del Vecchyo, Diego Ortega; Lange, Kenneth

    2013-01-01

    Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

  9. SNP panels/Imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Participants from thirteen countries discussed services that Interbull can perform or recommendations that Interbull can make to promote harmonization and assist member countries in improving their genomic evaluations in regard to SNP panels and imputation. The panel recommended: A mechanism to shar...

  10. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

    PubMed

    Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

    2015-01-01

    Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.

  11. A reference panel of 64,976 haplotypes for genotype imputation

    PubMed Central

    McCarthy, Shane; Das, Sayantan; Kretzschmar, Warren; Delaneau, Olivier; Wood, Andrew R.; Teumer, Alexander; Kang, Hyun Min; Fuchsberger, Christian; Danecek, Petr; Sharp, Kevin; Luo, Yang; Sidore, Carlo; Kwong, Alan; Timpson, Nicholas; Koskinen, Seppo; Vrieze, Scott; Scott, Laura J.; Zhang, He; Mahajan, Anubha; Veldink, Jan; Peters, Ulrike; Pato, Carlos; van Duijn, Cornelia M.; Gillies, Christopher E.; Gandin, Ilaria; Mezzavilla, Massimo; Gilly, Arthur; Cocca, Massimiliano; Traglia, Michela; Angius, Andrea; Barrett, Jeffrey; Boomsma, Dorret I.; Branham, Kari; Breen, Gerome; Brummet, Chad; Busonero, Fabio; Campbell, Harry; Chan, Andrew; Chen, Sai; Chew, Emily; Collins, Francis S.; Corbin, Laura; Davey Smith, George; Dedoussis, George; Dorr, Marcus; Farmaki, Aliki-Eleni; Ferrucci, Luigi; Forer, Lukas; Fraser, Ross M.; Gabriel, Stacey; Levy, Shawn; Groop, Leif; Harrison, Tabitha; Hattersley, Andrew; Holmen, Oddgeir L.; Hveem, Kristian; Kretzler, Matthias; Lee, James; McGue, Matt; Meitinger, Thomas; Melzer, David; Min, Josine; Mohlke, Karen L.; Vincent, John; Nauck, Matthias; Nickerson, Deborah; Palotie, Aarno; Pato, Michele; Pirastu, Nicola; McInnis, Melvin; Richards, Brent; Sala, Cinzia; Salomaa, Veikko; Schlessinger, David; Schoenheer, Sebastian; Slagboom, P Eline; Small, Kerrin; Spector, Timothy; Stambolian, Dwight; Tuke, Marcus; Tuomilehto, Jaakko; Van den Berg, Leonard; Van Rheenen, Wouter; Volker, Uwe; Wijmenga, Cisca; Toniolo, Daniela; Zeggini, Eleftheria; Gasparini, Paolo; Sampson, Matthew G.; Wilson, James F.; Frayling, Timothy; de Bakker, Paul; Swertz, Morris A.; McCarroll, Steven; Kooperberg, Charles; Dekker, Annelot; Altshuler, David; Willer, Cristen; Iacono, William; Ripatti, Samuli; Soranzo, Nicole; Walter, Klaudia; Swaroop, Anand; Cucca, Francesco; Anderson, Carl; Boehnke, Michael; McCarthy, Mark I.; Durbin, Richard; Abecasis, Gonçalo; Marchini, Jonathan

    2017-01-01

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently. PMID:27548312

  12. A reference panel of 64,976 haplotypes for genotype imputation.

    PubMed

    McCarthy, Shane; Das, Sayantan; Kretzschmar, Warren; Delaneau, Olivier; Wood, Andrew R; Teumer, Alexander; Kang, Hyun Min; Fuchsberger, Christian; Danecek, Petr; Sharp, Kevin; Luo, Yang; Sidore, Carlo; Kwong, Alan; Timpson, Nicholas; Koskinen, Seppo; Vrieze, Scott; Scott, Laura J; Zhang, He; Mahajan, Anubha; Veldink, Jan; Peters, Ulrike; Pato, Carlos; van Duijn, Cornelia M; Gillies, Christopher E; Gandin, Ilaria; Mezzavilla, Massimo; Gilly, Arthur; Cocca, Massimiliano; Traglia, Michela; Angius, Andrea; Barrett, Jeffrey C; Boomsma, Dorrett; Branham, Kari; Breen, Gerome; Brummett, Chad M; Busonero, Fabio; Campbell, Harry; Chan, Andrew; Chen, Sai; Chew, Emily; Collins, Francis S; Corbin, Laura J; Smith, George Davey; Dedoussis, George; Dorr, Marcus; Farmaki, Aliki-Eleni; Ferrucci, Luigi; Forer, Lukas; Fraser, Ross M; Gabriel, Stacey; Levy, Shawn; Groop, Leif; Harrison, Tabitha; Hattersley, Andrew; Holmen, Oddgeir L; Hveem, Kristian; Kretzler, Matthias; Lee, James C; McGue, Matt; Meitinger, Thomas; Melzer, David; Min, Josine L; Mohlke, Karen L; Vincent, John B; Nauck, Matthias; Nickerson, Deborah; Palotie, Aarno; Pato, Michele; Pirastu, Nicola; McInnis, Melvin; Richards, J Brent; Sala, Cinzia; Salomaa, Veikko; Schlessinger, David; Schoenherr, Sebastian; Slagboom, P Eline; Small, Kerrin; Spector, Timothy; Stambolian, Dwight; Tuke, Marcus; Tuomilehto, Jaakko; Van den Berg, Leonard H; Van Rheenen, Wouter; Volker, Uwe; Wijmenga, Cisca; Toniolo, Daniela; Zeggini, Eleftheria; Gasparini, Paolo; Sampson, Matthew G; Wilson, James F; Frayling, Timothy; de Bakker, Paul I W; Swertz, Morris A; McCarroll, Steven; Kooperberg, Charles; Dekker, Annelot; Altshuler, David; Willer, Cristen; Iacono, William; Ripatti, Samuli; Soranzo, Nicole; Walter, Klaudia; Swaroop, Anand; Cucca, Francesco; Anderson, Carl A; Myers, Richard M; Boehnke, Michael; McCarthy, Mark I; Durbin, Richard

    2016-10-01

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

  13. Effect of reference population size and available ancestor genotypes on imputation of Mexican Holstein genotypes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The effects of reference population size and the availability of information from genotyped ancestors on the accuracy of imputation of single nucleotide polymorphisms (SNPs) were investigated for Mexican Holstein cattle. Three scenarios for reference population size were examined: (1) a local popula...

  14. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    PubMed

    Kim, Kwangwoo; Bang, So-Young; Lee, Hye-Soon; Bae, Sang-Cheol

    2014-01-01

    Genetic variations of human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs) at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  15. The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data.

    PubMed

    Meuwissen, Theo; Goddard, Mike

    2010-08-01

    A novel method, called linkage disequilibrium multilocus iterative peeling (LDMIP), for the imputation of phase and missing genotypes is developed. LDMIP performs an iterative peeling step for every locus, which accounts for the family data, and uses a forward-backward algorithm to accumulate information across loci. Marker similarity between haplotype pairs is used to impute possible missing genotypes and phases, which relies on the linkage disequilibrium between closely linked markers. After this imputation step, the combined iterative peeling/forward-backward algorithm is applied again, until convergence. The calculations per iteration scale linearly with number of markers and number of individuals in the pedigree, which makes LDMIP well suited to large numbers of markers and/or large numbers of individuals. Per iteration calculations scale quadratically with the number of alleles, which implies biallelic markers are preferred. In a situation with up to 15% randomly missing genotypes, the error rate of the imputed genotypes was <1% and approximately 99% of the missing genotypes were imputed. In another example, LDMIP was used to impute whole-genome sequence data consisting of 17,321 SNPs on a chromosome. Imputation of the sequence was based on the information of 20 (re)sequenced founder individuals and genotyping their descendants for a panel of 3000 SNPs. The error rate of the imputed SNP genotypes was 10%. However, if the parents of these 20 founders are also sequenced, >99% of missing genotypes are imputed correctly.

  16. The utility of low-density genotyping for imputation in the Thoroughbred horse

    PubMed Central

    2014-01-01

    Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming

  17. Japonica array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals

    PubMed Central

    Kawai, Yosuke; Mimori, Takahiro; Kojima, Kaname; Nariai, Naoki; Danjoh, Inaho; Saito, Rumiko; Yasuda, Jun; Yamamoto, Masayuki; Nagasaki, Masao

    2015-01-01

    The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659 253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r2>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%imputations. PMID:26108142

  18. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    PubMed Central

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-01-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase. PMID:27762341

  19. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    NASA Astrophysics Data System (ADS)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  20. Genotype Imputation with Millions of Reference Samples.

    PubMed

    Browning, Brian L; Browning, Sharon R

    2016-01-07

    We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle's throughput was more than 100× greater than Impute2's throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs.

  1. Genotype Imputation To Improve the Cost-Efficiency of Genomic Selection in Farmed Atlantic Salmon

    PubMed Central

    Tsai, Hsin-Yuan; Matika, Oswald; Edwards, Stefan McKinnon; Antolín–Sánchez, Roberto; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Gharbi, Karim; Stear, Michael J.; Taggart, John B.; Bron, James E.; Hickey, John M.; Houston, Ross D.

    2017-01-01

    Genomic selection uses genome-wide marker information to predict breeding values for traits of economic interest, and is more accurate than pedigree-based methods. The development of high density SNP arrays for Atlantic salmon has enabled genomic selection in selective breeding programs, alongside high-resolution association mapping of the genetic basis of complex traits. However, in sibling testing schemes typical of salmon breeding programs, trait records are available on many thousands of fish with close relationships to the selection candidates. Therefore, routine high density SNP genotyping may be prohibitively expensive. One means to reducing genotyping cost is the use of genotype imputation, where selected key animals (e.g., breeding program parents) are genotyped at high density, and the majority of individuals (e.g., performance tested fish and selection candidates) are genotyped at much lower density, followed by imputation to high density. The main objectives of the current study were to assess the feasibility and accuracy of genotype imputation in the context of a salmon breeding program. The specific aims were: (i) to measure the accuracy of genotype imputation using medium (25 K) and high (78 K) density mapped SNP panels, by masking varying proportions of the genotypes and assessing the correlation between the imputed genotypes and the true genotypes; and (ii) to assess the efficacy of imputed genotype data in genomic prediction of key performance traits (sea lice resistance and body weight). Imputation accuracies of up to 0.90 were observed using the simple two-generation pedigree dataset, and moderately high accuracy (0.83) was possible even with very low density SNP data (∼250 SNPs). The performance of genomic prediction using imputed genotype data was comparable to using true genotype data, and both were superior to pedigree-based prediction. These results demonstrate that the genotype imputation approach used in this study can provide a cost

  2. Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.

    PubMed

    Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun

    2016-05-01

    DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS).

  3. What Improves with Increased Missing Data Imputations?

    ERIC Educational Resources Information Center

    Bodner, Todd E.

    2008-01-01

    When using multiple imputation in the analysis of incomplete data, a prominent guideline suggests that more than 10 imputed data values are seldom needed. This article calls into question the optimism of this guideline and illustrates that important quantities (e.g., p values, confidence interval half-widths, and estimated fractions of missing…

  4. 16 CFR 1115.11 - Imputed knowledge.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 16 Commercial Practices 2 2010-01-01 2010-01-01 false Imputed knowledge. 1115.11 Section 1115.11... PRODUCT HAZARD REPORTS General Interpretation § 1115.11 Imputed knowledge. (a) In evaluating whether or... care to ascertain the truth of complaints or other representations. This includes the knowledge a...

  5. Imputation of missing data in time series for air pollutants

    NASA Astrophysics Data System (ADS)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  6. Fast imputation using medium- or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  7. A Study of Imputation Algorithms. Working Paper Series.

    ERIC Educational Resources Information Center

    Hu, Ming-xiu; Salvucci, Sameena

    Many imputation techniques and imputation software packages have been developed over the years to deal with missing data. Different methods may work well under different circumstances, and it is advisable to conduct a sensitivity analysis when choosing an imputation method for a particular survey. This study reviewed about 30 imputation methods…

  8. One Thousand Genomes Imputation in the National Cancer Institute Breast and Prostate Cancer Cohort Consortium Aggressive Prostate Cancer Genome-wide Association Study

    PubMed Central

    Machiela, Mitchell J.; Chen, Constance; Liang, Liming; Diver, W. Ryan; Stevens, Victoria L.; Tsilidis, Konstantinos K.; Haiman, Christopher A.; Chanock, Stephen J.; Hunter, David J.; Kraft, Peter

    2014-01-01

    BACKGROUND Genotype imputation substantially increases available markers for analysis in genome-wide association studies (GWAS) by leveraging linkage disequilibrium from a reference panel. We sought to (i) investigate the performance of imputation from the August 2010 release of the 1000 Genomes Project (1000GP) in an existing GWAS of prostate cancer, (ii) look for novel associations with prostate cancer risk, (iii) fine-map known prostate cancer susceptibility regions using an approximate Bayesian framework and stepwise regression, and (iv) compare power and efficiency of imputation and de novo sequencing. METHODS We used 2,782 aggressive prostate cancer cases and 4,458 controls from the NCI Breast and Prostate Cancer Cohort Consortium aggressive prostate cancer GWAS to infer 5.8 million well-imputed autosomal single nucleotide polymorphisms. RESULTS Imputation quality, as measured by correlation between imputed and true allele counts, was higher among common variants than rare variants. We found no novel prostate cancer associations among a subset of 1.2 million well-imputed low-frequency variants. At a genome-wide sequencing cost of $2,500, imputation from SNP arrays is a more powerful strategy than sequencing for detecting disease associations of SNPs with minor allele frequencies above 1%. CONCLUSIONS 1000GP imputation provided dense coverage of previously-identified prostate cancer susceptibility regions, highlighting its potential as an inexpensive first-pass approach to fine-mapping in regions such as 5p15 and 8q24. Our study shows 1000GP imputation can accurately identify low-frequency variants and stresses the importance of large sample size when studying these variants. PMID:23255287

  9. Comparison of HLA allelic imputation programs.

    PubMed

    Karnes, Jason H; Shaffer, Christian M; Bastarache, Lisa; Gaudieri, Silvana; Glazer, Andrew M; Steiner, Heidi E; Mosley, Jonathan D; Mallal, Simon; Denny, Joshua C; Phillips, Elizabeth J; Roden, Dan M

    2017-01-01

    Imputation of human leukocyte antigen (HLA) alleles from SNP-level data is attractive due to importance of HLA alleles in human disease, widespread availability of genome-wide association study (GWAS) data, and expertise required for HLA sequencing. However, comprehensive evaluations of HLA imputations programs are limited. We compared HLA imputation results of HIBAG, SNP2HLA, and HLA*IMP:02 to sequenced HLA alleles in 3,265 samples from BioVU, a de-identified electronic health record database coupled to a DNA biorepository. We performed four-digit HLA sequencing for HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1 using long-read 454 FLX sequencing. All samples were genotyped using both the Illumina HumanExome BeadChip platform and a GWAS platform. Call rates and concordance rates were compared by platform, frequency of allele, and race/ethnicity. Overall concordance rates were similar between programs in European Americans (EA) (0.975 [SNP2HLA]; 0.939 [HLA*IMP:02]; 0.976 [HIBAG]). SNP2HLA provided a significant advantage in terms of call rate and the number of alleles imputed. Concordance rates were lower overall for African Americans (AAs). These observations were consistent when accuracy was compared across HLA loci. All imputation programs performed similarly for low frequency HLA alleles. Higher concordance rates were observed when HLA alleles were imputed from GWAS platforms versus the HumanExome BeadChip, suggesting that high genomic coverage is preferred as input for HLA allelic imputation. These findings provide guidance on the best use of HLA imputation methods and elucidate their limitations.

  10. Comparison of HLA allelic imputation programs

    PubMed Central

    Shaffer, Christian M.; Bastarache, Lisa; Gaudieri, Silvana; Glazer, Andrew M.; Steiner, Heidi E.; Mosley, Jonathan D.; Mallal, Simon; Denny, Joshua C.; Phillips, Elizabeth J.; Roden, Dan M.

    2017-01-01

    Imputation of human leukocyte antigen (HLA) alleles from SNP-level data is attractive due to importance of HLA alleles in human disease, widespread availability of genome-wide association study (GWAS) data, and expertise required for HLA sequencing. However, comprehensive evaluations of HLA imputations programs are limited. We compared HLA imputation results of HIBAG, SNP2HLA, and HLA*IMP:02 to sequenced HLA alleles in 3,265 samples from BioVU, a de-identified electronic health record database coupled to a DNA biorepository. We performed four-digit HLA sequencing for HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1 using long-read 454 FLX sequencing. All samples were genotyped using both the Illumina HumanExome BeadChip platform and a GWAS platform. Call rates and concordance rates were compared by platform, frequency of allele, and race/ethnicity. Overall concordance rates were similar between programs in European Americans (EA) (0.975 [SNP2HLA]; 0.939 [HLA*IMP:02]; 0.976 [HIBAG]). SNP2HLA provided a significant advantage in terms of call rate and the number of alleles imputed. Concordance rates were lower overall for African Americans (AAs). These observations were consistent when accuracy was compared across HLA loci. All imputation programs performed similarly for low frequency HLA alleles. Higher concordance rates were observed when HLA alleles were imputed from GWAS platforms versus the HumanExome BeadChip, suggesting that high genomic coverage is preferred as input for HLA allelic imputation. These findings provide guidance on the best use of HLA imputation methods and elucidate their limitations. PMID:28207879

  11. Dual imputation model for incomplete longitudinal data.

    PubMed

    Jolani, Shahab; Frank, Laurence E; van Buuren, Stef

    2014-05-01

    Missing values are a practical issue in the analysis of longitudinal data. Multiple imputation (MI) is a well-known likelihood-based method that has optimal properties in terms of efficiency and consistency if the imputation model is correctly specified. Doubly robust (DR) weighing-based methods protect against misspecification bias if one of the models, but not necessarily both, for the data or the mechanism leading to missing data is correct. We propose a new imputation method that captures the simplicity of MI and protection from the DR method. This method integrates MI and DR to protect against misspecification of the imputation model under a missing at random assumption. Our method avoids analytical complications of missing data particularly in multivariate settings, and is easy to implement in standard statistical packages. Moreover, the proposed method works very well with an intermittent pattern of missingness when other DR methods can not be used. Simulation experiments show that the proposed approach achieves improved performance when one of the models is correct. The method is applied to data from the fireworks disaster study, a randomized clinical trial comparing therapies in disaster-exposed children. We conclude that the new method increases the robustness of imputations.

  12. Multiple imputation: dealing with missing data.

    PubMed

    de Goeij, Moniek C M; van Diepen, Merel; Jager, Kitty J; Tripepi, Giovanni; Zoccali, Carmine; Dekker, Friedo W

    2013-10-01

    In many fields, including the field of nephrology, missing data are unfortunately an unavoidable problem in clinical/epidemiological research. The most common methods for dealing with missing data are complete case analysis-excluding patients with missing data--mean substitution--replacing missing values of a variable with the average of known values for that variable-and last observation carried forward. However, these methods have severe drawbacks potentially resulting in biased estimates and/or standard errors. In recent years, a new method has arisen for dealing with missing data called multiple imputation. This method predicts missing values based on other data present in the same patient. This procedure is repeated several times, resulting in multiple imputed data sets. Thereafter, estimates and standard errors are calculated in each imputation set and pooled into one overall estimate and standard error. The main advantage of this method is that missing data uncertainty is taken into account. Another advantage is that the method of multiple imputation gives unbiased results when data are missing at random, which is the most common type of missing data in clinical practice, whereas conventional methods do not. However, the method of multiple imputation has scarcely been used in medical literature. We, therefore, encourage authors to do so in the future when possible.

  13. Automatic Treatment Planning with Convex Imputing

    NASA Astrophysics Data System (ADS)

    Sayre, G. A.; Ruan, D.

    2014-03-01

    Current inverse optimization-based treatment planning for radiotherapy requires a set of complex DVH objectives to be simultaneously minimized. This process, known as multi-objective optimization, is challenging due to non-convexity in individual objectives and insufficient knowledge in the tradeoffs among the objective set. As such, clinical practice involves numerous iterations of human intervention that is costly and often inconsistent. In this work, we propose to address treatment planning with convex imputing, a new-data mining technique that explores the existence of a latent convex objective whose optimizer reflects the DVH and dose-shaping properties of previously optimized cases. Using ten clinical prostate cases as the basis for comparison, we imputed a simple least-squares problem from the optimized solutions of the prostate cases, and show that the imputed plans are more consistent than their clinical counterparts in achieving planning goals.

  14. Selecting SNPs to Identify Ancestry

    PubMed Central

    Sampson, Joshua; Kidd, Kenneth K; Kidd, Judith R; Zhao, Hongyu

    2011-01-01

    Background/Aims An individual’s genotypes at a group of Single Nucleotide Polymorphisms (SNPs) can be used to predict that individual’s ethnicity, or ancestry. In medical studies, knowledge of a subject’s ancestry can minimize possible confounding, and in forensic applications, such knowledge can help direct investigations. Our goal is to select a small subset of SNPs, from the millions already identified in the human genome, that can predict ancestry with a minimal error rate. Methods The general form for this variable selection procedure is to estimate the expected error rates for sets of SNPs using a training dataset and consider those sets with the lowest error rates given their size. The quality of the estimate for the error rate determines the quality of the resulting SNPs. As the apparent error rate performs poorly when either the number of SNPs or the number of populations is large, we propose a new estimate, the Improved Bayesian Estimate. Conclusions We demonstrate that selection procedures based on this estimate produce small sets of SNPs that can accurately predict ancestry. We also provide a list of the 100 optimal SNPs for identifying ancestry. R functions are available at http://bioinformatics.med.yale.edu/group/josh/index.html. PMID:21668909

  15. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  16. Multiple Imputation of Multilevel Missing Data-Rigor versus Simplicity

    ERIC Educational Resources Information Center

    Drechsler, Jörg

    2015-01-01

    Multiple imputation is widely accepted as the method of choice to address item-nonresponse in surveys. However, research on imputation strategies for the hierarchical structures that are typically found in the data in educational contexts is still limited. While a multilevel imputation model should be preferred from a theoretical point of view if…

  17. Marker imputation in barley association studies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Association mapping requires higher marker density than linkage mapping, potentially leading to more missing marker data and to higher genotyping costs. In human genetics, methods exist to impute missing marker data and whole markers that were typed in a reference panel but not in the experimental d...

  18. Missing value imputation strategies for metabolomics data.

    PubMed

    Armitage, Emily Grace; Godzien, Joanna; Alonso-Herranz, Vanesa; López-Gonzálvez, Ángeles; Barbas, Coral

    2015-12-01

    The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.

  19. Short communication: imputing genotypes using PedImpute fast algorithm combining pedigree and population information.

    PubMed

    Nicolazzi, E L; Biffani, S; Jansen, G

    2013-04-01

    Routine genomic evaluations frequently include a preliminary imputation step, requiring high accuracy and reduced computing time. A new algorithm, PedImpute (http://dekoppel.eu/pedimpute/), was developed and compared with findhap (http://aipl.arsusda.gov/software/findhap/) and BEAGLE (http://faculty.washington.edu/browning/beagle/beagle.html), using 19,904 Holstein genotypes from a 4-country international collaboration (United States, Canada, UK, and Italy). Different scenarios were evaluated on a sample subset that included only single nucleotide polymorphism from the Bovine low-density (LD) Illumina BeadChip (Illumina Inc., San Diego, CA). Comparative criteria were computing time, percentage of missing alleles, percentage of wrongly imputed alleles, and the allelic squared correlation. Imputation accuracy on ungenotyped animals was also analyzed. The algorithm PedImpute was slightly more accurate and faster than findhap and BEAGLE when sire, dam, and maternal grandsire were genotyped at high density. On the other hand, BEAGLE performed better than both PedImpute and findhap for animals with at least one close relative not genotyped or genotyped at low density. However, computing time and resources using BEAGLE were incompatible with routine genomic evaluations in Italy. Error rate and allelic squared correlation attained by PedImpute ranged from 0.2 to 1.1% and from 96.6 to 99.3%, respectively. When complete genomic information on sire, dam, and maternal grandsire are available, as expected to be the case in the close future in (at least) dairy cattle, and considering accuracies obtained and computation time required, PedImpute represents a valuable choice in routine evaluations among the algorithms tested.

  20. Data Driven Estimation of Imputation Error—A Strategy for Imputation with a Reject Option

    PubMed Central

    Bak, Nikolaj; Hansen, Lars K.

    2016-01-01

    Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values indiscriminately. We note that the effects of imputation can be strongly dependent on what is missing. To help make decisions about which records should be imputed, we propose to use a machine learning approach to estimate the imputation error for each case with missing data. The method is thought to be a practical approach to help users using imputation after the informed choice to impute the missing data has been made. To do this all patterns of missing values are simulated in all complete cases, enabling calculation of the “true error” in each of these new cases. The error is then estimated for each case with missing values by weighing the “true errors” by similarity. The method can also be used to test the performance of different imputation methods. A universal numerical threshold of acceptable error cannot be set since this will differ according to the data, research question, and analysis method. The effect of threshold can be estimated using the complete cases. The user can set an a priori relevant threshold for what is acceptable or use cross validation with the final analysis to choose the threshold. The choice can be presented along with argumentation for the choice rather than holding to conventions that might not be warranted in the specific dataset. PMID:27723782

  1. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    PubMed Central

    McClure, Matthew C.; Sonstegard, Tad S.; Wiggans, George R.; Van Eenennaam, Alison L.; Weber, Kristina L.; Penedo, Cecilia T.; Berry, Donagh P.; Flynn, John; Garcia, Jose F.; Carmo, Adriana S.; Regitano, Luciana C. A.; Albuquerque, Milla; Silva, Marcos V. G. B.; Machado, Marco A.; Coffey, Mike; Moore, Kirsty; Boscher, Marie-Yvonne; Genestout, Lucie; Mazza, Raffaele; Taylor, Jeremy F.; Schnabel, Robert D.; Simpson, Barry; Marques, Elisa; McEwan, John C.; Cromie, Andrew; Coutinho, Luiz L.; Kuehn, Larry A.; Keele, John W.; Piper, Emily K.; Cook, Jim; Williams, Robert; Van Tassell, Curtis P.

    2013-01-01

    To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ≤1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset. PMID:24065982

  2. Comparing performance of modern genotype imputation methods in different ethnicities

    NASA Astrophysics Data System (ADS)

    Roshyara, Nab Raj; Horn, Katrin; Kirsten, Holger; Ahnert, Peter; Scholz, Markus

    2016-10-01

    A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei’s GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes.

  3. Comparing performance of modern genotype imputation methods in different ethnicities

    PubMed Central

    Roshyara, Nab Raj; Horn, Katrin; Kirsten, Holger; Ahnert, Peter; Scholz, Markus

    2016-01-01

    A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei’s GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes. PMID:27698363

  4. Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs

    PubMed Central

    2012-01-01

    Background We explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. The following issues were evaluated: (a) the impact of different reference panels (HapMap vs. 1000 Genomes) on imputation; (b) potential differences in imputation performance between single-step vs. two-step (phasing and imputation) approaches; (c) the effect of different INFO score thresholds on imputation performance and (d) imputation performance in common vs. rare markers. Methods The sample from Mexico City comprised 1,310 individuals genotyped with the Affymetrix 5.0 array. We randomly masked 5% of the markers directly genotyped on chromosome 12 (n = 1,046) and compared the imputed genotypes with the microarray genotype calls. Imputation was carried out with the program IMPUTE. The concordance rates between the imputed and observed genotypes were used as a measure of imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy. Results The single-step imputation approach produced slightly higher concordance rates than the two-step strategy (99.1% vs. 98.4% when using the HapMap phase II combined panel), but at the expense of a lower proportion of non-missing genotypes (85.5% vs. 90.1%). The 1,000 Genomes reference sample produced similar concordance rates to the HapMap phase II panel (98.4% for both datasets, using the two-step strategy). However, the 1000 Genomes reference sample increased substantially the proportion of non-missing genotypes (94.7% vs. 90.1%). Rare variants (<1%) had lower imputation accuracy and efficacy than common markers. Conclusions The program IMPUTE had an excellent imputation performance for common alleles in an admixed sample from Mexico City, which has primarily Native American (62%) and European (33%) contributions. Genotype concordances were higher than 98.4% using all the imputation strategies, in spite of the fact that no Native American samples are present in the HapMap and

  5. Multiple imputation for an incomplete covariate that is a ratio.

    PubMed

    Morris, Tim P; White, Ian R; Royston, Patrick; Seaman, Shaun R; Wood, Angela M

    2014-01-15

    We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable.

  6. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

    PubMed

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-09-15

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates.

  7. The identification of SNPs with indeterminate positions using the Equine SNP50 BeadChip.

    PubMed

    Corbin, L J; Blott, S C; Swinburne, J E; Vaudin, M; Bishop, S C; Woolliams, J A

    2012-06-01

    We have used linkage disequilibrium (LD) to identify single nucleotide polymorphisms (SNPs) on the Illumina Equine SNP50 BeadChip, which may be incorrectly positioned on the genome map. A total of 1201 Thoroughbred horses were genotyped using the Illumina Equine SNP50 BeadChip. LD was evaluated in a pairwise fashion between all autosomal SNPs, both within and across chromosomes. Filters were then applied to the data, firstly to identify SNPs that may have been mapped to the wrong chromosome and secondly to identify SNPs that may have been incorrectly positioned within chromosomes. We identified a single SNP on ECA28, which showed low LD with neighbouring SNPs but considerable LD with a group of SNPs on ECA10. Furthermore, a cluster of SNPs on ECA5 showed unusually low LD with surrounding SNPs. A total of 39 SNPs met the criteria for unusual within-chromosome LD. The results of this study indicate that some SNPs may be misplaced. This finding is significant, as misplaced SNPs may lead to difficulties in the application of genomic methods, such as homozygosity mapping, for which SNP order is important.

  8. Analysis of Variance of Multiply Imputed Data.

    PubMed

    van Ginkel, Joost R; Kroonenberg, Pieter M

    2014-01-01

    As a procedure for handling missing data, Multiple imputation consists of estimating the missing data multiple times to create several complete versions of an incomplete data set. All these data sets are analyzed by the same statistical procedure, and the results are pooled for interpretation. So far, no explicit rules for pooling F-tests of (repeated-measures) analysis of variance have been defined. In this paper we outline the appropriate procedure for the results of analysis of variance for multiply imputed data sets. It involves both reformulation of the ANOVA model as a regression model using effect coding of the predictors and applying already existing combination rules for regression models. The proposed procedure is illustrated using three example data sets. The pooled results of these three examples provide plausible F- and p-values.

  9. Clustering with Missing Values: No Imputation Required

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  10. When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?

    PubMed Central

    Ramnarine, Shelina; Zhang, Juan; Chen, Li-Shiun; Culverhouse, Robert; Duan, Weimin; Hancock, Dana B.; Hartz, Sarah M.; Johnson, Eric O.; Olfson, Emily; Schwantes-An, Tae-Hwi; Saccone, Nancy L.

    2015-01-01

    Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen’s kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants. PMID:26458263

  11. Mining SNPs from EST databases.

    PubMed

    Picoult-Newberg, L; Ideker, T E; Pohl, M G; Taylor, S L; Donaldson, M A; Nickerson, D A; Boyce-Jacino, M

    1999-02-01

    There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs.

  12. Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm.

    PubMed

    Hoffmann, Thomas J; Zhan, Yiping; Kvale, Mark N; Hesselson, Stephanie E; Gollub, Jeremy; Iribarren, Carlos; Lu, Yontao; Mei, Gangwu; Purdy, Matthew M; Quesenberry, Charles; Rowell, Sarah; Shapero, Michael H; Smethurst, David; Somkin, Carol P; Van den Eeden, Stephen K; Walter, Larry; Webster, Teresa; Whitmer, Rachel A; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil

    2011-12-01

    Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies.

  13. Multiple Imputation of Missing Composite Outcomes in Longitudinal Data.

    PubMed

    O'Keeffe, Aidan G; Farewell, Daniel M; Tom, Brian D M; Farewell, Vernon T

    2016-01-01

    In longitudinal randomised trials and observational studies within a medical context, a composite outcome-which is a function of several individual patient-specific outcomes-may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome.

  14. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

  15. A multiple imputation strategy for incomplete longitudinal data.

    PubMed

    Landrum, M B; Becker, M P

    Longitudinal studies are commonly used to study processes of change. Because data are collected over time, missing data are pervasive in longitudinal studies, and complete ascertainment of all variables is rare. In this paper a new imputation strategy for completing longitudinal data sets is proposed. The proposed methodology makes use of shrinkage estimators for pooling information across geographic entities, and of model averaging for pooling predictions across different statistical models. Bayes factors are used to compute weights (probabilities) for a set of models considered to be reasonable for at least some of the units for which imputations must be produced, imputations are produced by draws from the predictive distributions of the missing data, and multiple imputations are used to better reflect selected sources of uncertainty in the imputation process. The imputation strategy is developed within the context of an application to completing incomplete longitudinal variables in the so-called Area Resource File. The proposed procedure is compared with several other imputation procedures in terms of inferences derived with the imputations, and the proposed methodology is demonstrated to provide valid estimates of model parameters when the completed data are analysed. Extensions to other missing data problems in longitudinal studies are straightforward so long as the missing data mechanism can be assumed to be ignorable.

  16. Geometric median for missing rainfall data imputation

    NASA Astrophysics Data System (ADS)

    Burhanuddin, Siti Nur Zahrah Amin; Deni, Sayang Mohd; Ramli, Norazan Mohamed

    2015-02-01

    Missing data is a common problem faced by researchers in environmental studies. Environmental data, particularly, rainfall data are highly vulnerable to be missed, which is due to several reasons, such as malfunction instrument, incorrect measurements, and relocation of stations. Rainfall data are also affected by the presence of outliers due to the temporal and spatial variability of rainfall measurements. These problems may harm the quality of rainfall data and subsequently, produce inaccuracy in the results of analysis. Thus, this study is aimed to propose an imputation method that is robust towards the presence of outliers for treating the missing rainfall data. Geometric median was applied to estimate the missing values based on the available rainfall data from neighbouring stations. The method was compared with several conventional methods, such as normal ratio and inverse distance weighting methods, in order to evaluate its performance. Thirteen rainfall stations in Peninsular Malaysia were selected for the application of the imputation methods. The results indicated that the proposed method provided the most accurate estimation values compared to both conventional methods based on the least mean absolute error. The normal ratio was found to be the worst method in estimating the missing rainfall values.

  17. How to Improve Postgenomic Knowledge Discovery Using Imputation

    PubMed Central

    2009-01-01

    While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN) reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures. PMID:19223972

  18. Meta-analysis and imputation refines the association of 15q25 with smoking quantity.

    PubMed

    Liu, Jason Z; Tozzi, Federica; Waterworth, Dawn M; Pillai, Sreekumar G; Muglia, Pierandrea; Middleton, Lefkos; Berrettini, Wade; Knouff, Christopher W; Yuan, Xin; Waeber, Gérard; Vollenweider, Peter; Preisig, Martin; Wareham, Nicholas J; Zhao, Jing Hua; Loos, Ruth J F; Barroso, Inês; Khaw, Kay-Tee; Grundy, Scott; Barter, Philip; Mahley, Robert; Kesaniemi, Antero; McPherson, Ruth; Vincent, John B; Strauss, John; Kennedy, James L; Farmer, Anne; McGuffin, Peter; Day, Richard; Matthews, Keith; Bakke, Per; Gulsvik, Amund; Lucae, Susanne; Ising, Marcus; Brueckl, Tanja; Horstmann, Sonja; Wichmann, H-Erich; Rawal, Rajesh; Dahmen, Norbert; Lamina, Claudia; Polasek, Ozren; Zgaga, Lina; Huffman, Jennifer; Campbell, Susan; Kooner, Jaspal; Chambers, John C; Burnett, Mary Susan; Devaney, Joseph M; Pichard, Augusto D; Kent, Kenneth M; Satler, Lowell; Lindsay, Joseph M; Waksman, Ron; Epstein, Stephen; Wilson, James F; Wild, Sarah H; Campbell, Harry; Vitart, Veronique; Reilly, Muredach P; Li, Mingyao; Qu, Liming; Wilensky, Robert; Matthai, William; Hakonarson, Hakon H; Rader, Daniel J; Franke, Andre; Wittig, Michael; Schäfer, Arne; Uda, Manuela; Terracciano, Antonio; Xiao, Xiangjun; Busonero, Fabio; Scheet, Paul; Schlessinger, David; St Clair, David; Rujescu, Dan; Abecasis, Gonçalo R; Grabe, Hans Jörgen; Teumer, Alexander; Völzke, Henry; Petersmann, Astrid; John, Ulrich; Rudan, Igor; Hayward, Caroline; Wright, Alan F; Kolcic, Ivana; Wright, Benjamin J; Thompson, John R; Balmforth, Anthony J; Hall, Alistair S; Samani, Nilesh J; Anderson, Carl A; Ahmad, Tariq; Mathew, Christopher G; Parkes, Miles; Satsangi, Jack; Caulfield, Mark; Munroe, Patricia B; Farrall, Martin; Dominiczak, Anna; Worthington, Jane; Thomson, Wendy; Eyre, Steve; Barton, Anne; Mooser, Vincent; Francks, Clyde; Marchini, Jonathan

    2010-05-01

    Smoking is a leading global cause of disease and mortality. We established the Oxford-GlaxoSmithKline study (Ox-GSK) to perform a genome-wide meta-analysis of SNP association with smoking-related behavioral traits. Our final data set included 41,150 individuals drawn from 20 disease, population and control cohorts. Our analysis confirmed an effect on smoking quantity at a locus on 15q25 (P = 9.45 x 10(-19)) that includes CHRNA5, CHRNA3 and CHRNB4, three genes encoding neuronal nicotinic acetylcholine receptor subunits. We used data from the 1000 Genomes project to investigate the region using imputation, which allowed for analysis of virtually all common SNPs in the region and offered a fivefold increase in marker density over HapMap2 (ref. 2) as an imputation reference panel. Our fine-mapping approach identified a SNP showing the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.

  19. Multiple imputation in the presence of non-normal data.

    PubMed

    Lee, Katherine J; Carlin, John B

    2017-02-20

    Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data.

    PubMed

    Fragoso, Christopher A; Heffelfinger, Christopher; Zhao, Hongyu; Dellaporta, Stephen L

    2016-02-01

    Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.

  1. An imputation-based genome-wide association study on traits related to male reproduction in a White Duroc × Erhualian F2 population.

    PubMed

    Zhao, Xueyan; Zhao, Kewei; Ren, Jun; Zhang, Feng; Jiang, Chao; Hong, Yuan; Jiang, Kai; Yang, Qiang; Wang, Chengbin; Ding, Nengshui; Huang, Lusheng; Zhang, Zhiyan; Xing, Yuyun

    2016-05-01

    Boar reproductive traits are economically important for the pig industry. Here we conducted a genome-wide association study (GWAS) for 13 reproductive traits measured on 205 F2 boars at day 300 using 60 K single nucleotide polymorphism (SNP) data imputed from a reference panel of 1200 pigs in a White Duroc × Erhualian F2 intercross population. We identified 10 significant loci for seven traits on eight pig chromosomes (SSC). Two loci surpassed the genome-wide significance level, including one for epididymal weight around 60.25 Mb on SSC7 and one for semen temperature around 43.69 Mb on SSC4. Four of the 10 significant loci that we identified were consistent with previously reported quantitative trait loci for boar reproduction traits. We highlighted several interesting candidate genes at these loci, including APN, TEP1, PARP2, SPINK1 and PDE1C. To evaluate the imputation accuracy, we further genotyped nine GWAS top SNPs using PCR restriction fragment length polymorphism or Sanger sequencing. We found an average of 91.44% of genotype concordance, 95.36% of allelic concordance and 0.85 of r(2) correlation between imputed and real genotype data. This indicates that our GWAS mapping results based on imputed SNP data are reliable, providing insights into the genetic basis of boar reproductive traits.

  2. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs

    PubMed Central

    Pistis, Giorgio; Porcu, Eleonora; Vrieze, Scott I; Sidore, Carlo; Steri, Maristella; Danjou, Fabrice; Busonero, Fabio; Mulas, Antonella; Zoledziewska, Magdalena; Maschio, Andrea; Brennan, Christine; Lai, Sandra; Miller, Michael B; Marcelli, Marco; Urru, Maria Francesca; Pitzalis, Maristella; Lyons, Robert H; Kang, Hyun M; Jones, Chris M; Angius, Andrea; Iacono, William G; Schlessinger, David; McGue, Matt; Cucca, Francesco; Abecasis, Gonçalo R; Sanna, Serena

    2015-01-01

    The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies. PMID:25293720

  3. A SPATIOTEMPORAL APPROACH FOR HIGH RESOLUTION TRAFFIC FLOW IMPUTATION

    SciTech Connect

    Han, Lee; Chin, Shih-Miao; Hwang, Ho-Ling

    2016-01-01

    Along with the rapid development of Intelligent Transportation Systems (ITS), traffic data collection technologies have been evolving dramatically. The emergence of innovative data collection technologies such as Remote Traffic Microwave Sensor (RTMS), Bluetooth sensor, GPS-based Floating Car method, automated license plate recognition (ALPR) (1), etc., creates an explosion of traffic data, which brings transportation engineering into the new era of Big Data. However, despite the advance of technologies, the missing data issue is still inevitable and has posed great challenges for research such as traffic forecasting, real-time incident detection and management, dynamic route guidance, and massive evacuation optimization, because the degree of success of these endeavors depends on the timely availability of relatively complete and reasonably accurate traffic data. A thorough literature review suggests most current imputation models, if not all, focus largely on the temporal nature of the traffic data and fail to consider the fact that traffic stream characteristics at a certain location are closely related to those at neighboring locations and utilize these correlations for data imputation. To this end, this paper presents a Kriging based spatiotemporal data imputation approach that is able to fully utilize the spatiotemporal information underlying in traffic data. Imputation performance of the proposed approach was tested using simulated scenarios and achieved stable imputation accuracy. Moreover, the proposed Kriging imputation model is more flexible compared to current models.

  4. A second generation human haplotype map of over 3.1 million SNPs

    PubMed Central

    2009-01-01

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

  5. QuickSNP: an automated web server for selection of tagSNPs

    PubMed Central

    Grover, Deepak; Woodfield, Alonzo S.; Verma, Ranjana; Zandi, Peter P.; Levinson, Douglas F.; Potash, James B.

    2007-01-01

    Although large-scale genetic association studies involving hundreds to thousands of SNPs have become feasible, the associated cost is substantial. Even with the increased efficiency introduced by the use of tagSNPs, researchers are often seeking ways to maximize resource utilization given a set of SNP-based gene-mapping goals. We have developed a web server named QuickSNP in order to provide cost-effective selection of SNPs, and to fill in some of the gaps in existing SNP selection tools. One useful feature of QuickSNP is the option to select only gene-centric SNPs from a chromosomal region in an automated fashion. Other useful features include automated selection of coding non-synonymous SNPs, SNP filtering based on inter-SNP distances and information regarding the availability of genotyping assays for SNPs and whether they are present on whole genome chips. The program produces user-friendly summary tables and results, and a link to a UCSC Genome Browser track illustrating the position of the selected tagSNPs in relation to genes and other genomic features. We hope the unique combination of features of this server will be useful for researchers aiming to select markers for their genotyping studies. The server is freely available and can be accessed at the URL http://bioinformoodics.jhmi.edu/quickSNP.pl. PMID:17517769

  6. genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools.

    PubMed

    Lemieux Perreault, Louis-Philippe; Legault, Marc-André; Asselin, Géraldine; Dubé, Marie-Pierre

    2016-12-01

    Genotype imputation is now commonly performed following genome-wide genotyping experiments. Imputation increases the density of analyzed genotypes in the dataset, enabling fine-mapping across the genome. However, the process of imputation using the most recent publicly available reference datasets can require considerable computation power and the management of hundreds of large intermediate files. We have developed genipe, a complete genome-wide imputation pipeline which includes automatic reporting, imputed data indexing and management, and a suite of statistical tests for imputed data commonly used in genetic epidemiology (Sequence Kernel Association Test, Cox proportional hazards for survival analysis, and linear mixed models for repeated measurements in longitudinal studies).

  7. Combining fractional polynomial model building with multiple imputation.

    PubMed

    Morris, Tim P; White, Ian R; Carpenter, James R; Stanworth, Simon J; Royston, Patrick

    2015-11-10

    Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only.

  8. References for Haplotype Imputation in the Big Data Era.

    PubMed

    Li, Wenzhi; Xu, Wei; Li, Qiling; Ma, Li; Song, Qing

    2015-11-01

    Imputation is a powerful in silico approach to fill in those missing values in the big datasets. This process requires a reference panel, which is a collection of big data from which the missing information can be extracted and imputed. Haplotype imputation requires ethnicity-matched references; a mismatched reference panel will significantly reduce the quality of imputation. However, currently existing big datasets cover only a small number of ethnicities, there is a lack of ethnicity-matched references for many ethnic populations in the world, which has hampered the data imputation of haplotypes and its downstream applications. To solve this issue, several approaches have been proposed and explored, including the mixed reference panel, the internal reference panel and genotype-converted reference panel. This review article provides the information and comparison between these approaches. Increasing evidence showed that not just one or two genetic elements dictate the gene activity and functions; instead, cis-interactions of multiple elements dictate gene activity. Cis-interactions require the interacting elements to be on the same chromosome molecule, therefore, haplotype analysis is essential for the investigation of cis-interactions among multiple genetic variants at different loci, and appears to be especially important for studying the common diseases. It will be valuable in a wide spectrum of applications from academic research, to clinical diagnosis, prevention, treatment, and pharmaceutical industry.

  9. Missing value imputation: with application to handwriting data

    NASA Astrophysics Data System (ADS)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  10. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    PubMed

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  11. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods

    PubMed Central

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-01-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540

  12. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…

  13. SPSS Syntax for Missing Value Imputation in Test and Questionnaire Data

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries

    2005-01-01

    A well-known problem in the analysis of test and questionnaire data is that some item scores may be missing. Advanced methods for the imputation of missing data are available, such as multiple imputation under the multivariate normal model and imputation under the saturated logistic model (Schafer, 1997). Accompanying software was made available…

  14. Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory

    PubMed Central

    Li, Haiquan; Lee, Younghee; Chen, James L; Rebman, Ellen; Li, Jianrong

    2012-01-01

    Objective Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning. Methods Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait–trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits. Results A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10−16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fisher's exact test p=0.001 and 3.5×10−7, respectively). Conclusion An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches

  15. Novel and efficient tag SNPs selection algorithms.

    PubMed

    Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2014-01-01

    SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels.

  16. Multiple imputation for cure rate quantile regression with censored data.

    PubMed

    Wu, Yuanshan; Yin, Guosheng

    2017-03-01

    The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration.

  17. Strategies to choose from millions of imputed sequence variants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of sequence variants are known, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Variant selection and imputation strategies were tested using 26 984 simulated reference bulls, of which 1 000 had 30 million sequence variants, 773 had 600 000 markers...

  18. Fast imputation using medium or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and microarray genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updat...

  19. Impact of adding foreign genomic information on Mexican Holstein imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The impact of adding US and Canada genomic information to the imputation of Mexican Holstein genotypes was measured by comparing 3 scenarios: 1) 2,018 Mexican genotyped animals; 2) animals from scenario 1 plus 886 related North American animals; and 3) animals from scenario 1 and all North American ...

  20. Accuracy of genotype imputation in Swiss cattle breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to evaluate the accuracy of imputation from Illumina Bovine3k Bead Chip (3k) and Illumina BovineLD (6k) to 54k chip information in Swiss dairy cattle breeds. Genotype data comprised of 54k SNP chip data of Original Braunvieh (OB), Brown Swiss (BS), Swiss Fleckvieh (SF...

  1. [Imputation methods for missing data in educational diagnostic evaluation].

    PubMed

    Fernández-Alonso, Rubén; Suárez-Álvarez, Javier; Muñiz, José

    2012-02-01

    In the diagnostic evaluation of educational systems, self-reports are commonly used to collect data, both cognitive and orectic. For various reasons, in these self-reports, some of the students' data are frequently missing. The main goal of this research is to compare the performance of different imputation methods for missing data in the context of the evaluation of educational systems. On an empirical database of 5,000 subjects, 72 conditions were simulated: three levels of missing data, three types of loss mechanisms, and eight methods of imputation. The levels of missing data were 5%, 10%, and 20%. The loss mechanisms were set at: Missing completely at random, moderately conditioned, and strongly conditioned. The eight imputation methods used were: listwise deletion, replacement by the mean of the scale, by the item mean, the subject mean, the corrected subject mean, multiple regression, and Expectation-Maximization (EM) algorithm, with and without auxiliary variables. The results indicate that the recovery of the data is more accurate when using an appropriate combination of different methods of recovering lost data. When a case is incomplete, the mean of the subject works very well, whereas for completely lost data, multiple imputation with the EM algorithm is recommended. The use of this combination is especially recommended when data loss is greater and its loss mechanism is more conditioned. Lastly, the results are discussed, and some future lines of research are analyzed.

  2. Investigation of Multiple Imputation in Low-Quality Questionnaire Data

    ERIC Educational Resources Information Center

    Van Ginkel, Joost R.

    2010-01-01

    The performance of multiple imputation in questionnaire data has been studied in various simulation studies. However, in practice, questionnaire data are usually more complex than simulated data. For example, items may be counterindicative or may have unacceptably low factor loadings on every subscale, or completely missing subscales may…

  3. Imputation of Missing Categorical Data by Maximizing Internal Consistency.

    ERIC Educational Resources Information Center

    van Buuren, Stef; van Rijckevorsel, Jan L. A.

    1992-01-01

    A technique is presented to transform incomplete categorical data into complete data by imputing appropriate scores into missing cells. A solution of the optimization problem is suggested, and relevant psychometric theory is discussed. The average correlation should be at least 0.50 before the method becomes practical. (SLD)

  4. Multiple Imputation Strategies for Multiple Group Structural Equation Models

    ERIC Educational Resources Information Center

    Enders, Craig K.; Gottschall, Amanda C.

    2011-01-01

    Although structural equation modeling software packages use maximum likelihood estimation by default, there are situations where one might prefer to use multiple imputation to handle missing data rather than maximum likelihood estimation (e.g., when incorporating auxiliary variables). The selection of variables is one of the nuances associated…

  5. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  6. No genetic association between attention-deficit/hyperactivity disorder (ADHD) and Parkinson's disease in nine ADHD candidate SNPs.

    PubMed

    Geissler, Julia M; Romanos, Marcel; Gerlach, Manfred; Berg, Daniela; Schulte, Claudia

    2017-02-07

    Attention-deficit/hyperactivity disorder (ADHD) and Parkinson's disease (PD) involve pathological changes in brain structures such as the basal ganglia, which are essential for the control of motor and cognitive behavior and impulsivity. The cause of ADHD and PD remains unknown, but there is increasing evidence that both seem to result from a complicated interplay of genetic and environmental factors affecting numerous cellular processes and brain regions. To explore the possibility of common genetic pathways within the respective pathophysiologies, nine ADHD candidate single nucleotide polymorphisms (SNPs) in seven genes were tested for association with PD in 5333 cases and 12,019 healthy controls: one variant, respectively, in the genes coding for synaptosomal-associated protein 25 k (SNAP25), the dopamine (DA) transporter (SLC6A3; DAT1), DA receptor D4 (DRD4), serotonin receptor 1B (HTR1B), tryptophan hydroxylase 2 (TPH2), the norepinephrine transporter SLC6A2 and three SNPs in cadherin 13 (CDH13). Information was extracted from a recent meta-analysis of five genome-wide association studies, in which 7,689,524 SNPs in European samples were successfully imputed. No significant association was observed after correction for multiple testing. Therefore, it is reasonable to conclude that candidate variants implicated in the pathogenesis of ADHD do not play a substantial role in PD.

  7. Missing data and multiple imputation in clinical epidemiological research.

    PubMed

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

  8. Missing data and multiple imputation in clinical epidemiological research

    PubMed Central

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data. PMID:28352203

  9. Family-based association analyses of imputed genotypes reveal genome-wide significant association of Alzheimer's disease with OSBPL6, PTPRG, and PDCL3.

    PubMed

    Herold, C; Hooli, B V; Mullin, K; Liu, T; Roehr, J T; Mattheisen, M; Parrado, A R; Bertram, L; Lange, C; Tanzi, R E

    2016-11-01

    The genetic basis of Alzheimer's disease (AD) is complex and heterogeneous. Over 200 highly penetrant pathogenic variants in the genes APP, PSEN1, and PSEN2 cause a subset of early-onset familial AD. On the other hand, susceptibility to late-onset forms of AD (LOAD) is indisputably associated to the ɛ4 allele in the gene APOE, and more recently to variants in more than two-dozen additional genes identified in the large-scale genome-wide association studies (GWAS) and meta-analyses reports. Taken together however, although the heritability in AD is estimated to be as high as 80%, a large proportion of the underlying genetic factors still remain to be elucidated. In this study, we performed a systematic family-based genome-wide association and meta-analysis on close to 15 million imputed variants from three large collections of AD families (~3500 subjects from 1070 families). Using a multivariate phenotype combining affection status and onset age, meta-analysis of the association results revealed three single nucleotide polymorphisms (SNPs) that achieved genome-wide significance for association with AD risk: rs7609954 in the gene PTPRG (P-value=3.98 × 10(-8)), rs1347297 in the gene OSBPL6 (P-value=4.53 × 10(-8)), and rs1513625 near PDCL3 (P-value=4.28 × 10(-8)). In addition, rs72953347 in OSBPL6 (P-value=6.36 × 10(-7)) and two SNPs in the gene CDKAL1 showed marginally significant association with LOAD (rs10456232, P-value=4.76 × 10(-7); rs62400067, P-value=3.54 × 10(-7)). In summary, family-based GWAS meta-analysis of imputed SNPs revealed novel genomic variants in (or near) PTPRG, OSBPL6, and PDCL3 that influence risk for AD with genome-wide significance.

  10. Imputation-based strategies for clinical trial longitudinal data with nonignorable missing values.

    PubMed

    Yang, Xiaowei; Li, Jinhui; Shoptaw, Steven

    2008-07-10

    Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.

  11. Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data.

    PubMed

    Kadengye, Damazo T; Cools, Wilfried; Ceulemans, Eva; Van den Noortgate, Wim

    2012-06-01

    Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.

  12. Should multiple imputation be the method of choice for handling missing data in randomized trials?

    PubMed

    Sullivan, Thomas R; White, Ian R; Salter, Amy B; Ryan, Philip; Lee, Katherine J

    2016-01-01

    The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle missing data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both missing outcome and missing baseline data, with missing outcome data induced under missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to missing outcome data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group.

  13. Identification of high utility SNPs for population assignment and traceability purposes in the pig using high-throughput sequencing.

    PubMed

    Ramos, A M; Megens, H J; Crooijmans, R P M A; Schook, L B; Groenen, M A M

    2011-12-01

    The objectives of this study were to develop breed-specific single nucleotide polymorphisms (SNPs) in five pig breeds sequenced with Illumina's Genome Analyzer and to investigate their usefulness for breed assignment purposes. DNA pools were prepared for Duroc, Landrace, Large White, Pietrain and Wild Boar. The total number of animals used for sequencing was 153. SNP discovery was performed by aligning the filtered reads against Build 7 of the pig genome. A total of 313,964 high confidence SNPs were identified and analysed for the presence of breed-specific SNPs (defined in this context as SNPs for which one of the alleles was detected in only one breed). There were 29,146 putative breed-specific SNPs identified, of which 4441 were included in the PorcineSNP60 beadchip. Upon re-examining the genotypes obtained using the beadchip, 193 SNPs were confirmed as being breed specific. These 193 SNPs were subsequently used to assign an additional 490 individuals from the same breeds, using the sequenced individuals as reference populations. In total, four breed assignment tests were performed. Results showed that for all methods tested 99% of the animals were correctly assigned, with an average probability of assignment of at least 99.2%, indicating the high utility of breed-specific markers for breed assignment and traceability. This study provides a blueprint for the way next-generation sequencing technologies can be used for the identification of breed-specific SNPs, as well as evidence that these SNPs may be a powerful tool for breed assignment and traceability of animal products to their breeds of origin.

  14. Imputation of adverse drug reactions: Causality assessment in hospitals

    PubMed Central

    Mastroianni, Patricia de Carvalho

    2017-01-01

    Background & objectives Different algorithms have been developed to standardize the causality assessment of adverse drug reactions (ADR). Although most share common characteristics, the results of the causality assessment are variable depending on the algorithm used. Therefore, using 10 different algorithms, the study aimed to compare inter-rater and multi-rater agreement for ADR causality assessment and identify the most consistent to hospitals. Methods Using ten causality algorithms, four judges independently assessed the first 44 cases of ADRs reported during the first year of implementation of a risk management service in a medium complexity hospital in the state of Sao Paulo (Brazil). Owing to variations in the terminology used for causality, the equivalent imputation terms were grouped into four categories: definite, probable, possible and unlikely. Inter-rater and multi-rater agreement analysis was performed by calculating the Cohen´s and Light´s kappa coefficients, respectively. Results None of the algorithms showed 100% reproducibility in the causal imputation. Fair inter-rater and multi-rater agreement was found. Emanuele (1984) and WHO-UMC (2010) algorithms showed a fair rate of agreement between the judges (k = 0.36). Interpretation & conclusions Although the ADR causality assessment algorithms were poorly reproducible, our data suggest that WHO-UMC algorithm is the most consistent for imputation in hospitals, since it allows evaluating the quality of the report. However, to improve the ability of assessing the causality using algorithms, it is necessary to include criteria for the evaluation of drug-related problems, which may be related to confounding variables that underestimate the causal association. PMID:28166274

  15. HIBAG--HLA genotype imputation with attribute bagging.

    PubMed

    Zheng, X; Shen, J; Cox, C; Wakefield, J C; Ehm, M G; Nelson, M R; Weir, B S

    2014-04-01

    Genotyping of classical human leukocyte antigen (HLA) alleles is an essential tool in the analysis of diseases and adverse drug reactions with associations mapping to the major histocompatibility complex (MHC). However, deriving high-resolution HLA types subsequent to whole-genome single-nucleotide polymorphism (SNP) typing or sequencing is often cost prohibitive for large samples. An alternative approach takes advantage of the extended haplotype structure within the MHC to predict HLA alleles using dense SNP genotypes, such as those available from genome-wide SNP panels. Current methods for HLA imputation are difficult to apply or may require the user to have access to large training data sets with SNP and HLA types. We propose HIBAG, HLA Imputation using attribute BAGging, that makes predictions by averaging HLA-type posterior probabilities over an ensemble of classifiers built on bootstrap samples. We assess the performance of HIBAG using our study data (n=2668 subjects of European ancestry) as a training set and HLA data from the British 1958 birth cohort study (n≈1000 subjects) as independent validation samples. Prediction accuracies for HLA-A, B, C, DRB1 and DQB1 range from 92.2% to 98.1% using a set of SNP markers common to the Illumina 1M Duo, OmniQuad, OmniExpress, 660K and 550K platforms. HIBAG performed well compared with the other two leading methods, HLA*IMP and BEAGLE. This method is implemented in a freely available HIBAG R package that includes pre-fit classifiers for European, Asian, Hispanic and African ancestries, providing a readily available imputation approach without the need to have access to large training data sets.

  16. Genetic diversity analysis of highly incomplete SNP genotype data with imputations: an empirical assessment.

    PubMed

    Fu, Yong-Bi

    2014-03-13

    Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, with up to 90% of observations missing. Here we performed an empirical assessment of accuracy in genetic diversity analysis of highly incomplete single nucleotide polymorphism genotypes with imputations. Three large single-nucleotide polymorphism genotype data sets for corn, wheat, and rice were acquired, and missing data with up to 90% of missing observations were randomly generated and then imputed for missing genotypes with three map-independent imputation methods. Estimating heterozygosity and inbreeding coefficient from original, missing, and imputed data revealed variable patterns of bias from assessed levels of missingness and genotype imputation, but the estimation biases were smaller for missing data without genotype imputation. The estimates of genetic differentiation were rather robust up to 90% of missing observations but became substantially biased when missing genotypes were imputed. The estimates of topology accuracy for four representative samples of interested groups generally were reduced with increased levels of missing genotypes. Probabilistic principal component analysis based imputation performed better in terms of topology accuracy than those analyses of missing data without genotype imputation. These findings are not only significant for understanding the reliability of the genetic diversity analysis with respect to large missing data and genotype imputation but also are instructive for performing a proper genetic diversity analysis of highly incomplete GBS or other genotype data.

  17. Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study

    PubMed Central

    2013-01-01

    Background Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic. Methods Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios. Results The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test. Conclusions Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research. PMID:24252653

  18. Replication and Characterization of Association between ABO SNPs and Red Blood Cell Traits by Meta-Analysis in Europeans

    PubMed Central

    McLachlan, Stela; Giambartolomei, Claudia; Charoen, Pimphen; Wong, Andrew; Finan, Chris; Engmann, Jorgen; Shah, Tina; Hersch, Micha; Cavadino, Alana; Jefferis, Barbara J.; Dale, Caroline E.; Hypponen, Elina; Morris, Richard W.; Casas, Juan P.; Kumari, Meena; Ben-Shlomo, Yoav; Gaunt, Tom R.; Drenos, Fotios; Langenberg, Claudia; Kuh, Diana; Kivimaki, Mika; Rueedi, Rico; Waeber, Gerard; Hingorani, Aroon D.; Price, Jacqueline F.

    2016-01-01

    Red blood cell (RBC) traits are routinely measured in clinical practice as important markers of health. Deviations from the physiological ranges are usually a sign of disease, although variation between healthy individuals also occurs, at least partly due to genetic factors. Recent large scale genetic studies identified loci associated with one or more of these traits; further characterization of known loci and identification of new loci is necessary to better understand their role in health and disease and to identify potential molecular mechanisms. We performed meta-analysis of Metabochip association results for six RBC traits—hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC)—in 11 093 Europeans from seven studies of the UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium. We identified 394 non-overlapping SNPs in five loci at genome-wide significance: 6p22.1-6p21.33 (with HFE among others), 6q23.2 (with HBS1L among others), 6q23.3 (contains no genes), 9q34.3 (only ABO gene) and 22q13.1 (with TMPRSS6 among others), replicating previous findings of association with RBC traits at these loci and extending them by imputation to 1000 Genomes. We further characterized associations between ABO SNPs and three traits: hemoglobin, hematocrit and red blood cell count, replicating them in an independent cohort. Conditional analyses indicated the independent association of each of these traits with ABO SNPs and a role for blood group O in mediating the association. The 15 most significant RBC-associated ABO SNPs were also associated with five cardiometabolic traits, with discordance in the direction of effect between groups of traits, suggesting that ABO may act through more than one mechanism to influence cardiometabolic risk. PMID:27280446

  19. Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA

    PubMed Central

    Jung, Yoonsuh; Huang, Jianhua Z.

    2014-01-01

    In genome-wide association studies, the primary task is to detect biomarkers in the form of Single Nucleotide Polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs comparing to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently the most commonly used approach is still to analyze one SNP at a time. In this paper, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a Majorization-Minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a Multiple Sclerosis data set and simulated data sets and shows promise in biomarker detection. PMID:25642005

  20. TSPYL5 SNPs: Association with Plasma Estradiol Concentrations and Aromatase Expression

    PubMed Central

    Liu, Mohan; Ingle, James N.; Fridley, Brooke L.; Buzdar, Aman U.; Robson, Mark E.; Kubo, Michiaki; Wang, Liewei; Batzler, Anthony; Jenkins, Gregory D.; Pietrzak, Tracy L.; Carlson, Erin E.; Goetz, Matthew P.; Northfelt, Donald W.; Perez, Edith A.; Williard, Clark V.; Schaid, Daniel J.; Nakamura, Yusuke

    2013-01-01

    We performed a discovery genome-wide association study to identify genetic factors associated with variation in plasma estradiol (E2) concentrations using DNA from 772 postmenopausal women with estrogen receptor (ER)-positive breast cancer prior to the initiation of aromatase inhibitor therapy. Association analyses showed that the single nucleotide polymorphisms (SNP) (rs1864729) with the lowest P value (P = 3.49E-08), mapped to chromosome 8 near TSPYL5. We also identified 17 imputed SNPs in or near TSPYL5 with P values < 5E-08, one of which, rs2583506, created a functional estrogen response element. We then used a panel of lymphoblastoid cell lines (LCLs) stably transfected with ERα with known genome-wide SNP genotypes to demonstrate that TSPYL5 expression increased after E2 exposure of cells heterozygous for variant TSPYL5 SNP genotypes, but not in those homozygous for wild-type alleles. TSPYL5 knockdown decreased, and overexpression increased aromatase (CYP19A1) expression in MCF-7 cells, LCLs, and adipocytes through the skin/adipose (I.4) promoter. Chromatin immunoprecipitation assay showed that TSPYL5 bound to the CYP19A1 I.4 promoter. A putative TSPYL5 binding motif was identified in 43 genes, and TSPYL5 appeared to function as a transcription factor for most of those genes. In summary, genome-wide significant SNPs in TSPYL5 were associated with elevated plasma E2 in postmenopausal breast cancer patients. SNP rs2583506 created a functional estrogen response element, and LCLs with variant SNP genotypes displayed increased E2-dependent TSPYL5 expression. TSPYL5 induced CYP19A1 expression and that of many other genes. These studies have revealed a novel mechanism for regulating aromatase expression and plasma E2 concentrations in postmenopausal women with ER(+) breast cancer. PMID:23518928

  1. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  2. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  3. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  4. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  5. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  6. Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates.

    PubMed

    Quartagno, M; Carpenter, J R

    2016-07-30

    Recently, multiple imputation has been proposed as a tool for individual patient data meta-analysis with sporadically missing observations, and it has been suggested that within-study imputation is usually preferable. However, such within study imputation cannot handle variables that are completely missing within studies. Further, if some of the contributing studies are relatively small, it may be appropriate to share information across studies when imputing. In this paper, we develop and evaluate a joint modelling approach to multiple imputation of individual patient data in meta-analysis, with an across-study probability distribution for the study specific covariance matrices. This retains the flexibility to allow for between-study heterogeneity when imputing while allowing (i) sharing information on the covariance matrix across studies when this is appropriate, and (ii) imputing variables that are wholly missing from studies. Simulation results show both equivalent performance to the within-study imputation approach where this is valid, and good results in more general, practically relevant, scenarios with studies of very different sizes, non-negligible between-study heterogeneity and wholly missing variables. We illustrate our approach using data from an individual patient data meta-analysis of hypertension trials. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  7. Imputation of Missing Genotypes From Sparse to High Density Using Long-Range Phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals in a population share long chromosome segments which trace to a common ancestor. We describe a long-range phasing algorithm that makes use of this property to phase whole chromosomes and simultaneously impute a large number of missing markers. We test our method by imputing marke...

  8. Estimation of missing rainfall data using spatial interpolation and imputation methods

    NASA Astrophysics Data System (ADS)

    Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Azman, Muhammad Az-zuhri

    2015-02-01

    This study is aimed to estimate missing rainfall data by dividing the analysis into three different percentages namely 5%, 10% and 20% in order to represent various cases of missing data. In practice, spatial interpolation methods are chosen at the first place to estimate missing data. These methods include normal ratio (NR), arithmetic average (AA), coefficient of correlation (CC) and inverse distance (ID) weighting methods. The methods consider the distance between the target and the neighbouring stations as well as the correlations between them. Alternative method for solving missing data is an imputation method. Imputation is a process of replacing missing data with substituted values. A once-common method of imputation is single-imputation method, which allows parameter estimation. However, the single imputation method ignored the estimation of variability which leads to the underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. In this study, comparison of spatial interpolation methods and multiple imputations method are presented to estimate missing rainfall data. The performance of the estimation methods used are assessed using the similarity index (S-index), mean absolute error (MAE) and coefficient of correlation (R).

  9. Variable selection for multiply-imputed data with application to dioxin exposure study.

    PubMed

    Chen, Qixuan; Wang, Sijian

    2013-09-20

    Multiple imputation (MI) is a commonly used technique for handling missing data in large-scale medical and public health studies. However, variable selection on multiply-imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes it difficult to interpret the final model or draw scientific conclusions. In this paper, we propose a novel multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) variable selection method as an extension of the least absolute shrinkage and selection operator (LASSO) method to multiply-imputed data. The MI-LASSO method treats the estimated regression coefficients of the same variable across all imputed datasets as a group and applies the group LASSO penalty to yield a consistent variable selection across multiple-imputed datasets. We use a simulation study to demonstrate the advantage of the MI-LASSO method compared with the alternatives. We also apply the MI-LASSO method to the University of Michigan Dioxin Exposure Study to identify important circumstances and exposure factors that are associated with human serum dioxin concentration in Midland, Michigan.

  10. A Simplified Framework for Using Multiple Imputation in Social Work Research

    ERIC Educational Resources Information Center

    Rose, Roderick A.; Fraser, Mark W.

    2008-01-01

    Missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings. Increasingly, social science researchers are turning to multiple imputation to handle missing data. Multiple imputation, in which missing values are replaced by values repeatedly drawn from…

  11. A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

    ERIC Educational Resources Information Center

    Wolkowitz, Amanda A.; Skorupski, William P.

    2013-01-01

    When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…

  12. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  13. Imputation for semiparametric transformation models with biased-sampling data

    PubMed Central

    Liu, Hao; Qin, Jing; Shen, Yu

    2012-01-01

    Widely recognized in many fields including economics, engineering, epidemiology, health sciences, technology and wildlife management, length-biased sampling generates biased and right-censored data but often provide the best information available for statistical inference. Different from traditional right-censored data, length-biased data have unique aspects resulting from their sampling procedures. We exploit these unique aspects and propose a general imputation-based estimation method for analyzing length-biased data under a class of flexible semiparametric transformation models. We present new computational algorithms that can jointly estimate the regression coefficients and the baseline function semiparametrically. The imputation-based method under the transformation model provides an unbiased estimator regardless whether the censoring is independent or not on the covariates. We establish large-sample properties using the empirical processes method. Simulation studies show that under small to moderate sample sizes, the proposed procedure has smaller mean square errors than two existing estimation procedures. Finally, we demonstrate the estimation procedure by a real data example. PMID:22903245

  14. RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION.

    PubMed

    Zhang, Zhongyang; Lange, Kenneth; Ophoff, Roel; Sabatti, Chiara

    2010-12-01

    Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18-29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way.We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost.

  15. Evaluation of Multi-parameter Test Statistics for Multiple Imputation.

    PubMed

    Liu, Yu; Enders, Craig K

    2017-03-22

    In Ordinary Least Square regression, researchers often are interested in knowing whether a set of parameters is different from zero. With complete data, this could be achieved using the gain in prediction test, hierarchical multiple regression, or an omnibus F test. However, in substantive research scenarios, missing data often exist. In the context of multiple imputation, one of the current state-of-art missing data strategies, there are several different analogous multi-parameter tests of the joint significance of a set of parameters, and these multi-parameter test statistics can be referenced to various distributions to make statistical inferences. However, little is known about the performance of these tests, and virtually no research study has compared the Type 1 error rates and statistical power of these tests in scenarios that are typical of behavioral science data (e.g., small to moderate samples, etc.). This paper uses Monte Carlo simulation techniques to examine the performance of these multi-parameter test statistics for multiple imputation under a variety of realistic conditions. We provide a number of practical recommendations for substantive researchers based on the simulation results, and illustrate the calculation of these test statistics with an empirical example.

  16. Imputation of KIR Types from SNP Variation Data

    PubMed Central

    Vukcevic, Damjan; Traherne, James A.; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H.; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-01-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  17. Data supporting the high-accuracy haplotype imputation using unphased genotype data as the references.

    PubMed

    Li, Wenzhi; Xu, Wei; He, Shaohua; Ma, Li; Song, Qing

    2016-09-01

    The data presented in this article is related to the research article entitled "High-accuracy haplotype imputation using unphased genotype data as the references" which reports the unphased genotype data can be used as reference for haplotyping imputation [1]. This article reports different implementation generation pipeline, the results of performance comparison between different implementations (A, B, and C) and between HiFi and three major imputation software tools. Our data showed that the performances of these three implementations are similar on accuracy, in which the accuracy of implementation-B is slightly but consistently higher than A and C. HiFi performed better on haplotype imputation accuracy and three other software performed slightly better on genotype imputation accuracy. These data may provide a strategy for choosing optimal phasing pipeline and software for different studies.

  18. [Imputing missing data in public health: general concepts and application to dichotomous variables].

    PubMed

    Hernández, Gilma; Moriña, David; Navarro, Albert

    2017-03-15

    The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced.

  19. Combining multiple imputation and meta-analysis with individual participant data.

    PubMed

    Burgess, Stephen; White, Ian R; Resche-Rigon, Matthieu; Wood, Angela M

    2013-11-20

    Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration.

  20. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data

    PubMed Central

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  1. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    PubMed

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  2. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data

    PubMed Central

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-01-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. PMID:28243594

  3. FAPI: Fast and accurate P-value Imputation for genome-wide association study.

    PubMed

    Kwan, Johnny S H; Li, Miao-Xin; Deng, Jia-En; Sham, Pak C

    2016-05-01

    Imputing individual-level genotypes (or genotype imputation) is now a standard procedure in genome-wide association studies (GWAS) to examine disease associations at untyped common genetic variants. Meta-analysis of publicly available GWAS summary statistics can allow more disease-associated loci to be discovered, but these data are usually provided for various variant sets. Thus imputing these summary statistics of different variant sets into a common reference panel for meta-analyses is impossible using traditional genotype imputation methods. Here we develop a fast and accurate P-value imputation (FAPI) method that utilizes summary statistics of common variants only. Its computational cost is linear with the number of untyped variants and has similar accuracy compared with IMPUTE2 with prephasing, one of the leading methods in genotype imputation. In addition, based on the FAPI idea, we develop a metric to detect abnormal association at a variant and showed that it had a significantly greater power compared with LD-PAC, a method that quantifies the evidence of spurious associations based on likelihood ratio. Our method is implemented in a user-friendly software tool, which is available at http://statgenpro.psychiatry.hku.hk/fapi.

  4. A multiple imputation strategy for sequential multiple assignment randomized trials

    PubMed Central

    Shortreed, Susan M.; Laber, Eric; Stroup, T. Scott; Pineau, Joelle; Murphy, Susan A.

    2014-01-01

    Sequential multiple assignment randomized trials (SMARTs) are increasingly being used to inform clinical and intervention science. In a SMART, each patient is repeatedly randomized over time. Each randomization occurs at a critical decision point in the treatment course. These critical decision points often correspond to milestones in the disease process or other changes in a patient’s health status. Thus, the timing and number of randomizations may vary across patients and depend on evolving patient-specific information. This presents unique challenges when analyzing data from a SMART in the presence of missing data. This paper presents the first comprehensive discussion of missing data issues typical of SMART studies: we describe five specific challenges, and propose a flexible imputation strategy to facilitate valid statistical estimation and inference using incomplete data from a SMART. To illustrate these contributions, we consider data from the Clinical Antipsychotic Trial of Intervention and Effectiveness (CATIE), one of the most well-known SMARTs to date. PMID:24919867

  5. Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

    PubMed

    Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

    2009-02-01

    One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.

  6. Analysis of incomplete longitudinal binary data using multiple imputation.

    PubMed

    Li, Xiaoming; Mehrotra, Devan V; Barnard, John

    2006-06-30

    We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.

  7. Dealing with missing values in large-scale studies: microarray data imputation and beyond.

    PubMed

    Aittokallio, Tero

    2010-03-01

    High-throughput biotechnologies, such as gene expression microarrays or mass-spectrometry-based proteomic assays, suffer from frequent missing values due to various experimental reasons. Since the missing data points can hinder downstream analyses, there exists a wide variety of ways in which to deal with missing values in large-scale data sets. Nowadays, it has become routine to estimate (or impute) the missing values prior to the actual data analysis. After nearly a decade since the publication of the first missing value imputation methods for gene expression microarray data, new imputation approaches are still being developed at an increasing rate. However, what is lagging behind is a systematic and objective evaluation of the strengths and weaknesses of the different approaches when faced with different types of data sets and experimental questions. In this review, the present strategies for missing value imputation and the measures for evaluating their performance are described. The imputation methods are first reviewed in the context of gene expression microarray data, since most of the methods have been developed for estimating gene expression levels; then, we turn to other large-scale data sets that also suffer from the problems posed by missing values, together with pointers to possible imputation approaches in these settings. Along with a description of the basic principles behind the different imputation approaches, the review tries to provide practical guidance for the users of high-throughput technologies on how to choose the imputation tool for their data and questions, and some additional research directions for the developers of imputation methodologies.

  8. Effect of reference population size and available ancestor genotypes on imputation of Mexican Holstein genotypes.

    PubMed

    García-Ruiz, A; Ruiz-Lopez, F J; Wiggans, G R; Van Tassell, C P; Montaldo, H H

    2015-05-01

    The effects of reference population size and the availability of information from genotyped ancestors on the accuracy of imputation of single nucleotide polymorphisms (SNP) were investigated for Mexican Holstein cattle. Three scenarios for reference population size were examined: (1) a local population of 2,011 genotyped Mexican Holsteins, (2) animals in scenario 1 plus 866 Holsteins in the US genotype database (GDB) with genotyped Mexican daughters, and (3) animals in scenario 1 and all US GDB Holsteins (338,073). Genotypes from 4 chip densities (2 low density, 1 mid density, and 1 high density) were imputed using findhap (version 3) to the 45,195 markers on the mid-density chip. Imputation success was determined by comparing the numbers of SNP with 1 or 2 alleles missing and the numbers of differently predicted SNP (conflicts) among the 3 scenarios. Imputation accuracy improved as chip density and numbers of genotyped ancestors increased, and the percentage of SNP with 1 missing allele was greater than that for 2 missing alleles for all scenarios. The largest numbers of conflicts were found between scenarios 1 and 3. The inclusion of information from direct ancestors (dam or sire) with US GDB genotypes in the imputation of Mexican Holstein genotypes increased imputation accuracy by 1 percentage point for low-density genotypes and by 0.5 percentage points for high-density genotypes, which was about half the gain found with information from all US GDB Holsteins. A larger reference population and the availability of genotyped ancestors improved imputation; animals with genotyped parents in a large reference population had higher imputation accuracy than those with no or few genotyped relatives in a small reference population. For small local populations, including genotypes from other related populations can aid in improving imputation accuracy.

  9. Differential network analysis with multiply imputed lipidomic data.

    PubMed

    Kujala, Maiju; Nevalainen, Jaakko; März, Winfried; Laaksonen, Reijo; Datta, Susmita

    2015-01-01

    The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.

  10. Differential Network Analysis with Multiply Imputed Lipidomic Data

    PubMed Central

    Kujala, Maiju; Nevalainen, Jaakko; März, Winfried; Laaksonen, Reijo; Datta, Susmita

    2015-01-01

    The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. PMID:25822937

  11. High-accuracy haplotype imputation using unphased genotype data as the references

    PubMed Central

    Li, Wenzhi; Xu, Wei; Fu, Guoxing; Ma, Li; Richards, Jendai; Rao, Weinian; Bythwood, Tameka; Guo, Shiwen; Song, Qing

    2017-01-01

    Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era. PMID:26232609

  12. Comparison of methods for imputing limited-range variables: a simulation study

    PubMed Central

    2014-01-01

    Background Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values. Methods Using data from a study of adolescent health, we consider three variables based on responses to the General Health Questionnaire (GHQ), a tool for detecting minor psychiatric illness. These variables, based on different scoring methods for the GHQ, resulted in three continuous distributions with mild, moderate and severe positive skewness. In an otherwise complete dataset, we set 33% of the GHQ observations to missing completely at random or missing at random; repeating this process to create 1000 datasets with incomplete data for each scenario. For each dataset, we imputed values on the raw scale and following a zero-skewness log transformation using: univariate regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics. Results Imputation with no rounding performed well when applied to data on the raw scale. Post-imputation rounding and imputation using truncated normal regression produced higher marginal means than the complete data estimate when data had a moderate or severe skew, and this was associated with under-coverage of the complete data estimate. Predictive mean matching also produced under-coverage of the complete data

  13. SNPs and Haplotypes in Native American Populations

    PubMed Central

    Kidd, Judith R.; Friedlaender, Françoise; Pakstis, Andrew J.; Furtado, Manohar; Fang, Rixun; Wang, Xudong; Nievergelt, Caroline M.; Kidd, Kenneth K.

    2013-01-01

    Autosomal DNA polymorphisms can provide new information and understanding of both the origins of and relationships among modern Native American populations. At the same time that autosomal markers can be highly informative, they are also susceptible to ascertainment biases in the selection of the markers to use. Identifying markers that can be used for ancestry inference among Native American populations can be considered separate from identifying markers to further the quest for history. In the current study we are using data on nine Native American populations to compare the results based on a large haplotype-based dataset with relatively small independent sets of SNPs. We are interested in what types of limited datasets an individual laboratory might be able to collect are best for addressing two different questions of interest. First, how well can we differentiate the Native American populations and/or infer ancestry by assigning an individual to her population(s) of origin? Second, how well can we infer the historical/evolutionary relationships among Native American populations and their Eurasian origins. We conclude that only a large comprehensive dataset involving multiple autosomal markers on multiple populations will be able to answer both questions; different small sets of markers are able to answer only one or the other of these questions. Using our largest dataset we see a general increasing distance from Old World populations from North to South in the New World except for an unexplained close relationship between our Maya and Quechua samples. PMID:21913176

  14. Predicting survival time for metastatic castration resistant prostate cancer: An iterative imputation approach

    PubMed Central

    Deng, Detian; Du, Yu; Ji, Zhicheng; Rao, Karthik; Wu, Zhenke; Zhu, Yuxin; Coley, R. Yates

    2016-01-01

    In this paper, we present our winning method for survival time prediction in the 2015 Prostate Cancer DREAM Challenge, a recent crowdsourced competition focused on risk and survival time predictions for patients with metastatic castration-resistant prostate cancer (mCRPC). We are interested in using a patient's covariates to predict his or her time until death after initiating standard therapy. We propose an iterative algorithm to multiply impute right-censored survival times and use ensemble learning methods to characterize the dependence of these imputed survival times on possibly many covariates. We show that by iterating over imputation and ensemble learning steps, we guide imputation with patient covariates and, subsequently, optimize the accuracy of survival time prediction. This method is generally applicable to time-to-event prediction problems in the presence of right-censoring. We demonstrate the proposed method's performance with training and validation results from the DREAM Challenge and compare its accuracy with existing methods. PMID:28299176

  15. MISSING DATA IMPUTATION IN THE ELECTRONIC HEALTH RECORD USING DEEPLY LEARNED AUTOENCODERS*

    PubMed Central

    BEAULIEU-JONES, BRETT K.; MOORE, JASON H.

    2016-01-01

    Electronic health records (EHRs) have become a vital source of patient outcome data but the widespread prevalence of missing data presents a major challenge. Different causes of missing data in the EHR data may introduce unintentional bias. Here, we compare the effectiveness of popular multiple imputation strategies with a deeply learned autoencoder using the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT). To evaluate performance, we examined imputation accuracy for known values simulated to be either missing completely at random or missing not at random. We also compared ALS disease progression prediction across different imputation models. Autoencoders showed strong performance for imputation accuracy and contributed to the strongest disease progression predictor. Finally, we show that despite clinical heterogeneity, ALS disease progression appears homogenous with time from onset being the most important predictor. PMID:27896976

  16. Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3

    PubMed Central

    Herold, Christine; Hooli, Basavaraj V.; Mullin, Kristina; Liu, Tian; Roehr, Johannes T; Mattheisen, Manuel; Parrado, Antonio R.; Bertram, Lars; Lange, Christoph; Tanzi, Rudolph E.

    2015-01-01

    The genetic basis of Alzheimer's disease (AD) is complex and heterogeneous. Over 200 highly penetrant pathogenic variants in the genes APP, PSEN1 and PSEN2 cause a subset of early-onset familial Alzheimer's disease (EOFAD). On the other hand, susceptibility to late-onset forms of AD (LOAD) is indisputably associated to the ε4 allele in the gene APOE, and more recently to variants in more than two-dozen additional genes identified in the large-scale genome-wide association studies (GWAS) and meta-analyses reports. Taken together however, although the heritability in AD is estimated to be as high as 80%, a large proportion of the underlying genetic factors still remain to be elucidated. In this study we performed a systematic family-based genome-wide association and meta-analysis on close to 15 million imputed variants from three large collections of AD families (~3,500 subjects from 1,070 families). Using a multivariate phenotype combining affection status and onset age, meta-analysis of the association results revealed three single nucleotide polymorphisms (SNPs) that achieved genome-wide significance for association with AD risk: rs7609954 in the gene PTPRG (P-value = 3.98·10−08), rs1347297 in the gene OSBPL6 (P-value = 4.53·10−08), and rs1513625 near PDCL3 (P-value = 4.28·10−08). In addition, rs72953347 in OSBPL6 (P-value = 6.36·10−07) and two SNPs in the gene CDKAL1 showed marginally significant association with LOAD (rs10456232, P-value: 4.76·10−07; rs62400067, P-value: 3.54·10−07). In summary, family-based GWAS meta-analysis of imputed SNPs revealed novel genomic variants in (or near) PTPRG, OSBPL6, and PDCL3 that influence risk for AD with genome-wide significance. PMID:26830138

  17. Large-scale epigenome imputation improves data quality and disease variant enrichment

    PubMed Central

    Ernst, Jason; Kellis, Manolis

    2015-01-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals, and surpass experimental datasets in consistency, recovery of gene annotations, and enrichment for disease-associated variants. We use the imputed data to detect low quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments, and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  18. Quantitative trait Loci association mapping by imputation of strain origins in multifounder crosses.

    PubMed

    Zhou, Jin J; Ghazalpour, Anatole; Sobel, Eric M; Sinsheimer, Janet S; Lange, Kenneth

    2012-02-01

    Although mapping quantitative traits in inbred strains is simpler than mapping the analogous traits in humans, classical inbred crosses suffer from reduced genetic diversity compared to experimental designs involving outbred animal populations. Multiple crosses, for example the Complex Trait Consortium's eight-way cross, circumvent these difficulties. However, complex mating schemes and systematic inbreeding raise substantial computational difficulties. Here we present a method for locally imputing the strain origins of each genotyped animal along its genome. Imputed origins then serve as mean effects in a multivariate Gaussian model for testing association between trait levels and local genomic variation. Imputation is a combinatorial process that assigns the maternal and paternal strain origin of each animal on the basis of observed genotypes and prior pedigree information. Without smoothing, imputation is likely to be ill-defined or jump erratically from one strain to another as an animal's genome is traversed. In practice, one expects to see long stretches where strain origins are invariant. Smoothing can be achieved by penalizing strain changes from one marker to the next. A dynamic programming algorithm then solves the strain imputation process in one quick pass through the genome of an animal. Imputation accuracy exceeds 99% in practical examples and leads to high-resolution mapping in simulated and real data. The previous fastest quantitative trait loci (QTL) mapping software for dense genome scans reduced compute times to hours. Our implementation further reduces compute times from hours to minutes with no loss in statistical power. Indeed, power is enhanced for full pedigree data.

  19. SNPs selection using support vector regression and genetic algorithms in GWAS

    PubMed Central

    2014-01-01

    Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332

  20. Imputation of missing data using machine learning techniques

    SciTech Connect

    Lakshminarayan, Kamakshi; Harp, S.A.; Goldman, R.; Samad, T.

    1996-12-31

    A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine learning techniques. The first is an unsupervised clustering strategy which uses a Bayesian approach to cluster the data into classes. The classes so obtained are then used to predict multiple choices for the attribute of interest. The second technique involves modeling missing variables by supervised induction of a decision tree-based classifier. This predicts the most likely value for the attribute of interest. Empirical tests using extracts from industrial databases maintained by Honeywell customers have been done in order to compare the two techniques. These tests show both approaches are useful and have advantages and disadvantages. We argue that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and discuss potential applications for the procedures we are developing.

  1. Binary variable multiple-model multiple imputation to address missing data mechanism uncertainty: application to a smoking cessation trial.

    PubMed

    Siddique, Juned; Harel, Ofer; Crespi, Catherine M; Hedeker, Donald

    2014-07-30

    The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables, which formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software.

  2. Addressing Missing Data Mechanism Uncertainty using Multiple-Model Multiple Imputation: Application to a Longitudinal Clinical Trial.

    PubMed

    Siddique, Juned; Harel, Ofer; Crespi, Catherine M

    2012-12-01

    We present a framework for generating multiple imputations for continuous data when the missing data mechanism is unknown. Imputations are generated from more than one imputation model in order to incorporate uncertainty regarding the missing data mechanism. Parameter estimates based on the different imputation models are combined using rules for nested multiple imputation. Through the use of simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal clinical trial of low-income women with depression where nonignorably missing data were a concern. We show that different assumptions regarding the missing data mechanism can have a substantial impact on inferences. Our method provides a simple approach for formalizing subjective notions regarding nonresponse so that they can be easily stated, communicated, and compared.

  3. Imputation of Variants from the 1000 Genomes Project Modestly Improves Known Associations and Can Identify Low-frequency Variant - Phenotype Associations Undetected by HapMap Based Imputation

    PubMed Central

    Wood, Andrew R.; Perry, John R. B.; Tanaka, Toshiko; Hernandez, Dena G.; Zheng, Hou-Feng; Melzer, David; Gibbs, J. Raphael; Nalls, Michael A.; Weedon, Michael N.; Spector, Tim D.; Richards, J. Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B.; Frayling, Timothy M.

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF <5%) and rare variants (<1%)) can enhance previously identified associations and identify novel loci, we selected 93 quantitative circulating factors where data was available from the InCHIANTI population study. These phenotypes included cytokines, binding proteins, hormones, vitamins and ions. We selected these phenotypes because many have known strong genetic associations and are potentially important to help understand disease processes. We performed a genome-wide scan for these 93 phenotypes in InCHIANTI. We identified 21 signals and 33 signals that reached P<5×10−8 based on HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P<5×10−11 respectively. Imputation of 1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P<5×10−8 in both analyses (17 of which represent well replicated signals in the NHGRI catalogue), six were captured by the same index SNP, five were nominally more strongly associated in 1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10−12). Our data provide important proof of principle

  4. GIGI: an approach to effective imputation of dense genotypes on large pedigrees.

    PubMed

    Cheung, Charles Y K; Thompson, Elizabeth A; Wijsman, Ellen M

    2013-04-04

    Recent emergence of the common-disease-rare-variant hypothesis has renewed interest in the use of large pedigrees for identifying rare causal variants. Genotyping with modern sequencing platforms is increasingly common in the search for such variants but remains expensive and often is limited to only a few subjects per pedigree. In population-based samples, genotype imputation is widely used so that additional genotyping is not needed. We now introduce an analogous approach that enables computationally efficient imputation in large pedigrees. Our approach samples inheritance vectors (IVs) from a Markov Chain Monte Carlo sampler by conditioning on genotypes from a sparse set of framework markers. Missing genotypes are probabilistically inferred from these IVs along with observed dense genotypes that are available on a subset of subjects. We implemented our approach in the Genotype Imputation Given Inheritance (GIGI) program and evaluated the approach on both simulated and real large pedigrees. With a real pedigree, we also compared imputed results obtained from this approach with those from the population-based imputation program BEAGLE. We demonstrated that our pedigree-based approach imputes many alleles with high accuracy. It is much more accurate for calling rare alleles than is population-based imputation and does not require an outside reference sample. We also evaluated the effect of varying other parameters, including the marker type and density of the framework panel, threshold for calling genotypes, and population allele frequencies. By leveraging information from existing genotypes already assayed on large pedigrees, our approach can facilitate cost-effective use of sequence data in the pursuit of rare causal variants.

  5. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed Central

    Meseck, Kristin; Jankowska, Marta M.; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and post-imputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  6. Symmetric smoothing filters from global consistency constraints.

    PubMed

    Haque, Sheikh Mohammadul; Pai, Gautam P; Govindu, Venu Madhav

    2015-05-01

    Many patch-based image denoising methods can be viewed as data-dependent smoothing filters that carry out a weighted averaging of similar pixels. It has recently been argued that these averaging filters can be improved using their doubly stochastic approximation, which are symmetric and stable smoothing operators. In this paper, we introduce a simple principle of consistency that argues that the relative similarities between pixels as imputed by the averaging matrix should be preserved in the filtered output. The resultant consistency filter has the theoretically desirable properties of being symmetric and stable, and is a generalized doubly stochastic matrix. In addition, we can also interpret our consistency filter as a specific form of Laplacian regularization. Thus, our approach unifies two strands of image denoising methods, i.e., symmetric smoothing filters and spectral graph theory. Our consistency filter provides high-quality image denoising and significantly outperforms the doubly stochastic version. We present a thorough analysis of the properties of our proposed consistency filter and compare its performance with that of other significant methods for image denoising in the literature.

  7. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs)

    PubMed Central

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K.; Wang, Qin; Dennis, Joe; Alonso, M. Rosario; Andrulis, Irene L.; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W.; Benitez, Javier; Bogdanova, Natalia V.; Bojesen, Stig E.; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M.; Couch, Fergus J.; Cox, Angela; Cross, Simon S.; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F.; Fasching, Peter A.; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G.; Goldberg, Mark S.; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A.; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L.; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L.; Muir, Kenneth; Neuhausen, Susan L.; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J.; Schmidt, Marjanka K.; Schmutzler, Rita K.; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C.; Stram, Daniel O.; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H.; Tessier, Daniel C.; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M.; Vincent, Daniel; Winqvist, Robert; Wu, Anna H.; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D. P.; Hall, Per; Edwards, Stacey L.; Simard, Jacques; French, Juliet D.; Chenevix-Trench, Georgia; Dunning, Alison M.

    2016-01-01

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90–0.94; P = 8.96 × 10−15)) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10−09, r2 = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10−11, r2 = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus. PMID:27600471

  8. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs).

    PubMed

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W; Benitez, Javier; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M; Couch, Fergus J; Cox, Angela; Cross, Simon S; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F; Fasching, Peter A; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G; Goldberg, Mark S; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L; Muir, Kenneth; Neuhausen, Susan L; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Stram, Daniel O; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tessier, Daniel C; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M; Vincent, Daniel; Winqvist, Robert; Wu, Anna H; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D P; Hall, Per; Edwards, Stacey L; Simard, Jacques; French, Juliet D; Chenevix-Trench, Georgia; Dunning, Alison M

    2016-09-07

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90-0.94; P = 8.96 × 10(-15))) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10(-09), r(2) = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10(-11), r(2) = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus.

  9. Are the Current Population Survey Uninsurance Estimates Too High? An Examination of the Imputation Process

    PubMed Central

    Davern, Michael; Rodin, Holly; Blewett, Lynn A; Call, Kathleen Thiede

    2007-01-01

    Research Objective To determine whether the imputation procedure used to replace missing data by the U.S. Census Bureau produces bias in the estimates of health insurance coverage in the Current Population Survey's (CPS) Annual Social and Economic Supplement (ASEC). Data Source 2004 CPS-ASEC. Study Design Eleven percent of the respondents to the monthly CPS do not take the ASEC supplement and the entire supplement for these respondents is imputed by the Census Bureau. We compare the health insurance coverage of these “full-supplement imputations” with those respondents answering the ASEC supplement. We then compare demographic characteristics of the two groups and model the likelihood of having insurance coverage given the data are imputed controlling for demographic characteristics. Finally, in order to gauge the impact of imputation on the uninsurance rate we remove the full-supplement imputations and reweight the data, and we also use the multivariate regression model to simulate what the uninsurance rate would be under the counter-factual simulation that no cases had the full-supplement imputation. Population Studied The noninstitutionalized U.S. population under 65 years of age in 2004. Data Extraction Methods The CPS-ASEC survey was extracted from the U.S. Census Bureau's FTP web page in September of 2004 (http://www.bls.census.gov/ferretftp.htm). Principal Findings In the 2004 CPS-ASEC, 59.3 percent of the full-supplement imputations under age 65 years had private health insurance coverage as compared with 69.1 percent of the nonfull-supplement imputations. Furthermore, full-supplement imputations have a 26.4 percent uninsurance rate while all others have an uninsurance rate of 16.6 percent. Having imputed data remains a significant predictor of health insurance coverage in multivariate models with demographic controls. Both our reweighting strategy and our counterfactual modeling show that the uninsured rate is approximately one percentage point higher

  10. Imputation method for lifetime exposure assessment in air pollution epidemiologic studies

    PubMed Central

    2013-01-01

    Background Environmental epidemiology, when focused on the life course of exposure to a specific pollutant, requires historical exposure estimates that are difficult to obtain for the full time period due to gaps in the historical record, especially in earlier years. We show that these gaps can be filled by applying multiple imputation methods to a formal risk equation that incorporates lifetime exposure. We also address challenges that arise, including choice of imputation method, potential bias in regression coefficients, and uncertainty in age-at-exposure sensitivities. Methods During time periods when parameters needed in the risk equation are missing for an individual, the parameters are filled by an imputation model using group level information or interpolation. A random component is added to match the variance found in the estimates for study subjects not needing imputation. The process is repeated to obtain multiple data sets, whose regressions against health data can be combined statistically to develop confidence limits using Rubin’s rules to account for the uncertainty introduced by the imputations. To test for possible recall bias between cases and controls, which can occur when historical residence location is obtained by interview, and which can lead to misclassification of imputed exposure by disease status, we introduce an “incompleteness index,” equal to the percentage of dose imputed (PDI) for a subject. “Effective doses” can be computed using different functional dependencies of relative risk on age of exposure, allowing intercomparison of different risk models. To illustrate our approach, we quantify lifetime exposure (dose) from traffic air pollution in an established case–control study on Long Island, New York, where considerable in-migration occurred over a period of many decades. Results The major result is the described approach to imputation. The illustrative example revealed potential recall bias, suggesting that regressions

  11. Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches

    PubMed Central

    Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer

    2014-01-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  12. Assessment of Internal Validity of Prognostic Models through Bootstrapping and Multiple Imputation of Missing Data

    PubMed Central

    Baneshi, MR; Talei, A

    2012-01-01

    Background: Prognostic models have clinical appeal to aid therapeutic decision making. Two main practical challenges in development of such models are assessment of validity of models and imputation of missing data. In this study, importance of imputation of missing data and application of bootstrap technique in development, simplification, and assessment of internal validity of a prognostic model is highlighted. Methods: Overall, 310 breast cancer patients were recruited. Missing data were imputed 10 times. Then to deal with sensitivity of the model due to small changes in the data (internal validity), 100 bootstrap samples were drawn from each of 10 imputed data sets leading to 1000 samples. A Cox regression model was fitted to each of 1000 samples. Only variables retained in more than 50% of samples were used in development of final model. Results: Four variables retained significant in more than 50% (i.e. 500 samples) of bootstrap samples; tumour size (91%), tumour grade (64%), history of benign breast disease (77%), and age at diagnosis (59%). Tumour size was the strongest predictor with inclusion frequency exceeding 90%. Number of deliveries was correlated with age at diagnosis (r=0.35, P<0.001). These two variables together retained significant in more than 90% of samples. Conclusion: We addressed two important methodological issues using a cohort of breast cancer patients. The algorithm combines multiple imputation of missing data and bootstrapping and has the potential to be applied in all kind of regression modelling exercises so as to address internal validity of models. PMID:23113185

  13. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    PubMed

    Livne, Oren E; Han, Lide; Alkorta-Aranburu, Gorka; Wentworth-Sheilds, William; Abney, Mark; Ober, Carole; Nicolae, Dan L

    2015-03-01

    Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  14. Multiple imputation as a flexible tool for missing data handling in clinical research.

    PubMed

    Enders, Craig K

    2016-11-18

    The last 20 years has seen an uptick in research on missing data problems, and most software applications now implement one or more sophisticated missing data handling routines (e.g., multiple imputation or maximum likelihood estimation). Despite their superior statistical properties (e.g., less stringent assumptions, greater accuracy and power), the adoption of these modern analytic approaches is not uniform in psychology and related disciplines. Thus, the primary goal of this manuscript is to describe and illustrate the application of multiple imputation. Although maximum likelihood estimation is perhaps the easiest method to use in practice, psychological data sets often feature complexities that are currently difficult to handle appropriately in the likelihood framework (e.g., mixtures of categorical and continuous variables), but relatively simple to treat with imputation. The paper describes a number of practical issues that clinical researchers are likely to encounter when applying multiple imputation, including mixtures of categorical and continuous variables, item-level missing data in questionnaires, significance testing, interaction effects, and multilevel missing data. Analysis examples illustrate imputation with software packages that are freely available on the internet.

  15. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    PubMed

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  16. Multiple imputation and analysis for high-dimensional incomplete proteomics data.

    PubMed

    Yin, Xiaoyan; Levy, Daniel; Willinger, Christine; Adourian, Aram; Larson, Martin G

    2016-04-15

    Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ≫ N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤ 40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction.

  17. Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data

    PubMed Central

    Edwards, Jessie K.; Cole, Stephen R.; Troester, Melissa A.; Richardson, David B.

    2013-01-01

    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods. PMID:24627573

  18. Dealing with missing data in family-based association studies: a multiple imputation approach.

    PubMed

    Croiseau, Pascal; Génin, Emmanuelle; Cordell, Heather J

    2007-01-01

    To test for association between a disease and a set of linked markers, or to estimate relative risks of disease, several different methods have been developed. Many methods for family data require that individuals be genotyped at the full set of markers and that phase can be reconstructed. Individuals with missing data are excluded from the analysis. This can result in an important decrease in sample size and a loss of information. A possible solution to this problem is to use missing-data likelihood methods. We propose an alternative approach, namely the use of multiple imputation. Briefly, this method consists in estimating from the available data all possible phased genotypes and their respective posterior probabilities. These posterior probabilities are then used to generate replicate imputed data sets via a data augmentation algorithm. We performed simulations to test the efficiency of this approach for case/parent trio data and we found that the multiple imputation procedure generally gave unbiased parameter estimates with correct type 1 error and confidence interval coverage. Multiple imputation had some advantages over missing data likelihood methods with regards to ease of use and model flexibility. Multiple imputation methods represent promising tools in the search for disease susceptibility variants.

  19. Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data.

    PubMed

    Edwards, Jessie K; Cole, Stephen R; Troester, Melissa A; Richardson, David B

    2013-05-01

    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods.

  20. Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

    PubMed

    Resche-Rigon, Matthieu; White, Ian R

    2016-09-19

    In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.

  1. Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data.

    PubMed

    Resche-Rigon, Matthieu; White, Ian R; Bartlett, Jonathan W; Peters, Sanne A E; Thompson, Simon G

    2013-12-10

    A variable is 'systematically missing' if it is missing for all individuals within particular studies in an individual participant data meta-analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure-disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta-analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between-study random effects in the imputation model.

  2. Kinship Testing Based on SNPs Using Microarray System

    PubMed Central

    Cho, Sohee; Seo, Hee Jin; Lee, Jihyun; Yu, Hyung Jin; Lee, Soong Deok

    2016-01-01

    Background Kinship testing using biallelic SNP markers has been demonstrated to be a promising approach as a supplement to standard STR typing, and several systems, such as pyrosequencing and microarray, have been introduced and utilized in real forensic cases. The Affymetrix microarray containing 169 autosomal SNPs developed for forensic application was applied to our practical case for kinship analysis that had remained inconclusive due to partial STR profiles of degraded DNA and possibility of inbreeding within the population. Case Report 169 autosomal SNPs were typed on array with severely degraded DNA of two bone samples, and the kinship compared to genotypes in a reference database of their putative family members. Results Two bone samples remained unidentified through traditional STR typing with partial profiles of 10 or 14 of 16 alleles. Because these samples originated from a geographically isolated population, a cautious approach was required when analyzing and declaring true paternity only based on PI values. In a supplementary SNP typing, 106 and 78 SNPs were obtained, and the match candidates were found in each case with improved PI values than using only STRs and with no discrepant SNPs in comparison. Conclusion Our case showed that the utility of multiple SNPs on array is expected in practical forensic caseworks with an establishment of reference database. PMID:27994531

  3. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation.

    PubMed

    Graffelman, Jan; Nelson, S; Gogarten, S M; Weir, B S

    2015-09-15

    This paper addresses the issue of exact-test based statistical inference for Hardy-Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy-Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ(2) statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy-Weinberg disequilibrium. Depending on the imputation method used, 6-13% of the test results changed qualitatively at the 5% level.

  4. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation

    PubMed Central

    Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.

    2015-01-01

    This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level. PMID:26377959

  5. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    PubMed Central

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  6. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which

  7. Imputation of Truncated p-Values For Meta-Analysis Methods and Its Genomic Application1

    PubMed Central

    Tang, Shaowu; Ding, Ying; Sibille, Etienne; Mogil, Jeffrey; Lariviere, William R.; Tseng, George C.

    2014-01-01

    Microarray analysis to monitor expression activities in thousands of genes simultaneously has become routine in biomedical research during the past decade. a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that aggregate transformed p-value evidence have been widely used in genomic settings, among which Fisher's and Stouffer's methods are the most popular ones. In practice, raw data and p-values of DE evidence are often not available in genomic studies that are to be combined. Instead, only the detected DE gene lists under a certain p-value threshold (e.g., DE genes with p-value < 0.001) are reported in journal publications. The truncated p-value information makes the aforementioned meta-analysis methods inapplicable and researchers are forced to apply a less efficient vote counting method or naïvely drop the studies with incomplete information. The purpose of this paper is to develop effective meta-analysis methods for such situations with partially censored p-values. We developed and compared three imputation methods—mean imputation, single random imputation and multiple imputation—for a general class of evidence aggregation methods of which Fisher's and Stouffer's methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis frameworks were established. Simulations were performed to investigate the type Ierror, power and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were applied to several genomic applications in colorectal cancer, pain and liquid association analysis of major depressive disorder (MDD). The results showed that imputation methods outperformed existing naïve approaches. Mean imputation and

  8. Imputation for exposure histories with gaps, under an excess relative risk model.

    PubMed

    Weinberg, C R; Moledor, E S; Umbach, D M; Sandler, D P

    1996-09-01

    In reconstructing exposure histories needed to calculate cumulative exposures, gaps often occur. Our investigation was motivated by case-control studies of residential radon exposure and lung cancer, where half or more of the targeted homes may not be measurable. Investigators have adopted various schemes for imputing exposures for such gaps. We first undertook simulations to assess the performance of five such methods under an excess relative risk model, in the presence of random missingness and under assumed independence among the true exposure levels for different epochs of exposure (houses). Assuming no other source of measurement error, one of the methods performed without bias and with coverage of nominally 95% confidence intervals that was close to 95%. This method assigns to the missing residences the arithmetic mean across all measured control residences. We show that its good properties can be explained by the fact that this approach produces approximate "Berkson errors." To take advantage of predictive information that might exist about the missing epochs of exposure, one might prefer to carry out the imputations within strata. In further simulations, we asked whether the method would still perform well if imputations were carried out within many strata. It does, and much of the lost statistical power/precision can be recovered if the stratification system is moderately predictive of the missing exposures. Thus, observed control mean imputation provides a way to impute missing exposures without corrupting the study's validity; and stratifying the imputations can enhance precision. The technique is applicable in other settings where exposure histories contain gaps.

  9. Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.

    PubMed

    Erler, Nicole S; Rizopoulos, Dimitris; Rosmalen, Joost van; Jaddoe, Vincent W V; Franco, Oscar H; Lesaffre, Emmanuel M E H

    2016-07-30

    Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Treatments of Missing Data: A Monte Carlo Comparison of RBHDI, Iterative Stochastic Regression Imputation, and Expectation-Maximization.

    ERIC Educational Resources Information Center

    Gold, Michael Steven; Bentler, Peter M.

    2000-01-01

    Describes a Monte Carlo investigation of four methods for treating incomplete data: (1) resemblance based hot-deck imputation (RBHDI); (2) iterated stochastic regression imputation; (3) structured model expectation maximization; and (4) saturated model expectation maximization. Results favored the expectation maximization methods. (SLD)

  11. Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

    PubMed Central

    Lee, Minjung; Dignam, James J.; Han, Junhee

    2014-01-01

    We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107

  12. Disk filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  13. Disk filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  14. Analysis of mitochondrial transcription factor A SNPs in alcoholic cirrhosis

    PubMed Central

    TANG, CHUN; LIU, HONGMING; TANG, YONGLIANG; GUO, YONG; LIANG, XIANCHUN; GUO, LIPING; PI, RUXIAN; YANG, JUNTAO

    2014-01-01

    Genetic susceptibility to alcoholic cirrhosis (AC) exists. We previously demonstrated hepatic mitochondrial DNA (mtDNA) damage in patients with AC compared with chronic alcoholics without cirrhosis. Mitochondrial transcription factor A (mtTFA) is central to mtDNA expression regulation and repair; however, it is unclear whether there are specific mtTFA single nucleotide polymorphisms (SNPs) in patients with AC and whether they affect mtDNA repair. In the present study, we screened mtTFA SNPs in patients with AC and analyzed their impact on the copy number of mtDNA in AC. A total of 50 patients with AC, 50 alcoholics without AC and 50 normal subjects were enrolled in the study. SNPs of full-length mtTFA were analyzed using the polymerase chain reaction (PCR) combined with gene sequencing. The hepatic mtTFA mRNA and mtDNA copy numbers were measured using quantitative PCR (qPCR), and mtTFA protein was measured using western blot analysis. A total of 18 mtTFA SNPs specific to patients with AC with frequencies >10% were identified. Two were located in the coding region and 16 were identified in non-coding regions. Conversely, there were five SNPs that were only present in patients with AC and normal subjects and had a frequency >10%. In the AC group, the hepatic mtTFA mRNA and protein levels were significantly lower than those in the other two groups. Moreover, the hepatic mtDNA copy number was significantly lower in the AC group than in the controls and alcoholics without AC. Based on these data, we conclude that AC-specific mtTFA SNPs may be responsible for the observed reductions in mtTFA mRNA, protein levels and mtDNA copy number and they may also increase the susceptibility to AC. PMID:24348767

  15. Fine mapping of disease genes using tagging SNPs.

    PubMed

    Sjölander, Arvid; Hössjer, Ola; Hartman, Linda Werner; Humphreys, Keith

    2007-11-01

    We describe a haplotype clustering approach for localising a disease mutation within a fixed genomic region, which supplements tagging SNP (tSNP) information with (external) information on linkage disequilibrium. By applying our method to simulated data based on the coalescent, and on real haplotype data, we demonstrate that there are situations where significant gains can be made by incorporating tagged SNPs into the analysis. The issues we explore are important not only to these types of studies, but also to studies that select tSNPs based on (external) HapMap phase II data, and those that use genome-wide markers.

  16. A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples.

    PubMed

    Chou, Wen-Chi; Zheng, Hou-Feng; Cheng, Chia-Ho; Yan, Han; Wang, Li; Han, Fang; Richards, J Brent; Karasik, David; Kiel, Douglas P; Hsu, Yi-Hsiang

    2016-12-22

    Imputation using the 1000 Genomes haplotype reference panel has been widely adapted to estimate genotypes in genome wide association studies. To evaluate imputation quality with a relatively larger reference panel and a reference panel composed of different ethnic populations, we conducted imputations in the Framingham Heart Study and the North Chinese Study using a combined reference panel from the 1000 Genomes (N = 1,092) and UK10K (N = 3,781) projects. For rare variants with 0.01% < MAF ≤ 0.5%, imputation in the Framingham Heart Study with the combined reference panel increased well-imputed genotypes (with imputation quality score ≥0.4) from 62.9% to 76.1% when compared to imputation with the 1000 Genomes. For the North Chinese samples, imputation of rare variants with 0.01% < MAF ≤ 0.5% with the combined reference panel increased well-imputed genotypes by from 49.8% to 61.8%. The predominant European ancestry of the UK10K and the combined reference panels may explain why there was less of an increase in imputation success in the North Chinese samples. Our results underscore the importance and potential of larger reference panels to impute rare variants, while recognizing that increasing ethnic specific variants in reference panels may result in better imputation for genotypes in some ethnic groups.

  17. A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples

    PubMed Central

    Chou, Wen-Chi; Zheng, Hou-Feng; Cheng, Chia-Ho; Yan, Han; Wang, Li; Han, Fang; Richards, J. Brent; Karasik, David; Kiel, Douglas P.; Hsu, Yi-Hsiang

    2016-01-01

    Imputation using the 1000 Genomes haplotype reference panel has been widely adapted to estimate genotypes in genome wide association studies. To evaluate imputation quality with a relatively larger reference panel and a reference panel composed of different ethnic populations, we conducted imputations in the Framingham Heart Study and the North Chinese Study using a combined reference panel from the 1000 Genomes (N = 1,092) and UK10K (N = 3,781) projects. For rare variants with 0.01% < MAF ≤ 0.5%, imputation in the Framingham Heart Study with the combined reference panel increased well-imputed genotypes (with imputation quality score ≥0.4) from 62.9% to 76.1% when compared to imputation with the 1000 Genomes. For the North Chinese samples, imputation of rare variants with 0.01% < MAF ≤ 0.5% with the combined reference panel increased well-imputed genotypes by from 49.8% to 61.8%. The predominant European ancestry of the UK10K and the combined reference panels may explain why there was less of an increase in imputation success in the North Chinese samples. Our results underscore the importance and potential of larger reference panels to impute rare variants, while recognizing that increasing ethnic specific variants in reference panels may result in better imputation for genotypes in some ethnic groups. PMID:28004816

  18. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough.

    PubMed

    McMahon, George; Ring, Susan M; Davey-Smith, George; Timpson, Nicholas J

    2015-10-15

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case-control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E - 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy.

  19. Imputation for transcription factor binding predictions based on deep learning

    PubMed Central

    Qin, Qian

    2017-01-01

    Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding. PMID:28234893

  20. Imputation for transcription factor binding predictions based on deep learning.

    PubMed

    Qin, Qian; Feng, Jianxing

    2017-02-01

    Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding.

  1. Water Filters

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The Aquaspace H2OME Guardian Water Filter, available through Western Water International, Inc., reduces lead in water supplies. The filter is mounted on the faucet and the filter cartridge is placed in the "dead space" between sink and wall. This filter is one of several new filtration devices using the Aquaspace compound filter media, which combines company developed and NASA technology. Aquaspace filters are used in industrial, commercial, residential, and recreational environments as well as by developing nations where water is highly contaminated.

  2. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    NASA Astrophysics Data System (ADS)

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  3. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their…

  4. A model-based approach for imputing censored data in source apportionment studies

    PubMed Central

    Krall, Jenna R.; Simpson, Charles H.

    2015-01-01

    Sources of particulate matter (PM) air pollution are generally inferred from PM chemical constituent concentrations using source apportionment models. Concentrations of PM constituents are often censored below minimum detection limits (MDL) and most source apportionment models cannot handle these censored data. Frequently, censored data are first substituted by a constant proportion of the MDL or are removed to create a truncated dataset before sources are estimated. When estimating the complete data distribution, these commonly applied methods to adjust censored data perform poorly compared with model-based imputation methods. Model-based imputation has not been used in source apportionment and may lead to better source estimation. However if the censored chemical constituents are not important for estimating sources, censoring adjustment methods may have little impact on source estimation. We focus on two source apportionment models applied in the literature and provide a comprehensive assessment of how censoring adjustment methods, including model-based imputation, impact source estimation. A review of censoring adjustment methods critically informs how censored data should be handled in these source apportionment models. In a simulation study, we demonstrated that model-based multiple imputation frequently leads to better source estimation compared with commonly used censoring adjustment methods. We estimated sources of PM in New York City and found estimated source distributions differed by censoring adjustment method. In this study, we provide guidance for adjusting censored PM constituent data in common source apportionment models, which is necessary for estimation of PM sources and their subsequent health effects. PMID:26640398

  5. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    PubMed

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length.

  6. Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.

    ERIC Educational Resources Information Center

    Schafer, Joseph L.; Olsen, Maren K.

    1998-01-01

    The key ideas of multiple imputation for multivariate missing data problems are reviewed. Software programs available for this analysis are described, and their use is illustrated with data from the Adolescent Alcohol Prevention Trial (W. Hansen and J. Graham, 1991). (SLD)

  7. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  8. Analysis of an incomplete longitudinal composite variable using a marginalized random effects model and multiple imputation.

    PubMed

    Gosho, Masahiko; Maruo, Kazushi; Ishii, Ryota; Hirakawa, Akihiro

    2016-11-16

    The total score, which is calculated as the sum of scores in multiple items or questions, is repeatedly measured in longitudinal clinical studies. A mixed effects model for repeated measures method is often used to analyze these data; however, if one or more individual items are not measured, the method cannot be directly applied to the total score. We develop two simple and interpretable procedures that infer fixed effects for a longitudinal continuous composite variable. These procedures consider that the items that compose the total score are multivariate longitudinal continuous data and, simultaneously, handle subject-level and item-level missing data. One procedure is based on a multivariate marginalized random effects model with a multiple of Kronecker product covariance matrices for serial time dependence and correlation among items. The other procedure is based on a multiple imputation approach with a multivariate normal model. In terms of the type-1 error rate and the bias of treatment effect in total score, the marginalized random effects model and multiple imputation procedures performed better than the standard mixed effects model for repeated measures analysis with listwise deletion and single imputations for handling item-level missing data. In particular, the mixed effects model for repeated measures with listwise deletion resulted in substantial inflation of the type-1 error rate. The marginalized random effects model and multiple imputation methods provide for a more efficient analysis by fully utilizing the partially available data, compared to the mixed effects model for repeated measures method with listwise deletion.

  9. Effects of reduced panel, reference origin, and genetic relationship on imputation of genotypes in Hereford cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to investigate alternative methods for designing and utilizing reduced single nucleotide polymorphism (SNP) panels for imputing SNP genotypes. Two purebred Hereford populations, an experimental population known as Line 1 Hereford (L1, N=240) and registered Hereford wi...

  10. Multiple Imputation to Correct for Nonresponse Bias: Application in Non-Communicable Disease Risk Factors Survey

    PubMed Central

    Miri, Hamid Heidarian; Hassanzadeh, Jafar; Rajaeefard, Abdolreza; Mirmohammadkhani, Majid; Angali, Kambiz Ahmadi

    2016-01-01

    Background: This study was carried out to use multiple imputation (MI) in order to correct for the potential nonresponse bias in measurements related to variable fasting blood glucose (FBS) in non-communicable disease risk factors survey conducted in Iran in 2007. Methods: Five multiple imputation methods as bootstrap expectation maximization, multivariate normal regression, univariate linear regression, MI by chained equation, and predictive mean matching were applied to impute variable fasting blood sugar. To make FBS consistent with normality assumption natural logarithm (Ln) and Box-Cox (BC) transformations were used prior to imputation. Measurements from which we intended to remove nonresponse bias included mean of FBS and percentage of those with high FBS. Results: For mean of FBS results didn’t considerably change after applying MI methods. Regarding the prevalence of high blood sugar all methods on original scale tended to increase the estimates except for predictive mean matching that along with all methods on BC or Ln transformed data didn’t change the results. Conclusions: FBS-related measurements didn’t change after applying different MI methods. It seems that nonresponse bias was not an important challenge regarding these measurements. However use of MI methods resulted in more efficient estimations. Further studies are encouraged on accuracy of MI methods in these settings. PMID:26234966

  11. Evaluation of an Imputed Pitch Velocity Model of the Auditory Kappa Effect

    ERIC Educational Resources Information Center

    Henry, Molly J.; McAuley, J. Devin

    2009-01-01

    Three experiments evaluated an imputed pitch velocity model of the auditory kappa effect. Listeners heard 3-tone sequences and judged the timing of the middle (target) tone relative to the timing of the 1st and 3rd (bounding) tones. Experiment 1 held pitch constant but varied the time (T) interval between bounding tones (T = 728, 1,000, or 1,600…

  12. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

  13. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    ERIC Educational Resources Information Center

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a…

  14. A model-based approach for imputing censored data in source apportionment studies.

    PubMed

    Krall, Jenna R; Simpson, Charles H; Peng, Roger D

    2015-12-01

    Sources of particulate matter (PM) air pollution are generally inferred from PM chemical constituent concentrations using source apportionment models. Concentrations of PM constituents are often censored below minimum detection limits (MDL) and most source apportionment models cannot handle these censored data. Frequently, censored data are first substituted by a constant proportion of the MDL or are removed to create a truncated dataset before sources are estimated. When estimating the complete data distribution, these commonly applied methods to adjust censored data perform poorly compared with model-based imputation methods. Model-based imputation has not been used in source apportionment and may lead to better source estimation. However if the censored chemical constituents are not important for estimating sources, censoring adjustment methods may have little impact on source estimation. We focus on two source apportionment models applied in the literature and provide a comprehensive assessment of how censoring adjustment methods, including model-based imputation, impact source estimation. A review of censoring adjustment methods critically informs how censored data should be handled in these source apportionment models. In a simulation study, we demonstrated that model-based multiple imputation frequently leads to better source estimation compared with commonly used censoring adjustment methods. We estimated sources of PM in New York City and found estimated source distributions differed by censoring adjustment method. In this study, we provide guidance for adjusting censored PM constituent data in common source apportionment models, which is necessary for estimation of PM sources and their subsequent health effects.

  15. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

    PubMed

    Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

    2017-02-27

    Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  16. Biological Filters.

    ERIC Educational Resources Information Center

    Klemetson, S. L.

    1978-01-01

    Presents the 1978 literature review of wastewater treatment. The review is concerned with biological filters, and it covers: (1) trickling filters; (2) rotating biological contractors; and (3) miscellaneous reactors. A list of 14 references is also presented. (HM)

  17. Metallic Filters

    NASA Technical Reports Server (NTRS)

    1985-01-01

    Filtration technology originated in a mid 1960's NASA study. The results were distributed to the filter industry, an HR Textron responded, using the study as a departure for the development of 421 Filter Media. The HR system is composed of ultrafine steel fibers metallurgically bonded and compressed so that the pore structure is locked in place. The filters are used to filter polyesters, plastics, to remove hydrocarbon streams, etc. Several major companies use the product in chemical applications, pollution control, etc.

  18. Filter validation.

    PubMed

    Madsen, Russell E

    2006-01-01

    Validation of a sterilizing filtration process is critical since it is impossible with currently available technology to measure the sterility of each filled container; therefore, sterility assurance of the filtered product must be achieved through validation of the filtration process. Validating a pharmaceutical sterile filtration process involves three things: determining the effect of the liquid on the filter, determining the effect of the filter on the liquid, and demonstrating that the filter removes all microorganisms from the liquid under actual processing conditions.

  19. Analysis of accelerated failure time data with dependent censoring using auxiliary variables via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Taylor, Jeremy M G; Hu, Chengcheng

    2015-08-30

    We consider the situation of estimating the marginal survival distribution from censored data subject to dependent censoring using auxiliary variables. We had previously developed a nonparametric multiple imputation approach. The method used two working proportional hazards (PH) models, one for the event times and the other for the censoring times, to define a nearest neighbor imputing risk set. This risk set was then used to impute failure times for censored observations. Here, we adapt the method to the situation where the event and censoring times follow accelerated failure time models and propose to use the Buckley-James estimator as the two working models. Besides studying the performances of the proposed method, we also compare the proposed method with two popular methods for handling dependent censoring through the use of auxiliary variables, inverse probability of censoring weighted and parametric multiple imputation methods, to shed light on the use of them. In a simulation study with time-independent auxiliary variables, we show that all approaches can reduce bias due to dependent censoring. The proposed method is robust to misspecification of either one of the two working models and their link function. This indicates that a working proportional hazards model is preferred because it is more cumbersome to fit an accelerated failure time model. In contrast, the inverse probability of censoring weighted method is not robust to misspecification of the link function of the censoring time model. The parametric imputation methods rely on the specification of the event time model. The approaches are applied to a prostate cancer dataset.

  20. Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data.

    PubMed

    He, Dan; Han, Buhm; Eskin, Eleazar

    2013-02-01

    Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses, including admixture mapping, identifying regions of identity by descent, and imputation. Traditionally, haplotypes are inferred from genotype data obtained from microarrays using information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method that combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype, which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.

  1. FILTER TREATMENT

    DOEpatents

    Sutton, J.B.; Torrey, J.V.P.

    1958-08-26

    A process is described for reconditioning fused alumina filters which have become clogged by the accretion of bismuth phosphate in the filter pores, The method consists in contacting such filters with faming sulfuric acid, and maintaining such contact for a substantial period of time.

  2. Water Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    A compact, lightweight electrolytic water filter generates silver ions in concentrations of 50 to 100 parts per billion in the water flow system. Silver ions serve as effective bactericide/deodorizers. Ray Ward requested and received from NASA a technical information package on the Shuttle filter, and used it as basis for his own initial development, a home use filter.

  3. Variational filtering.

    PubMed

    Friston, K J

    2008-07-01

    This note presents a simple Bayesian filtering scheme, using variational calculus, for inference on the hidden states of dynamic systems. Variational filtering is a stochastic scheme that propagates particles over a changing variational energy landscape, such that their sample density approximates the conditional density of hidden and states and inputs. The key innovation, on which variational filtering rests, is a formulation in generalised coordinates of motion. This renders the scheme much simpler and more versatile than existing approaches, such as those based on particle filtering. We demonstrate variational filtering using simulated and real data from hemodynamic systems studied in neuroimaging and provide comparative evaluations using particle filtering and the fixed-form homologue of variational filtering, namely dynamic expectation maximisation.

  4. Identification of three SNPs in the porcine myostatin gene (MSTN).

    PubMed

    Jiang, Y L; Li, N; Plastow, G; Liu, Z L; Hu, X X; Wu, C X

    2002-05-01

    Thirteen pairs of primers were designed for the entire porcine MSTN gene to enable PCR amplification for the detection of single nucleotide polymorphisms (SNPs) by a PCR-SSCP approach. Altogether 96.5% (1089/1128) of the encoding regions and 971 bp of the non-coding regions were screened. A total of three polymorphisms were identified with PCR-SSCP. They were located in the promoter, intron one and exon three regions of the gene. These polymorphisms were then confirmed to be point mutations (T --> A transversion, G --> A transition and C --> T transition respectively) by sequencing. Allele frequencies were determined for all three SNPs in several different pig breed populations. The polymorphisms were found to be rare in Western breeds, but much more common in Chinese breeds. Whether they have any relationship with the marked difference in lean meat mass between Western and Chinese breeds requires further study.

  5. S-PRIME/TI-SNPS Conceptual Design Summary

    NASA Astrophysics Data System (ADS)

    Mills, Joseph C.; Determan, William R.; van Hagan, Tom H.

    1994-07-01

    A conceptual design for a 40-kWe thermionic space nuclear power system (TI-SNPS) known as the S-PRIME system is being developed by Rockwell and its subcontractors for the U.S. Department of Energy (DOE), United States Air Force (USAF), and Ballistic Missile Defense Organization (BMDO) under the TI-SNPS Program. Phase 1 of this program includes the development of a conceptual design of a 5- to 40-kWe range TI-SNPS and validation of key technologies supporting the design. All key technologies for the S-PRIME design have been identified along with six critical component demonstrations, which will be used to validate the S-PREME design features. Phase 1 is scheduled for completion in September 1994 culminating in a conceptual design review. Phase 2 of the contract, which is an option, provides for the development of a system preliminary design and demonstration of technology readiness with a preliminary design review (PDR) scheduled for September 1995.

  6. Identity-by-descent graphs offer a flexible framework for imputation and both linkage and association analyses

    PubMed Central

    2014-01-01

    We demonstrate the flexibility of identity-by-descent (IBD) graphs for genotype imputation and testing relationships between genotype and phenotype. We analyzed chromosome 3 and the first replicate of simulated diastolic blood pressure. IBD graphs were obtained from complete pedigrees and full multipoint marker analysis, facilitating subsequent linkage and other analyses. For rare alleles, pedigree-based imputation using these IBD graphs had a higher call rate than did population-based imputation. Combining the two approaches improved call rates for common alleles. We found it advantageous to incorporate known, rather than estimated, pedigree relationships when testing for association. Replacing missing data with imputed alleles improved association signals as well. Analyses were performed with knowledge of the underlying model. PMID:25519371

  7. Accounting for dependence induced by weighted KNN imputation in paired samples, motivated by a colorectal cancer study.

    PubMed

    Suyundikov, Anvar; Stevens, John R; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K; Slattery, Martha L

    2015-01-01

    Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control.

  8. Comparison of SNPs and microsatellites in identifying offtypes of cacao clones from Cameroon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single Nucleotide Polymorphism (SNP) markers are increasingly being used in crop breeding programs, slowly replacing microsatellites and other markers. SNPs provide many benefits over microsatellites, including ease of analysis and unambiguous results across various platforms. We compare SNPs to m...

  9. Filtering apparatus

    DOEpatents

    Haldipur, Gaurang B.; Dilmore, William J.

    1992-01-01

    A vertical vessel having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas.

  10. Filtering apparatus

    DOEpatents

    Haldipur, G.B.; Dilmore, W.J.

    1992-09-01

    A vertical vessel is described having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas. 18 figs.

  11. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees.

    PubMed

    Saad, Mohamad; Wijsman, Ellen M

    2014-11-01

    In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.

  12. A Review of Hot Deck Imputation for Survey Non-response

    PubMed Central

    Andridge, Rebecca R.; Little, Roderick J. A.

    2011-01-01

    Summary Hot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a “similar” unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and obtain inferences from the completed data set. Here we review different forms of the hot deck and existing research on its statistical properties. We describe applications of the hot deck currently in use, including the U.S. Census Bureau’s hot deck for the Current Population Survey (CPS). We also provide an extended example of variations of the hot deck applied to the third National Health and Nutrition Examination Survey (NHANES III). Some potential areas for future research are highlighted. PMID:21743766

  13. Imputed outage costs under a proposed curtailable rate program in Taiwan

    SciTech Connect

    Hsu, G.J.Y. ); Chang, P.L.; Chen, T.Y. )

    1991-01-01

    The implementation of a curtailable rate program through an appropriately designed menu, mainly determined by the customer's outage costs, is one feasibly solution to a power shortage problem. In Taiwan, this issue is particularly important because currently Taiwan is facing a shortage of power generation supply in the summer peak period. In this paper, the authors conducted a survey to examine the market acceptance for the proposed curtailable rate menu and investigated customers' imputed outage costs in relation to their attributes. The survey results show that the imputed outage costs range from $2.25/KW to $3.14/KW and a potentiality of 5.6% to 18.1% of high-tension power peak load supply of the surveyed customers could be curtailed. The economic implications of the research results are presented and further research is recommended.

  14. Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys

    PubMed Central

    Rendall, Michael S.; Ghosh-Dastidar, Bonnie; Weden, Margaret M.; Baker, Elizabeth H.; Nazarov, Zafar

    2013-01-01

    Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys. PMID:24223447

  15. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.

    PubMed

    Lin, Peijie; Troup, Michael; Ho, Joshua W K

    2017-03-28

    Most existing dimensionality reduction and clustering packages for single-cell RNA-seq (scRNA-seq) data deal with dropouts by heavy modeling and computational machinery. Here, we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner. Using a range of simulated and real data, we show that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA, and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds when processing a data set of hundreds of cells and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.com/VCCRI/CIDR .

  16. Variable selection in the presence of missing data: resampling and imputation.

    PubMed

    Long, Qi; Johnson, Brent A

    2015-07-01

    In the presence of missing data, variable selection methods need to be tailored to missing data mechanisms and statistical approaches used for handling missing data. We focus on the mechanism of missing at random and variable selection methods that can be combined with imputation. We investigate a general resampling approach (BI-SS) that combines bootstrap imputation and stability selection, the latter of which was developed for fully observed data. The proposed approach is general and can be applied to a wide range of settings. Our extensive simulation studies demonstrate that the performance of BI-SS is the best or close to the best and is relatively insensitive to tuning parameter values in terms of variable selection, compared with several existing methods for both low-dimensional and high-dimensional problems. The proposed approach is further illustrated using two applications, one for a low-dimensional problem and the other for a high-dimensional problem.

  17. Missing data imputation and corrected statistics for large-scale behavioral databases.

    PubMed

    Courrieu, Pierre; Rey, Arnaud

    2011-06-01

    This article presents a new methodology for solving problems resulting from missing data in large-scale item performance behavioral databases. Useful statistics corrected for missing data are described, and a new method of imputation for missing data is proposed. This methodology is applied to the Dutch Lexicon Project database recently published by Keuleers, Diependaele, and Brysbaert (Frontiers in Psychology, 1, 174, 2010), which allows us to conclude that this database fulfills the conditions of use of the method recently proposed by Courrieu, Brand-D'Abrescia, Peereman, Spieler, and Rey (2011) for testing item performance models. Two application programs in MATLAB code are provided for the imputation of missing data in databases and for the computation of corrected statistics to test models.

  18. Inference from Multiple Imputation for Missing Data Using Mixtures of Normals.

    PubMed

    Steele, Russell J; Wang, Naisyin; Raftery, Adrian E

    2010-05-01

    We consider two difficulties with standard multiple imputation methods for missing data based on Rubin's t method for confidence intervals: their often excessive width, and their instability. These problems are present most often when the number of copies is small, as is often the case when a data collection organization is making multiple completed datasets available for analysis. We suggest using mixtures of normals as an alternative to Rubin's t. We also examine the performance of improper imputation methods as an alternative to generating copies from the true posterior distribution for the missing observations. We report the results of simulation studies and analyses of data on health-related quality of life in which the methods suggested here gave narrower confidence intervals and more stable inferences, especially with small numbers of copies or non-normal posterior distributions of parameter estimates. A free R software package called MImix that implements our methods is available from CRAN.

  19. Sensitivity to imputation models and assumptions in receiver operating characteristic analysis with incomplete data.

    PubMed

    Karakaya, Jale; Karabulut, Erdem; Yucel, Recai M

    Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms.

  20. Imputing historical statistics, soils information, and other land-use data to crop area

    NASA Technical Reports Server (NTRS)

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  1. Microarray missing data imputation based on a set theoretic framework and biological knowledge

    PubMed Central

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2006-01-01

    Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods. PMID:16549873

  2. Normalization and missing value imputation for label-free LC-MS analysis

    SciTech Connect

    Karpievitch, Yuliya; Dabney, Alan R.; Smith, Richard D.

    2012-11-05

    Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

  3. SNPs Selection using Gravitational Search Algorithm and Exhaustive Search for Association Mapping

    NASA Astrophysics Data System (ADS)

    Kusuma, W. A.; Hasibuan, L. S.; Istiadi, M. A.

    2016-01-01

    Single Nucleotide Polymorphisms (SNPs) are known having association to phenotipic variations. The study of linking SNPs to interest phenotype is refer to Association Mapping (AM), which is classified as a combinatorial problem. Exhaustive Search (ES) approach is able to be implemented to select targeted SNPs exactly since it evaluate all possible combinations of SNPs, but it is not efficient in terms of computer resources and computation time. Heuristic Search (HS) approach is an alternative to improve the performance of ES in those terms, but it still suffers high false positive SNPs in each combinations. Gravitational Search Algorithm (GSA) is a new HS algorithm that yields better performance than other nature inspired HS. This paper proposed a new method which combined GSA and ES to identify the most appropriate combination of SNPs linked to interest phenotype. Testing was conducted using dataset without epistasis and dataset with epistasis. Using dataset without epistasis with 7 targeted SNPs, the proposed method identified 7 SNPs - 6 True Positive (TP) SNPs and 1 False Positive (FP) SNP- with association value of 0.83. In addition, the proposed method could identified 3 SNPs- 2 TP SNP and 1 FP SNP with association value of 0.87 by using dataset with epistases and 5 targeted SNPs. The results showed that the method is robust in reducing redundant SNPs and identifying main markers.

  4. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    PubMed

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  5. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    PubMed Central

    Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  6. Imputation of missing covariate values in epigenome-wide analysis of DNA methylation data

    PubMed Central

    Wu, Chong; Demerath, Ellen W.; Pankow, James S.; Bressler, Jan; Fornage, Myriam; Grove, Megan L.; Chen, Wei; Guan, Weihua

    2016-01-01

    ABSTRACT DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed. PMID:26890800

  7. Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.

    PubMed

    Antonelli, Joseph; Zigler, Corwin; Dominici, Francesca

    2017-03-03

    In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.

  8. Using latent variable modeling and multiple imputation to calibrate rater bias in diagnosis assessment.

    PubMed

    Siddique, Juned; Crespi, Catherine M; Gibbons, Robert D; Green, Bonnie L

    2011-01-30

    We present an approach that uses latent variable modeling and multiple imputation to correct rater bias when one group of raters tends to be more lenient in assigning a diagnosis than another. Our method assumes that there exists an unobserved moderate category of patient who is assigned a positive diagnosis by one type of rater and a negative diagnosis by the other type. We present a Bayesian random effects censored ordinal probit model that allows us to calibrate the diagnoses across rater types by identifying and multiply imputing 'case' or 'non-case' status for patients in the moderate category. A Markov chain Monte Carlo algorithm is presented to estimate the posterior distribution of the model parameters and generate multiple imputations. Our method enables the calibrated diagnosis variable to be used in subsequent analyses while also preserving uncertainty in true diagnosis. We apply our model to diagnoses of posttraumatic stress disorder (PTSD) from a depression study where nurse practitioners were twice as likely as clinical psychologists to diagnose PTSD despite the fact that participants were randomly assigned to either a nurse or a psychologist. Our model appears to balance PTSD rates across raters, provides a good fit to the data, and preserves between-rater variability. After calibrating the diagnoses of PTSD across rater types, we perform an analysis looking at the effects of comorbid PTSD on changes in depression scores over time. Results are compared with an analysis that uses the original diagnoses and show that calibrating the PTSD diagnoses can yield different inferences.

  9. A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers

    PubMed Central

    Turrado, Concepción Crespo; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés-José; Melero, Manuel G.; de Cos Juez, Francisco Javier

    2016-01-01

    The storage of data is a key process in the study of electrical power networks related to the search for harmonics and the finding of a lack of balance among phases. The presence of missing data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, current in each phase and power factor) affects any time series study in a negative way that has to be addressed. When this occurs, missing data imputation algorithms are required. These algorithms are able to substitute the data that are missing for estimated values. This research presents a new algorithm for the missing data imputation method based on Self-Organized Maps Neural Networks and Mahalanobis distances and compares it not only with a well-known technique called Multivariate Imputation by Chained Equations (MICE) but also with an algorithm previously proposed by the authors called Adaptive Assignation Algorithm (AAA). The results obtained demonstrate how the proposed method outperforms both algorithms. PMID:27626419

  10. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION

    PubMed Central

    Allen, Genevera I.; Tibshirani, Robert

    2015-01-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility. PMID:26877823

  11. Imputation of missing covariate values in epigenome-wide analysis of DNA methylation data.

    PubMed

    Wu, Chong; Demerath, Ellen W; Pankow, James S; Bressler, Jan; Fornage, Myriam; Grove, Megan L; Chen, Wei; Guan, Weihua

    2016-01-01

    DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed.

  12. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions.

    PubMed

    Han, Ying; Hazelett, Dennis J; Wiklund, Fredrik; Schumacher, Fredrick R; Stram, Daniel O; Berndt, Sonja I; Wang, Zhaoming; Rand, Kristin A; Hoover, Robert N; Machiela, Mitchell J; Yeager, Merideth; Burdette, Laurie; Chung, Charles C; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C; Key, Timothy J; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L; Kolb, Suzanne; Gapstur, Susan M; Diver, W Ryan; Stevens, Victoria L; Strom, Sara S; Pettaway, Curtis A; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A; Yeboah, Edward D; Tettey, Yao; Biritwum, Richard B; Adjei, Andrew A; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P; Isaacs, William B; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M; Ingles, Sue A; Kittles, Rick A; Murphy, Adam B; Blot, William J; Signorello, Lisa B; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M Cristina; Wu, Suh-Yuh; Hennis, Anselm J M; Rybicki, Benjamin A; Neslund-Dudas, Christine; Hsing, Ann W; Chu, Lisa; Goodman, Phyllis J; Klein, Eric A; Zheng, S Lilly; Witte, John S; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L; Hunter, David J; Gronberg, Henrik; Cook, Michael B; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J; Easton, Douglas F; Henderson, Brian E; Coetzee, Gerhard A; Conti, David V; Haiman, Christopher A

    2015-10-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 × 10(-4)-5.6 × 10(-3)) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 × 10(-6)) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation.

  13. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions

    PubMed Central

    Han, Ying; Hazelett, Dennis J.; Wiklund, Fredrik; Schumacher, Fredrick R.; Stram, Daniel O.; Berndt, Sonja I.; Wang, Zhaoming; Rand, Kristin A.; Hoover, Robert N.; Machiela, Mitchell J.; Yeager, Merideth; Burdette, Laurie; Chung, Charles C.; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C.; Key, Timothy J.; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L.; Kolb, Suzanne; Gapstur, Susan M.; Diver, W. Ryan; Stevens, Victoria L.; Strom, Sara S.; Pettaway, Curtis A.; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A.; Yeboah, Edward D.; Tettey, Yao; Biritwum, Richard B.; Adjei, Andrew A.; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P.; Isaacs, William B.; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L.; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M.; Ingles, Sue A.; Kittles, Rick A.; Murphy, Adam B.; Blot, William J.; Signorello, Lisa B.; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M. Cristina; Wu, Suh-Yuh; Hennis, Anselm J. M.; Rybicki, Benjamin A.; Neslund-Dudas, Christine; Hsing, Ann W.; Chu, Lisa; Goodman, Phyllis J.; Klein, Eric A.; Zheng, S. Lilly; Witte, John S.; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L.; Hunter, David J.; Gronberg, Henrik; Cook, Michael B.; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J.; Easton, Douglas F.; Henderson, Brian E.; Coetzee, Gerhard A.; Conti, David V.; Haiman, Christopher A.

    2015-01-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 × 10−4–5.6 × 10−3) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 × 10−6) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation. PMID:26162851

  14. Ultraviolet filters.

    PubMed

    Shaath, Nadim A

    2010-04-01

    The chemistry, photostability and mechanism of action of ultraviolet filters are reviewed. The worldwide regulatory status of the 55 approved ultraviolet filters and their optical properties are documented. The photostabilty of butyl methoxydibenzoyl methane (avobenzone) is considered and methods to stabilize it in cosmetic formulations are presented.

  15. Tracing Cattle Breeds with Principal Components Analysis Ancestry Informative SNPs

    PubMed Central

    Lewis, Jamey; Abas, Zafiris; Dadousis, Christos; Lykidis, Dimitrios; Paschou, Peristera; Drineas, Petros

    2011-01-01

    The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products. PMID:21490966

  16. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti

    PubMed Central

    2014-01-01

    Background Genetic markers are widely used to understand the biology and population dynamics of disease vectors, but often markers are limited in the resolution they provide. In particular, the delineation of population structure, fine scale movement and patterns of relatedness are often obscured unless numerous markers are available. To address this issue in the major arbovirus vector, the yellow fever mosquito (Aedes aegypti), we used double digest Restriction-site Associated DNA (ddRAD) sequencing for the discovery of genome-wide single nucleotide polymorphisms (SNPs). We aimed to characterize the new SNP set and to test the resolution against previously described microsatellite markers in detecting broad and fine-scale genetic patterns in Ae. aegypti. Results We developed bioinformatics tools that support the customization of restriction enzyme-based protocols for SNP discovery. We showed that our approach for RAD library construction achieves unbiased genome representation that reflects true evolutionary processes. In Ae. aegypti samples from three continents we identified more than 18,000 putative SNPs. They were widely distributed across the three Ae. aegypti chromosomes, with 47.9% found in intergenic regions and 17.8% in exons of over 2,300 genes. Pattern of their imputed effects in ORFs and UTRs were consistent with those found in a recent transcriptome study. We demonstrated that individual mosquitoes from Indonesia, Australia, Vietnam and Brazil can be assigned with a very high degree of confidence to their region of origin using a large SNP panel. We also showed that familial relatedness of samples from a 0.4 km2 area could be confidently established with a subset of SNPs. Conclusions Using a cost-effective customized RAD sequencing approach supported by our bioinformatics tools, we characterized over 18,000 SNPs in field samples of the dengue fever mosquito Ae. aegypti. The variants were annotated and positioned onto the three Ae. aegypti chromosomes

  17. Molecular Beacon CNT-based Detection of SNPs

    NASA Astrophysics Data System (ADS)

    Egorova, V. P.; Krylova, H. V.; Lipnevich, I. V.; Veligura, A. A.; Shulitsky, B. G.; Y Fedotenkova, L.

    2015-11-01

    An fluorescence quenching effect due to few-walled carbon nanotubes chemically modified by carboxyl groups has been utilized to discriminate Single Nucleotide Polymorphism (SNP). It was shown that the complex obtained from these nanotube and singlestranded primer DNA is formed due to stacking interactions between the hexagons of the nanotubes and aromatic rings of nucleotide bases as well as due to establishing of hydrogen bonds between acceptor amine groups of nucleotide bases and donor carboxyl groups of the nanotubes. It has been demonstrated that these complexes may be used to make highly effective DNA biosensors detecting SNPs which operate as molecular beacons.

  18. SNPs Array Karyotyping in Non-Hodgkin Lymphoma

    PubMed Central

    Etebari, Maryam; Navari, Mohsen; Piccaluga, Pier Paolo

    2015-01-01

    The traditional methods for detection of chromosomal aberrations, which included cytogenetic or gene candidate solutions, suffered from low sensitivity or the need for previous knowledge of the target regions of the genome. With the advent of single nucleotide polymorphism (SNP) arrays, genome screening at global level in order to find chromosomal aberrations like copy number variants, DNA amplifications, deletions, and also loss of heterozygosity became feasible. In this review, we present an update of the knowledge, gained by SNPs arrays, of the genomic complexity of the most important subtypes of non-Hodgkin lymphomas. PMID:27600240

  19. Outlier SNPs show more genetic structure between two Bay of Fundy metapopulations of Atlantic salmon than do neutral SNPs.

    PubMed

    Freamo, Heather; O'Reilly, Patrick; Berg, Paul R; Lien, Sigbjørn; Boulding, Elizabeth G

    2011-03-01

    Atlantic salmon of Eastern Canada were once of considerable importance to aboriginal, recreational, and commercial fisheries, yet many populations are now in decline, particularly those of the inner Bay of Fundy (iBoF), which were recently listed as endangered. We investigated whether nonneutral SNPs could be used to assign individual Atlantic salmon accurately to either the iBoF or the outer Bay of Fundy (oBoF) metapopulations because this has been difficult with existing neutral markers. We first searched for markers under diversifying selection by genotyping eight captively bred Bay of Fundy (BoF) populations for 320 SNP loci with the Sequenom MassARRAY™ system and then analysed the data set with four different F(ST) outlier detection programs. Three outlier loci were identified by both BayesFST and BayeScan whereas seven outlier loci, including the three previously mentioned, were identified by both Fdist and Arlequin. A subset of 14 nonneutral SNPs was more accurate (85% accuracy) than a subset of 67 neutral SNPs (75% accuracy) at assigning individual salmon back to their metapopulation. We then chose a subset of nine outlier SNP markers and used them to inexpensively genotype archived DNA samples from seven wild BoF populations using Invader™ chemistry. Hierarchical AMOVA of these independent wild samples corroborated our previous findings of significant genetic differentiation between iBoF and oBoF salmon metapopulations. Our research shows that identifying and using outlier loci is an important step towards achieving the goal of consistently and accurately distinguishing iBoF from oBoF Atlantic salmon, which will aid in their conservation.

  20. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    PubMed Central

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L.; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Al Turki, Saeed; Amuzu, Antoinette; Anderson, Carl A.; Anney, Richard; Antony, Dinu; Artigas, María Soler; Ayub, Muhammad; Bala, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Benn, Marianne; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick F.; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Pablo Casas, Juan; Chambers, John C.; Charlton, Ruth; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebahattin; Clapham, Peter; Clement, Gail; Coates, Guy; Cocca, Massimiliano; Collier, David A.; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day, Ian N. M.; Day-Williams, Aaron; Dedoussis, George; Down, Thomas; Du, Yuanping; van Duijn, Cornelia M.; Dunham, Ian; Edkins, Sarah; Ekong, Rosemary; Ellis, Peter; Evans, David M.; Farooqi, I. Sadaf; Fitzpatrick, David R.; Flicek, Paul; Floyd, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Gasparini, Paolo; Gaunt, Tom R.; Geihs, Matthias; Geschwind, Daniel; Greenwood, Celia; Griffin, Heather; Grozeva, Detelina; Guo, Xiaosen; Guo, Xueqin; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey E.; Holmans, Peter; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Iotchkova, Valentina; Isaacs, Aaron; Jackson, David K.; Jamshidi, Yalda; Johnson, Jon; Joyce, Chris; Karczewski, Konrad J.; Kaye, Jane; Keane, Thomas; Kemp, John P.; Kennedy, Karen; Kent, Alastair; Keogh, Julia; Khawaja, Farrah; Kleber, Marcus E.; van Kogelenberg, Margriet; Kolb-Kokocinski, Anja; Kooner, Jaspal S.; Lachance, Genevieve; Langenberg, Claudia; Langford, Cordelia; Lawson, Daniel; Lee, Irene; van Leeuwen, Elisabeth M.; Lek, Monkol; Li, Rui; Li, Yingrui; Liang, Jieqin; Lin, Hong; Liu, Ryan; Lönnqvist, Jouko; Lopes, Luis R.; Lopes, Margarida; Luan, Jian'an; MacArthur, Daniel G.; Mangino, Massimo; Marenne, Gaëlle; März, Winfried; Maslen, John; Matchan, Angela; Mathieson, Iain; McGuffin, Peter; McIntosh, Andrew M.; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Migone, Nicola; Mitchison, Hannah M.; Moayyeri, Alireza; Morris, James; Morris, Richard; Muddyman, Dawn; Muntoni, Francesco; Nordestgaard, Børge G.; Northstone, Kate; O'Donovan, Michael C.; O'Rahilly, Stephen; Onoufriadis, Alexandros; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Payne, Stewart J.; Perry, John R. B.; Pietilainen, Olli; Plagnol, Vincent; Pollitt, Rebecca C.; Povey, Sue; Quail, Michael A.; Quaye, Lydia; Raymond, Lucy; Rehnström, Karola; Ridout, Cheryl K.; Ring, Susan; Ritchie, Graham R. S.; Roberts, Nicola; Robinson, Rachel L.; Savage, David B.; Scambler, Peter; Schiffels, Stephan; Schmidts, Miriam; Schoenmakers, Nadia; Scott, Richard H.; Scott, Robert A.; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shaw, Adam; Shihab, Hashem A.; Shin, So-Youn; Skuse, David; Small, Kerrin S.; Smee, Carol; Smith, George Davey; Southam, Lorraine; Spasic-Boskovic, Olivera; Spector, Timothy D.; St Clair, David; St Pourcain, Beate; Stalker, Jim; Stevens, Elizabeth; Sun, Jianping; Surdulescu, Gabriela; Suvisaari, Jaana; Syrris, Petros; Tachmazidou, Ioanna; Taylor, Rohan; Tian, Jing; Tobin, Martin D.; Toniolo, Daniela; Traglia, Michela; Tybjaerg-Hansen, Anne; Valdes, Ana M.; Vandersteen, Anthony M.; Varbo, Anette; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T. R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Eleanor; Whincup, Peter; Whyte, Tamieka; Williams, Hywel J.; Williamson, Kathleen A.; Wilson, Crispian; Wilson, Scott G.; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zaza, Gianluigi; Zeggini, Eleftheria; Zhang, Feng; Zhang, Pingbo; Zhang, Weihua; Gambaro, Giovanni; Richards, J. Brent; Durbin, Richard; Timpson, Nicholas J.; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants. PMID:26368830

  1. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    PubMed

    Shara, Nawar; Yassin, Sayf A; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V; Wang, Wenyu; Lee, Elisa T; Umans, Jason G

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991), 2 (1993-1995), and 3 (1998-1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  2. Imputation of Continuous Tree Suitability over the Continental United States from Sparse Measurements Using Associative Clustering

    NASA Astrophysics Data System (ADS)

    Hargrove, W. W.; Kumar, J.; Hoffman, F. M.; Potter, K. M.; Mills, R. T.

    2012-12-01

    Up-scaling from sparse measurements to a continuous raster of estimated values is a common problem in Earth System Science. We present a new general-purpose empirical imputation method based on associative clustering, which associates sparse measurements of dependent variables with particular multivariate clustered combinations of the independent variables, and then uses several methods to estimate values for unmeasured clusters, based on directional proximity in multidimensional data space, at both the cluster and map cell levels of resolution. We demonstrate this new imputation tool on tree species range distribution maps, which describe the suitable extent and expected growth performance of a particular tree species over a wide area. Range maps having continuous estimates of tree growth performance are more useful than more classical tree range maps that simply show binary occurence suitability. The USDA Forest Service Forest Inventory Assessment (FIA) plots provide information about the occurence and growth performance for various tree species across the US, but such measurements are limited to FIA plots. Using Associative Clustering, we scale up the discontinuous FIA Inventory growth measurements into continuous maps that show the expected growth and suitabilty for individual tree species covering the Continental United States. A multivariate cluster analysis was applied to global output from a General Circulation Model (GCM) consisting of 17 variables downscaled to 4km2 resolution. Present global growing conditions were divided into 30 thousand relatively homogeneous ecoregions describing climatic and topographic conditions. At every mapcell a multi-linear regression was applied in 17 dimensional hyperspace to derive the suitability of a tree species where not measured using the forest inventory data. The continuous species distribution maps obtained were compared and validated against existing tree range suitability maps. Associative Clustering is intended

  3. Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion

    PubMed Central

    Žitnik, Marinka; Zupan, Blaž

    2015-01-01

    Abstract Epistatic miniarray profile (E-MAP) is a popular large-scale genetic interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions with greater precision. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, in this way completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to missing data values. We introduce a new interaction data imputation method called network-guided matrix completion (NG-MC). The core part of NG-MC is low-rank probabilistic matrix completion that incorporates prior knowledge presented as a collection of gene networks. NG-MC assumes that interactions are transitive, such that latent gene interaction profiles inferred by NG-MC depend on the profiles of their direct neighbors in gene networks. As the NG-MC inference algorithm progresses, it propagates latent interaction profiles through each of the networks and updates gene network weights toward improved prediction. In a study with four different E-MAP data assays and considered protein–protein interaction and gene ontology similarity networks, NG-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allowed NG-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches. PMID:25658751

  4. Multiple imputation methods for multivariate one-sided tests with missing data.

    PubMed

    Wang, Tao; Wu, Lang

    2011-12-01

    Multivariate one-sided hypotheses testing problems arise frequently in practice. Various tests have been developed. In practice, there are often missing values in multivariate data. In this case, standard testing procedures based on complete data may not be applicable or may perform poorly if the missing data are discarded. In this article, we propose several multiple imputation methods for multivariate one-sided testing problem with missing data. Some theoretical results are presented. The proposed methods are evaluated using simulations. A real data example is presented to illustrate the methods.

  5. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification

  6. The operating regimes and basic control principles of SNPS Topaz''. [Cs

    SciTech Connect

    Makarov, A.N.; Volberg, M.S.; Grayznov, G.M.; Zhabotinsky, E.E.; Serbin, V.I. )

    1991-01-05

    The basic operating regimes of space nuclear power system (SNPS) Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

  7. Filter apparatus

    DOEpatents

    Kuban, Daniel P.; Singletary, B. Huston; Evans, John H.

    1984-01-01

    A plurality of holding tubes are respectively mounted in apertures in a partition plate fixed in a housing receiving gas contaminated with particulate material. A filter cartridge is removably held in each holding tube, and the cartridges and holding tubes are arranged so that gas passes through apertures therein and across the partition plate while particulate material is collected in the cartridges. Replacement filter cartridges are respectively held in holding canisters mounted on a support plate which can be secured to the aforesaid housing, and screws mounted on said canisters are arranged to push replacement cartridges into the cartridge holding tubes and thereby eject used cartridges therefrom.

  8. Filter apparatus

    DOEpatents

    Kuban, D.P.; Singletary, B.H.; Evans, J.H.

    A plurality of holding tubes are respectively mounted in apertures in a partition plate fixed in a housing receiving gas contaminated with particulate material. A filter cartridge is removably held in each holding tube, and the cartridges and holding tubes are arranged so that gas passes through apertures therein and across the the partition plate while particulate material is collected in the cartridges. Replacement filter cartridges are respectively held in holding canisters mounted on a support plate which can be secured to the aforesaid housing, and screws mounted on said canisters are arranged to push replacement cartridges into the cartridge holding tubes and thereby eject used cartridges therefrom.

  9. Sigma Filter

    NASA Technical Reports Server (NTRS)

    Balgovind, R. C.

    1985-01-01

    The GLA Fourth-Order model is needed to smooth the topography. This is to remove the Gibbs phenomenon. The Gibbs phenomenon occurs whenever we truncate a Fourier Series. The Sigma factors were introduced to reduce the Gibbs phenomenon. It is found that the smooth Fourier series is nothing but the original Fourier series with its coefficients multiplied by corresponding sigma factors. This operator can be applied many times to obtain high order sigma filtered field and is easily applicable using FFT. It is found that this filter is beneficial in deriving the topography.

  10. Water Filters

    NASA Technical Reports Server (NTRS)

    1988-01-01

    Seeking to find a more effective method of filtering potable water that was highly contaminated, Mike Pedersen, founder of Western Water International, learned that NASA had conducted extensive research in methods of purifying water on board manned spacecraft. The key is Aquaspace Compound, a proprietary WWI formula that scientifically blends various types of glandular activated charcoal with other active and inert ingredients. Aquaspace systems remove some substances; chlorine, by atomic adsorption, other types of organic chemicals by mechanical filtration and still others by catalytic reaction. Aquaspace filters are finding wide acceptance in industrial, commercial, residential and recreational applications in the U.S. and abroad.

  11. Using full-cohort data in nested case-control and case-cohort studies by multiple imputation.

    PubMed

    Keogh, Ruth H; White, Ian R

    2013-10-15

    In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure-disease association studies are therefore often based on nested case-control or case-cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case-control or case-cohort study plus the remainder of the cohort as a full-cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub-studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full-cohort information in the analysis of nested case-control and case-cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter-matching in nested case-control studies and a weighted analysis for case-cohort studies, both of which use some full-cohort information. Approximate imputation models perform well except when there are interactions or non-linear terms in the outcome model, where imputation using rejection sampling works well.

  12. Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE.

    PubMed

    Jolani, Shahab; Debray, Thomas P A; Koffijberg, Hendrik; van Buuren, Stef; Moons, Karel G M

    2015-05-20

    Individual participant data meta-analyses (IPD-MA) are increasingly used for developing and validating multivariable (diagnostic or prognostic) risk prediction models. Unfortunately, some predictors or even outcomes may not have been measured in each study and are thus systematically missing in some individual studies of the IPD-MA. As a consequence, it is no longer possible to evaluate between-study heterogeneity and to estimate study-specific predictor effects, or to include all individual studies, which severely hampers the development and validation of prediction models. Here, we describe a novel approach for imputing systematically missing data and adopt a generalized linear mixed model to allow for between-study heterogeneity. This approach can be viewed as an extension of Resche-Rigon's method (Stat Med 2013), relaxing their assumptions regarding variance components and allowing imputation of linear and nonlinear predictors. We illustrate our approach using a case study with IPD-MA of 13 studies to develop and validate a diagnostic prediction model for the presence of deep venous thrombosis. We compare the results after applying four methods for dealing with systematically missing predictors in one or more individual studies: complete case analysis where studies with systematically missing predictors are removed, traditional multiple imputation ignoring heterogeneity across studies, stratified multiple imputation accounting for heterogeneity in predictor prevalence, and multilevel multiple imputation (MLMI) fully accounting for between-study heterogeneity. We conclude that MLMI may substantially improve the estimation of between-study heterogeneity parameters and allow for imputation of systematically missing predictors in IPD-MA aimed at the development and validation of prediction models.

  13. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    ERIC Educational Resources Information Center

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  14. Impact of SNPs on Protein Phosphorylation Status in Rice (Oryza sativa L.).

    PubMed

    Lin, Shoukai; Chen, Lijuan; Tao, Huan; Huang, Jian; Xu, Chaoqun; Li, Lin; Ma, Shiwei; Tian, Tian; Liu, Wei; Xue, Lichun; Ai, Yufang; He, Huaqin

    2016-11-11

    Single nucleotide polymorphisms (SNPs) are widely used in functional genomics and genetics research work. The high-quality sequence of rice genome has provided a genome-wide SNP and proteome resource. However, the impact of SNPs on protein phosphorylation status in rice is not fully understood. In this paper, we firstly updated rice SNP resource based on the new rice genome Ver. 7.0, then systematically analyzed the potential impact of Non-synonymous SNPs (nsSNPs) on the protein phosphorylation status. There were 3,897,312 SNPs in Ver. 7.0 rice genome, among which 9.9% was nsSNPs. Whilst, a total 2,508,261 phosphorylated sites were predicted in rice proteome. Interestingly, we observed that 150,197 (39.1%) nsSNPs could influence protein phosphorylation status, among which 52.2% might induce changes of protein kinase (PK) types for adjacent phosphorylation sites. We constructed a database, SNP_rice, to deposit the updated rice SNP resource and phosSNPs information. It was freely available to academic researchers at http://bioinformatics.fafu.edu.cn. As a case study, we detected five nsSNPs that potentially influenced heterotrimeric G proteins phosphorylation status in rice, indicating that genetic polymorphisms showed impact on the signal transduction by influencing the phosphorylation status of heterotrimeric G proteins. The results in this work could be a useful resource for future experimental identification and provide interesting information for better rice breeding.

  15. Impact of SNPs on Protein Phosphorylation Status in Rice (Oryza sativa L.)

    PubMed Central

    Lin, Shoukai; Chen, Lijuan; Tao, Huan; Huang, Jian; Xu, Chaoqun; Li, Lin; Ma, Shiwei; Tian, Tian; Liu, Wei; Xue, Lichun; Ai, Yufang; He, Huaqin

    2016-01-01

    Single nucleotide polymorphisms (SNPs) are widely used in functional genomics and genetics research work. The high-quality sequence of rice genome has provided a genome-wide SNP and proteome resource. However, the impact of SNPs on protein phosphorylation status in rice is not fully understood. In this paper, we firstly updated rice SNP resource based on the new rice genome Ver. 7.0, then systematically analyzed the potential impact of Non-synonymous SNPs (nsSNPs) on the protein phosphorylation status. There were 3,897,312 SNPs in Ver. 7.0 rice genome, among which 9.9% was nsSNPs. Whilst, a total 2,508,261 phosphorylated sites were predicted in rice proteome. Interestingly, we observed that 150,197 (39.1%) nsSNPs could influence protein phosphorylation status, among which 52.2% might induce changes of protein kinase (PK) types for adjacent phosphorylation sites. We constructed a database, SNP_rice, to deposit the updated rice SNP resource and phosSNPs information. It was freely available to academic researchers at http://bioinformatics.fafu.edu.cn. As a case study, we detected five nsSNPs that potentially influenced heterotrimeric G proteins phosphorylation status in rice, indicating that genetic polymorphisms showed impact on the signal transduction by influencing the phosphorylation status of heterotrimeric G proteins. The results in this work could be a useful resource for future experimental identification and provide interesting information for better rice breeding. PMID:27845739

  16. Use of Multiple Imputation to Correct for Bias in Lung Cancer Incidence Trends by Histologic Subtype

    PubMed Central

    Yu, Mandi; Feuer, Eric J.; Cronin, Kathleen A.; Caporaso, Neil E.

    2014-01-01

    Background Over the past several decades, advances in lung cancer research and practice have led to refinements of histological diagnosis of lung cancer. The differential use and subsequent alterations of non-specific morphology codes, however, may have caused artifactual fluctuations in the incidence rates for histologic subtypes, thus biasing temporal trends. Methods We developed a multiple imputation (MI) method to correct lung cancer incidence for non-specific histology using data from the Surveillance, Epidemiology, and End Results (SEER) Program during 1975–2010. Results For adenocarcinoma in men and squamous in both genders, the change to a increasing trend around 2005, after more than ten years of decreasing incidence, is apparently an artifact of the changes in histopathology practice and coding system. After imputation, the rates remained decreasing for adenocarcinoma and squamous in men, and became constant for squamous in women. Conclusions As molecular features of distinct histologies are increasingly identified by new technologies, accurate histological distinctions are becoming increasingly relevant to more effective 'targeted' therapies, and therefore, are important to track in patients. However, without incorporating the coding changes, the incidence trends estimated for histologic subtypes could be misleading. Impact The MI approach provides a valuable tool for bridging the different histology definitions, thus permitting meaningful inferences about the long-term trends of lung cancer by histological subtype. PMID:24855099

  17. Comparing multiple imputation methods for systematically missing subject-level data.

    PubMed

    Kline, David; Andridge, Rebecca; Kaizar, Eloise

    2015-12-17

    When conducting research synthesis, the collection of studies that will be combined often do not measure the same set of variables, which creates missing data. When the studies to combine are longitudinal, missing data can occur on the observation-level (time-varying) or the subject-level (non-time-varying). Traditionally, the focus of missing data methods for longitudinal data has been on missing observation-level variables. In this paper, we focus on missing subject-level variables and compare two multiple imputation approaches: a joint modeling approach and a sequential conditional modeling approach. We find the joint modeling approach to be preferable to the sequential conditional approach, except when the covariance structure of the repeated outcome for each individual has homogenous variance and exchangeable correlation. Specifically, the regression coefficient estimates from an analysis incorporating imputed values based on the sequential conditional method are attenuated and less efficient than those from the joint method. Remarkably, the estimates from the sequential conditional method are often less efficient than a complete case analysis, which, in the context of research synthesis, implies that we lose efficiency by combining studies. Copyright © 2015 John Wiley & Sons, Ltd.

  18. Notch filter

    NASA Technical Reports Server (NTRS)

    Shelton, G. B. (Inventor)

    1977-01-01

    A notch filter for the selective attenuation of a narrow band of frequencies out of a larger band was developed. A helical resonator is connected to an input circuit and an output circuit through discrete and equal capacitors, and a resistor is connected between the input and the output circuits.

  19. Gender Imputation

    ERIC Educational Resources Information Center

    National Student Clearinghouse, 2013

    2013-01-01

    In late 2007, the National Student Clearinghouse (NSC) expanded its Enrollment Reporting service to include several additional data elements (commonly referred to as the "A2" or "expanded" data elements). One of these expanded data elements is student gender. Although gender is potentially important to a number of research…

  20. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    PubMed

    Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF <5%) and rare variants (<1%)) can enhance previously identified associations and identify novel loci, we selected 93 quantitative circulating factors where data was available from the InCHIANTI population study. These phenotypes included cytokines, binding proteins, hormones, vitamins and ions. We selected these phenotypes because many have known strong genetic associations and are potentially important to help understand disease processes. We performed a genome-wide scan for these 93 phenotypes in InCHIANTI. We identified 21 signals and 33 signals that reached P<5×10(-8) based on HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P<5×10(-11) respectively. Imputation of 1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P<5×10(-8) in both analyses (17 of which represent well replicated signals in the NHGRI catalogue), six were captured by the same index SNP, five were nominally more strongly associated in 1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12)). Our data provide important proof of principle

  1. Plasmonic filters.

    SciTech Connect

    Passmore, Brandon Scott; Shaner, Eric Arthur; Barrick, Todd A.

    2009-09-01

    Metal films perforated with subwavelength hole arrays have been show to demonstrate an effect known as Extraordinary Transmission (EOT). In EOT devices, optical transmission passbands arise that can have up to 90% transmission and a bandwidth that is only a few percent of the designed center wavelength. By placing a tunable dielectric in proximity to the EOT mesh, one can tune the center frequency of the passband. We have demonstrated over 1 micron of passive tuning in structures designed for an 11 micron center wavelength. If a suitable midwave (3-5 micron) tunable dielectric (perhaps BaTiO{sub 3}) were integrated with an EOT mesh designed for midwave operation, it is possible that a fast, voltage tunable, low temperature filter solution could be demonstrated with a several hundred nanometer passband. Such an element could, for example, replace certain components in a filter wheel solution.

  2. Water Filter

    NASA Technical Reports Server (NTRS)

    1982-01-01

    A compact, lightweight electrolytic water sterilizer available through Ambassador Marketing, generates silver ions in concentrations of 50 to 100 parts per billion in water flow system. The silver ions serve as an effective bactericide/deodorizer. Tap water passes through filtering element of silver that has been chemically plated onto activated carbon. The silver inhibits bacterial growth and the activated carbon removes objectionable tastes and odors caused by addition of chlorine and other chemicals in municipal water supply. The three models available are a kitchen unit, a "Tourister" unit for portable use while traveling and a refrigerator unit that attaches to the ice cube water line. A filter will treat 5,000 to 10,000 gallons of water.

  3. Mining ESTs to determine the usefulness of SNPs across shrimp species.

    PubMed

    Gorbach, Danielle M; Hu, Zhi-Liang; Du, Zhi-Qiang; Rothschild, Max F

    2010-04-01

    Expressed sequence tag (EST) libraries from members of the Penaeidae family and brine shrimp (Artemia franciscana) are currently the primary source of sequence data for shrimp species. Penaeid shrimp are the most commonly farmed worldwide, but selection methods for improving shrimp are limited. A better understanding of shrimp genomics is needed for farmers to use genetic markers to select the best breeding animals. The ESTs from Litopenaeus vannamei have been previously mined for single nucleotide polymorphisms (SNPs). This present study took publicly available ESTs from nine shrimp species, excluding L. vannamei, clustered them with CAP3, predicted SNPs within them using SNPidentifier, and then analyzed whether the SNPs were intra- or interspecies. Major goals of the project were to predict SNPs that may distinguish shrimp species, locate SNPs that may segregate in multiple species, and determine the genetic similarities between L. vannamei and the other shrimp species based on their EST sequences. Overall, 4,597 SNPs were predicted from 4,600 contigs with 703 of them being interspecies SNPs, 735 of them possibly predicting species' differences, and 18 of them appearing to segregate in multiple species. While sequences appear relatively well conserved, SNPs do not appear to be well conserved across shrimp species.

  4. Association of CD4 SNPs with fat percentage of Holstein cattle.

    PubMed

    Usman, T; Yu, Y; Zhai, L; Liu, C; Wang, X; Wang, Y

    2016-09-16

    Cluster of differentiation 4 gene (CD4) is well known for its role in immunity, but its effects on production traits remain to be elucidated. The present study was designed to explore single nucleotide polymorphisms (SNPs) in the exons, flanking introns, and promoter of CD4, as well as to analyze their effects on milk production traits (percentage of protein, fat, and lactose; mastitis indicator traits somatic cell count; and somatic cell score). A total of 10 SNPs, including eight in the exon and two in the intron regions, were identified using pooled DNA sequencing. These SNPs were screened in a population of 258 Chinese Holstein using the SNaPshot technique. We analyzed the effects of SNPs, parity, herd, year, and season of calving on the production and mastitis indicator traits. Our analysis revealed two haplotypes and strong linkage disequilibrium (D' > 0.97) among all SNPs. All 10 SNPs were significantly associated with fat percentage (P < 0.01). Cows homozygous for the wild-type genotypes had higher fat percentages than those with the other genotypes. The dominant and additive effects were also significant for fat percentage (P < 0.05). These results suggest that CD4 plays a role in production traits as well as in immune function. The identified SNPs could be used as genetic markers for selection of dairy cows with improved fat percentage. We propose further studies of these SNPs in a larger population as well as further investigations of the function of this gene.

  5. Collaborative development of SNPs for cotton research, introgression, MAS and breeding

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Extensive use of genome-wide analyses requires that molecular markers be highly abundant, informative and, once developed, extremely cost-effective to use, such as single-nucleotide polymorphisms (SNPs). The efforts toward development of cotton SNPs have been few and small-scale. The novel cotton ...

  6. Eyeglass Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Biomedical Optical Company of America's suntiger lenses eliminate more than 99% of harmful light wavelengths. NASA derived lenses make scenes more vivid in color and also increase the wearer's visual acuity. Distant objects, even on hazy days, appear crisp and clear; mountains seem closer, glare is greatly reduced, clouds stand out. Daytime use protects the retina from bleaching in bright light, thus improving night vision. Filtering helps prevent a variety of eye disorders, in particular cataracts and age related macular degeneration.

  7. 7 CFR 3017.630 - May the Department of Agriculture impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false May the Department of Agriculture impute conduct of one person to another? 3017.630 Section 3017.630 Agriculture Regulations of the Department of Agriculture (Continued) OFFICE OF THE CHIEF FINANCIAL OFFICER, DEPARTMENT OF AGRICULTURE...

  8. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    PubMed Central

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  9. 41 CFR 105-68.630 - May the General Services Administration impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 41 Public Contracts and Property Management 3 2010-07-01 2010-07-01 false May the General Services Administration impute conduct of one person to another? 105-68.630 Section 105-68.630 Public Contracts and Property Management Federal Property Management Regulations System (Continued) GENERAL...

  10. Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...

  11. Moment Reconstruction and Moment-Adjusted Imputation When Exposure is Generated by a Complex, Nonlinear Random Effects Modeling Process

    PubMed Central

    Potgieter, Cornelis J.; Wei, Rubin; Kipnis, Victor; Freedman, Laurence S.; Carroll, Raymond J.

    2016-01-01

    Summary For the classical, homoscedastic measurement error model, moment reconstruction (Freedman et al., 2004, 2008) and moment-adjusted imputation (Thomas et al., 2011) are appealing, computationally simple imputation-like methods for general model fitting. Like classical regression calibration, the idea is to replace the unobserved variable subject to measurement error with a proxy that can be used in a variety of analyses. Moment reconstruction and moment-adjusted imputation differ from regression calibration in that they attempt to match multiple features of the latent variable, and also to match some of the latent variable’s relationships with the response and additional covariates. In this note, we consider a problem where true exposure is generated by a complex, nonlinear random effects modeling process, and develop analogues of moment reconstruction and moment-adjusted imputation for this case. This general model includes classical measurement errors, Berkson measurement errors, mixtures of Berkson and classical errors and problems that are not measurement error problems, but also cases where the data generating process for true exposure is a complex, nonlinear random effects modeling process. The methods are illustrated using the National Institutes of Health-AARP Diet and Health Study where the latent variable is a dietary pattern score called the Healthy Eating Index - 2005. We also show how our general model includes methods used in radiation epidemiology as a special case. Simulations are used to illustrate the methods. PMID:27061196

  12. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...

  13. Thermal state of SNPS Topaz'' units: Calculation basing and experimental confirmation

    SciTech Connect

    Bogush, I.P.; Bushinsky, A.V.; Galkin, A.Y.; Serbin, V.I.; Zhabotinsky, E.E. )

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS Topaz''.

  14. SNPselector: a web tool for selecting SNPs for genetic association studies

    PubMed Central

    Xu, Hong; Gregory, Simon G.; Hauser, Elizabeth R.; Stenger, Judith E.; Pericak Vance, Margaret A.; Vance, Jeffery M.; Züchner, Stephan; Hauser, Michael A.

    2005-01-01

    Summary: Single nucleotide polymorphisms (SNPs) are commonly used for association studies to find genes responsible for complex genetic diseases. With the recent advance of SNP technology, researchers are able to assay thousands of SNPs in a single experiment. But the process of manually choosing thousands of genotyping SNPs for tens or hundreds of genes is time consuming. We have developed a web-based program, SNPselector, to automate the process. SNPselector takes a list of gene names or a list of genomic regions as input and searches the Ensembl genes or genomic regions for available SNPs. It prioritizes these SNPs on their tagging for linkage disequilibrium, SNP allele frequencies and source, function, regulatory potential, and repeat status. SNPselector outputs result in compressed Excel spreadsheet files for review by the user. Availability: SNPselector is freely available at http://primer.duhs.duke.edu/ Contact: hong.xu@duke.edu, mike.hauser@duke.edu PMID:16179360

  15. Inferring Alcoholism SNPs and Regulatory Chemical Compounds Based on Ensemble Bayesian Network.

    PubMed

    Chen, Huan; Sun, Jiatong; Jiang, Hong; Wang, Xianyue; Wu, Lingxiang; Wu, Wei; Wang, Qh

    2016-12-20

    The disturbance of consciousness is one of the most common symptoms of those have alcoholism and may cause disability and mortality. Previous studies indicated that several single nucleotide polymorphisms (SNP) increase the susceptibility of alcoholism. In this study, we utilized the Ensemble Bayesian Network (EBN) method to identify causal SNPs of alcoholism based on the verified GAW14 data. Thirteen out of eighteen SNPs directly connected with alcoholism were found concordance with potential risk regions of alcoholism in OMIM database. As a number of SNPs were found contributing to alteration on gene expression, known as expression quantitative trait loci (eQTLs), we further sought to identify chemical compounds acting as regulators of alcoholism genes captured by causal SNPs. Chloroprene and valproic acid were identified as the expression regulators for genes C11orf66 and SALL3 which were captured by alcoholism SNPs, respectively.

  16. Double Sampling with Multiple Imputation to Answer Large Sample Meta-Research Questions: Introduction and Illustration by Evaluating Adherence to Two Simple CONSORT Guidelines

    PubMed Central

    Capers, Patrice L.; Brown, Andrew W.; Dawson, John A.; Allison, David B.

    2015-01-01

    Background: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing) has improved feasibility of large meta-research questions, but possibly at the cost of accuracy. Objective: To evaluate the use of double sampling combined with multiple imputation (DS + MI) to address meta-research questions, using as an example adherence of PubMed entries to two simple consolidated standards of reporting trials guidelines for titles and abstracts. Methods: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT, human, abstract available, and English language (n = 322, 107). For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI) method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO) human rating method. Multiple imputation of the missing-completely at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication. Results: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title = 1.00, abstract = 0.92). Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS + MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by year: subsample RHITLO 1.050–1.174 vs. DS + MI 1.082–1.151). As evidence of improved accuracy, DS + MI coefficient estimates were closer to RHITLO than the large sample RLOTHI. Conclusion: Our results support our hypothesis that DS + MI would result in improved precision and accuracy. This method is flexible and may provide a practical

  17. Genotyping of 75 SNPs using arrays for individual identification in five population groups.

    PubMed

    Hwa, Hsiao-Lin; Wu, Lawrence Shih Hsin; Lin, Chun-Yen; Huang, Tsun-Ying; Yin, Hsiang-I; Tseng, Li-Hui; Lee, James Chun-I

    2016-01-01

    Single nucleotide polymorphism (SNP) typing offers promise to forensic genetics. Various strategies and panels for analyzing SNP markers for individual identification have been published. However, the best panels with fewer identity SNPs for all major population groups are still under discussion. This study aimed to find more autosomal SNPs with high heterozygosity for individual identification among Asian populations. Ninety-six autosomal SNPs of 502 DNA samples from unrelated individuals of five population groups (208 Taiwanese Han, 83 Filipinos, 62 Thais, 69 Indonesians, and 80 individuals with European, Near Eastern, or South Asian ancestry) were analyzed using arrays in an initial screening, and 75 SNPs (group A, 46 newly selected SNPs; groups B, 29 SNPs based on a previous SNP panel) were selected for further statistical analyses. Some SNPs with high heterozygosity from Asian populations were identified. The combined random match probability of the best 40 and 45 SNPs was between 3.16 × 10(-17) and 7.75 × 10(-17) and between 2.33 × 10(-19) and 7.00 × 10(-19), respectively, in all five populations. These loci offer comparable power to short tandem repeats (STRs) for routine forensic profiling. In this study, we demonstrated the population genetic characteristics and forensic parameters of 75 SNPs with high heterozygosity from five population groups. This SNPs panel can provide valuable genotypic information and can be helpful in forensic casework for individual identification among these populations.

  18. CRYSTAL FILTER TEST SET

    DTIC Science & Technology

    CRYSTAL FILTERS, *HIGH FREQUENCY, *RADIOFREQUENCY FILTERS, AMPLIFIERS, ELECTRIC POTENTIAL, FREQUENCY, IMPEDANCE MATCHING , INSTRUMENTATION, RADIOFREQUENCY, RADIOFREQUENCY AMPLIFIERS, TEST EQUIPMENT, TEST METHODS

  19. Moment Adjusted Imputation for Multivariate Measurement Error Data with Applications to Logistic Regression

    PubMed Central

    Thomas, Laine; Stefanski, Leonard A.; Davidian, Marie

    2013-01-01

    In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mismeasured data will differ from the corresponding analysis based on the “true” covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeo between convenience and performance. Moment Adjusted Imputation (MAI) is method for measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuastions, inducing correlated multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well. PMID:24072947

  20. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation.

    PubMed

    Soler Artigas, María; Wain, Louise V; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R; Grallert, Harald; Hammond, Chris J; Harris, Sarah E; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W; Navarro, Pau; Nickle, David C; Padmanabhan, Sandosh; Raitakari, Olli T; Ried, Janina S; Ripatti, Samuli; Schulz, Holger; Scott, Robert A; Sin, Don D; Starr, John M; Viñuela, Ana; Völzke, Henry; Wild, Sarah H; Wright, Alan F; Zemunik, Tatijana; Jarvis, Deborah L; Spector, Tim D; Evans, David M; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J; Karrasch, Stefan; Probst-Hensch, Nicole M; Heinrich, Joachim; Stubbe, Beate; Wilson, James F; Wareham, Nicholas J; James, Alan L; Morris, Andrew P; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P; Hall, Ian P; Tobin, Martin D

    2015-12-04

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered.

  1. Assessing assay agreement estimation for multiple left-censored data: a multiple imputation approach.

    PubMed

    Lapidus, Nathanael; Chevret, Sylvie; Resche-Rigon, Matthieu

    2014-12-30

    Agreement between two assays is usually based on the concordance correlation coefficient (CCC), estimated from the means, standard deviations, and correlation coefficient of these assays. However, such data will often suffer from left-censoring because of lower limits of detection of these assays. To handle such data, we propose to extend a multiple imputation approach by chained equations (MICE) developed in a close setting of one left-censored assay. The performance of this two-step approach is compared with that of a previously published maximum likelihood estimation through a simulation study. Results show close estimates of the CCC by both methods, although the coverage is improved by our MICE proposal. An application to cytomegalovirus quantification data is provided.

  2. Missing data analysis using multiple imputation: getting to the heart of the matter.

    PubMed

    He, Yulei

    2010-01-01

    Missing data are a pervasive problem in health investigations. We describe some background of missing data analysis and criticize ad hoc methods that are prone to serious problems. We then focus on multiple imputation, in which missing cases are first filled in by several sets of plausible values to create multiple completed datasets, then standard complete-data procedures are applied to each completed dataset, and finally the multiple sets of results are combined to yield a single inference. We introduce the basic concepts and general methodology and provide some guidance for application. For illustration, we use a study assessing the effect of cardiovascular diseases on hospice discussion for late stage lung cancer patients.

  3. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    PubMed Central

    Artigas, María Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria; Viñuela, Ana; Völzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10−8) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  4. Impute DC link (IDCL) cell based power converters and control thereof

    DOEpatents

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  5. Filtered or Unfiltered?

    ERIC Educational Resources Information Center

    Curry, Ann; Haycock, Ken

    2001-01-01

    Discusses results of a survey questionnaire of public and school libraries that investigated the use of Internet filtering software. Considers filter alternatives; reasons for filtering or not filtering; brand names; satisfaction with site blocking; satisfaction with the decision to install filter software; and guidelines for considering filters.…

  6. Single nucleotide polymorphisms (SNPs) are highly conserved in rhesus (Macaca mulatta) and cynomolgus (Macaca fascicularis) macaques

    PubMed Central

    Street, Summer L; Kyes, Randall C; Grant, Richard; Ferguson, Betsy

    2007-01-01

    Background Macaca fascicularis (cynomolgus or longtail macaques) is the most commonly used non-human primate in biomedical research. Little is known about the genomic variation in cynomolgus macaques or how the sequence variants compare to those of the well-studied related species, Macaca mulatta (rhesus macaque). Previously we identified single nucleotide polymorphisms (SNPs) in portions of 94 rhesus macaque genes and reported that Indian and Chinese rhesus had largely different SNPs. Here we identify SNPs from some of the same genomic regions of cynomolgus macaques (from Indochina, Indonesia, Mauritius and the Philippines) and compare them to the SNPs found in rhesus. Results We sequenced a portion of 10 genes in 20 cynomolgus macaques. We identified 69 SNPs in these regions, compared with 71 SNPs found in the same genomic regions of 20 Indian and Chinese rhesus macaques. Thirty six (52%) of the M. fascicularis SNPs were overlapping in both species. The majority (70%) of the SNPs found in both Chinese and Indian rhesus macaque populations were also present in M. fascicularis. Of the SNPs previously found in a single rhesus population, 38% (Indian) and 44% (Chinese) were also identified in cynomolgus macaques. In an alternative approach, we genotyped 100 cynomolgus DNAs using a rhesus macaque SNP array representing 53 genes and found that 51% (29/57) of the rhesus SNPs were present in M. fascicularis. Comparisons of SNP profiles from cynomolgus macaques imported from breeding centers in China (where M. fascicularis are not native) showed they were similar to those from Indochina. Conclusion This study demonstrates a surprisingly high conservation of SNPs between M. fascicularis and M. mulatta, suggesting that the relationship of these two species is closer than that suggested by morphological and mitochondrial DNA analysis alone. These findings indicate that SNP discovery efforts in either species will generate useful resources for both macaque species

  7. Analysis of partially observed clustered data using generalized estimating equations and multiple imputation.

    PubMed

    Aloisio, Kathryn M; Swanson, Sonja A; Micali, Nadia; Field, Alison; Horton, Nicholas J

    2014-10-01

    Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject's symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non-monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991-92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness.

  8. On imputing function to structure from the behavioural effects of brain lesions.

    PubMed Central

    Young, M P; Hilgetag, C C; Scannell, J W

    2000-01-01

    What is the link, if any, between the patterns of connections in the brain and the behavioural effects of localized brain lesions? We explored this question in four related ways. First, we investigated the distribution of activity decrements that followed simulated damage to elements of the thalamocortical network, using integrative mechanisms that have recently been used to successfully relate connection data to information on the spread of activation, and to account simultaneously for a variety of lesion effects. Second, we examined the consequences of the patterns of decrement seen in the simulation for each type of inference that has been employed to impute function to structure on the basis of the effects of brain lesions. Every variety of conventional inference, including double dissociation, readily misattributed function to structure. Third, we tried to derive a more reliable framework of inference for imputing function to structure, by clarifying concepts of function, and exploring a more formal framework, in which knowledge of connectivity is necessary but insufficient, based on concepts capable of mathematical specification. Fourth, we applied this framework to inferences about function relating to a simple network that reproduces intact, lesioned and paradoxically restored orientating behaviour. Lesion effects could be used to recover detailed and reliable information on which structures contributed to particular functions in this simple network. Finally, we explored how the effects of brain lesions and this formal approach could be used in conjunction with information from multiple neuroscience methodologies to develop a practical and reliable approach to inferring the functional roles of brain structures. PMID:10703050

  9. Handling missing data in matched case-control studies using multiple imputation.

    PubMed

    Seaman, Shaun R; Keogh, Ruth H

    2015-12-01

    Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data.

  10. Ceramic filters

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1995-12-31

    Filters were formed from ceramic fibers, organic fibers, and a ceramic bond phase using a papermaking technique. The distribution of particulate ceramic bond phase was determined using a model silicon carbide system. As the ceramic fiber increased in length and diameter the distance between particles decreased. The calculated number of particles per area showed good agreement with the observed value. After firing, the papers were characterized using a biaxial load test. The strength of papers was proportional to the amount of bond phase included in the paper. All samples exhibited strain-tolerant behavior.

  11. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    SciTech Connect

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.; Loots, Gabriela G.; Houston, Kathryn A.; Dubchak, Inna; Speed, Terence P.; Rubin, Edward M.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs in gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.

  12. Rocket noise filtering system using digital filters

    NASA Technical Reports Server (NTRS)

    Mauritzen, David

    1990-01-01

    A set of digital filters is designed to filter rocket noise to various bandwidths. The filters are designed to have constant group delay and are implemented in software on a general purpose computer. The Parks-McClellan algorithm is used. Preliminary tests are performed to verify the design and implementation. An analog filter which was previously employed is also simulated.

  13. 22 CFR 208.630 - May the U.S. Agency for International Development impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... benefits derived from the conduct is evidence of knowledge, approval or acquiescence. (b) Conduct imputed... responsible for the improper conduct. Acceptance of the benefits derived from the conduct is evidence...

  14. SNP uniqueness problem: a proof-of-principle in HapMap SNPs.

    PubMed

    Doron, Shany; Shweiki, Dorit

    2011-04-01

    SNP-based research strongly affects our biomedical and clinically associated knowledge. Nonunique and false-positive SNP existence in commonly used datasets may thus lead to biased, inaccurate clinically associated conclusions. We designed a computational study to reveal the degree of nonunique/false-positive SNPs in the HapMap dataset. Two sets of SNP flanking sequences were used as queries for BLAT analysis against the human genome. 4.2% and 11.9% of HapMap SNPs align to the genome nonuniquely (long and short, respectively). Furthermore, an average of 7.9% nonunique SNPs are included in common commercial genotyping arrays (according to our designed probes). Nonunique SNPs identified in this study are represented to various degrees in clinically associated databases, stressing the consequence of inaccurate SNP annotation and hence SNP utilization. Unfortunately, our results question some disease-related genotyping analyses, raising a worrisome concern on their validity.

  15. Identification of common carp (Cyprinus carpio) microRNAs and microRNA-related SNPs

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) exist pervasively across viruses, plants and animals and play important roles in the post-transcriptional regulation of genes. In the common carp, miRNA targets have not been investigated. In model species, single-nucleotide polymorphisms (SNPs) have been reported to impair or enhance miRNA regulation as well as to alter miRNA biogenesis. SNPs are often associated with diseases or traits. To date, no studies into the effects of SNPs on miRNA biogenesis and regulation in the common carp have been reported. Results Using homology-based prediction combined with small RNA sequencing, we have identified 113 common carp mature miRNAs, including 92 conserved miRNAs and 21 common carp specific miRNAs. The conserved miRNAs had significantly higher expression levels than the specific miRNAs. The miRNAs were clustered into three phylogenetic groups. Totally 394 potential miRNA binding sites in 206 target mRNAs were predicted for 83 miRNAs. We identified 13 SNPs in the miRNA precursors. Among them, nine SNPs had the potential to either increase or decrease the energy of the predicted secondary structures of the precursors. Further, two SNPs in the 3’ untranslated regions of target genes were predicted to either disturb or create miRNA-target interactions. Conclusions The common carp miRNAs and their target genes reported here will help further our understanding of the role of miRNAs in gene regulation. The analysis of the miRNA-related SNPs and their effects provided insights into the effects of SNPs on miRNA biogenesis and function. The resource data generated in this study will help advance the study of miRNA function and phenotype-associated miRNA identification. PMID:22908890

  16. Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer

    PubMed Central

    Suling, Anna; Neuser, Petra; Reuss, Alexander; Canzler, Ulrich; Fehm, Tanja; Luyten, Alexander; Hellriegel, Martin; Woelber, Linn; Mahner, Sven

    2016-01-01

    Propensity scoring (PS) is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring techniques is addressed in this work. For a sample of lymph node-positive vulvar cancer patients, we re-analyze associations between the application of radiotherapy and disease-related and non-related survival. Inverse-probability-of-treatment-weighting (IPTW) and PS stratification are applied after multiple imputation by chained equation (MICE). Methodological issues are described in detail. Interpretation of the results and methodological limitations are discussed. PMID:27802342

  17. A multiple imputation approach to the analysis of clustered interval-censored failure time data with the additive hazards model

    PubMed Central

    Chen, Ling; Sun, Jianguo; Xiong, Chengjie

    2016-01-01

    Clustered interval-censored failure time data can occur when the failure time of interest is collected from several clusters and known only within certain time intervals. Regression analysis of clustered interval-censored failure time data is discussed assuming that the data arise from the semiparametric additive hazards model. A multiple imputation approach is proposed for inference. A major advantage of the approach is its simplicity because it avoids estimating the correlation within clusters by implementing a resampling-based method. The presented approach can be easily implemented by using the existing software packages for right-censored failure time data. Extensive simulation studies are conducted, indicating that the proposed imputation approach performs well for practical situations. The proposed approach also performs well compared to the existing methods and can be more conveniently applied to various types of data representation. The proposed methodology is further demonstrated by applying it to a lymphatic filariasis study. PMID:27773956

  18. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

    PubMed

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

    2013-04-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.

  19. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon

    PubMed Central

    Tsai, Hsin Y.; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Bishop, Steve C.; Houston, Ross D.

    2015-01-01

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth. PMID:26703584

  20. A novel method for in silico identification of regulatory SNPs in human genome.

    PubMed

    Li, Rong; Zhong, Dexing; Liu, Ruiling; Lv, Hongqiang; Zhang, Xinman; Liu, Jun; Han, Jiuqiang

    2017-02-21

    Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/.

  1. Mining for SNPs and SSRs using SNPServer, dbSNP and SSR taxonomy tree.

    PubMed

    Batley, Jacqueline; Edwards, David

    2009-01-01

    Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequence data sets permits mining for these molecular markers, which may then be used for applications such as genetic trait mapping, diversity analysis and marker assisted selection in agriculture. Here we describe web-based automated methods for the discovery of SSRs using SSR taxonomy tree, the discovery of SNPs from sequence data using SNPServer and the identification of validated SNPs from within the dbSNP database. SSR taxonomy tree identifies pre-determined SSR amplification primers for virtually all species represented within the GenBank database. SNPServer uses a redundancy based approach to identify SNPs within DNA sequences. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms. The NCBI dbSNP database is a catalogue of molecular variation, hosting validated SNPs for several species within a public-domain archive.

  2. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon.

    PubMed

    Tsai, Hsin Y; Hamilton, Alastair; Guy, Derrick R; Tinch, Alan E; Bishop, Steve C; Houston, Ross D

    2015-12-22

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth.

  3. netview p: a network visualization tool to unravel complex population structure using genome-wide SNPs.

    PubMed

    Steinig, Eike J; Neuditschko, Markus; Khatkar, Mehar S; Raadsma, Herman W; Zenger, Kyall R

    2016-01-01

    Network-based approaches are emerging as valuable tools for the analysis of complex genetic structure in wild and captive populations. netview p combines data quality control with the construction of population networks through mutual k-nearest neighbours thresholds applied to genome-wide SNPs. The program is cross-platform compatible, open-source and efficiently operates on data ranging from hundreds to hundreds of thousands of SNPs. The pipeline was used for the analysis of pedigree data from simulated (n = 750, SNPs = 1279) and captive silver-lipped pearl oysters (n = 415, SNPs = 1107), wild populations of the European hake from the Atlantic and Mediterranean (n = 834, SNPs = 380) and grey wolves from North America (n = 239, SNPs = 78 255). The population networks effectively visualize large- and fine-scale genetic structure within and between populations, including family-level structure and relationships. netview p comprises a network-based addition to other population analysis tools and provides user-friendly access to a complex network analysis pipeline through implementation in python.

  4. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

    DOE PAGES

    Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; ...

    2015-04-09

    In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yieldedmore » the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.« less

  5. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; Brown, Joseph N.; Wang, Jing; McDermott, Jason E.; Smith, Richard D.; Rodland, Karin D.; Metz, Thomas O.; Pounds, Joel G.; Waters, Katrina M.

    2015-04-09

    In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.

  6. Multiple imputation of completely missing repeated measures data within person from a complex sample: application to accelerometer data in the National Health and Nutrition Examination Survey.

    PubMed

    Liu, Benmei; Yu, Mandi; Graubard, Barry I; Troiano, Richard P; Schenker, Nathaniel

    2016-12-10

    The Physical Activity Monitor component was introduced into the 2003-2004 National Health and Nutrition Examination Survey (NHANES) to collect objective information on physical activity including both movement intensity counts and ambulatory steps. Because of an error in the accelerometer device initialization process, the steps data were missing for all participants in several primary sampling units, typically a single county or group of contiguous counties, who had intensity count data from their accelerometers. To avoid potential bias and loss in efficiency in estimation and inference involving the steps data, we considered methods to accurately impute the missing values for steps collected in the 2003-2004 NHANES. The objective was to come up with an efficient imputation method that minimized model-based assumptions. We adopted a multiple imputation approach based on additive regression, bootstrapping and predictive mean matching methods. This method fits alternative conditional expectation (ace) models, which use an automated procedure to estimate optimal transformations for both the predictor and response variables. This paper describes the approaches used in this imputation and evaluates the methods by comparing the distributions of the original and the imputed data. A simulation study using the observed data is also conducted as part of the model diagnostics. Finally, some real data analyses are performed to compare the before and after imputation results. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  7. Gap-filling methods to impute eddy covariance flux data by preserving variance.

    NASA Astrophysics Data System (ADS)

    Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.

    2015-12-01

    To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables

  8. Genome-wide association study SNPs in the human genome diversity project populations: does selection affect unlinked SNPs with shared trait associations?

    PubMed

    Casto, Amanda M; Feldman, Marcus W

    2011-01-06

    Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS.

  9. Using multiple imputation to efficiently correct cerebral MRI whole brain lesion and atrophy data in patients with multiple sclerosis.

    PubMed

    Chua, Alicia S; Egorova, Svetlana; Anderson, Mark C; Polgar-Turcsanyi, Mariann; Chitnis, Tanuja; Weiner, Howard L; Guttmann, Charles R G; Bakshi, Rohit; Healy, Brian C

    2015-10-01

    Automated segmentation of brain MRI scans into tissue classes is commonly used for the assessment of multiple sclerosis (MS). However, manual correction of the resulting brain tissue label maps by an expert reader remains necessary in many cases. Since automated segmentation data awaiting manual correction are "missing", we proposed to use multiple imputation (MI) to fill-in the missing manually-corrected MRI data for measures of normalized whole brain volume (brain parenchymal fraction-BPF) and T2 hyperintense lesion volume (T2LV). Automated and manually corrected MRI measures from 1300 patients enrolled in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at the Brigham and Women's Hospital (CLIMB) were identified. Simulation studies were conducted to assess the performance of MI with missing data both missing completely at random and missing at random. An imputation model including the concurrent automated data as well as clinical and demographic variables explained a high proportion of the variance in the manually corrected BPF (R(2)=0.97) and T2LV (R(2)=0.89), demonstrating the potential to accurately impute the missing data. Further, our results demonstrate that MI allows for the accurate estimation of group differences with little to no bias and with similar precision compared to an analysis with no missing data. We believe that our findings provide important insights for efficient correction of automated MRI measures to obviate the need to perform manual correction on all cases.

  10. Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect

    PubMed Central

    Wan, Y.; Datta, S.; Conklin, D.J.; Kong, M.

    2015-01-01

    The statistical methods for variable selection and prediction could be challenging when missing covariates exist. Although multiple imputation (MI) is a universally accepted technique for solving missing data problem, how to combine the MI results for variable selection is not quite clear, because different imputations may result in different selections. The widely applied variable selection methods include the sparse partial least-squares (SPLS) method and the penalized least-squares method, e.g. the elastic net (ENet) method. In this paper, we propose an MI-based weighted elastic net (MI-WENet) method that is based on stacked MI data and a weighting scheme for each observation in the stacked data set. In the MI-WENet method, MI accounts for sampling and imputation uncertainty for missing values, and the weight accounts for the observed information. Extensive numerical simulations are carried out to compare the proposed MI-WENet method with the other competing alternatives, such as the SPLS and ENet. In addition, we applied the MIWENet method to examine the predictor variables for the endothelial function that can be characterized by median effective dose (ED50) and maximum effect (Emax) in an ex-vivo phenylephrine-induced extension and acetylcholine-induced relaxation experiment. PMID:26412909

  11. Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data.

    PubMed

    Gottfredson, Nisha C; Sterba, Sonya K; Jackson, Kristina M

    2017-01-01

    Random coefficient-dependent (RCD) missingness is a non-ignorable mechanism through which missing data can arise in longitudinal designs. RCD, for which we cannot test, is a problematic form of missingness that occurs if subject-specific random effects correlate with propensity for missingness or dropout. Particularly when covariate missingness is a problem, investigators typically handle missing longitudinal data by using single-level multiple imputation procedures implemented with long-format data, which ignores within-person dependency entirely, or implemented with wide-format (i.e., multivariate) data, which ignores some aspects of within-person dependency. When either of these standard approaches to handling missing longitudinal data is used, RCD missingness leads to parameter bias and incorrect inference. We explain why multilevel multiple imputation (MMI) should alleviate bias induced by a RCD missing data mechanism under conditions that contribute to stronger determinacy of random coefficients. We evaluate our hypothesis with a simulation study. Three design factors are considered: intraclass correlation (ICC; ranging from .25 to .75), number of waves (ranging from 4 to 8), and percent of missing data (ranging from 20 to 50%). We find that MMI greatly outperforms the single-level wide-format (multivariate) method for imputation under a RCD mechanism. For the MMI analyses, bias was most alleviated when the ICC is high, there were more waves of data, and when there was less missing data. Practical recommendations for handling longitudinal missing data are suggested.

  12. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    NASA Astrophysics Data System (ADS)

    Riggi, S.; Riggi, D.; Riggi, F.

    2015-04-01

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures' models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers' Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  13. NOVEL MICROWAVE FILTER DESIGN TECHNIQUES.

    DTIC Science & Technology

    ELECTROMAGNETIC WAVE FILTERS, MICROWAVE FREQUENCY, PHASE SHIFT CIRCUITS, BANDPASS FILTERS, TUNED CIRCUITS, NETWORKS, IMPEDANCE MATCHING , LOW PASS FILTERS, MULTIPLEXING, MICROWAVE EQUIPMENT, WAVEGUIDE FILTERS, WAVEGUIDE COUPLERS.

  14. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

    PubMed

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

  15. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps

    PubMed Central

    Iotchkova, Valentina; Huang, Jie; Morris, John A.; Jain, Deepti; Barbieri, Caterina; Walter, Klaudia; Min, Josine L.; Chen, Lu; Astle, William; Cocca, Massimilian; Deelen, Patrick; Elding, Heather; Farmaki, Aliki-Eleni; Franklin, Christopher S.; Franberg, Mattias; Gaunt, Tom R.; Hofman, Albert; Jiang, Tao; Kleber, Marcus E.; Lachance, Genevieve; Luan, Jian'an; Malerba, Giovanni; Matchan, Angela; Mead, Daniel; Memari, Yasin; Ntalla, Ioanna; Panoutsopoulou, Kalliope; Pazoki, Raha; Perry, John R.B.; Rivadeneira, Fernando; Sabater-Lleal, Maria; Sennblad, Bengt; Shin, So-Youn; Southam, Lorraine; Traglia, Michela; van Dijk, Freerk; van Leeuwen, Elisabeth M.; Zaza, Gianluigi; Zhang, Weihua; Amin, Najaf; Butterworth, Adam; Chambers, John C.; Dedoussis, George; Dehghan, Abbas; Franco, Oscar H.; Franke, Lude; Frontini, Mattia; Gambaro, Giovanni; Gasparini, Paolo; Hamsten, Anders; Issacs, Aaron; Kooner, Jaspal S.; Kooperberg, Charles; Langenberg, Claudia; Marz, Winfried; Scott, Robert A.; Swertz, Morris A.; Toniolo, Daniela; Uitterlinden, Andre G.; van Duijn, Cornelia M.; Watkins, Hugh; Zeggini, Eleftheria; Maurano, Mathew T.; Timpson, Nicholas J.

    2017-01-01

    Large-scale whole genome sequence datasets offer novel opportunities to identify genetic variation underlying human traits. Here we apply genotype imputation based on whole genome sequence data from the UK10K and the 1000 Genomes Projects into 35,981 study participants of European ancestry, followed by association analysis with twenty quantitative cardiometabolic and hematologic traits. We describe 17 novel associations, including six rare (minor allele frequency [MAF]<1%) or low frequency variants (1%

  16. Impact of Supported Housing on Clinical Outcomes Analysis of a Randomized Trial Using Multiple Imputation Technique

    PubMed Central

    Cheng, An-Lin; Lin, Haiqun; Kasprow, Wesley; Rosenheck, Robert A.

    2011-01-01

    In 1992, the US Department of Housing and Urban Development (HUD) and the US Department of Veterans Affairs (VA) established the HUD-VA Supported Housing (HUD-VASH) Program to provide integrated clinical and housing services to homeless veterans with psychiatric and/or substance abuse disorders at 19 sites. At four sites, 460 subjects were randomly assigned to one of the three groups: (1) HUD-VASH, with both Section 8 vouchers and intensive case management; (2) case management only; and (3) standard VA care. A previous publication found HUD-VASH resulted in superior housing outcomes but yielded no benefits on clinical outcomes. Since many participants missed prescheduled visits during the follow-up period and follow-up rates were quite different across the groups, we reanalyzed these data using multiple imputation statistical methods to account for the missing observations. Significant benefits were found for HUD-VASH in drug and alcohol abuse outcomes that had not previously been identified. PMID:17220745

  17. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values.

    PubMed

    White, Ian R; Carlin, John B

    2010-12-10

    When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete-case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO).

  18. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

    PubMed Central

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules. PMID:27199552

  19. Imputation-Based Meta-Analysis of Severe Malaria in Three African Populations

    PubMed Central

    Band, Gavin; Le, Quang Si; Jostins, Luke; Pirinen, Matti; Kivinen, Katja; Jallow, Muminatou; Sisay-Joof, Fatoumatta; Bojang, Kalifa; Pinder, Margaret; Sirugo, Giorgio; Conway, David J.; Nyirongo, Vysaul; Kachala, David; Molyneux, Malcolm; Taylor, Terrie; Ndila, Carolyne; Peshu, Norbert; Marsh, Kevin; Williams, Thomas N.; Alcock, Daniel; Andrews, Robert; Edkins, Sarah; Gray, Emma; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Schuldt, Kathrin; Clark, Taane G.; Small, Kerrin S.; Teo, Yik Ying; Kwiatkowski, Dominic P.; Rockett, Kirk A.; Barrett, Jeffrey C.; Spencer, Chris C. A.

    2013-01-01

    Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP–based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles. PMID:23717212

  20. Multiple regression based imputation for individualizing template human model from a small number of measured dimensions.

    PubMed

    Nohara, Ryuki; Endo, Yui; Murai, Akihiko; Takemura, Hiroshi; Kouchi, Makiko; Tada, Mitsunori

    2016-08-01

    Individual human models are usually created by direct 3D scanning or deforming a template model according to the measured dimensions. In this paper, we propose a method to estimate all the necessary dimensions (full set) for the human model individualization from a small number of measured dimensions (subset) and human dimension database. For this purpose, we solved multiple regression equation from the dimension database given full set dimensions as the objective variable and subset dimensions as the explanatory variables. Thus, the full set dimensions are obtained by simply multiplying the subset dimensions to the coefficient matrix of the regression equation. We verified the accuracy of our method by imputing hand, foot, and whole body dimensions from their dimension database. The leave-one-out cross validation is employed in this evaluation. The mean absolute errors (MAE) between the measured and the estimated dimensions computed from 4 dimensions (hand length, breadth, middle finger breadth at proximal, and middle finger depth at proximal) in the hand, 2 dimensions (foot length, breadth, and lateral malleolus height) in the foot, and 1 dimension (height) and weight in the whole body are computed. The average MAE of non-measured dimensions were 4.58% in the hand, 4.42% in the foot, and 3.54% in the whole body, while that of measured dimensions were 0.00%.

  1. SNPs for Parentage Testing and Traceability in Globally Diverse Breeds of Sheep

    PubMed Central

    Heaton, Michael P.; Leymaster, Kreg A.; Kalbfleisch, Theodore S.; Kijas, James W.; Clarke, Shannon M.; McEwan, John; Maddox, Jillian F.; Basnayake, Veronica; Petrik, Dustin T.; Simpson, Barry; Smith, Timothy P. L.; Chitko-McKown, Carol G.

    2014-01-01

    DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF≥0.3) in 48±5 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionization–time-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parent’s genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.1×10(−39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the world’s sheep breeds

  2. Filter quality of pleated filter cartridges.

    PubMed

    Chen, Chun-Wan; Huang, Sheng-Hsiu; Chiang, Che-Ming; Hsiao, Ta-Chih; Chen, Chih-Chieh

    2008-04-01

    The performance of dust cartridge filters commonly used in dust masks and in room ventilation depends both on the collection efficiency of the filter material and the pressure drop across the filter. Currently, the optimization of filter design is based only on minimizing the pressure drop at a set velocity chosen by the manufacturer. The collection efficiency, an equally important factor, is rarely considered in the optimization process. In this work, a filter quality factor, which combines the collection efficiency and the pressure drop, is used as the optimization criterion for filter evaluation. Most respirator manufacturers pleat the filter to various extents to increase the filtration area in the limit space within the dust cartridge. Six sizes of filter holders were fabricated to hold just one pleat of filter, simulating six different pleat counts, ranging from 0.5 to 3.33 pleats cm(-1). The possible electrostatic charges on the filter were removed by dipping in isopropyl alcohol, and the air velocity is fixed at 100 cm s(-1). Liquid dicotylphthalate particles generated by a constant output atomizer were used as challenge aerosols to minimize particle loading effects. A scanning mobility particle sizer was used to measure the challenge aerosol number concentrations and size distributions upstream and downstream of the pleated filter. The pressure drop across the filter was monitored by using a calibrated pressure transducer. The results showed that the performance of pleated filters depend not only on the size of the particle but also on the pleat count of the pleated filter. Based on filter quality factor, the optimal pleat count (OPC) is always higher than that based on pressure drop by about 0.3-0.5 pleats cm(-1). For example, the OPC is 2.15 pleats cm(-1) from the standpoint of pressure drop, but for the highest filter quality factor, the pleated filter needed to have a pleat count of 2.65 pleats cm(-1) at particle diameter of 122 nm. From the aspect of

  3. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass

    PubMed Central

    Ramstein, Guillaume P.; Lipka, Alexander E.; Lu, Fei; Costich, Denise E.; Cherney, Jerome H.; Buckler, Edward S.; Casler, Michael D.

    2015-01-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  4. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

    PubMed

    Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D

    2015-03-12

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data.

  5. Studies on interaction of colloidal silver nanoparticles (SNPs) with five different bacterial species.

    PubMed

    Khan, S Sudheer; Mukherjee, Amitava; Chandrasekaran, N

    2011-10-01

    Silver nanoparticles (SNPs) are being increasingly used in many consumer products like textile fabrics, cosmetics, washing machines, food and drug products owing to its excellent antimicrobial properties. Here we have studied the adsorption and toxicity of SNPs on bacterial species such as Pseudomonas aeruginosa, Micrococcus luteus, Bacillus subtilis, Bacillus barbaricus and Klebsiella pneumoniae. The influence of zeta potential on the adsorption of SNPs on bacterial cell surface was investigated at acidic, neutral and alkaline pH and with varying salt (NaCl) concentrations (0.05, 0.1, 0.5, 1 and 1.5 M). The survival rate of bacterial species decreased with increase in adsorption of SNPs. Maximum adsorption and toxicity was observed at pH 5, and NaCl concentration of <0.5 M. A very less adsorption was observed at pH 9 and NaCl concentration >0.5 M, there by resulting in less toxicity. The zeta potential study suggests that, the adsorption of SNPs on the cell surface was related to electrostatic force of attraction. The equilibrium and kinetics of the adsorption process were also studied. The adsorption equilibrium isotherms fitted well to the Langmuir model. The kinetics of adsorption fitted best to pseudo-first-order. These findings form a basis for interpreting the interaction of nanoparticles with environmental bacterial species.

  6. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    SciTech Connect

    Yang, Jing; Li, Yuan-Yuan; Li, Yi-Xue; Ye, Zhi-Qiang

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

  7. Portability of tag SNPs across isolated population groups: an example from India.

    PubMed

    Sarkar Roy, N; Farheen, S; Roy, N; Sengupta, S; Majumder, P P

    2008-01-01

    Isolated population groups are useful in conducting association studies of complex diseases to avoid various pitfalls, including those arising from population stratification. Since DNA resequencing is expensive, it is recommended that genotyping be carried out at tagSNP (tSNP) loci. For this, tSNPs identified in one isolated population need to be used in another. Unless tSNPs are highly portable across populations this strategy may result in loss of information in association studies. We examined the issue of tSNP portability by sampling individuals from 10 isolated ethnic groups from India. We generated DNA resequencing data pertaining to 3 genomic regions and identified tSNPs in each population. We defined an index of tSNP portability and showed that portability is low across isolated Indian ethnic groups. The extent of portability did not significantly correlate with genetic similarity among the populations studied here. We also analyzed our data with sequence data from individuals of African and European descent. Our results indicated that it may be necessary to carry out resequencing in a small number of individuals to discover SNPs and identify tSNPs in the specific isolated population in which a disease association study is to be conducted.

  8. [Analysis of population stratification using random SNPs in genome-wide association studies].

    PubMed

    Cao, Zong-Fu; Ma, Chuan-Xiang; Wang, Lei; Cai, Bin

    2010-09-01

    Since population genetic STRUCTURE can increase false-positive rate in genome-wide association studies (GWAS) for complex diseases, the effect of population stratification should be taken into account in GWAS. However, the effect of randomly selected SNPs in population stratification analysis is underdetermined. In this study, based on the genotype data generated on Genome-Wide Human SNP Array 6.0 from unrelated individuals of HapMap Phase2, we randomly selected SNPs that were evenly distributed across the whole-genome, and acquired Ancestry Informative Markers (AIMs) by the method of f value and allelic Fisher exact test. F-statistics and STRUCTURE analysis based on the select different sets of SNPs were used to evaluate the effect of distinguishing the populations from HapMap Phase3. We found that randomly selected SNPs that were evenly distributed across the whole-genome were able to be used to identify the population structure. This study further indicated that more than 3 000 randomly selected SNPs that were evenly distributed across the whole-genome were substituted for AIMs in population stratification analysis, when there were no available AIMs for spe-cific populations.

  9. A computational method for prediction of rSNPs in human genome.

    PubMed

    Li, Rong; Han, Jiuqiang; Liu, Jun; Zheng, Jiguang; Liu, Ruiling

    2016-06-01

    Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn't rely on data of regulatory elements, thus it will have better adaptability when facing different data scenarios. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnpdect/.

  10. Profiling deleterious non-synonymous SNPs of smoker's gene CYP1A1.

    PubMed

    Ramesh, A Sai; Khan, Imran; Farhan, Md; Thiagarajan, Padma

    2013-01-01

    CYP1A1 gene belongs to the cytochrome P450 family and is known better as smokers' gene due to its hyperactivation as a consequence of long term smoking. The expression of CYP1A1 induces polycyclic aromatic hydrocarbon production in the lungs, which when over expressed, is known to cause smoking related diseases, such as cardiovascular pathologies, cancer, and diabetes. Single nucleotide polymorphisms (SNPs) are the simplest form of genetic variations that occur at a higher frequency, and are denoted as synonymous and non-synonymous SNPs on the basis of their effects on the amino acids. This study adopts a systematic in silico approach to predict the deleterious SNPs that are associated with disease conditions. It is inferred that four SNPs are highly deleterious, among which the SNP with rs17861094 is commonly predicted to be harmful by all tools. Hydrophobic (isoleucine) to hydrophilic (serine) amino acid variation was observed in the candidate gene. Hence, this investigation aims to characterize a candidate gene from 159 SNPs of CYP1A1.

  11. Identification of SNPs in growth-related genes in Colombian creole cattle.

    PubMed

    Martinez, R; Rocha, J F; Bejarano, D; Gomez, Y; Abuabara, Y; Gallego, J

    2016-09-19

    Colombian creole cattle have important adaptation traits related to heat tolerance and reproductive and productive efficiency. Romosinuano (ROMO) and Blanco Orejinegro (BON) are the most common breeds used by Colombian cattle breeders. Growth traits are of prime importance in these animals, which are mainly raised for beef production. Genes encoding growth hormone, growth hormone receptor, homeobox protein, insulin growth factor binding protein 3, leptin, and myostatin have been associated with physiological growth pathways in cattle and other species. We therefore aimed to identify single nucleotide polymorphisms (SNPs) within these genes in ROMO, BON, and Zebu cattle. DNA regions of these genes were sequenced in 386 animals; 47 new SNPs were found, of which 14 were located in the exonic regions, thereby changing the protein sequence. An association of SNPs with weaning weight (WW), daily weight gain at weaning (DWG), and weight at 16 months (W16M) traits was deduced. The genetic analysis revealed several SNPs related to these traits. The SNP GhRE06.2 had a significant association with WW and the SNP Lep03.4 was highly associated with DWG and W16M. Other polymorphisms were significantly associated with WW and DWG, although they did not surpass the Bonferroni significance threshold. The new mutations identified may indicate important points of genetic control in the DNA that could be responsible for changes in the expression of the analyzed traits. These SNPs might be used in future breeding programs to improve the productive performance of cattle in beef farms.

  12. Association of MHC region SNPs with irritant susceptibility in healthcare workers

    PubMed Central

    Yucesoy, Berran; Talzhanov, Yerkebulan; Barmada, M. Michael; Johnson, Victor J.; Kashon, Michael L.; Baron, Elma; Wilson, Nevin W.; Frye, Bonnie; Wang, Wei; Fluharty, Kara; Gharib, Rola; Meade, Jean; Germolec, Dori; Luster, Michael I.; Nedorost, Susan

    2017-01-01

    Irritant contact dermatitis is the most common work-related skin disease, especially affecting workers in “wet-work” occupations. This study was conducted to investigate the association between single nucleotide polymorphisms (SNPs) within the major histocompatibility complex (MHC) and skin irritant response in a group of healthcare workers. 585 volunteer healthcare workers were genotyped for MHC SNPs and patch tested with three different irritants: sodium lauryl sulfate (SLS), sodium hydroxide (NaOH) and benzalkonium chloride (BKC). Genotyping was performed using Illumina Goldengate MHC panels. A number of SNPs within the MHC Class I (OR2B3, TRIM31, TRIM10, TRIM40 and IER3), Class II (HLA-DPA1, HLA-DPB1) and Class III (C2) genes were associated (p <0.001) with skin response to tested irritants in different genetic models. Linkage disequilibrium patterns and functional annotations identified two SNPs in the TRIM40 (rs1573298) and HLA-DPB1 (rs9277554) genes, with a potential impact on gene regulation. In addition, SNPs in PSMB9 (rs10046277 and ITPR3 (rs499384) were associated with hand dermatitis. The results are of interest as they demonstrate that genetic variations in inflammation-related genes within the MHC can influence chemical-induced skin irritation and may explain the connection between inflamed skin and propensity to subsequent allergic contact sensitization. PMID:27258892

  13. Functional annotation of sixty-five type-2 diabetes risk SNPs and its application in risk prediction

    PubMed Central

    Wu, Yiming; Jing, Runyu; Dong, Yongcheng; Kuang, Qifan; Li, Yan; Huang, Ziyan; Gan, Wei; Xue, Yue; Li, Yizhou; Li, Menglong

    2017-01-01

    Genome-wide association studies (GWAS) have identified more than sixty single nucleotide polymorphisms (SNPs) associated with increased risk for type 2 diabetes (T2D). However, the identification of causal risk SNPs for T2D pathogenesis was complicated by the factor that each risk SNP is a surrogate for the hundreds of SNPs, most of which reside in non-coding regions. Here we provide a comprehensive annotation of 65 known T2D related SNPs and inspect putative functional SNPs probably causing protein dysfunction, response element disruptions of known transcription factors related to T2D genes and regulatory response element disruption of four histone marks in pancreas and pancreas islet. In new identified risk SNPs, some of them were reported as T2D related SNPs in recent studies. Further, we found that accumulation of modest effects of single sites markedly enhanced the risk prediction based on 1989 T2D samples and 3000 healthy controls. The AROC value increased from 0.58 to 0.62 by only using genotype score when putative risk SNPs were added. Besides, the net reclassification improvement is 10.03% on the addition of new risk SNPs. Taken together, functional annotation could provide a list of prioritized potential risk SNPs for the further estimation on the T2D susceptibility of individuals. PMID:28262806

  14. HEPA filter dissolution process

    DOEpatents

    Brewer, K.N.; Murphy, J.A.

    1994-02-22

    A process is described for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal. 4 figures.

  15. Hepa filter dissolution process

    DOEpatents

    Brewer, Ken N.; Murphy, James A.

    1994-01-01

    A process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  16. Recirculating electric air filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric air filter cartridge has a cylindrical inner high voltage eleode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  17. HEPA filter dissolution process

    SciTech Connect

    Brewer, K.N.; Murphy, J.A.

    1992-12-31

    This invention is comprised of a process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  18. Recirculating electric air filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric air filter cartridge has a cylindrical inner high voltage electrode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  19. All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

    PubMed Central

    Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  20. Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs

    NASA Astrophysics Data System (ADS)

    Watson, Corey T.; Disanto, Giulio; Breden, Felix; Giovannoni, Gavin; Ramagopalan, Sreeram V.

    2012-10-01

    Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.

  1. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.

    PubMed

    Burton, Paul R; Clayton, David G; Cardon, Lon R; Craddock, Nick; Deloukas, Panos; Duncanson, Audrey; Kwiatkowski, Dominic P; McCarthy, Mark I; Ouwehand, Willem H; Samani, Nilesh J; Todd, John A; Donnelly, Peter; Barrett, Jeffrey C; Davison, Dan; Easton, Doug; Evans, David M; Leung, Hin-Tak; Marchini, Jonathan L; Morris, Andrew P; Spencer, Chris C A; Tobin, Martin D; Attwood, Antony P; Boorman, James P; Cant, Barbara; Everson, Ursula; Hussey, Judith M; Jolley, Jennifer D; Knight, Alexandra S; Koch, Kerstin; Meech, Elizabeth; Nutland, Sarah; Prowse, Christopher V; Stevens, Helen E; Taylor, Niall C; Walters, Graham R; Walker, Neil M; Watkins, Nicholas A; Winzer, Thilo; Jones, Richard W; McArdle, Wendy L; Ring, Susan M; Strachan, David P; Pembrey, Marcus; Breen, Gerome; St Clair, David; Caesar, Sian; Gordon-Smith, Katharine; Jones, Lisa; Fraser, Christine; Green, Elaine K; Grozeva, Detelina; Hamshere, Marian L; Holmans, Peter A; Jones, Ian R; Kirov, George; Moskivina, Valentina; Nikolov, Ivan; O'Donovan, Michael C; Owen, Michael J; Collier, David A; Elkin, Amanda; Farmer, Anne; Williamson, Richard; McGuffin, Peter; Young, Allan H; Ferrier, I Nicol; Ball, Stephen G; Balmforth, Anthony J; Barrett, Jennifer H; Bishop, Timothy D; Iles, Mark M; Maqbool, Azhar; Yuldasheva, Nadira; Hall, Alistair S; Braund, Peter S; Dixon, Richard J; Mangino, Massimo; Stevens, Suzanne; Thompson, John R; Bredin, Francesca; Tremelling, Mark; Parkes, Miles; Drummond, Hazel; Lees, Charles W; Nimmo, Elaine R; Satsangi, Jack; Fisher, Sheila A; Forbes, Alastair; Lewis, Cathryn M; Onnie, Clive M; Prescott, Natalie J; Sanderson, Jeremy; Matthew, Christopher G; Barbour, Jamie; Mohiuddin, M Khalid; Todhunter, Catherine E; Mansfield, John C; Ahmad, Tariq; Cummings, Fraser R; Jewell, Derek P; Webster, John; Brown, Morris J; Lathrop, Mark G; Connell, John; Dominiczak, Anna; Marcano, Carolina A Braga; Burke, Beverley; Dobson, Richard; Gungadoo, Johannie; Lee, Kate L; Munroe, Patricia B; Newhouse, Stephen J; Onipinla, Abiodun; Wallace, Chris; Xue, Mingzhan; Caulfield, Mark; Farrall, Martin; Barton, Anne; Bruce, Ian N; Donovan, Hannah; Eyre, Steve; Gilbert, Paul D; Hilder, Samantha L; Hinks, Anne M; John, Sally L; Potter, Catherine; Silman, Alan J; Symmons, Deborah P M; Thomson, Wendy; Worthington, Jane; Dunger, David B; Widmer, Barry; Frayling, Timothy M; Freathy, Rachel M; Lango, Hana; Perry, John R B; Shields, Beverley M; Weedon, Michael N; Hattersley, Andrew T; Hitman, Graham A; Walker, Mark; Elliott, Kate S; Groves, Christopher J; Lindgren, Cecilia M; Rayner, Nigel W; Timpson, Nicolas J; Zeggini, Eleftheria; Newport, Melanie; Sirugo, Giorgio; Lyons, Emily; Vannberg, Fredrik; Hill, Adrian V S; Bradbury, Linda A; Farrar, Claire; Pointon, Jennifer J; Wordsworth, Paul; Brown, Matthew A; Franklyn, Jayne A; Heward, Joanne M; Simmonds, Matthew J; Gough, Stephen C L; Seal, Sheila; Stratton, Michael R; Rahman, Nazneen; Ban, Maria; Goris, An; Sawcer, Stephen J; Compston, Alastair; Conway, David; Jallow, Muminatou; Newport, Melanie; Sirugo, Giorgio; Rockett, Kirk A; Bumpstead, Suzannah J; Chaney, Amy; Downes, Kate; Ghori, Mohammed J R; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Keniry, Andrew; King, Emma; McGinnis, Ralph; Potter, Simon; Ravindrarajah, Rathi; Whittaker, Pamela; Widden, Claire; Withers, David; Cardin, Niall J; Davison, Dan; Ferreira, Teresa; Pereira-Gale, Joanne; Hallgrimsdo'ttir, Ingeleif B; Howie, Bryan N; Su, Zhan; Teo, Yik Ying; Vukcevic, Damjan; Bentley, David; Brown, Matthew A; Compston, Alastair; Farrall, Martin; Hall, Alistair S; Hattersley, Andrew T; Hill, Adrian V S; Parkes, Miles; Pembrey, Marcus; Stratton, Michael R; Mitchell, Sarah L; Newby, Paul R; Brand, Oliver J; Carr-Smith, Jackie; Pearce, Simon H S; McGinnis, R; Keniry, A; Deloukas, P; Reveille, John D; Zhou, Xiaodong; Sims, Anne-Marie; Dowling, Alison; Taylor, Jacqueline; Doan, Tracy; Davis, John C; Savage, Laurie; Ward, Michael M; Learch, Thomas L; Weisman, Michael H; Brown, Mathew

    2007-11-01

    We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.

  2. SNPs of hemocyanin C-terminal fragment in shrimp Litopenaeus vannamei.

    PubMed

    Zhao, Xianliang; Guo, Lingling; Zhang, Yueling; Liu, Yao; Zhang, Xiaoyu; Lun, Jingsheng; Chen, Jiehui; Li, Yuanyou

    2012-02-17

    In this study, we identified a variable region in the C-terminus of hemocyanin from the shrimp Litopenaeus vannamei (2288-2503bp, HcSC) by sequence alignments. A total of 13 SNPs were identified by PCR-SSCP and HcSC clone sequencing. The SSCP patterns of HcSC could be modulated in Vibro parahaemolyticus-treated shrimps. A novel SSCP band with four SNP sites was identified in V. parahaemolyticus-resistant shrimps. More importantly, three of these four SNPs introduced variations in amino acid sequence and possibly secondary structure of the HcSC polypeptide and resulted in a higher agglutinative activity against seven pathogenic bacteria. These results suggest that the C-terminus of shrimp L. vannamei hemocyanin possesses SNPs, which may be related to shrimp resistance to different pathogens.

  3. S-PRIME/TI-SNPS program activities in FY94 critical components testing

    NASA Astrophysics Data System (ADS)

    Brown, Colette; Dale Rogers, R.; Determan, William R.; Van Hagan, Tom

    1995-01-01

    A conceptual design for a 40-kWe thermionic space nuclear power system (TI-SNPS) known as the S-PRIME system is being developed by Rockwell and its subcontractors for the U.S. Department of Energy (DOE), United States Air Force (USAF), and Ballistic Missile Defense Organization (BMDO) under the TI-SNPS Program. Phase 1 of this program includes developing a conceptual design of a 5- to 40-kWe range TI-SNPS and validating key technologies that support the design. All key technologies for the S-PRIME design have been identified along with six critical component demonstrations, which will be used to validate the S-PRIME design features.

  4. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris)

    PubMed Central

    Villordo-Pineda, Emiliano; González-Chavira, Mario M.; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A.; Caballero-Pérez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  5. Silver sulfide nanoparticles (Ag2S-NPs) are taken up by plants and are phytotoxic.

    PubMed

    Wang, Peng; Menzies, Neal W; Lombi, Enzo; Sekine, Ryo; Blamey, F Pax C; Hernandez-Soriano, Maria C; Cheng, Miaomiao; Kappen, Peter; Peijnenburg, Willie J G M; Tang, Caixian; Kopittke, Peter M

    2015-01-01

    Silver nanoparticles (NPs) are used in more consumer products than any other nanomaterial and their release into the environment is unavoidable. Of primary concern is the wastewater stream in which most silver NPs are transformed to silver sulfide NPs (Ag2S-NPs) before being applied to agricultural soils within biosolids. While Ag2S-NPs are assumed to be biologically inert, nothing is known of their effects on terrestrial plants. The phytotoxicity of Ag and its accumulation was examined in short-term (24 h) and longer-term (2-week) solution culture experiments with cowpea (Vigna unguiculata L. Walp.) and wheat (Triticum aestivum L.) exposed to Ag2S-NPs (0-20 mg Ag L(-1)), metallic Ag-NPs (0-1.6 mg Ag L(-1)), or ionic Ag (AgNO3; 0-0.086 mg Ag L(-1)). Although not inducing any effects during 24-h exposure, Ag2S-NPs reduced growth by up to 52% over a 2-week period. This toxicity did not result from their dissolution and release of toxic Ag(+) in the rooting medium, with soluble Ag concentrations remaining below 0.001 mg Ag L(-1). Rather, Ag accumulated as Ag2S in the root and shoot tissues when plants were exposed to Ag2S-NPs, consistent with their direct uptake. Importantly, this differed from the form of Ag present in tissues of plants exposed to AgNO3. For the first time, our findings have shown that Ag2S-NPs exert toxic effects through their direct accumulation in terrestrial plant tissues. These findings need to be considered to ensure high yield of food crops, and to avoid increasing Ag in the food chain.

  6. SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences

    PubMed Central

    Han, Areum; Kang, Hyo Jin; Cho, Yoobok; Lee, Sunghoon; Kim, Young Joo; Gong, Sungsam

    2006-01-01

    The single nucleotide polymorphisms (SNPs) in conserved protein regions have been thought to be strong candidates that alter protein functions. Thus, we have developed SNP@Domain, a web resource, to identify SNPs within human protein domains. We annotated SNPs from dbSNP with protein structure-based as well as sequence-based domains: (i) structure-based using SCOP and (ii) sequence-based using Pfam to avoid conflicts from two domain assignment methodologies. Users can investigate SNPs within protein domains with 2D and 3D maps. We expect this visual annotation of SNPs within protein domains will help scientists select and interpret SNPs associated with diseases. A web interface for the SNP@Domain is freely available at and from . PMID:16845090

  7. Imputation approach for deducing a complete mitogenome sequence from low-depth-coverage next-generation sequencing data: application to ancient remains from the Moon Pyramid, Mexico.

    PubMed

    Mizuno, Fuzuki; Kumagai, Masahiko; Kurosaki, Kunihiko; Hayashi, Michiko; Sugiyama, Saburo; Ueda, Shintaroh; Wang, Li

    2017-02-16

    It is considered that more than 15 depths of coverage are necessary for next-generation sequencing (NGS) data to obtain reliable complete nucleotide sequences of the mitogenome. However, it is difficult to satisfy this requirement for all nucleotide positions because of problems obtaining a uniform depth of coverage for poorly preserved materials. Thus, we propose an imputation approach that allows a complete mitogenome sequence to be deduced from low-depth-coverage NGS data. We used different types of mitogenome data files as panels for imputation: a worldwide panel comprising all the major haplogroups, a worldwide panel comprising sequences belonging to the estimated haplogroup alone, a panel comprising sequences from the population most closely related to an individual under investigation, and a panel comprising sequences belonging to the estimated haplogroup from the population most closely related to an individual under investigation. The number of missing nucleotides was drastically reduced in all the panels, but the contents obtained by imputation were quite different among the panels. The efficiency of the imputation method differed according to the panels used. The missing nucleotides were most credibly imputed using sequences of the estimated haplogroup from the population most closely related to the individual under investigation as a panel.Journal of Human Genetics advance online publication, 16 February 2017; doi:10.1038/jhg.2017.14.

  8. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references.

    PubMed

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-12-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case-control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5-98.8% obtained with the original HIBAG references to 95.2-99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets.

  9. Residential proximity to electromagnetic field sources and birth weight: Minimizing residual confounding using multiple imputation and propensity score matching.

    PubMed

    de Vocht, Frank; Lee, Brian

    2014-08-01

    Studies have suggested that residential exposure to extremely low frequency (50 Hz) electromagnetic fields (ELF-EMF) from high voltage cables, overhead power lines, electricity substations or towers are associated with reduced birth weight and may be associated with adverse birth outcomes or even miscarriages. We previously conducted a study of 140,356 singleton live births between 2004 and 2008 in Northwest England, which suggested that close residential proximity (≤ 50 m) to ELF-EMF sources was associated with reduced average birth weight of 212 g (95%CI: -395 to -29 g) but not with statistically significant increased risks for other adverse perinatal outcomes. However, the cohort was limited by missing data for most potentially confounding variables including maternal smoking during pregnancy, which was only available for a small subgroup, while also residual confounding could not be excluded. This study, using the same cohort, was conducted to minimize the effects of these problems using multiple imputation to address missing data and propensity score matching to minimize residual confounding. Missing data were imputed using multiple imputation using chained equations to generate five datasets. For each dataset 115 exposed women (residing ≤ 50 m from a residential ELF-EMF source) were propensity score matched to 1150 unexposed women. After doubly robust confounder adjustment, close proximity to a residential ELF-EMF source remained associated with a reduction in birth weight of -116 g (95% confidence interval: -224:-7 g). No effect was found for proximity ≤ 100 m compared to women living further away. These results indicate that although the effect size was about half of the effect previously reported, close maternal residential proximity to sources of ELF-EMF remained associated with suboptimal fetal growth.

  10. Estimation of the Incidence of Hepatocellular Carcinoma and Cholangiocarcinoma in Songkhla, Thailand, 1989-2013, Using Multiple Imputation Method

    PubMed Central

    Yeesoonsang, Seesai; Bilheem, Surichai; McNeil, Edward; Iamsirithaworn, Sophon; Jiraphongsa, Chuleeporn; Sriplung, Hutcha

    2017-01-01

    Purpose Histological specimens are not required for diagnosis of liver and bile duct (LBD) cancer, resulting in a high percentage of unknown histologies. We compared estimates of hepatocellular carcinoma (HCC) and cholangiocarcinoma (CCA) incidences by imputing these unknown histologies. Materials and Methods A retrospective study was conducted using data from the Songkhla Cancer Registry, southern Thailand, from 1989 to 2013. Multivariate imputation by chained equations (mice) was used in re-classification of the unknown histologies. Age-standardized rates (ASR) of HCC and CCA by sex were calculated and the trends were compared. Results Of 2,387 LBD cases, 61% had unknown histology. After imputation, the ASR of HCC in males during 1989 to 2007 increased from 4 to 10 per 100,000 and then decreased after 2007. The ASR of CCA increased from 2 to 5.5 per 100,000, and the ASR of HCC in females decreased from 1.5 in 2009 to 1.3 in 2013 and that of CCA increased from less than 1 to 1.9 per 100,000 by 2013. Results of complete case analysis showed somewhat similar, although less dramatic, trends. Conclusion In Songkhla, the incidence of CCA appears to be stable after increasing for 20 years whereas the incidence of HCC is now declining. The decline in incidence of HCC among males since 2007 is probably due to implementation of the hepatitis B virus vaccine in the 1990s. The rise in incidence of CCA is a concern and highlights the need for case control studies to elucidate the risk factors. PMID:27188200

  11. Imputation of DNA Methylation Levels in the Brain Implicates a Risk Factor for Parkinson’s Disease

    PubMed Central

    Rawlik, Konrad; Rowlatt, Amy; Tenesa, Albert

    2016-01-01

    Understanding how genetic variation affects intermediate phenotypes, like DNA methylation or gene expression, and how these in turn vary with complex human disease provides valuable insight into disease etiology. However, intermediate phenotypes are typically tissue and developmental stage specific, making relevant phenotypes difficult to assay. Assembling large case–control cohorts, necessary to achieve sufficient statistical power to assess associations between complex traits and relevant intermediate phenotypes, has therefore remained challenging. Imputation of such intermediate phenotypes represents a practical alternative in this context. We used a mixed linear model to impute DNA methylation (DNAm) levels of four brain tissues at up to 1826 methylome-wide sites in 6259 patients with Parkinson’s disease and 9452 controls from across five genome-wide association studies (GWAS). Six sites, in two regions, were found to associate with Parkinson’s disease for at least one tissue. While a majority of identified sites were within an established risk region for Parkinson’s disease, suggesting a role of DNAm in mediating previously observed genetic effects at this locus, we also identify an association with four CpG sites in chromosome 16p11.2. Direct measures of DNAm in the substantia nigra of 39 cases and 13 control samples were used to independently replicate these four associations. Only the association at cg10917602 replicated with a concordant direction of effect (P = 0.02). cg10917602 is 87 kb away from the closest reported GWAS hit. The employed imputation methodology implies that variation of DNAm levels at cg10917602 is predictive for Parkinson’s disease risk, suggesting a possible causal role for methylation at this locus. More generally this study demonstrates the feasibility of identifying predictive epigenetic markers of disease risk from readily available data sets. PMID:27466229

  12. ARRANGEMENT FOR REPLACING FILTERS

    DOEpatents

    Blomgren, R.A.; Bohlin, N.J.C.

    1957-08-27

    An improved filtered air exhaust system which may be continually operated during the replacement of the filters without the escape of unfiltered air is described. This is accomplished by hermetically sealing the box like filter containers in a rectangular tunnel with neoprene covered sponge rubber sealing rings coated with a silicone impregnated pneumatic grease. The tunnel through which the filters are pushed is normal to the exhaust air duct. A number of unused filters are in line behind the filters in use, and are moved by a hydraulic ram so that a fresh filter is positioned in the air duct. The used filter is pushed into a waiting receptacle and is suitably disposed. This device permits a rapid and safe replacement of a radiation contaminated filter without interruption to the normal flow of exhaust air.

  13. Method of securing filter elements

    SciTech Connect

    Brown, Erik P.; Haslam, Jeffery L.; Mitchell, Mark A.

    2016-10-04

    A filter securing system including a filter unit body housing; at least one tubular filter element positioned in the filter unit body housing, the tubular filter element having a closed top and an open bottom; a dimple in either the filter unit body housing or the top of the tubular filter element; and a socket in either the filter unit body housing or the top of the tubular filter element that receives the dimple in either the filter unit body housing or the top of the tubular filter element to secure the tubular filter element to the filter unit body housing.

  14. Rigid porous filter

    DOEpatents

    Chiang, Ta-Kuan; Straub, Douglas L.; Dennis, Richard A.

    2000-01-01

    The present invention involves a porous rigid filter including a plurality of concentric filtration elements having internal flow passages and forming external flow passages there between. The present invention also involves a pressure vessel containing the filter for the removal of particulates from high pressure particulate containing gases, and further involves a method for using the filter to remove such particulates. The present filter has the advantage of requiring fewer filter elements due to the high surface area-to-volume ratio provided by the filter, requires a reduced pressure vessel size, and exhibits enhanced mechanical design properties, improved cleaning properties, configuration options, modularity and ease of fabrication.

  15. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

    PubMed

    Hopke, P K; Liu, C; Rubin, D B

    2001-03-01

    Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.

  16. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, Harry S.; Thompson, Robert C.; Hubbard, Charles W.; Perkins, Richard W.

    1997-01-01

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, whereafter the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant.

  17. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, H.S.; Thompson, R.C.; Hubbard, C.W.; Perkins, R.W.

    1997-03-25

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, where after the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant. 5 figs.

  18. Blind Prediction of Deleterious Amino Acid Variations with SNPs&GO.

    PubMed

    Capriotti, Emidio; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2017-01-19

    SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a Support Vector Machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by Gene Ontology terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve (AUC) of 0.88 with low false positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper we summarize the best results obtained by SNPs&GO on disease related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013) and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs.

  19. No prognostic value added by vitamin D pathway SNPs to current prognostic system for melanoma survival

    PubMed Central

    Orlow, Irene; Kanetsky, Peter A.; Thomas, Nancy E.; Fang, Shenying; Lee, Jeffrey E.; Berwick, Marianne; Lee, Ji-Hyun

    2017-01-01

    The prognostic improvement attributed to genetic markers over current prognostic system has not been well studied for melanoma. The goal of this study is to evaluate the added prognostic value of Vitamin D Pathway (VitD) SNPs to currently known clinical and demographic factors such as age, sex, Breslow thickness, mitosis and ulceration (CDF). We utilized two large independent well-characterized melanoma studies: the Genes, Environment, and Melanoma (GEM) and MD Anderson studies, and performed variable selection of VitD pathway SNPs and CDF using Random Survival Forest (RSF) method in addition to Cox proportional hazards models. The Harrell’s C-index was used to compare the performance of model predictability. The population-based GEM study enrolled 3,578 incident cases of cutaneous melanoma (CM), and the hospital-based MD Anderson study consisted of 1,804 CM patients. Including both VitD SNPs and CDF yielded C-index of 0.85, which provided slight but not significant improvement by CDF alone (C-index = 0.83) in the GEM study. Similar results were observed in the independent MD Anderson study (C-index = 0.84 and 0.83, respectively). The Cox model identified no significant associations after adjusting for multiplicity. Our results do not support clinically significant prognostic improvements attributable to VitD pathway SNPs over current prognostic system for melanoma survival. PMID:28323902

  20. Assessing polar bear (Ursus maritimus) population structure in the Hudson Bay region using SNPs.

    PubMed

    Viengkone, Michelle; Derocher, Andrew Edward; Richardson, Evan Shaun; Malenfant, René Michael; Miller, Joshua Moses; Obbard, Martyn E; Dyck, Markus G; Lunn, Nick J; Sahanatien, Vicki; Davis, Corey S

    2016-12-01

    Defining subpopulations using genetics has traditionally used data from microsatellite markers to investigate population structure; however, single-nucleotide polymorphisms (SNPs) have emerged as a tool for detection of fine-scale structure. In Hudson Bay, Canada, three polar bear (Ursus maritimus) subpopulations (Foxe Basin (FB), Southern Hudson Bay (SH), and Western Hudson Bay (WH)) have been delineated based on mark-recapture studies, radiotelemetry and satellite telemetry, return of marked animals in the subsistence harvest, and population genetics using microsatellites. We used SNPs to detect fine-scale population structure in polar bears from the Hudson Bay region and compared our results to the current designations using 414 individuals genotyped at 2,603 SNPs. Analyses based on discriminant analysis of principal components (DAPC) and STRUCTURE support the presence of four genetic clusters: (i) Western-including individuals sampled in WH, SH (excluding Akimiski Island in James Bay), and southern FB (south of Southampton Island); (ii) Northern-individuals sampled in northern FB (Baffin Island) and Davis Strait (DS) (Labrador coast); (iii) Southeast-individuals from SH (Akimiski Island in James Bay); and (iv) Northeast-individuals from DS (Baffin Island). Population structure differed from microsatellite studies and current management designations demonstrating the value of using SNPs for fine-scale population delineation in polar bears.

  1. Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.

    PubMed

    Guy, Richard T; Santago, Peter; Langefeld, Carl D

    2012-02-01

    Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms (SNPs) of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order nk(2), where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7,822 SNPs in 3,548 individuals). Our results suggest that BADTrees hold promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease.

  2. Cross-Amplification and Validation of SNPs Conserved over 44 Million Years between Seals and Dogs

    PubMed Central

    Hoffman, Joseph I.; Thorne, Michael A. S.; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  3. Alteration of Antiviral Signalling by Single Nucleotide Polymorphisms (SNPs) of Mitochondrial Antiviral Signalling Protein (MAVS)

    PubMed Central

    Xing, Fei; Matsumiya, Tomoh; Hayakari, Ryo; Yoshida, Hidemi; Kawaguchi, Shogo; Takahashi, Ippei; Nakaji, Shigeyuki; Imaizumi, Tadaatsu

    2016-01-01

    Genetic variation is associated with diseases. As a type of genetic variation occurring with certain regularity and frequency, the single nucleotide polymorphism (SNP) is attracting more and more attention because of its great value for research and real-life application. Mitochondrial antiviral signalling protein (MAVS) acts as a common adaptor molecule for retinoic acid-inducible gene-I (RIG-I)-like receptors (RLRs), which can recognize foreign RNA, including viral RNA, leading to the induction of type I interferons (IFNs). Therefore, MAVS is thought to be a crucial molecule in antiviral innate immunity. We speculated that genetic variation of MAVS may result in susceptibility to infectious diseases. To assess the risk of viral infection based on MAVS variation, we tested the effects of twelve non-synonymous MAVS coding-region SNPs from the National Center for Biotechnology Information (NCBI) database that result in amino acid substitutions. We found that five of these SNPs exhibited functional alterations. Additionally, four resulted in an inhibitory immune response, and one had the opposite effect. In total, 1,032 human genomic samples obtained from a mass examination were genotyped at these five SNPs. However, no homozygous or heterozygous variation was detected. We hypothesized that these five SNPs are not present in the Japanese population and that such MAVS variations may result in serious immune diseases. PMID:26954674

  4. Large-scale enrichment and discovery of gene-associated SNPs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...

  5. Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA Segments

    PubMed Central

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  6. SNP2TFBS – a database of regulatory SNPs affecting predicted transcription factor binding site affinity

    PubMed Central

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-01

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/. PMID:27899579

  7. Angiogenic, neurotrophic, and inflammatory system SNPs moderate the association between birth weight and ADHD symptom severity.

    PubMed

    Smith, Taylor F; Anastopoulos, Arthur D; Garrett, Melanie E; Arias-Vasquez, Alejandro; Franke, Barbara; Oades, Robert D; Sonuga-Barke, Edmund; Asherson, Philip; Gill, Michael; Buitelaar, Jan K; Sergeant, Joseph A; Kollins, Scott H; Faraone, Stephen V; Ashley-Koch, Allison

    2014-12-01

    Low birth weight is associated with increased risk for Attention-Deficit/Hyperactivity Disorder (ADHD); however, the etiological underpinnings of this relationship remain unclear. This study investigated if genetic variants in angiogenic, dopaminergic, neurotrophic, kynurenine, and cytokine-related biological pathways moderate the relationship between birth weight and ADHD symptom severity. A total of 398 youth from two multi-site, family-based studies of ADHD were included in the analysis. The sample consisted of 360 ADHD probands, 21 affected siblings, and 17 unaffected siblings. A set of 164 SNPs from 31 candidate genes, representing five biological pathways, were included in our analyses. Birth weight and gestational age data were collected from a state birth registry, medical records, and parent report. Generalized Estimating Equations tested for main effects and interactions between individual SNPs and birth weight centile in predicting ADHD symptom severity. SNPs within neurotrophic (NTRK3) and cytokine genes (CNTFR) were associated with ADHD inattentive symptom severity. There was no main effect of birth weight centile on ADHD symptom severity. SNPs within angiogenic (NRP1 & NRP2), neurotrophic (NTRK1 & NTRK3), cytokine (IL16 & S100B), and kynurenine (CCBL1 & CCBL2) genes moderate the association between birth weight centile and ADHD symptom severity. The SNP main effects and SNP × birth weight centile interactions remained significant after adjusting for multiple testing. Genetic variability in angiogenic, neurotrophic, and inflammatory systems may moderate the association between restricted prenatal growth, a proxy for an adverse prenatal environment, and risk to develop ADHD.

  8. Haplotype sharing analysis with SNPs in candidate genes: the Genetic Analysis Workshop 12 example.

    PubMed

    Fischer, Christine; Beckmann, Lars; Majoram, Paul; te Meerman, Gerard; Chang-Claude, Jenny

    2003-01-01

    Haplotype sharing analysis was used to investigate the association of affection status with single nucleotide polymorphism (SNP) haplotypes within candidate gene 1 in one sample each from the isolated and the general population of Genetic Analysis Workshop (GAW) 12 simulated data. Gene 1 has direct influence on affection and harbors more than 70 SNPs. Haplotype sharing analysis depends heavily on previous haplotype estimation. Using GENEHUNTER haplotypes, strong evidence was found for most SNPs in the isolated population sample, thus providing evidence for an involvement of this gene, but the maximum -log(10)(p) values for the haplotype sharing statistics (HSS) test statistic did not correspond to the location of the true variant in either population. In comparison, transmission disequilibrium test (TDT) analysis showed the strongest results at the disease-causing variant in both populations, and these were outstanding in the general population. In this example, TDT analysis appears to perform better than HSS in identifying the disease-causing variant, using SNPs within a candidate gene in an outbred population. Simulations showed that the performance of HSS is hampered by closely spaced SNPs in strong linkage disequilibrium with the functional variant and by ambiguous haplotypes.

  9. SNPs for parentage testing and traceability in globally diverse breeds of sheep

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA-based parentage determination accelerates genetic improvement by increasing pedigree accuracy. However, the utility of any “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities...

  10. SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity.

    PubMed

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-04

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.

  11. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing

    PubMed Central

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  12. Validation of 58 autosomal individual identification SNPs in three Chinese populations

    PubMed Central

    Wei, Yi-Liang; Qin, Cui-Jiao; Liu, Hai-Bo; Jia, Jing; Hu, Lan; Li, Cai-Xia

    2014-01-01

    Aim To genotype and evaluate a panel of single-nucleotide polymorphisms for individual identification (IISNPs) in three Chinese populations: Chinese Han, Uyghur, and Tibetan. Methods Two previously identified panels of IISNPs, 86 unlinked IISNPs and SNPforID 52-plex markers, were pooled and analyzed. Four SNPs were included in both panels. In total, 132 SNPs were typed on Sequenom MassARRAY® platform in 330 individuals from Han Chinese, Uyghur, and Tibetan populations. Population genetic indices and forensic parameters were determined for all studied markers. Results No significant deviation from Hardy-Weinberg equilibrium was observed for any of the SNPs in 3 populations. Expected heterozygosity (He) ranged from 0.144 to 0.500 in Han Chinese, from 0.197 to 0.500 in Uyghur, and from 0.018 to 0.500 in Tibetan population. Wright's Fst values ranged from 0.0001 to 0.1613. Pairwise linkage disequilibrium (LD) calculations for all 132 SNPs showed no significant LD across the populations (r2<0.147). A subset of 58 unlinked IISNPs (r2<0.094) with He>0.450 and Fst values from 0.0002 to 0.0536 gave match probabilities of 10−25 and a cumulative probability of exclusion of 0.999992. Conclusion The 58 unlinked IISNPs with high heterozygosity have low allele frequency variation among 3 Chinese populations, which makes them excellent candidates for the development of multiplex assays for individual identification and paternity testing. PMID:24577821

  13. Generalized Hampel Filters

    NASA Astrophysics Data System (ADS)

    Pearson, Ronald K.; Neuvo, Yrjö; Astola, Jaakko; Gabbouj, Moncef

    2016-12-01

    The standard median filter based on a symmetric moving window has only one tuning parameter: the window width. Despite this limitation, this filter has proven extremely useful and has motivated a number of extensions: weighted median filters, recursive median filters, and various cascade structures. The Hampel filter is a member of the class of decsion filters that replaces the central value in the data window with the median if it lies far enough from the median to be deemed an outlier. This filter depends on both the window width and an additional tuning parameter t, reducing to the median filter when t=0, so it may be regarded as another median filter extension. This paper adopts this view, defining and exploring the class of generalized Hampel filters obtained by applying the median filter extensions listed above: weighted Hampel filters, recursive Hampel filters, and their cascades. An important concept introduced here is that of an implosion sequence, a signal for which generalized Hampel filter performance is independent of the threshold parameter t. These sequences are important because the added flexibility of the generalized Hampel filters offers no practical advantage for implosion sequences. Partial characterization results are presented for these sequences, as are useful relationships between root sequences for generalized Hampel filters and their median-based counterparts. To illustrate the performance of this filter class, two examples are considered: one is simulation-based, providing a basis for quantitative evaluation of signal recovery performance as a function of t, while the other is a sequence of monthly Italian industrial production index values that exhibits glaring outliers.

  14. lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse

    PubMed Central

    Gong, Jing; Liu, Wei; Zhang, Jiayou; Miao, Xiaoping; Guo, An-Yuan

    2015-01-01

    Long non-coding RNAs (lncRNAs) play key roles in various cellular contexts and diseases by diverse mechanisms. With the rapid growth of identified lncRNAs and disease-associated single nucleotide polymorphisms (SNPs), there is a great demand to study SNPs in lncRNAs. Aiming to provide a useful resource about lncRNA SNPs, we systematically identified SNPs in lncRNAs and analyzed their potential impacts on lncRNA structure and function. In total, we identified 495 729 and 777 095 SNPs in more than 30 000 lncRNA transcripts in human and mouse, respectively. A large number of SNPs were predicted with the potential to impact on the miRNA–lncRNA interaction. The experimental evidence and conservation of miRNA–lncRNA interaction, as well as miRNA expressions from TCGA were also integrated to prioritize the miRNA–lncRNA interactions and SNPs on the binding sites. Furthermore, by mapping SNPs to GWAS results, we found that 142 human lncRNA SNPs are GWAS tagSNPs and 197 827 lncRNA SNPs are in the GWAS linkage disequilibrium regions. All these data for human and mouse lncRNAs were imported into lncRNASNP database (http://bioinfo.life.hust.edu.cn/lncRNASNP/), which includes two sub-databases lncRNASNP-human and lncRNASNP-mouse. The lncRNASNP database has a user-friendly interface for searching and browsing through the SNP, lncRNA and miRNA sections. PMID:25332392

  15. Prioritization of candidate SNPs in colon cancer using bioinformatics tools: an alternative approach for a cancer biologist.

    PubMed

    George Priya Doss, C; Rajasekaran, R; Arjun, P; Sethumadhavan, Rao

    2010-12-01

    The genetics of human phenotype variation and especially, the genetic basis of human complex diseases could be understood by knowing the functions of Single Nucleotide Polymorphisms (SNPs). The main goal of this work is to predict the deleterious non-synonymous SNPs (nsSNPs), so that the number of SNPs screened for association with disease can be reduced to that most likely alters gene function. In this work by using computational tools, we have analyzed the SNPs that can alter the expression and function of cancerous genes involved in colon cancer. To explore possible relationships between genetic mutation and phenotypic variation, different computational algorithm tools like Sorting Intolerant from Tolerant (evolutionary-based approach), Polymorphism Phenotyping (structure-based approach), PupaSuite, UTRScan and FASTSNP were used for prioritization of high-risk SNPs in coding region (exonic nonsynonymous SNPs) and non-coding regions (intronic and exonic 5' and 3'-untranslated region (UTR) SNPs). We developed semi-quantitative relative ranking strategy (non availability of 3D structure) that can be adapted to a priori SNP selection or post hoc evaluation of variants identified in whole genome scans or within haplotype blocks associated with disease. Lastly, we analyzed haplotype tagging SNPs (htSNPs) in the coding and untranslated regions of all the genes by selecting the force tag SNPs selection using iHAP analysis. The computational architecture proposed in this review is based on integrating relevant biomedical information sources to provide a systematic analysis of complex diseases. We have shown a "real world" application of interesting existing bioinformatics tools for SNP analysis in colon cancer.

  16. TNFα and IL10 SNPs act together to predict disease behaviour in Crohn's disease

    PubMed Central

    Fowler, E; Eri, R; Hume, G; Johnstone, S; Pandeya, N; Lincoln, D; Templeton, D; Radford-Smith, G

    2005-01-01

    Background: The cytokines tumour necrosis factor (TNF)α and interleukin (IL)10 have been implicated in the pathogenesis of Crohn's disease (CD), with increased concentrations reported in patients with active disease. However, limited data exist on their effects on disease phenotype in the same population. Certain single nucleotide polymorphisms (SNPs) within the promoter region of the IL10 (-1082G/A, -592C/A) and TNFα (-308G/A, -857C/T) genes have been associated with altered levels of circulating IL10 and TNFα. Methods: We conducted an Australian based case–control study (304 CD patients; 231 healthy controls) of these four SNPs. Further investigation of two SNPs was conducted using a logistic regression analysis. Results: We identified a possible association of both IL10 SNPs and TNFα-857 with CD. Further investigation of a relationship with disease severity showed a significant association of higher producing IL10-1082G and TNFα-857C alleles with stricturing behaviour, which was strongest when these alleles were combined and persisted after multivariate analysis (p = 0.007; odds ratio (OR) 2.37, 95% CI 1.26 to 4.43). In addition, the TNFα-857CC genotype was independently associated with familial CD (p = 0.03; OR 3.12; 95% CI 1.15 to 8.46). Conclusion: These two SNPs may help to predict disease behaviour in CD patients, which may be clinically useful in shaping treatment of the disease at an earlier stage. PMID:15937090

  17. Altered Transmission of HOX and Apoptotic SNPs Identify a Potential Common Pathway for Clubfoot

    PubMed Central

    Ester, Audrey R.; Weymouth, Katelyn S.; Burt, Amber; Wise, Carol; Scott, Allison; Gurnett, Christina A; Dobbs, Matthew B.; Blanton, Susan H.; Hecht, Jacqueline T.

    2009-01-01

    Clubfoot is a common birth defect that affects 135,000 newborns each year worldwide. It is characterized by equinus deformity of one or both feet and hypoplastic calf muscles. Despite numerous study approaches, the cause(s) remains poorly understood although a multifactorial etiology is generally accepted. We considered the HOXA and HOXD gene clusters and insulin-like growth factor binding protein 3 (IGFBP3) as candidate genes because of their important roles in limb and muscle morphogenesis. Twenty SNPs from the HOXA and HOXD gene clusters and 12 SNPs in IGFBP3 were genotyped in a sample composed of nonHispanic white and Hispanic multiplex and simplex families (discovery samples) and a second sample of nonHispanic white simplex trios (validation sample). Four SNPs (rs6668, rs2428431, rs3801776 and rs3779456) in the HOXA cluster demonstrated altered transmission in the discovery sample, but only rs3801776, located in the HOXA basal promoter region, showed altered transmission in both the discovery and validation samples (p=0.004 and p=0.028). Interestingly, HOXA9 is expressed in muscle during development. A SNP in IGFBP3, rs13223993, also showed altered transmission (p=0.003) in the discovery sample. Gene-gene interactions were identified between variants in HOXA, HOXD and IGFBP3 and with previously associated SNPs in mitochondrial-mediated apoptotic genes. The most significant interactions were found between CASP3 SNPS and variants in HOXA, HOXD and IGFBP3. These results suggest a biologic model for clubfoot in which perturbation of HOX and apoptotic genes together affect muscle and limb development, which may cause the downstream failure of limb rotation into a plantar grade position. PMID:19938081

  18. A comprehensive meta-analysis of genetic associations between five key SNPs and colorectal cancer risk

    PubMed Central

    Li, Wei; Liu, Dahai; He, Kan

    2016-01-01

    Genome-wide association studies (GWAS) on colorectal cancer (CRC) have identified dozens of single nucleotide polymorphisms (SNPs) in more than 19 independent loci associated with CRC. Due to the heterogeneity of the studied subjects and the contrary results, it is challenging to verify the certainty of the association between these loci and CRC. We conducted a critical review of the published studies of SNPs associated with CRC. Five most frequently reported SNPs, which are rs6983267/8q24.21, rs4939827/18q21.1, rs10795668/10p14, rs4444235/14q22.2 and rs4779584/ 15q13.3, were selected for the current study from the qualified studies. Then meta-analyses based on larger sample sizes with average of 33,000 CRC cases and 34,000 controls were performed to assess the association between SNPs and CRC risk. Heterogeneity among studies and publication bias were assessed by the χ2-based Q statistic test Begg's funnel plot or Egger's test, respectively. Our meta-analysis confirmed significant associations of the five SNPs with CRC risk under different genetic models. Two risk variants at rs6983267 {Odds Ratio (OR) 1.388, 95% Confidence Interval (CI) 1.180-1.8633} and rs10795668 (OR 1.323, 95% CI 1.062-1.648) had the highest ORs in homogeneous model. While ORs of the other three variants at rs4939827 {OR 1.298, 95% CI 1.135-1.483}, rs4779584 (OR 1.261, 95% CI 1.146-1.386) and rs4444235 (OR 1.160, 95% CI 1.106-1.216) were also statistically significant. Sensitivity analyses and publication bias assessment indicated the robust stability and reliability of the results. PMID:27661122

  19. Counting digital filter

    NASA Technical Reports Server (NTRS)

    Zohar, S.

    1977-01-01

    Overall design of filter combines radix converter with ADC in single functional unit that directly converts analog input to its negative binary representation. Four basic elements of filter are fixed register, shift register, counter, and accumulator.

  20. Bag filters for TPP

    SciTech Connect

    L.V. Chekalov; Yu.I. Gromov; V.V. Chekalov

    2007-05-15

    Cleaning of TPP flue gases with bag filters capable of pulsed regeneration is examined. A new filtering element with a three-dimensional filtering material formed from a needle-broached cloth in which the filtration area, as compared with a conventional smooth bag, is increased by more than two times, is proposed. The design of a new FRMI type of modular filter is also proposed. A standard series of FRMI filters with a filtration area ranging from 800 to 16,000 m{sup 2} is designed for an output more than 1 million m{sub 3}/h of with respect to cleaned gas. The new bag filter permits dry collection of sulfur oxides from waste gases at TPP operating on high-sulfur coals. The design of the filter makes it possible to replace filter elements without taking the entire unit out of service.

  1. MST Filterability Tests

    SciTech Connect

    Poirier, M. R.; Burket, P. R.; Duignan, M. R.

    2015-03-12

    The Savannah River Site (SRS) is currently treating radioactive liquid waste with the Actinide Removal Process (ARP) and the Modular Caustic Side Solvent Extraction Unit (MCU). The low filter flux through the ARP has limited the rate at which radioactive liquid waste can be treated. Recent filter flux has averaged approximately 5 gallons per minute (gpm). Salt Batch 6 has had a lower processing rate and required frequent filter cleaning. Savannah River Remediation (SRR) has a desire to understand the causes of the low filter flux and to increase ARP/MCU throughput. In addition, at the time the testing started, SRR was assessing the impact of replacing the 0.1 micron filter with a 0.5 micron filter. This report describes testing of MST filterability to investigate the impact of filter pore size and MST particle size on filter flux and testing of filter enhancers to attempt to increase filter flux. The authors constructed a laboratory-scale crossflow filter apparatus with two crossflow filters operating in parallel. One filter was a 0.1 micron Mott sintered SS filter and the other was a 0.5 micron Mott sintered SS filter. The authors also constructed a dead-end filtration apparatus to conduct screening tests with potential filter aids and body feeds, referred to as filter enhancers. The original baseline for ARP was 5.6 M sodium salt solution with a free hydroxide concentration of approximately 1.7 M.3 ARP has been operating with a sodium concentration of approximately 6.4 M and a free hydroxide concentration of approximately 2.5 M. SRNL conducted tests varying the concentration of sodium and free hydroxide to determine whether those changes had a significant effect on filter flux. The feed slurries for the MST filterability tests were composed of simple salts (NaOH, NaNO2, and NaNO3) and MST (0.2 – 4.8 g/L). The feed slurry for the filter enhancer tests contained simulated salt batch 6 supernate, MST, and filter enhancers.

  2. Survey of digital filtering

    NASA Technical Reports Server (NTRS)

    Nagle, H. T., Jr.

    1972-01-01

    A three part survey is made of the state-of-the-art in digital filtering. Part one presents background material including sampled data transformations and the discrete Fourier transform. Part two, digital filter theory, gives an in-depth coverage of filter categories, transfer function synthesis, quantization and other nonlinear errors, filter structures and computer aided design. Part three presents hardware mechanization techniques. Implementations by general purpose, mini-, and special-purpose computers are presented.

  3. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  4. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    PubMed Central

    Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

    2012-01-01

    Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses. PMID:22013517

  5. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island

    PubMed Central

    Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A.; Shouche, Yogesh S.; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1–40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1–20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25–40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  6. Multiple imputation of missing covariates in NONMEM and evaluation of the method's sensitivity to η-shrinkage.

    PubMed

    Johansson, Åsa M; Karlsson, Mats O

    2013-10-01

    Multiple imputation (MI) is an approach widely used in statistical analysis of incomplete data. However, its application to missing data problems in nonlinear mixed-effects modelling is limited. The objective was to implement a four-step MI method for handling missing covariate data in NONMEM and to evaluate the method's sensitivity to η-shrinkage. Four steps were needed; (1) estimation of empirical Bayes estimates (EBEs) using a base model without the partly missing covariate, (2) a regression model for the covariate values given the EBEs from subjects with covariate information, (3) imputation of covariates using the regression model and (4) estimation of the population model. Steps (3) and (4) were repeated several times. The procedure was automated in PsN and is now available as the mimp functionality ( http://psn.sourceforge.net/ ). The method's sensitivity to shrinkage in EBEs was evaluated in a simulation study where the covariate was missing according to a missing at random type of missing data mechanism. The η-shrinkage was increased in steps from 4.5 to 54%. Two hundred datasets were simulated and analysed for each scenario. When shrinkage was low the MI method gave unbiased and precise estimates of all population parameters. With increased shrinkage the estimates became less precise but remained unbiased.

  7. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island.

    PubMed

    Kumbhare, Shreyas V; Dhotre, Dhiraj P; Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A; Shouche, Yogesh S; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1-40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1-20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25-40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity.

  8. Nonlinear Attitude Filtering Methods

    NASA Technical Reports Server (NTRS)

    Markley, F. Landis; Crassidis, John L.; Cheng, Yang

    2005-01-01

    This paper provides a survey of modern nonlinear filtering methods for attitude estimation. Early applications relied mostly on the extended Kalman filter for attitude estimation. Since these applications, several new approaches have been developed that have proven to be superior to the extended Kalman filter. Several of these approaches maintain the basic structure of the extended Kalman filter, but employ various modifications in order to provide better convergence or improve other performance characteristics. Examples of such approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters that propagate and update a discrete set of sigma points rather than using linearized equations for the mean and covariance are also reviewed. A two-step approach is discussed with a first-step state that linearizes the measurement model and an iterative second step to recover the desired attitude states. These approaches are all based on the Gaussian assumption that the probability density function is adequately specified by its mean and covariance. Other approaches that do not require this assumption are reviewed, including particle filters and a Bayesian filter based on a non-Gaussian, finite-parameter probability density function on SO(3). Finally, the predictive filter, nonlinear observers and adaptive approaches are shown. The strengths and weaknesses of the various approaches are discussed.

  9. Filtering by nonlinear systems.

    PubMed

    Campos Cantón, E; González Salas, J S; Urías, J

    2008-12-01

    Synchronization of nonlinear systems forced by external signals is formalized as the response of a nonlinear filter. Sufficient conditions for a nonlinear system to behave as a filter are given. Some examples of generalized chaos synchronization are shown to actually be special cases of nonlinear filtering.

  10. The Ribosome Filter Redux

    PubMed Central

    Mauro, Vincent P.; Edelman, Gerald M.

    2010-01-01

    The ribosome filter hypothesis postulates that ribosomes are not simply translation machines but also function as regulatory elements that differentially affect or filter the translation of particular mRNAs. On the basis of new information, we take the opportunity here to review the ribosome filter hypothesis, suggest specific mechanisms of action, and discuss recent examples from the literature that support it. PMID:17890902

  11. HEPA filter encapsulation

    DOEpatents

    Gates-Anderson, Dianne D.; Kidd, Scott D.; Bowers, John S.; Attebery, Ronald W.

    2003-01-01

    A low viscosity resin is delivered into a spent HEPA filter or other waste. The resin is introduced into the filter or other waste using a vacuum to assist in the mass transfer of the resin through the filter media or other waste.

  12. Filter service system

    DOEpatents

    Sellers, Cheryl L.; Nordyke, Daniel S.; Crandell, Richard A.; Tomlins, Gregory; Fei, Dong; Panov, Alexander; Lane, William H.; Habeger, Craig F.

    2008-12-09

    According to an exemplary embodiment of the present disclosure, a system for removing matter from a filtering device includes a gas pressurization assembly. An element of the assembly is removably attachable to a first orifice of the filtering device. The system also includes a vacuum source fluidly connected to a second orifice of the filtering device.

  13. Associations of OCA2-HERC2 SNPs and haplotypes with human pigmentation characteristics in the Brazilian population.

    PubMed

    Andrade, Edilene S; Fracasso, Nádia C A; Strazza Júnior, Paulo S; Simões, Aguinaldo L; Mendes-Junior, Celso T

    2017-01-01

    Panels composed of Single Nucleotide Polymorphisms (SNPs) in genes related to pigmentation, when associated with different phenotypes, may assist in predicting the physical appearance of an individual, being very useful in forensic caseworks. We evaluated the association of seven OCA2-HERC2 SNPs and haplotypes with pigmentation characteristics (eye, skin, hair and freckles) in the highly admixed and phenotypically heterogeneous Brazilian population. All the seven SNPs evaluated presented one allele associated with phenotypes from at least two pigmentation features and the alternative allele associated with the opposite phenotypes from the same trait. The genotypic associations followed the same pattern for all seven SNPs. Nine haplotypes were observed in our sample and eight were associated with at least two pigmentation traits. Such SNPs and haplotypes could be deemed as good predictors for the presence of freckles and for skin, eye and hair pigmentation in the Brazilian population.

  14. Regenerative particulate filter development

    NASA Technical Reports Server (NTRS)

    Descamp, V. A.; Boex, M. W.; Hussey, M. W.; Larson, T. P.

    1972-01-01

    Development, design, and fabrication of a prototype filter regeneration unit for regenerating clean fluid particle filter elements by using a backflush/jet impingement technique are reported. Development tests were also conducted on a vortex particle separator designed for use in zero gravity environment. A maintainable filter was designed, fabricated and tested that allows filter element replacement without any leakage or spillage of system fluid. Also described are spacecraft fluid system design and filter maintenance techniques with respect to inflight maintenance for the space shuttle and space station.

  15. A unified Kalman filter

    NASA Astrophysics Data System (ADS)

    Stubberud, Allen R.

    2017-01-01

    When considering problems of linear sequential estimation, two versions of the Kalman filter, the continuous-time version and the discrete-time version, are often used. (A hybrid filter also exists.) In many applications in which the Kalman filter is used, the system to which the filter is applied is a linear continuous-time system, but the Kalman filter is implemented on a digital computer, a discrete-time device. The two general approaches for developing a discrete-time filter for implementation on a digital computer are: (1) approximate the continuous-time system by a discrete-time system (called discretization of the continuous-time system) and develop a filter for the discrete-time approximation; and (2) develop a continuous-time filter for the system and then discretize the continuous-time filter. Generally, the two discrete-time filters will be different, that is, it can be said that discretization and filter generation are not, in general, commutative operations. As a result, any relationship between the discrete-time and continuous-time versions of the filter for the same continuous-time system is often obfuscated. This is particularly true when an attempt is made to generate the continuous-time version of the Kalman filter through a simple limiting process (the sample period going to zero) applied to the discrete-time version. The correct result is, generally, not obtained. In a 1961 research report, Kalman showed that the continuous-time Kalman filter can be obtained from the discrete-time Kalman filter by taking limits as the sample period goes to zero if the white noise process for the continuous-time version is appropriately defined. Using this basic concept, a discrete-time Kalman filter can be developed for a continuous-time system as follows: (1) discretize the continuous-time system using Kalman's technique; and (2) develop a discrete-time Kalman filter for that discrete-time system. Kalman's results show that the discrete-time filter generated in

  16. Developmental validation of a custom panel including 273 SNPs for forensic application using Ion Torrent PGM.

    PubMed

    Zhang, Suhua; Bian, Yingnan; Chen, Anqi; Zheng, Hancheng; Gao, Yuzhen; Hou, Yiping; Li, Chengtao

    2017-03-01

    Utilizing massively parallel sequencing (MPS) technology for SNP testing in forensic genetics is becoming attractive because of the shortcomings of STR markers, such as their high mutation rates and disadvantages associated with the current PCR-CE method as well as its limitations regarding multiplex capabilities. MPS offers the potential to genotype hundreds to thousands of SNPs from multiple samples in a single experimental run. In this study, we designed a customized SNP panel that includes 273 forensically relevant identity SNPs chosen from SNPforID, IISNP, and the HapMap database as well as previously related studies and evaluated the levels of genotyping precision, sequence coverage, sensitivity and SNP performance using the Ion Torrent PGM. In a concordant study of the custom MPS-SNP panel, only four MPS callings were missing due to coverage reads that were too low (<20), whereas the others were fully concordant with Sanger's sequencing results across the two control samples, that is, 9947A and 9948. The analyses indicated a balanced coverage among the included loci, with the exception of the 16 SNPs that were used to detect an inconsistent allele balance and/or lower coverage reads among 50 tested individuals from the Chinese HAN population and the above controls. With the exception of the 16 poorly performing SNPs, the sequence coverage obtained was extensive for the bulk of the SNPs, and only three Y-SNPs (rs16980601, rs11096432, rs3900) showed a mean coverage below 1000. Analyses of the dilution series of control DNA 9948 yielded reproducible results down to 1ng of DNA input. In addition, we provide an analysis tool for automated data quality control and genotyping checks, and we conclude that the SNP targets are polymorphic and independent in the Chinese HAN population. In summary, the evaluation of the sensitivity, accuracy and genotyping performance provides strong support for the application of MPS technology in forensic SNP analysis, and the assay

  17. Ceramic fiber filter technology

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1996-06-01

    Fibrous filters have been used for centuries to protect individuals from dust, disease, smoke, and other gases or particulates. In the 1970s and 1980s ceramic filters were developed for filtration of hot exhaust gases from diesel engines. Tubular, or candle, filters have been made to remove particles from gases in pressurized fluidized-bed combustion and gasification-combined-cycle power plants. Very efficient filtration is necessary in power plants to protect the turbine blades. The limited lifespan of ceramic candle filters has been a major obstacle in their development. The present work is focused on forming fibrous ceramic filters using a papermaking technique. These filters are highly porous and therefore very lightweight. The papermaking process consists of filtering a slurry of ceramic fibers through a steel screen to form paper. Papermaking and the selection of materials will be discussed, as well as preliminary results describing the geometry of papers and relative strengths.

  18. Compact planar microwave blocking filters

    NASA Technical Reports Server (NTRS)

    U-Yen, Kongpop (Inventor); Wollack, Edward J. (Inventor)

    2012-01-01

    A compact planar microwave blocking filter includes a dielectric substrate and a plurality of filter unit elements disposed on the substrate. The filter unit elements are interconnected in a symmetrical series cascade with filter unit elements being organized in the series based on physical size. In the filter, a first filter unit element of the plurality of filter unit elements includes a low impedance open-ended line configured to reduce the shunt capacitance of the filter.

  19. Collective effects of SNPs on transgenerational inheritance in Caenorhabditis elegans and budding yeast.

    PubMed

    Zhu, Zuobin; Man, Xian; Xia, Mengying; Huang, Yimin; Yuan, Dejian; Huang, Shi

    2015-07-01

    We studied the collective effects of single nucleotide polymorphisms (SNPs) on transgenerational inheritance in Caenorhabditis elegans recombinant inbred advanced intercross lines (RIAILs) and yeast segregants. We divided the RIAILs and segregants into two groups of high and low minor allele content (MAC). RIAILs with higher MAC needed less generations of benzaldehyde training to gain a stable olfactory imprint and showed a greater change from normal after benzaldehyde training. Yeast segregants with higher MAC showed a more dramatic shortening of the lag phase length after ethanol exposure. The short lag phase as acquired by ethanol training was more dramatically lost after recovery in ethanol free medium for the high MAC group. We also found a preferential association between MAC and traits linked with higher number of additive QTLs. These results suggest a role for the collective effects of SNPs in transgenerational inheritance, and may help explain human variations in disease susceptibility.

  20. SNPs: At the origins of the databases of an innovative biotechnology tool.

    PubMed

    Corfield, Anthony; Meyer, Peter; Kassam, Shelina; Mikuz, Gregor; Sergi, Consolato

    2010-01-01

    The discovery that DNA sequence variations can influence the response of an individual to a drug or can predict the outcome of a disease has added a new dimension to evidence-based medicine. It is clear that the goals, risks, and benefits of drug therapy can be better assessed if the underlying genome of the patient is known. The relevance of identifying patients at increased risk of adverse drug reactions, the application of genomic technologies to drug development and the clarification of the mechanisms of drug action on cells will be important targets in the therapeutic approach to medicine in the 21st century. In this review, we summarize the development of single nucleotide polymorphisms (SNPs) and give computational biological data for SNPs databases.

  1. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  2. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...

  3. Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...

  4. Five known tagging DLL3 SNPs are not associated with congenital scoliosis

    PubMed Central

    Yang, Yong; Wang, Bing-Qiang; Wu, Zhi-Hong; Zhang, Hai-Yan; Qiu, Gui-Xing; Shen, Jian-Xiong; Zhang, Jian-Guo; Zhao, Yu; Wang, Yi-Peng; Fei, Qi

    2016-01-01

    Abstract Genetic etiology hypothesis is widely accepted in the development of congenital scoliosis (CS). The delta-like 3 (DLL3) gene, a member of the Notch signaling pathway, was implicated to contribute to human CS. In this study, a case–control association study was conducted to determine the association of single nucleotide polymorphism (SNP) in the DLL3 gene with CS in a Chinese Han Population. Five known tagging SNPs of the DLL3 gene were genotyped among 270 Chinese Han subjects (128 nonsyndromic CS patients and 142 matched controls). CS patients were divided into 3 types: type I—failure of formation (29 cases), type II—failure of segmentation (50 cases), and type III—mixed defects (49 cases). The 5 SNPs were analyzed by the allelic and genotypic association analysis, genotype–phenotype association analysis, and haplotype analysis. Allele frequencies of 5 tagging SNPs (SNP1: rs1110627, SNP2: rs3212276, SNP3: rs2304223, SNP4: rs2304222, and SNP5: rs2304214) in CS cases and controls were comparable and there were no available inheritance models. The SNPs were not associated with clinical phenotypes. Moreover, the 5 makers in the DLL3 gene were found to be in strong linkage disequilibrium (LD). Both global haplotype and individual haplotype analyses showed that the haplotypes of SNP1/SNP2/SNP3/SNP4/SNP5 did not correlate with the disease (P >0.05). Together, these data suggest that genetic variants of the DLL3 gene are not associated with CS in the Chinese Han population. PMID:27472720

  5. Rare SNPs in receptor tyrosine kinases are negative outcome predictors in multiple myeloma

    PubMed Central

    Langer, Christian; Knop, Stefan; Pischimarov, Jordan; Kull, Miriam; Stühmer, Thorsten; Steinbrunn, Torsten; Bargou, Ralf; Einsele, Hermann; Rosenwald, Andreas; Leich, Ellen

    2016-01-01

    Multiple myeloma (MM) is a plasma cell disorder that is characterized by a great genetic heterogeneity. Recent next generation sequencing studies revealed an accumulation of tumor-associated mutations in receptor tyrosine kinases (RTKs) which may also contribute to the activation of survival pathways in MM. To investigate the clinical role of RTK-mutations in MM, we deep-sequenced the coding DNA-sequence of EGFR, EPHA2, ERBB3, IGF1R, NTRK1 and NTRK2 which were previously found to be mutated in MM, in 75 uniformly treated MM patients of the “Deutsche Studiengruppe Multiples Myelom”. Subsequently, we correlated the detected mutations with common cytogenetic alterations and clinical parameters. We identified 11 novel non-synonymous SNVs or rare patient-specific SNPs, not listed in the SNP databases 1000 genomes and dbSNP, in 10 primary MM cases. The mutations predominantly affected the tyrosine-kinase and ligand-binding domains and no correlation with cytogenetic parameters was found. Interestingly, however, patients with RTK-mutations, specifically those with rare patient-specific SNPs, showed a significantly lower overall, event-free and progression-free survival. This indicates that RTK SNVs and rare patient-specific RTK SNPs are of prognostic relevance and suggests that MM patients with RTK-mutations could potentially profit from treatment with RTK-inhibitors. PMID:27246973

  6. Impact of Single Nucleotide Polymorphisms (SNPs) on Immunosuppressive Therapy in Lung Transplantation

    PubMed Central

    Ruiz, Jesus; Herrero, María José; Bosó, Virginia; Megías, Juan Eduardo; Hervás, David; Poveda, Jose Luis; Escrivá, Juan; Pastor, Amparo; Solé, Amparo; Aliño, Salvador Francisco

    2015-01-01

    Lung transplant patients present important variability in immunosuppressant blood concentrations during the first months after transplantation. Pharmacogenetics could explain part of this interindividual variability. We evaluated SNPs in genes that have previously shown correlations in other kinds of solid organ transplantation, namely ABCB1 and CYP3A5 genes with tacrolimus (Tac) and ABCC2, UGT1A9 and SLCO1B1 genes with mycophenolic acid (MPA), during the first six months after lung transplantation (51 patients). The genotype was correlated to the trough blood drug concentrations corrected for dose and body weight (C0/Dc). The ABCB1 variant in rs1045642 was associated with significantly higher Tac concentration, at six months post-transplantation (CT vs. CC). In the MPA analysis, CT patients in ABCC2 rs3740066 presented significantly lower blood concentrations than CC or TT, three months after transplantation. Other tendencies, confirming previously expected results, were found associated with the rest of studied SNPs. An interesting trend was recorded for the incidence of acute rejection according to NOD2/CARD15 rs2066844 (CT: 27.9%; CC: 12.5%). Relevant SNPs related to Tac and MPA in other solid organ transplants also seem to be related to the efficacy and safety of treatment in the complex setting of lung transplantation. PMID:26307985

  7. Association of SNPs in the PPARγ gene and hypertension in a Mongolian population.

    PubMed

    Yang, L; Tian, R G; Chang, P Y; Yan, M R; Su, X L

    2015-12-29

    The association of single nucleotide polymorphisms (SNPs) in PPARγ with hypertension is controversial. The aim of the present study was to clarify the contributions of PPARγ genetic variants to hypertension through an association study. A total of 414 unrelated Mongolian herdsmen and 524 Han farmers were included in this study. Fourteen intronic SNPs were analyzed and genotyped using a polymerase chain reaction/ligase detection reaction assay. Prior to correction for multiple testing, the SNPs rs6802898 and rs12633551 were significantly associated with the prevalence of hypertension in the Han and Mongolian populations, respectively. The genetic association of each SNP with hypertension was individually tested using logistic regression. The SNP rs6802898 was associated with hypertension in both dominant (P = 0.033) and additive models (P = 0.026) in the Han population, whereas the SNP rs12633551 was associated with hypertension in both dominant (P = 0.014) and additive models (P = 0.0073) in the Mongolian population. Moreover, SNP rs12633551 had a significant effect on systolic and diastolic blood pressure response. However, none of these associations were statistically significant after Bonferroni correction for multiple testing, although there was a significant difference among the haplotypes in the Han and Mongolian populations. Interestingly, there was an association of the PPARγ haplotypes with hypertension even after Bonferroni correction. Thus, determination of the PPARγ haplotypes in different populations may prove informative for assessment of the genetic risk for hypertension.

  8. RNAsnp: Efficient Detection of Local RNA Secondary Structure Changes Induced by SNPs

    PubMed Central

    Sabarinathan, Radhakrishnan; Tafer, Hakim; Seemann, Stefan E; Hofacker, Ivo L; Stadler, Peter F; Gorodkin, Jan

    2013-01-01

    Structural characteristics are essential for the functioning of many noncoding RNAs and cis-regulatory elements of mRNAs. SNPs may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs. The global measures employed so far suffer from limited accuracy of folding programs on large RNAs and are computationally too demanding for genome-wide applications. Here, we present a strategy that focuses on the local regions of maximal structural change between mutant and wild-type. These local regions are approximated in a “screening mode” that is intended for genome-wide applications. Furthermore, localized regions are identified as those with maximal discrepancy. The mutation effects are quantified in terms of empirical P values. To this end, the RNAsnp software uses extensive precomputed tables of the distribution of SNP effects as function of length and GC content. RNAsnp thus achieves both a noise reduction and speed-up of several orders of magnitude over shuffling-based approaches. On a data set comprising 501 SNPs associated with human-inherited diseases, we predict 54 to have significant local structural effect in the untranslated region of mRNAs. RNAsnp is available at http://rth.dk/resources/rnasnp. PMID:23315997

  9. PrimerZ: streamlined primer design for promoters, exons and human SNPs.

    PubMed

    Tsai, Ming-Fang; Lin, Yi-Jung; Cheng, Yu-Chang; Lee, Kuo-Hsi; Huang, Cheng-Chih; Chen, Yuan-Tsong; Yao, Adam

    2007-07-01

    PrimerZ (http://genepipe.ngc.sinica.edu.tw/primerz/) is a web application dedicated primarily to primer design for genes and human SNPs. PrimerZ accepts genes by gene name or Ensembl accession code, and SNPs by dbSNP rs or AFFY_Probe IDs. The promoter and exon sequence information of all gene transcripts fetched from the Ensembl database (http://www.ensembl.org) are processed before being passed on to Primer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) for individual primer design. All results returned from Primer 3 are organized and integrated in a specially designed web page for easy browsing. Besides the web page presentation, csv text file export is also provided for enhanced user convenience. PrimerZ automates highly standard but tedious gene primer design to improve the success rate of PCR experiments. More than 2000 primers have been designed with PrimerZ at our institute since 2004 and the success rate is over 70%. The addition of several new features has made PrimerZ even more useful to the research community in facilitating primer design for promoters, exons and SNPs.

  10. Y-chromosomal SNPs in Finno-Ugric-speaking populations analyzed by minisequencing on microarrays.

    PubMed

    Raitio, M; Lindroos, K; Laukkanen, M; Pastinen, T; Sistonen, P; Sajantila, A; Syvänen, A C

    2001-03-01

    An increasing number of single nucleotide polymorphisms (SNPs) on the Y chromosome are being identified. To utilize the full potential of the SNP markers in population genetic studies, new genotyping methods with high throughput are required. We describe a microarray system based on the minisequencing single nucleotide primer extension principle for multiplex genotyping of Y-chromosomal SNP markers. The system was applied for screening a panel of 25 Y-chromosomal SNPs in a unique collection of samples representing five Finno--Ugric populations. The specific minisequencing reaction provides 5-fold to infinite discrimination between the Y-chromosomal genotypes, and the microarray format of the system allows parallel and simultaneous analysis of large numbers of SNPs and samples. In addition to the SNP markers, five Y-chromosomal microsatellite loci were typed. Altogether 10,000 genotypes were generated to assess the genetic diversity in these population samples. Six of the 25 SNP markers (M9, Tat, SRY10831, M17, M12, 92R7) were polymorphic in the analyzed populations, yielding six distinct SNP haplotypes. The microsatellite data were used to study the genetic structure of two major SNP haplotypes in the Finns and the Saami in more detail. We found that the most common haplotypes are shared between the Finns and the Saami, and that the SNP haplotypes show regional differences within the Finns and the Saami, which supports the hypothesis of two separate settlement waves to Finland.

  11. Functional classification of 15 million SNPs detected from diverse chicken populations

    PubMed Central

    Gheyas, Almas A.; Boschiero, Clarissa; Eory, Lel; Ralph, Hannah; Kuo, Richard; Woolliams, John A.; Burt, David W.

    2015-01-01

    Next-generation sequencing has prompted a surge of discovery of millions of genetic variants from vertebrate genomes. Besides applications in genetic association and linkage studies, a fraction of these variants will have functional consequences. This study describes detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions. The study reports: 183K amino acid-altering SNPs of which 48% predicted as evolutionary intolerant, 13K splicing variants, 51K likely to alter RNA secondary structures, 500K within most conserved elements and 3K from non-coding RNAs. Regions of local fixation within commercial broiler and layer lines were investigated as potential selective sweeps using genome-wide SNP data. Relationships with phenotypes, if any, of the pfVars were explored by overlaying the sweep regions with known QTLs. Based on this, the candidate genes and/or causal mutations for a number of important traits are discussed. Although the fixed variants within sweep regions were enriched with non-coding SNPs, some non-synonymous-intolerant mutations reached fixation, suggesting their possible adaptive advantage. The results presented in this study are expected to have important implications for future genomic research to identify candidate causal mutations and in poultry breeding. PMID:25926514

  12. SNPs of melanocortin 4 receptor (MC4R) associated with body weight in Beagle dogs.

    PubMed

    Zeng, Ruixia; Zhang, Yibo; Du, Peng

    2014-01-01

    Melanocortin 4 receptor (MC4R), which is associated with inherited human obesity, is involoved in food intake and body weight of mammals. To study the relationships between MC4R gene polymorphism and body weight in Beagle dogs, we detected and compared the nucleotide sequence of the whole coding region and 3'- and 5'- flanking regions of the dog MC4R gene (1214 bp). In 120 Beagle dogs, two SNPs (A420C, C895T) were identified and their relation with body weight was analyzed with RFLP-PCR method. The results showed that the SNP at A420C was significantly associated with canine body weight trait when it changed amino acid 101 of the MC4R protein from asparagine to threonine, while canine body weight variations were significant in female dogs when MC4R nonsense mutation at C895T. It suggested that the two SNPs might affect the MC4R gene's function which was relative to body weight in Beagle dogs. Therefore, MC4R was a candidate gene for selecting different size dogs with the MC4R SNPs (A420C, C895T) being potentially valuable as a genetic marker.

  13. Loss and Gain of Human Acidic Mammalian Chitinase Activity by Nonsynonymous SNPs

    PubMed Central

    Okawa, Kazuaki; Ohno, Misa; Kashimura, Akinori; Kimura, Masahiro; Kobayashi, Yuki; Sakaguchi, Masayoshi; Sugahara, Yasusato; Kamaya, Minori; Kino, Yoshihiro; Bauer, Peter O.; Oyama, Fumitaka

    2016-01-01

    Acidic mammalian chitinase (AMCase) is implicated in asthma, allergic inflammation, and food processing. Little is known about genetic and evolutional regulation of chitinolytic activity of AMCase. Here, we relate human AMCase polymorphisms to the mouse AMCase, and show that the highly active variants encoded by nonsynonymous single-nucleotide polymorphisms (nsSNPs) are consistent with the mouse AMCase sequence. The chitinolytic activity of the recombinant human AMCase was significantly lower than that of the mouse counterpart. By creating mouse-human chimeric AMCase protein we found that the presence of the N-terminal region of human AMCase containing conserved active site residues reduced the enzymatic activity of the molecule. We were able to significantly increase the activity of human AMCase by amino acid substitutions encoded by nsSNPs (N45, D47, and R61) with those conserved in the mouse homologue (D45, N47, and M61). For abolition of the mouse AMCase activity, introduction of M61R mutation was sufficient. M61 is conserved in most of primates other than human and orangutan as well as in other mammals. Orangutan has I61 substitution, which also markedly reduced the activity of the mouse AMCase, indicating that the M61 is a crucial residue for the chitinolytic activity. Altogether, our data suggest that human AMCase has lost its chitinolytic activity by integration of nsSNPs during evolution and that the enzyme can be reactivated by introducing amino acids conserved in the mouse counterpart. PMID:27702777

  14. Do SNPs of DRD4 gene predict adult persistence of ADHD in a Chinese sample?

    PubMed

    Li, Yueling; Baker-Ericzen, Mary; Ji, Ning; Chang, Weili; Guan, Lili; Qian, Qiujin; Zhang, Yujuan; Faraone, Stephen V; Wang, Yufeng

    2013-01-30

    The dopamine D4 receptor (DRD4) gene has been frequently studied in relation to attention deficit hyperactivity disorder (ADHD) but little is known about the contribution of single nucleotide polymorphisms (SNPs) of the DRD4 gene to the development and persistence of ADHD. In the present study, we examined the association between two SNPs in DRD4 (rs1800955, rs916455) and adult ADHD persistence in a Chinese sample. Subjects (n=193) were diagnosed with ADHD in childhood and reassessed in young adulthood at an affiliated clinic of Peking University Sixth Hospital. Kaplan-Meier survival analyses and Cox proportional hazard models were used to test the association between ADHD remission and alleles of the two SNPs. DRD4 rs916455 C allele carriers were more likely to have persistent ADHD symptoms in adulthood. No significant association was found between rs1800955 allele and the course of ADHD. These newly detected associations between DRD4 polymorphisms and ADHD prognosis in adulthood may help to predict the persistence of childhood ADHD into adulthood.

  15. Impact of Single Nucleotide Polymorphisms (SNPs) on Immunosuppressive Therapy in Lung Transplantation.

    PubMed

    Ruiz, Jesus; Herrero, María José; Bosó, Virginia; Megías, Juan Eduardo; Hervás, David; Poveda, Jose Luis; Escrivá, Juan; Pastor, Amparo; Solé, Amparo; Aliño, Salvador Francisco

    2015-08-25

    Lung transplant patients present important variability in immunosuppressant blood concentrations during the first months after transplantation. Pharmacogenetics could explain part of this interindividual variability. We evaluated SNPs in genes that have previously shown correlations in other kinds of solid organ transplantation, namely ABCB1 and CYP3A5 genes with tacrolimus (Tac) and ABCC2, UGT1A9 and SLCO1B1 genes with mycophenolic acid (MPA), during the first six months after lung transplantation (51 patients). The genotype was correlated to the trough blood drug concentrations corrected for dose and body weight (C0/Dc). The ABCB1 variant in rs1045642 was associated with significantly higher Tac concentration, at six months post-transplantation (CT vs. CC). In the MPA analysis, CT patients in ABCC2 rs3740066 presented significantly lower blood concentrations than CC or TT, three months after transplantation. Other tendencies, confirming previously expected results, were found associated with the rest of studied SNPs. An interesting trend was recorded for the incidence of acute rejection according to NOD2/CARD15 rs2066844 (CT: 27.9%; CC: 12.5%). Relevant SNPs related to Tac and MPA in other solid organ transplants also seem to be related to the efficacy and safety of treatment in the complex setting of lung transplantation.

  16. 5-HT2A SNPs and the Temperament and Character Inventory.

    PubMed

    Serretti, Alessandro; Calati, Raffaella; Giegling, Ina; Hartmann, Annette M; Möller, Hans-Jürgen; Colombo, Cristina; Rujescu, Dan

    2007-08-15

    Temperamental traits, the most basic part of personality, have been largely correlated with neurotransmitter systems and are under genetic control. Among serotonin candidates, the 2A receptor (5-HT(2A)) received considerable attention. We analyzed four SNPs (rs643627, rs594242, rs6311 and rs6313) in the 5-HT(2A) gene and their association with personality traits, as measured with the Temperament and Character Inventory (TCI). The sample was composed of three sub-groups: two German sub-samples, consisting of a healthy group of 289 subjects (42.6% males, mean age: 45.2+/-14.9) and a psychiatric patient group of 111 suicide attempters (38.7% males, mean age: 39.2+/-13.6), and an Italian sub-sample, composed of 60 mood disorder patients (35.0% males, mean age: 44.0+/-14.8). Controlling for sex, age and educational level, the SNPs were not strongly associated with personality dimensions. Only the rs594242 showed an association with Self-Directedness (p=0.003) in the German sample, while rs6313 was marginally associated with Novelty Seeking (p=0.01) in the Italian sample. We conclude that 5-HT(2A) SNPs may marginally modulate personality traits but further studies are required.

  17. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

    PubMed Central

    Taylor, Ben; Delaney, Aidan J.; Soares, Jorge; Seemann, Torsten; Keane, Jacqueline A.; Harris, Simon R.

    2016-01-01

    Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. SNP-sites is implemented in C and is available under the open source license GNU GPL version 3. PMID:28348851

  18. Hierarchical analysis of 30 Y-chromosome SNPs in European populations.

    PubMed

    Brion, M; Sobrino, B; Blanco-Verea, A; Lareu, M V; Carracedo, A

    2005-01-01

    Analysis of Y-chromosome haplogroups defined by binary polymorphisms, has became a standard approach for studying the origin of modern human populations and for measuring the variability between them. Furthermore, the simplicity and population specificity of binary polymorphisms allows inferences to be drawn about the population origin of any male sample of interest for forensic purposes. From the 245 binary polymorphisms that can be analysed by PCR described in the Y Chromosome Consortium tree, we have selected 30 markers. The set of 30 has been grouped into 4 multiplexes in order to determine the most frequent haplogroups in Europe, using only 1 or 2 multiplexes. In this way, we avoid typing unnecessary SNPs to define the final haplogroup saving effort and cost, since we only need to type 9 SNPs in the best case and in the worst case, no more than 17 SNPs to define the haplogroup. The selected method for allele discrimination was a single base extension reaction using the SNaPshot multiplex kit. A total of 292 samples from 8 different districts of Galicia (northwest Spain) were analysed with this strategy. No significant differences were detected among the different districts, except for the population from Marina Lucense, which showed a distant haplogroup frequency but not higher Phi(st) values.

  19. Evaluating the transferability of 15 European-derived fasting plasma glucose SNPs in Mexican children and adolescents

    PubMed Central

    Langlois, Christine; Abadi, Arkan; Peralta-Romero, Jesus; Alyass, Akram; Suarez, Fernando; Gomez-Zamudio, Jaime; Burguete-Garcia, Ana I.; Yazdi, Fereshteh T.; Cruz, Miguel; Meyre, David

    2016-01-01

    Genome wide association studies (GWAS) have identified single-nucleotide polymorphisms (SNPs) that are associated with fasting plasma glucose (FPG) in adult European populations. The contribution of these SNPs to FPG in non-Europeans and children is unclear. We studied the association of 15 GWAS SNPs and a genotype score (GS) with FPG and 7 metabolic traits in 1,421 Mexican children and adolescents from Mexico City. Genotyping of the 15 SNPs was performed using TaqMan Open Array. We used multivariate linear regression models adjusted for age, sex, body mass index standard deviation score, and recruitment center. We identified significant associations between 3 SNPs (G6PC2 (rs560887), GCKR (rs1260326), MTNR1B (rs10830963)), the GS and FPG level. The FPG risk alleles of 11 out of the 15 SNPs (73.3%) displayed significant or non-significant beta values for FPG directionally consistent with those reported in adult European GWAS. The risk allele frequencies for 11 of 15 (73.3%) SNPs differed significantly in Mexican children and adolescents compared to European adults from the 1000G Project, but no significant enrichment in FPG risk alleles was observed in the Mexican population. Our data support a partial transferability of European GWAS FPG association signals in children and adolescents from the admixed Mexican population. PMID:27782183

  20. Rank and Order: Evaluating the Performance of SNPs for Individual Assignment in a Non-Model Organism

    PubMed Central

    Storer, Caroline G.; Pascal, Carita E.; Roberts, Steven B.; Templin, William D.; Seeb, Lisa W.; Seeb, James E.

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: FST, informativeness (In), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from FST, In, and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  1. Generic Kalman Filter Software

    NASA Technical Reports Server (NTRS)

    Lisano, Michael E., II; Crues, Edwin Z.

    2005-01-01

    The Generic Kalman Filter (GKF) software provides a standard basis for the development of application-specific Kalman-filter programs. Historically, Kalman filters have been implemented by customized programs that must be written, coded, and debugged anew for each unique application, then tested and tuned with simulated or actual measurement data. Total development times for typical Kalman-filter application programs have ranged from months to weeks. The GKF software can simplify the development process and reduce the development time by eliminating the need to re-create the fundamental implementation of the Kalman filter for each new application. The GKF software is written in the ANSI C programming language. It contains a generic Kalman-filter-development directory that, in turn, contains a code for a generic Kalman filter function; more specifically, it contains a generically designed and generically coded implementation of linear, linearized, and extended Kalman filtering algorithms, including algorithms for state- and covariance-update and -propagation functions. The mathematical theory that underlies the algorithms is well known and has been reported extensively in the open technical literature. Also contained in the directory are a header file that defines generic Kalman-filter data structures and prototype functions and template versions of application-specific subfunction and calling navigation/estimation routine code and headers. Once the user has provided a calling routine and the required application-specific subfunctions, the application-specific Kalman-filter software can be compiled and executed immediately. During execution, the generic Kalman-filter function is called from a higher-level navigation or estimation routine that preprocesses measurement data and post-processes output data. The generic Kalman-filter function uses the aforementioned data structures and five implementation- specific subfunctions, which have been developed by the user on

  2. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

    PubMed Central

    2013-01-01

    Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482

  3. Optically tunable optical filter

    NASA Astrophysics Data System (ADS)

    James, Robert T. B.; Wah, Christopher; Iizuka, Keigo; Shimotahira, Hiroshi

    1995-12-01

    We experimentally demonstrate an optically tunable optical filter that uses photorefractive barium titanate. With our filter we implement a spectrum analyzer at 632.8 nm with a resolution of 1.2 nm. We simulate a wavelength-division multiplexing system by separating two semiconductor laser diodes, at 1560 nm and 1578 nm, with the same filter. The filter has a bandwidth of 6.9 nm. We also use the same filter to take 2.5-nm-wide slices out of a 20-nm-wide superluminescent diode centered at 840 nm. As a result, we experimentally demonstrate a phenomenal tuning range from 632.8 to 1578 nm with a single filtering device.

  4. Contactor/filter improvements

    DOEpatents

    Stelman, D.

    1988-06-30

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream is described. The filter includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. A cover screen isolates the filter element from contact with the moving granular bed. In one embodiment, the granular material is comprised of porous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses. 6 figs.

  5. Concentric Split Flow Filter

    NASA Technical Reports Server (NTRS)

    Stapleton, Thomas J. (Inventor)

    2015-01-01

    A concentric split flow filter may be configured to remove odor and/or bacteria from pumped air used to collect urine and fecal waste products. For instance, filter may be designed to effectively fill the volume that was previously considered wasted surrounding the transport tube of a waste management system. The concentric split flow filter may be configured to split the air flow, with substantially half of the air flow to be treated traveling through a first bed of filter media and substantially the other half of the air flow to be treated traveling through the second bed of filter media. This split flow design reduces the air velocity by 50%. In this way, the pressure drop of filter may be reduced by as much as a factor of 4 as compare to the conventional design.

  6. Backward multiple imputation estimation of the conditional lifetime expectancy function with application to censored human longevity data

    PubMed Central

    Kong, Jing; Klein, Barbara E. K.; Klein, Ronald; Wahba, Grace

    2015-01-01

    The conditional lifetime expectancy function (LEF) is the expected lifetime of a subject given survival past a certain time point and the values of a set of explanatory variables. This function is attractive to researchers because it summarizes the entire residual life distribution and has an easy interpretation compared with the popularly used hazard function. In this paper, we propose a general framework of backward multiple imputation for estimating the conditional LEF and the variance of the estimator in the right-censoring setting. Simulation studies are conducted to investigate the empirical properties of the proposed estimator and the corresponding variance estimator. We demonstrate the method on the Beaver Dam Eye Study data, where the expected human lifetime is modeled with smoothing-spline ANOVA given the covariates information including sex, lifestyle factors, and disease variables. PMID:26371300

  7. A set of EST-SNPs for map saturation and cultivar identification in melon

    PubMed Central

    Deleu, Wim; Esteras, Cristina; Roig, Cristina; González-To, Mireia; Fernández-Silva, Iria; Gonzalez-Ibeas, Daniel; Blanca, José; Aranda, Miguel A; Arús, Pere; Nuez, Fernando; Monforte, Antonio J; Picó, Maria Belén; Garcia-Mas, Jordi

    2009-01-01

    Background There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs. Results EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) × 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars. Conclusion This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 × 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon

  8. Germline hereditary, somatic mutations and microRNAs targeting-SNPs in congenital heart defects.

    PubMed

    Sabina, Saverio; Pulignani, Silvia; Rizzo, Milena; Cresci, Monica; Vecoli, Cecilia; Foffa, Ilenia; Ait-Ali, Lamia; Pitto, Letizia; Andreassi, Maria Grazia

    2013-07-01

    Somatic mutations and dysregulation by microRNAs (miRNAs) may have a pivotal role in the Congenital Heart Defects (CHDs). The purpose of the study was to assess both somatic and germline mutations in the GATA4 and NKX2.5 genes as well as to identify 3'UTR single nucleotide polymorphisms (SNPs) in the miRNA target sites. We enrolled 30 patients (13 males; 13.4±8.3 years) with non-syndromic CHD. GATA4 and NKX2.5 genes were screened in cardiac tissue of sporadic and in blood samples of familial cases. Computational methods were used to detect putative miRNAs in the 3'UTR region and to assess the Minimum Free Energy of hybridization (MFE, kcal/mol). Difference of MFEs (ΔMFE) ≥4 kcal/mol between alleles was considered biologically relevant on miRNA binding. The sum of all ΔMFEs (|ΔMFEtot|=∑|ΔMFE|) was calculated in order to predict the biological importance of SNPs binding more miRNAs. No evidence of novel GATA4 and NKX2.5 mutations was found both in sporadic and familial patients. Bioinformatic analysis revealed 27 putative miRNAs binding to identified SNPs in the 3'UTR of GATA4. ΔMFE ≥4 kcal/mol between alleles was obtained for the +354A>C (miR-4299), +587A>G (miR-604), +1355G>A (miR-548v, miR-139-5p) and +1521C>G (miR-583, miR-3125, miR-3928) SNPs. The +1521C>G SNP showed the highest ΔMFEtot (21.66 kcal/mol). Luciferase reporter assays indicated that miR-583 was dose-dependently effective in regulating +1521 C allele compared with +1521 G allele. Based on the analysis of 100 CHD cases and 204 healthy newborns, the +1521 G allele was also associated with a lower risk of CHD (OR=0.5, 95% CI 0.3-0.9, p=0.03), likely due to the relatively low binding of the miRNA and high levels of protein. These results suggest that common SNPs in the 3'UTR of GATA4 alter miRNA gene regulation contributing to the pathogenesis of CHDs.

  9. Hybrid Filter Membrane

    NASA Technical Reports Server (NTRS)

    Laicer, Castro; Rasimick, Brian; Green, Zachary

    2012-01-01

    Cabin environmental control is an important issue for a successful Moon mission. Due to the unique environment of the Moon, lunar dust control is one of the main problems that significantly diminishes the air quality inside spacecraft cabins. Therefore, this innovation was motivated by NASA s need to minimize the negative health impact that air-suspended lunar dust particles have on astronauts in spacecraft cabins. It is based on fabrication of a hybrid filter comprising nanofiber nonwoven layers coated on porous polymer membranes with uniform cylindrical pores. This design results in a high-efficiency gas particulate filter with low pressure drop and the ability to be easily regenerated to restore filtration performance. A hybrid filter was developed consisting of a porous membrane with uniform, micron-sized, cylindrical pore channels coated with a thin nanofiber layer. Compared to conventional filter media such as a high-efficiency particulate air (HEPA) filter, this filter is designed to provide high particle efficiency, low pressure drop, and the ability to be regenerated. These membranes have well-defined micron-sized pores and can be used independently as air filters with discreet particle size cut-off, or coated with nanofiber layers for filtration of ultrafine nanoscale particles. The filter consists of a thin design intended to facilitate filter regeneration by localized air pulsing. The two main features of this invention are the concept of combining a micro-engineered straight-pore membrane with nanofibers. The micro-engineered straight pore membrane can be prepared with extremely high precision. Because the resulting membrane pores are straight and not tortuous like those found in conventional filters, the pressure drop across the filter is significantly reduced. The nanofiber layer is applied as a very thin coating to enhance filtration efficiency for fine nanoscale particles. Additionally, the thin nanofiber coating is designed to promote capture of

  10. Total Variation Electrocardiogram Filtering

    DTIC Science & Technology

    2011-03-01

    hand, the TV smoothing is still a low pass filter, which effectively filters out high-frequency noise. Results We compared the performance of the TV...resulting signal to make the ECG samples positive and to amplify the high-frequency components. Finally, in the last stage, it uses a low -pass filter to...collected during the study on glycemic control in young adults performed at the USDA Beltsville Human Nutrition Center. The study has been approved by

  11. Filter vapor trap

    DOEpatents

    Guon, Jerold

    1976-04-13

    A sintered filter trap is adapted for insertion in a gas stream of sodium vapor to condense and deposit sodium thereon. The filter is heated and operated above the melting temperature of sodium, resulting in a more efficient means to remove sodium particulates from the effluent inert gas emanating from the surface of a liquid sodium pool. Preferably the filter leaves are precoated with a natrophobic coating such as tetracosane.

  12. Smart Filter Design.

    DTIC Science & Technology

    1991-06-01

    polarity relative to phase-state polarity . It was found that zero-state leakage (about 3% in intensity as mentioned) limited useful TPAF performance to...resources. Our first efforts used polar formatted filters having 32 sectors, of which only 16 were independent since the filter was trained as a... polar plane. One common choice for the angle of this line, for example, corresponds to thresholding on the real part of the transform. Fourier filters

  13. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

    PubMed Central

    van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

    2015-01-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  14. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer

    PubMed Central

    Al-Tassan, Nada A.; Whiffin, Nicola; Hosking, Fay J.; Palles, Claire; Farrington, Susan M.; Dobbins, Sara E.; Harris, Rebecca; Gorman, Maggie; Tenesa, Albert; Meyer, Brian F.; Wakil, Salma M.; Kinnersley, Ben; Campbell, Harry; Martin, Lynn; Smith, Christopher G.; Idziaszczyk, Shelley; Barclay, Ella; Maughan, Timothy S.; Kaplan, Richard; Kerr, Rachel; Kerr, David; Buchannan, Daniel D.; Ko Win, Aung; Hopper, John; Jenkins, Mark; Lindor, Noralane M.; Newcomb, Polly A.; Gallinger, Steve; Conti, David; Schumacher, Fred; Casey, Graham; Dunlop, Malcolm G.; Tomlinson, Ian P.; Cheadle, Jeremy P.; Houlston, Richard S.

    2015-01-01

    Genome-wide association studies (GWAS) of colorectal cancer (CRC) have identified 23 susceptibility loci thus far. Analyses of previously conducted GWAS indicate additional risk loci are yet to be discovered. To identify novel CRC susceptibility loci, we conducted a new GWAS and performed a meta-analysis with five published GWAS (totalling 7,577 cases and 9,979 controls of European ancestry), imputing genotypes utilising the 1000 Genomes Project. The combined analysis identified new, significant associations with CRC at 1p36.2 marked by rs72647484 (minor allele frequency [MAF] = 0.09) near CDC42 and WNT4 (P = 1.21 × 10−8, odds ratio [OR] = 1.21 ) and at 16q24.1 marked by rs16941835 (MAF = 0.21, P = 5.06 × 10−8; OR = 1.15) within the long non-coding RNA (lncRNA) RP11-58A18.1 and ~500 kb from the nearest coding gene FOXL1. Additionally we identified a promising association at 10p13 with rs10904849 intronic to CUBN (MAF = 0.32, P = 7.01 × 10-8; OR = 1.14). These findings provide further insights into the genetic and biological basis of inherited genetic susceptibility to CRC. Additionally, our analysis further demonstrates that imputation can be used to exploit GWAS data to identify novel disease-causing variants. PMID:25990418

  15. Linear phase compressive filter

    DOEpatents

    McEwan, Thomas E.

    1995-01-01

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line.

  16. Linear phase compressive filter

    DOEpatents

    McEwan, T.E.

    1995-06-06

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line. 2 figs.

  17. Nanofiber Filters Eliminate Contaminants

    NASA Technical Reports Server (NTRS)

    2009-01-01

    With support from Phase I and II SBIR funding from Johnson Space Center, Argonide Corporation of Sanford, Florida tested and developed its proprietary nanofiber water filter media. Capable of removing more than 99.99 percent of dangerous particles like bacteria, viruses, and parasites, the media was incorporated into the company's commercial NanoCeram water filter, an inductee into the Space Foundation's Space Technology Hall of Fame. In addition to its drinking water filters, Argonide now produces large-scale nanofiber filters used as part of the reverse osmosis process for industrial water purification.

  18. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, T.E.; Alvin, M.A.; Bruck, G.J.; Smeltzer, E.E.

    1999-03-02

    A filter holder and gasket assembly are disclosed for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut. 9 figs.

  19. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, Thomas Edwin; Alvin, Mary Anne; Bruck, Gerald Joseph; Smeltzer, Eugene E.

    1999-03-02

    A filter holder and gasket assembly for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut.

  20. Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The dissection of complex traits of economic importance for the pig industry requires the availability of a significant number of genetic markers, such as SNPs. This study was conducted in order to discover thousands of porcine SNPs using next generation sequencing technologies and use those SNPs, a...

  1. Large-scale characterization of public database SNPs causing non-synonymous changes in three ethnic groups.

    PubMed

    Ireland, James; Carlton, Victoria E H; Falkowski, Matthew; Moorhead, Martin; Tran, Karen; Useche, Francisco; Hardenbol, Paul; Erbilgin, Ayca; Fitzgerald, Ron; Willis, Thomas D; Faham, Malek

    2006-03-01

    Single nucleotide polymorphisms (SNPs) that lead to non-synonymous changes in proteins may have functional effects and be subject to selection. Hence they are of particular interest in the study of genetic diseases. We have genotyped approximately 28,000 such SNPs in three ethnic populations (the HapMap plates) and ten primate species and analyzed these data for evidence of selection. We find SNPs predicted by PolyPhen to be damaging, have lower allele frequencies, and are particularly likely to be population-specific. We have also grouped SNPs by molecular function or biological process of the associated genes and find evidence that selection may be acting in concert on classes of genes.

  2. In silico model-driven assessment of the effects of single nucleotide polymorphisms (SNPs) on human red blood cell metabolism.

    PubMed

    Jamshidi, Neema; Wiback, Sharon J; Palsson B, Bernhard Ø

    2002-11-01

    The completion of the human genome project and the construction of single nucleotide polymorphism (SNP) maps have lead to significant efforts to find SNPs that can be linked to pathophysiology. In silico models of complete biochemical reaction networks relate a cell's individual reactions to the function of the entire network. Sequence variations can in turn be related to kinetic properties of individual enzymes, thus allowing an in silico model-driven assessment of the effects of defined SNPs on overall cellular functions. This process is applied to defined SNPs in two key enzymes of human red blood cell metabolism: glucose-6-phosphate dehydrogenase and pyruvate kinase. The results demonstrate the utility of in silico models in providing insight into differences between red cell function in patients with chronic and nonchronic anemia. In silico models of complex cellular processes are thus likely to aid in defining and understanding key SNPs in human pathophysiology.

  3. Extended range harmonic filter

    NASA Technical Reports Server (NTRS)

    Jankowski, H.; Geia, A. J.; Allen, C. C.

    1973-01-01

    Two types of filters, leaky-wall and open-guide, are combined into single component. Combination gives 10 db or greater additional attenuation to fourth and higher harmonics, at expense of increasing loss of fundamental frequency by perhaps 0.05 to 0.08 db. Filter is applicable to all high power microwave transmitters, but is especially desirable for satellite transmitters.

  4. Tunable acoustical optical filter

    NASA Technical Reports Server (NTRS)

    Lane, A. L.

    1977-01-01

    Solid state filter with active crystal element increases sensitivity and resolution of passive and active spectrometers. Filter is capable of ranging through infrared and visible spectra, can be built as portable device for field use, and is suitable for ecological surveying, for pollution detection, and for pollutant classification.

  5. Filtering reprecipitated slurry

    SciTech Connect

    Morrissey, M.F.

    1992-01-01

    As part of the Late Washing Demonstration at Savannah River Technology Center, Interim Waste Technology has filtered reprecipitated and non reprecipitated slurry with the Experimental Laboratory Filter (ELF) at TNX. Reprecipitated slurry generates higher permeate fluxes than non reprecipitated slurry. Washing reprecipitated slurry may require a defoamer because reprecipitation encourages foaming.

  6. Filtering reprecipitated slurry

    SciTech Connect

    Morrissey, M.F.

    1992-12-31

    As part of the Late Washing Demonstration at Savannah River Technology Center, Interim Waste Technology has filtered reprecipitated and non reprecipitated slurry with the Experimental Laboratory Filter (ELF) at TNX. Reprecipitated slurry generates higher permeate fluxes than non reprecipitated slurry. Washing reprecipitated slurry may require a defoamer because reprecipitation encourages foaming.

  7. Durability of ceramic filters

    SciTech Connect

    Alvin, M.A.; Tressler, R.E.; Lippert, T.E.; Diaz, E.S.; Smeltzer, E.E.

    1994-10-01

    The objectives of this program are to identify the potential long-term thermal/chemical effects that advanced coal-based power generating systems have on the stability of porous ceramic filter materials, as well as to assess the influence of these effects on filter operating performance and life.

  8. Powerful Identification of Cis-regulatory SNPs in Human Primary Monocytes Using Allele-Specific Gene Expression

    PubMed Central

    Almlöf, Jonas Carlsson; Lundmark, Per; Lundmark, Anders; Ge, Bing; Maouche, Seraya; Göring, Harald H. H.; Liljedahl, Ulrika; Enström, Camilla; Brocheton, Jessy; Proust, Carole; Godefroy, Tiphaine; Sambrook, Jennifer G.; Jolley, Jennifer; Crisp-Hihn, Abigail; Foad, Nicola; Lloyd-Jones, Heather; Stephens, Jonathan; Gwilliam, Rhian; Rice, Catherine M.; Hengstenberg, Christian; Samani, Nilesh J.; Erdmann, Jeanette; Schunkert, Heribert; Pastinen, Tomi; Deloukas, Panos; Goodall, Alison H.; Ouwehand, Willem H.; Cambien, François; Syvänen, Ann-Christine

    2012-01-01

    A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers. PMID:23300628

  9. How many single nucleotide polymorphisms (SNPs) are needed to replace short tandem repeats (STRs) in forensic applications?

    PubMed

    Lee, Hyo-Jung; Lee, Jae Won; Jeong, Su Jin; Park, Mira

    2017-02-27

    Short tandem repeats (STRs) are the most commonly used forms of genetic information in forensic identification. In recent times, advances in the information on single nucleotide polymorphisms (SNPs) have raised the possibility that these markers could replace the forensically established STRs. In this work, we conducted comparative simulation studies that allowed us to estimate the number of SNPs needed if these markers were used instead of STRs in criminal cases and paternity investigations.

  10. Implicit Kalman filtering

    NASA Technical Reports Server (NTRS)

    Skliar, M.; Ramirez, W. F.

    1997-01-01

    For an implicitly defined discrete system, a new algorithm for Kalman filtering is developed and an efficient numerical implementation scheme is proposed. Unlike the traditional explicit approach, the implicit filter can be readily applied to ill-conditioned systems and allows for generalization to descriptor systems. The implementation of the implicit filter depends on the solution of the congruence matrix equation (A1)(Px)(AT1) = Py. We develop a general iterative method for the solution of this equation, and prove necessary and sufficient conditions for convergence. It is shown that when the system matrices of an implicit system are sparse, the implicit Kalman filter requires significantly less computer time and storage to implement as compared to the traditional explicit Kalman filter. Simulation results are presented to illustrate and substantiate the theoretical developments.

  11. Sintered composite filter

    DOEpatents

    Bergman, W.

    1986-05-02

    A particulate filter medium formed of a sintered composite of 0.5 micron diameter quartz fibers and 2 micron diameter stainless steel fibers is described. Preferred composition is about 40 vol.% quartz and about 60 vol.% stainless steel fibers. The media is sintered at about 1100/sup 0/C to bond the stainless steel fibers into a cage network which holds the quartz fibers. High filter efficiency and low flow resistance are provided by the smaller quartz fibers. High strength is provided by the stainless steel fibers. The resulting media has a high efficiency and low pressure drop similar to the standard HEPA media, with tensile strength at least four times greater, and a maximum operating temperature of about 550/sup 0/C. The invention also includes methods to form the composite media and a HEPA filter utilizing the composite media. The filter media can be used to filter particles in both liquids and gases.

  12. Sub-micron filter

    DOEpatents

    Tepper, Frederick; Kaledin, Leonid

    2009-10-13

    Aluminum hydroxide fibers approximately 2 nanometers in diameter and with surface areas ranging from 200 to 650 m.sup.2/g have been found to be highly electropositive. When dispersed in water they are able to attach to and retain electronegative particles. When combined into a composite filter with other fibers or particles they can filter bacteria and nano size particulates such as viruses and colloidal particles at high flux through the filter. Such filters can be used for purification and sterilization of water, biological, medical and pharmaceutical fluids, and as a collector/concentrator for detection and assay of microbes and viruses. The alumina fibers are also capable of filtering sub-micron inorganic and metallic particles to produce ultra pure water. The fibers are suitable as a substrate for growth of cells. Macromolecules such as proteins may be separated from each other based on their electronegative charges.

  13. Molecular genetics of nicotine dependence and abstinence: whole genome association using 520,000 SNPs

    PubMed Central

    Uhl, George R; Liu, Qing-Rong; Drgon, Tomas; Johnson, Catherine; Walther, Donna; Rose, Jed E

    2007-01-01

    Background Classical genetic studies indicate that nicotine dependence is a substantially heritable complex disorder. Genetic vulnerabilities to nicotine dependence largely overlap with genetic vulnerabilities to dependence on other addictive substances. Successful abstinence from nicotine displays substantial heritable components as well. Some of the heritability for the ability to quit smoking appears to overlap with the genetics of nicotine dependence and some does not. We now report genome wide association studies of nicotine dependent individuals who were successful in abstaining from cigarette smoking, nicotine dependent individuals who were not successful in abstaining and ethnically-matched control subjects free from substantial lifetime use of any addictive substance. Results These data, and their comparison with data that we have previously obtained from comparisons of four other substance dependent vs control samples support two main ideas: 1) Single nucleotide polymorphisms (SNPs) whose allele frequencies distinguish nicotine-dependent from control individuals identify a set of genes that overlaps significantly with the set of genes that contain markers whose allelic frequencies distinguish the four other substance dependent vs control groups (p < 0.018). 2) SNPs whose allelic frequencies distinguish successful vs unsuccessful abstainers cluster in small genomic regions in ways that are highly unlikely to be due to chance (Monte Carlo p < 0.00001). Conclusion These clustered SNPs nominate candidate genes for successful abstinence from smoking that are implicated in interesting functions: cell adhesion, enzymes, transcriptional regulators, neurotransmitters and receptors and regulation of DNA, RNA and proteins. As these observations are replicated, they will provide an increasingly-strong basis for understanding mechanisms of successful abstinence, for identifying individuals more or less likely to succeed in smoking cessation efforts and for tailoring

  14. Tag SNPs detect association of the CYP1B1 gene with primary open angle glaucoma

    PubMed Central

    Hewitt, Alex W.; Mackey, David A.; Mitchell, Paul; Craig, Jamie E.

    2010-01-01

    Purpose The cytochrome p450 family 1 subfamily B (CYP1B1) gene is a well known cause of autosomal recessive primary congenital glaucoma. It has also been postulated as a modifier of disease severity in primary open angle glaucoma (POAG), particularly in juvenile onset families. However, the role of common variation in the gene in relation to POAG has not been thoroughly explored. Methods Seven tag single nucleotide polymorphisms (SNPs), including two coding variants (L432V and N543S), were genotyped in 860 POAG cases and 898 examined normal controls. Each SNP and haplotype was assessed for association with disease. In addition, a subset of 396 severe cases and 452 elderly controls were analyzed separately. Results There was no association of any individual SNP in the full data set. Two SNPs (rs162562 and rs10916) were nominally associated under a dominant model in the severe cases (p<0.05). A common haplotype (AGCAGCC) was also found to be nominally associated in both the full data set (p=0.048, OR [95%CI]=0.83 [0.69–0.90]) and more significantly in the severe cases (p=0.004, OR [95%CI]=0.68 [0.52–0.89]) which survives correction for multiple testing. Conclusions Although no major effect of common variation at the CYP1B1 locus on POAG was found, there could be an effect of SNPs tagged by rs162562 and represented on the AGCAGCC haplotype. PMID:21139974

  15. Genetic Association of Recovery from Eating Disorders: The Role of GABA Receptor SNPs

    PubMed Central

    Bloss, Cinnamon S; Berrettini, Wade; Bergen, Andrew W; Magistretti, Pierre; Duvvuri, Vikas; Strober, Michael; Brandt, Harry; Crawford, Steve; Crow, Scott; Fichter, Manfred M; Halmi, Katherine A; Johnson, Craig; Kaplan, Allan S; Keel, Pamela; Klump, Kelly L; Mitchell, James; Treasure, Janet; Woodside, D Blake; Marzola, Enrica; Schork, Nicholas J; Kaye, Walter H

    2011-01-01

    Follow-up studies of eating disorders (EDs) suggest outcomes ranging from recovery to chronic illness or death, but predictors of outcome have not been consistently identified. We tested 5151 single-nucleotide polymorphisms (SNPs) in approximately 350 candidate genes for association with recovery from ED in 1878 women. Initial analyses focused on a strictly defined discovery cohort of women who were over age 25 years, carried a lifetime diagnosis of an ED, and for whom data were available regarding the presence (n=361 ongoing symptoms in the past year, ie, ‘ill') or absence (n=115 no symptoms in the past year, ie, ‘recovered') of ED symptoms. An intronic SNP (rs17536211) in GABRG1 showed the strongest statistical evidence of association (p=4.63 × 10−6, false discovery rate (FDR)=0.021, odds ratio (OR)=0.46). We replicated these findings in a more liberally defined cohort of women age 25 years or younger (n=464 ill, n=107 recovered; p=0.0336, OR=0.68; combined sample p=4.57 × 10−6, FDR=0.0049, OR=0.55). Enrichment analyses revealed that GABA (γ-aminobutyric acid) SNPs were over-represented among SNPs associated at p<0.05 in both the discovery (Z=3.64, p=0.0003) and combined cohorts (Z=2.07, p=0.0388). In follow-up phenomic association analyses with a third independent cohort (n=154 ED cases, n=677 controls), rs17536211 was associated with trait anxiety (p=0.049), suggesting a possible mechanism through which this variant may influence ED outcome. These findings could provide new insights into the development of more effective interventions for the most treatment-resistant patients. PMID:21750581

  16. Discovering Genome-Wide Tag SNPs Based on the Mutual Information of the Variants

    PubMed Central

    Elmas, Abdulkadir; Ou Yang, Tai-Hsien; Wang, Xiaodong

    2016-01-01

    Exploring linkage disequilibrium (LD) patterns among the single nucleotide polymorphism (SNP) sites can improve the accuracy and cost-effectiveness of genomic association studies, whereby representative (tag) SNPs are identified to sufficiently represent the genomic diversity in populations. There has been considerable amount of effort in developing efficient algorithms to select tag SNPs from the growing large-scale data sets. Methods using the classical pairwise-LD and multi-locus LD measures have been proposed that aim to reduce the computational complexity and to increase the accuracy, respectively. The present work solves the tag SNP selection problem by efficiently balancing the computational complexity and accuracy, and improves the coverage in genomic diversity in a cost-effective manner. The employed algorithm makes use of mutual information to explore the multi-locus association between SNPs and can handle different data types and conditions. Experiments with benchmark HapMap data sets show comparable or better performance against the state-of-the-art algorithms. In particular, as a novel application, the genome-wide SNP tagging is performed in the 1000 Genomes Project data sets, and produced a well-annotated database of tagging variants that capture the common genotype diversity in 2,504 samples from 26 human populations. Compared to conventional methods, the algorithm requires as input only the genotype (or haplotype) sequences, can scale up to genome-wide analyses, and produces accurate solutions with more information-rich output, providing an improved platform for researchers towards the subsequent association studies. PMID:27992465

  17. Regulatory SNPs Alter the Gene Expression of Diabetic Retinopathy Associated Secretary Factors

    PubMed Central

    Chen, Chian-Feng; Liou, Shiow-Wen; Wu, Hsin-Han; Lin, Chin-Hui; Huang, Li-Shan; Woung, Lin-Chung; Tsai, Ching-Yao

    2016-01-01

    Objectives: Diabetic retinopathy (DR) is a common microvascular complication in both type I and type II diabetes. Several previous reports indicated the serum centration of some secretary factors were highly associated with DR. Therefore, we hypothesis regulatory SNPs (rSNPs) genotype in secretary factors may alter these gene expression and lead to DR. Methods: At first, pyrosequencing were applying to screen the SNPs which present allele frequency different in DR and DNR. Then individual genotyping was processed by Taqman assays in Taiwanese DR and DNR patients. To evaluate the effect of SNP allele on transcriptional activity, we measured promoter activity using luciferase reporter constructs. Results: We found the frequencies of the CC, CG, and GG genotype of the rs2010963 polymorphism were 15.09%, 47.14%, and 37.74% in DR and 12.90%, 19.35%, and 67.74% in DNR, respectively (p = 0.0205). The prevalence of DR was higher (p = 0.00793) in patients with the CC or CG genotype (62.26% and 32.26% for DR and DNR, respectively) compared with the patients with the GG genotype. To evaluate the effect of rs2010963-C allele on transcriptional activity, we measured promoter activity using luciferase reporter constructs. The rs2010963-C reporter showed 1.6 to 2-fold higher luciferase activity than rs2010963-G in 3 cell lines. Conclusion: Our data proposed rs2010963-C altered the expression level of VEGFA in different tissues. We suggested small increase but long term exposure to VEGFA may lead to DR finally. PMID:27648002

  18. A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

    PubMed Central

    Bakir-Gungor, Burcu; Sezerman, Osman Ugur

    2011-01-01

    Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network

  19. Differences in allele frequencies of autosomal dominant hypercholesterolemia SNPs in the Malaysian population.

    PubMed

    Alex, Livy; Chahil, Jagdish Kaur; Lye, Say Hean; Bagali, Pramod; Ler, Lian Wee

    2012-06-01

    Hypercholesterolemia is caused by different interactions of lifestyle and genetic determinants. At the genetic level, it can be attributed to the interactions of multiple polymorphisms, or as in the example of familial hypercholesterolemia (FH), it can be the result of a single mutation. A large number of genetic markers, mostly single nucleotide polymorphisms (SNP) or mutations in three genes, implicated in autosomal dominant hypercholesterolemia (ADH), viz APOB (apolipoprotein B), LDLR (low density lipoprotein receptor) and PCSK9 (proprotein convertase subtilisin/kexin type-9), have been identified and characterized. However, such studies have been insufficiently undertaken specifically in Malaysia and Southeast Asia in general. The main objective of this study was to identify ADH variants, specifically ADH-causing mutations and hypercholesterolemia-associated polymorphisms in multiethnic Malaysian population. We aimed to evaluate published SNPs in ADH causing genes, in this population and to report any unusual trends. We examined a large number of selected SNPs from previous studies of APOB, LDLR, PCSK9 and other genes, in clinically diagnosed ADH patients (n=141) and healthy control subjects (n=111). Selection of SNPs was initiated by searching within genes reported to be associated with ADH from known databases. The important finding was 137 mono-allelic markers (44.1%) and 173 polymorphic markers (55.8%) in both subject groups. By comparing to publicly available data, out of the 137 mono-allelic markers, 23 markers showed significant differences in allele frequency among Malaysians, European Whites, Han Chinese, Yoruba and Gujarati Indians. Our data can serve as reference for others in related fields of study during the planning of their experiments.

  20. A Joint Association Test for Multiple SNPs in Genetic Case-Control Studies

    PubMed Central

    Wang, Tao; Jacob, Howard; Ghosh, Soumitra; Wang, Xujing; Zeng, Zhao-Bang

    2009-01-01

    For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods. PMID:18770519

  1. A genome-wide association study of heat stress-associated SNPs in catfish.

    PubMed

    Jin, Y; Zhou, T; Geng, X; Liu, S; Chen, A; Yao, J; Jiang, C; Tan, S; Su, B; Liu, Z

    2017-04-01

    Heat tolerance is a complex and economically important trait for catfish genetic breeding programs. With global climate change, it is becoming an increasingly important trait. To better understand the molecular basis of heat stress, a genome-wide association study (GWAS) was carried out using the 250 K catfish SNP array with interspecific backcross progenies, which derived from crossing female channel catfish with male F1 hybrid catfish (female channel catfish × male blue catfish). Three significant associated SNPs were detected by performing an EMMAX approach for GWAS. The SNP located on linkage group 14 explained 12.1% of phenotypical variation. The other two SNPs, located on linkage group 16, explained 11.3 and 11.5% of phenotypical variation respectively. A total of 14 genes with heat stress related functions were detected within the significant associated regions. Among them, five genes-TRAF2, FBXW5, ANAPC2, UBR1 and KLHL29- have known functions in the protein degradation process through the ubiquitination pathway. Other genes related to heat stress include genes involved in protein biosynthesis (PRPF4 and SYNCRIP), protein folding (DNAJC25), molecule and iron transport (SLC25A46 and CLIC5), cytoskeletal reorganization (COL12A1) and energy metabolism (COX7A2, PLCB1 and PLCB4) processes. The results provide fundamental information about genes and pathways that is useful for further investigation into the molecular mechanisms of heat stress. The associated SNPs could be promising candidates for selecting heat-tolerant catfish lines after validating their effects on larger and various catfish populations.

  2. Re-evaluating data quality of dog mitochondrial, Y chromosomal, and autosomal SNPs genotyped by SNP array.

    PubMed

    O Otecko, Newton; Peng, Min-Sheng; Yang, He-Chuan; Zhang, Ya-Ping; Wang, Guo-Dong

    2016-11-18

    Quality deficiencies in single nucleotide polymorphism (SNP) analyses have important implications. We used missingness rates to investigate the quality of a recently published dataset containing 424 mitochondrial, 211 Y chromosomal, and 160 432 autosomal SNPs generated by a semicustom Illumina SNP array from 5 392 dogs and 14 grey wolves. Overall, the individual missingness rate for mitochondrial SNPs was ~43.8%, with 980 (18.1%) individuals completely missing mitochondrial SNP genotyping (missingness rate=1). In males, the genotype missingness rate was ~28.8% for Y chromosomal SNPs, with 374 males recording rates above 0.96. These 374 males also exhibited completely failed mitochondrial SNPs genotyping, indicative of a batch effect. Individual missingness rates for autosomal markers were greater than zero, but less than 0.5. Neither mitochondrial nor Y chromosomal SNPs achieved complete genotyping (locus missingness rate=0), whereas 5.9% of autosomal SNPs had a locus missingness rate=1. The high missingness rates and possible batch effect show that caution and rigorous measures are vital when genotyping and analyzing SNP array data for domestic animals. Further improvements of these arrays will be helpful to future studies.

  3. Re-evaluating data quality of dog mitochondrial, Y chromosomal, and autosomal SNPs genotyped by SNP array

    PubMed Central

    OTECKO, Newton O.; PENG, Min-Sheng; YANG, He-Chuan; ZHANG, Ya-Ping; WANG, Guo-Dong

    2016-01-01

    Quality deficiencies in single nucleotide polymorphism (SNP) analyses have important implications. We used missingness rates to investigate the quality of a recently published dataset containing 424 mitochondrial, 211 Y chromosomal, and 160 432 autosomal SNPs generated by a semicustom Illumina SNP array from 5 392 dogs and 14 grey wolves. Overall, the individual missingness rate for mitochondrial SNPs was ~43.8%, with 980 (18.1%) individuals completely missing mitochondrial SNP genotyping (missingness rate=1). In males, the genotype missingness rate was ~28.8% for Y chromosomal SNPs, with 374 males recording rates above 0.96. These 374 males also exhibited completely failed mitochondrial SNPs genotyping, indicative of a batch effect. Individual missingness rates for autosomal markers were greater than zero, but less than 0.5. Neither mitochondrial nor Y chromosomal SNPs achieved complete genotyping (locus missingness rate=0), whereas 5.9% of autosomal SNPs had a locus missingness rate=1. The high missingness rates and possible batch effect show that caution and rigorous measures are vital when genotyping and analyzing SNP array data for domestic animals. Further improvements of these arrays will be helpful to future studies. PMID:28105800

  4. Association of polymorphisms in 9p21 region with CAD in North Indian population: replication of SNPs identified through GWAS.

    PubMed

    Kumar, J; Yumnam, S; Basu, T; Ghosh, A; Garg, G; Karthikeyan, G; Sengupta, S

    2011-06-01

    Coronary artery disease (CAD) is one of the leading causes of death worldwide that is influenced by both environmental as well as genetic factors. Several recent genome-wide association studies (GWAS) have reported the association of multiple single nucleotide polymorphisms (SNPs) mainly in the 9p21 region with CAD. However, the association of these SNPs with CAD has not been rigorously tested in Indian population, which accounts for the largest incidences of CAD in the world. Herein, we genotyped six such SNPs (rs10116277, rs10757274, rs1333040, rs2383206, rs2383207 and rs1994016) identified through GWAS, in 754 individuals (311 angiography-confirmed CAD patients and 443 treadmill test controls) recruited mainly from North India to evaluate if these SNPs were associated with CAD. The minor allele frequency of these six SNPs was comparable to that reported in the respective GWAS. We found that three of these SNPs (rs10116277, rs1333040 and rs2383206) present at the locus 9p21 were significantly associated with CAD even after controlling for the confounding factors such as age, sex, body mass index, homocysteine, hypertension, diabetes, smoking, diet, etc. In conclusion, the locus 9p21 found to be significantly associated with cardiovascular diseases in the Caucasian populations seems to be also important in North Indian population.

  5. Association of Genome-Wide Association Study (GWAS) Identified SNPs and Risk of Breast Cancer in an Indian Population

    PubMed Central

    Nagrani, Rajini; Mhatre, Sharayu; Rajaraman, Preetha; Chatterjee, Nilanjan; Akbari, Mohammad R.; Boffetta, Paolo; Brennan, Paul; Badwe, Rajendra; Gupta, Sudeep; Dikshit, Rajesh

    2017-01-01

    To date, no studies have investigated the association of the GWAS-identified SNPs with BC risk in Indian population. We investigated the association of 30 previously reported and replicated BC susceptibility SNPs in 1,204 cases and 1,212 controls from a hospital based case-control study conducted at the Tata Memorial Hospital, Mumbai. As a measure of total susceptibility burden, the polygenic risk score (PRS) for each individual was defined by the weighted sum of genotypes from 21 independent SNPs with weights derived from previously published estimates of association odds-ratios. Logistic regression models were used to assess risk associated with individual SNPs and overall PRS, and stratified by menopausal and receptor status. A total of 11 SNPs from eight genomic regions (FGFR2, 9q31.2, MAP3K, CCND1, ZM1Z1, RAD51L11, ESR1 and UST) showed statistically significant (p-value ≤ 0.05) evidence of association, either overall or when stratified by menopausal status or hormone receptor status. BC SNPs previously identified in Caucasian population showed evidence of replication in the Indian population mainly with respect to risk of postmenopausal and hormone receptor positive BC. PMID:28098224

  6. In Silico Analysis of SNPs in PARK2 and PINK1 Genes That Potentially Cause Autosomal Recessive Parkinson Disease

    PubMed Central

    Ibrahim, Mohamed Osama Mirghani; Mirghani, Yousra Abdelazim; Hassan, Mohamed Ahmed Salih

    2016-01-01

    Introduction. Parkinson's disease (PD) is a common neurodegenerative disorder. Mutations in PINK1 are the second most common agents causing autosomal recessive, early onset PD. We aimed to identify the pathogenic SNPs in PARK2 and PINK1 using in silico prediction software and their effect on the structure, function, and regulation of the proteins. Materials and Methods. We carried out in silico prediction of structural effect of each SNP using different bioinformatics tools to predict substitution influence on protein structure and function. Result. Twenty-one SNPs in PARK2 gene were found to affect transcription factor binding activity. 185 SNPs were found to affect splicing. Ten SNPs were found to affect the miRNA binding site. Two SNPs rs55961220 and rs56092260 affected the structure, function, and stability of Parkin protein. In PINK1 gene only one SNP (rs7349186) was found to affect the structure, function, and stability of the PINK1 protein. Ten SNPs were found to affect the microRNA binding site. Conclusion. Better understanding of Parkinson's disease caused by mutations in PARK2 and PINK1 genes was achieved using in silico prediction. Further studies should be conducted with a special consideration of the ethnic diversity of the different populations. PMID:28127307

  7. Identification and validation of regulatory SNPs that modulate transcription factor chromatin binding and gene expression in prostate cancer

    PubMed Central

    Jin, Hong-Jian; Jung, Segun; DebRoy, Auditi R.; Davuluri, Ramana V.

    2016-01-01

    Prostate cancer (PCa) is the second most common solid tumor for cancer related deaths in American men. Genome wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with the increased risk of PCa. Because most of the susceptibility SNPs are located in noncoding regions, little is known about their functional mechanisms. We hypothesize that functional SNPs reside in cell type-specific regulatory elements that mediate the binding of critical transcription factors (TFs), which in turn result in changes in target gene expression. Using PCa-specific functional genomics data, here we identify 38 regulatory candidate SNPs and their target genes in PCa. Through risk analysis by incorporating gene expression and clinical data, we identify 6 target genes (ZG16B, ANKRD5, RERE, FAM96B, NAALADL2 and GTPBP10) as significant predictors of PCa biochemical recurrence. In addition, 5 SNPs (rs2659051, rs10936845, rs9925556, rs6057110 and rs2742624) are selected for experimental validation using Chromatin immunoprecipitation (ChIP), dual-luciferase reporter assay in LNCaP cells, showing allele-specific enhancer activity. Furthermore, we delete the rs2742624-containing region using CRISPR/Cas9 genome editing and observe the drastic downregulation of its target gene UPK3A. Taken together, our results illustrate that this new methodology can be applied to identify regulatory SNPs and their target genes that likely impact PCa risk. We suggest that similar studies can be performed to characterize regulatory variants in other diseases. PMID:27409348

  8. Sequence Diversity and Large-Scale Typing of SNPs in the Human Apolipoprotein E Gene

    PubMed Central

    Nickerson, Deborah A.; Taylor, Scott L.; Fullerton, Stephanie M.; Weiss, Kenneth M.; Clark, Andrew G.; Stengård, Jari H.; Salomaa, Veikko; Boerwinkle, Eric; Sing, Charles F.

    2000-01-01

    A common strategy for genotyping large samples begins with the characterization of human single nucleotide polymorphisms (SNPs) by sequencing candidate regions in a small sample for SNP discovery. This is usually followed by typing in a large sample those sites observed to vary in a smaller sample. We present results from a systematic investigation of variation at the human apolipoprotein E locus (APOE), as well as the evaluation of the two-tiered sampling strategy based on these data. We sequenced 5.5 kb spanning the entire APOE genomic region in a core sample of 72 individuals, including 24 each of African-Americans from Jackson, Mississippi; European-Americans from Rochester, Minnesota; and Europeans from North Karelia, Finland. This sequence survey detected 21 SNPs and 1 multiallelic indel, 14 of which had not been previously reported. Alleles varied in relative frequency among the populations, and 10 sites were polymorphic in only a single population sample. Oligonucleotide ligation assays (OLA) were developed for 20 of these sites (omitting the indel and a closely-linked SNP). These were then scored in 2179 individuals sampled from the same three populations (n = 843, 884, and 452, respectively). Relative allele frequencies were generally consistent with estimates from the core sample, although variation was found in some populations in the larger sample at SNPs that were monomorphic in the corresponding smaller core sample. Site variation in the larger samples showed no systematic deviation from Hardy-Weinberg expectation. The large OLA sample clearly showed that variation in many, but not all, of OLA-typed SNPs is significantly correlated with the classical protein-coding variants, implying that there may be important substructure within the classical ɛ2, ɛ3, and ɛ4 alleles. Comparison of the levels and patterns of polymorphism in the core samples with those estimated for the OLA-typed samples shows how nucleotide diversity is underestimated when

  9. Association between SNPs in genes involved in folate metabolism and preterm birth risk.

    PubMed

    Wang, B J; Liu, M J; Wang, Y; Dai, J R; Tao, J Y; Wang, S N; Zhong, N; Chen, Y

    2015-02-02

    We investigated the association between 12 single nucleotide polymorphisms (SNPs) in 11 genes involved in folate metabolic and preterm birth. A subset of SNPs selected from 11 genes/loci involved in the folic acid metabolism pathway were subjected to SNaPshot analysis in a case-control study. Twelve SNPs (CBS-C699T, DHFR-c594+59del19, GST01-C428T, MTHFD-G1958A, MTHFR-C677T, MTHFR-A1298C, MTR-A2756G, MTRR-A66G, NFE2L2-ins1+C11108T, RFC1-G80A, TCN2-C776G, and TYMS-1494del6) in 503 DNA samples were simultaneously tested, and included 315 preterm births and 188 controls. None of the 12 SNP genotype distributions related to the folic acid metabolism pathway showed a significant difference between preterm and term babies. The frequency of the compound mutation genotype of MTHFD-G1958A, MTR-A2756G and RFC1-G80A in preterm babies was 7.3%, which was significantly higher than the 2.7% in term babies. Seven babies carried the compound mutation genotype of MTHFD-G1958A, MTR-A2756G, and CBS-C699T, but this was not observed in term babies. The frequency of the combined wild-type genotype of MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1- G80A in preterm babies was 3.17%, which was significantly lower than the 7.4% in term babies. The 12 SNPs screened in this study were not independent risk factors of preterm birth. Compound mutation genotypes, including MTHFD-G1958A, MTR-A2756G, and RFC1- G80A and MTHFD-G1958A, MTR-A2756G, and CBS-C699T, may increase the risk of preterm birth. The combined wild-type genotype MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1-G80A may decrease the risk of preterm birth.

  10. Ceramic fiber reinforced filter

    DOEpatents

    Stinton, David P.; McLaughlin, Jerry C.; Lowden, Richard A.

    1991-01-01

    A filter for removing particulate matter from high temperature flowing fluids, and in particular gases, that is reinforced with ceramic fibers. The filter has a ceramic base fiber material in the form of a fabric, felt, paper of the like, with the refractory fibers thereof coated with a thin layer of a protective and bonding refractory applied by chemical vapor deposition techniques. This coating causes each fiber to be physically joined to adjoining fibers so as to prevent movement of the fibers during use and to increase the strength and toughness of the composite filter. Further, the coating can be selected to minimize any reactions between the constituents of the fluids and the fibers. A description is given of the formation of a composite filter using a felt preform of commercial silicon carbide fibers together with the coating of these fibers with pure silicon carbide. Filter efficiency approaching 100% has been demonstrated with these filters. The fiber base material is alternately made from aluminosilicate fibers, zirconia fibers and alumina fibers. Coating with Al.sub.2 O.sub.3 is also described. Advanced configurations for the composite filter are suggested.

  11. Solc filter engineering

    NASA Technical Reports Server (NTRS)

    Rosenberg, W. J.; Title, A. M.

    1982-01-01

    A Solc (1965) filter configuration is presented which is both tunable and spectrally variable, since it possesses an adjustable bandwidth, and which although less efficient than a Lyot filter is attractive because of its spectral versatility. The lossless design, using only an entrance and exit polarizer, improves throughput generally and especially in the IR, where polarizers are less convenient than dichroic sheet polarizers. Attention is given to the transmission profiles of Solc filters with different numbers of elements and split elements, as well as their mechanical design features.

  12. Multilevel filtering elliptic preconditioners

    NASA Technical Reports Server (NTRS)

    Kuo, C. C. Jay; Chan, Tony F.; Tong, Charles

    1989-01-01

    A class of preconditioners is presented for elliptic problems built on ideas borrowed from the digital filtering theory and implemented on a multilevel grid structure. They are designed to be both rapidly convergent and highly parallelizable. The digital filtering viewpoint allows the use of filter design techniques for constructing elliptic preconditioners and also provides an alternative framework for understanding several other recently proposed multilevel preconditioners. Numerical results are presented to assess the convergence behavior of the new methods and to compare them with other preconditioners of multilevel type, including the usual multigrid method as preconditioner, the hierarchical basis method and a recent method proposed by Bramble-Pasciak-Xu.

  13. Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion.

    PubMed

    Vinkhuyzen, A A E; Pedersen, N L; Yang, J; Lee, S H; Magnusson, P K E; Iacono, W G; McGue, M; Madden, P A F; Heath, A C; Luciano, M; Payton, A; Horan, M; Ollier, W; Pendleton, N; Deary, I J; Montgomery, G W; Martin, N G; Visscher, P M; Wray, N R

    2012-04-17

    The personality traits of neuroticism and extraversion are predictive of a number of social and behavioural outcomes and psychiatric disorders. Twin and family studies have reported moderate heritability estimates for both traits. Few associations have been reported between genetic variants and neuroticism/extraversion, but hardly any have been replicated. Moreover, the ones that have been replicated explain only a small proportion of the heritability (<~2%). Using genome-wide single-nucleotide polymorphism (SNP) data from ~12,000 unrelated individuals we estimated the proportion of phenotypic variance explained by variants in linkage disequilibrium with common SNPs as 0.06 (s.e. = 0.03) for neuroticism and 0.12 (s.e. = 0.03) for extraversion. In an additional series of analyses in a family-based sample, we show that while for both traits ~45% of the phenotypic variance can be explained by pedigree data (that is, expected genetic similarity) one third of this can be explained by SNP data (that is, realized genetic similarity). A part of the so-called 'missing heritability' has now been accounted for, but some of the reported heritability is still unexplained. Possible explanations for the remaining missing heritability are that: (i) rare variants that are not captured by common SNPs on current genotype platforms make a major contribution; and/ or (ii) the estimates of narrow sense heritability from twin and family studies are biased upwards, for example, by not properly accounting for nonadditive genetic factors and/or (common) environmental factors.

  14. SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association.

    PubMed

    Dai, James Y; Leblanc, Michael; Smith, Nicholas L; Psaty, Bruce; Kooperberg, Charles

    2009-10-01

    Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.

  15. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

    PubMed

    Lee, S Hong; Ripke, Stephan; Neale, Benjamin M; Faraone, Stephen V; Purcell, Shaun M; Perlis, Roy H; Mowry, Bryan J; Thapar, Anita; Goddard, Michael E; Witte, John S; Absher, Devin; Agartz, Ingrid; Akil, Huda; Amin, Farooq; Andreassen, Ole A; Anjorin, Adebayo; Anney, Richard; Anttila, Verneri; Arking, Dan E; Asherson, Philip; Azevedo, Maria H; Backlund, Lena; Badner, Judith A; Bailey, Anthony J; Banaschewski, Tobias; Barchas, Jack D; Barnes, Michael R; Barrett, Thomas B; Bass, Nicholas; Battaglia, Agatino; Bauer, Michael; Bayés, Mònica; Bellivier, Frank; Bergen, Sarah E; Berrettini, Wade; Betancur, Catalina; Bettecken, Thomas; Biederman, Joseph; Binder, Elisabeth B; Black, Donald W; Blackwood, Douglas H R; Bloss, Cinnamon S; Boehnke, Michael; Boomsma, Dorret I; Breen, Gerome; Breuer, René; Bruggeman, Richard; Cormican, Paul; Buccola, Nancy G; Buitelaar, Jan K; Bunney, William E; Buxbaum, Joseph D; Byerley, William F; Byrne, Enda M; Caesar, Sian; Cahn, Wiepke; Cantor, Rita M; Casas, Miguel; Chakravarti, Aravinda; Chambert, Kimberly; Choudhury, Khalid; Cichon, Sven; Cloninger, C Robert; Collier, David A; Cook, Edwin H; Coon, Hilary; Cormand, Bru; Corvin, Aiden; Coryell, William H; Craig, David W; Craig, Ian W; Crosbie, Jennifer; Cuccaro, Michael L; Curtis, David; Czamara, Darina; Datta, Susmita; Dawson, Geraldine; Day, Richard; De Geus, Eco J; Degenhardt, Franziska; Djurovic, Srdjan; Donohoe, Gary J; Doyle, Alysa E; Duan, Jubao; Dudbridge, Frank; Duketis, Eftichia; Ebstein, Richard P; Edenberg, Howard J; Elia, Josephine; Ennis, Sean; Etain, Bruno; Fanous, Ayman; Farmer, Anne E; Ferrier, I Nicol; Flickinger, Matthew; Fombonne, Eric; Foroud, Tatiana; Frank, Josef; Franke, Barbara; Fraser, Christine; Freedman, Robert; Freimer, Nelson B; Freitag, Christine M; Friedl, Marion; Frisén, Louise; Gallagher, Louise; Gejman, Pablo V; Georgieva, Lyudmila; Gershon, Elliot S; Geschwind, Daniel H; Giegling, Ina; Gill, Michael; Gordon, Scott D; Gordon-Smith, Katherine; Green, Elaine K; Greenwood, Tiffany A; Grice, Dorothy E; Gross, Magdalena; Grozeva, Detelina; Guan, Weihua; Gurling, Hugh; De Haan, Lieuwe; Haines, Jonathan L; Hakonarson, Hakon; Hallmayer, Joachim; Hamilton, Steven P; Hamshere, Marian L; Hansen, Thomas F; Hartmann, Annette M; Hautzinger, Martin; Heath, Andrew C; Henders, Anjali K; Herms, Stefan; Hickie, Ian B; Hipolito, Maria; Hoefels, Susanne; Holmans, Peter A; Holsboer, Florian; Hoogendijk, Witte J; Hottenga, Jouke-Jan; Hultman, Christina M; Hus, Vanessa; Ingason, Andrés; Ising, Marcus; Jamain, Stéphane; Jones, Edward G; Jones, Ian; Jones, Lisa; Tzeng, Jung-Ying; Kähler, Anna K; Kahn, René S; Kandaswamy, Radhika; Keller, Matthew C; Kennedy, James L; Kenny, Elaine; Kent, Lindsey; Kim, Yunjung; Kirov, George K; Klauck, Sabine M; Klei, Lambertus; Knowles, James A; Kohli, Martin A; Koller, Daniel L; Konte, Bettina; Korszun, Ania; Krabbendam, Lydia; Krasucki, Robert; Kuntsi, Jonna; Kwan, Phoenix; Landén, Mikael; Långström, Niklas; Lathrop, Mark; Lawrence, Jacob; Lawson, William B; Leboyer, Marion; Ledbetter, David H; Lee, Phil H; Lencz, Todd; Lesch, Klaus-Peter; Levinson, Douglas F; Lewis, Cathryn M; Li, Jun; Lichtenstein, Paul; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Liu, Chunyu; Lohoff, Falk W; Loo, Sandra K; Lord, Catherine; Lowe, Jennifer K; Lucae, Susanne; MacIntyre, Donald J; Madden, Pamela A F; Maestrini, Elena; Magnusson, Patrik K E; Mahon, Pamela B; Maier, Wolfgang; Malhotra, Anil K; Mane, Shrikant M; Martin, Christa L; Martin, Nicholas G; Mattheisen, Manuel; Matthews, Keith; Mattingsdal, Morten; McCarroll, Steven A; McGhee, Kevin A; McGough, James J; McGrath, Patrick J; McGuffin, Peter; McInnis, Melvin G; McIntosh, Andrew; McKinney, Rebecca; McLean, Alan W; McMahon, Francis J; McMahon, William M; McQuillin, Andrew; Medeiros, Helena; Medland, Sarah E; Meier, Sandra; Melle, Ingrid; Meng, Fan; Meyer, Jobst; Middeldorp, Christel M; Middleton, Lefkos; Milanova, Vihra; Miranda, Ana; Monaco, Anthony P; Montgomery, Grant W; Moran, Jennifer L; Moreno-De-Luca, Daniel; Morken, Gunnar; Morris, Derek W; Morrow, Eric M; Moskvina, Valentina; Muglia, Pierandrea; Mühleisen, Thomas W; Muir, Walter J; Müller-Myhsok, Bertram; Murtha, Michael; Myers, Richard M; Myin-Germeys, Inez; Neale, Michael C; Nelson, Stan F; Nievergelt, Caroline M; Nikolov, Ivan; Nimgaonkar, Vishwajit; Nolen, Willem A; Nöthen, Markus M; Nurnberger, John I; Nwulia, Evaristus A; Nyholt, Dale R; O'Dushlaine, Colm; Oades, Robert D; Olincy, Ann; Oliveira, Guiomar; Olsen, Line; Ophoff, Roel A; Osby, Urban; Owen, Michael J; Palotie, Aarno; Parr, Jeremy R; Paterson, Andrew D; Pato, Carlos N; Pato, Michele T; Penninx, Brenda W; Pergadia, Michele L; Pericak-Vance, Margaret A; Pickard, Benjamin S; Pimm, Jonathan; Piven, Joseph; Posthuma, Danielle; Potash, James B; Poustka, Fritz; Propping, Peter; Puri, Vinay; Quested, Digby J; Quinn, Emma M; Ramos-Quiroga, Josep Antoni; Rasmussen, Henrik B; Raychaudhuri, Soumya; Rehnström, Karola; Reif, Andreas; Ribasés, Marta; Rice, John P; Rietschel, Marcella; Roeder, Kathryn; Roeyers, Herbert; Rossin, Lizzy; Rothenberger, Aribert; Rouleau, Guy; Ruderfer, Douglas; Rujescu, Dan; Sanders, Alan R; Sanders, Stephan J; Santangelo, Susan L; Sergeant, Joseph A; Schachar, Russell; Schalling, Martin; Schatzberg, Alan F; Scheftner, William A; Schellenberg, Gerard D; Scherer, Stephen W; Schork, Nicholas J; Schulze, Thomas G; Schumacher, Johannes; Schwarz, Markus; Scolnick, Edward; Scott, Laura J; Shi, Jianxin; Shilling, Paul D; Shyn, Stanley I; Silverman, Jeremy M; Slager, Susan L; Smalley, Susan L; Smit, Johannes H; Smith, Erin N; Sonuga-Barke, Edmund J S; St Clair, David; State, Matthew; Steffens, Michael; Steinhausen, Hans-Christoph; Strauss, John S; Strohmaier, Jana; Stroup, T Scott; Sutcliffe, James S; Szatmari, Peter; Szelinger, Szabocls; Thirumalai, Srinivasa; Thompson, Robert C; Todorov, Alexandre A; Tozzi, Federica; Treutlein, Jens; Uhr, Manfred; van den Oord, Edwin J C G; Van Grootheest, Gerard; Van Os, Jim; Vicente, Astrid M; Vieland, Veronica J; Vincent, John B; Visscher, Peter M; Walsh, Christopher A; Wassink, Thomas H; Watson, Stanley J; Weissman, Myrna M; Werge, Thomas; Wienker, Thomas F; Wijsman, Ellen M; Willemsen, Gonneke; Williams, Nigel; Willsey, A Jeremy; Witt, Stephanie H; Xu, Wei; Young, Allan H; Yu, Timothy W; Zammit, Stanley; Zandi, Peter P; Zhang, Peng; Zitman, Frans G; Zöllner, Sebastian; Devlin, Bernie; Kelsoe, John R; Sklar, Pamela; Daly, Mark J; O'Donovan, Michael C; Craddock, Nicholas; Sullivan, Patrick F; Smoller, Jordan W; Kendler, Kenneth S; Wray, Naomi R

    2013-09-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders.

  16. VEGF-A and VEGFR1 SNPs associate with preeclampsia in a Philippine population.

    PubMed

    Amosco, Melissa D; Villar, Van Anthony M; Naniong, Justin Michael A; David-Bustamante, Lara Marie G; Jose, Pedro A; Palmes-Saloma, Cynthia P

    The vascular endothelial growth factor (VEGF) family is important for establishing normal pregnancy, and related single nucleotide polymorphisms (SNPs) are implicated in abnormal placentation and preeclampsia. We evaluated the association between preeclampsia and several VEGF SNPs among Filipinos, an ethnically distinct group with high prevalence of preeclampsia. The genotypes and allelic variants were determined in a case-control study (191 controls and 165 preeclampsia patients) through SNP analysis of VEGF-A (rs2010963, rs3025039) and VEGF-C (rs7664413) and their corresponding receptors VEGFR1 (rs722503, rs12584067, rs7335588) and VEGFR3 (rs307826) from venous blood DNA. VEGF-A rs3025039 C allele has been shown to associate with preeclampsia (odds ratio of 1.648 (1.03-2.62)), while the T allele bestowed an additive effect for the maintenance of normal, uncomplicated pregnancy and against the development of preeclampsia (odds ratio of 0.62 (0.39-0.98)). VEGFR1 rs722503 is associated with preeclampsia occurring at or after the age of 40 years. The results showed that genetic variability of VEGF-A and VEGFR1 are important in the etiology of preeclampsia among Filipinos.

  17. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  18. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing

    PubMed Central

    Bowers, John E.; Pearl, Stephanie A.; Burke, John M.

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  19. The evolutionary history of Afrocanarian blue tits inferred from genomewide SNPs.

    PubMed

    Gohli, Jostein; Leder, Erica H; Garcia-Del-Rey, Eduardo; Johannessen, Lars Erik; Johnsen, Arild; Laskemoen, Terje; Popp, Magnus; Lifjeld, Jan T

    2015-01-01

    A common challenge in phylogenetic reconstruction is to find enough suitable genomic markers to reliably trace splitting events with short internodes. Here, we present phylogenetic analyses based on genomewide single-nucleotide polymorphisms (SNPs) of an enigmatic avian radiation, the subspecies complex of Afrocanarian blue tits (Cyanistes teneriffae). The two sister species, the Eurasian blue tit (Cyanistes caeruleus) and the azure tit (Cyanistes cyanus), constituted the out-group. We generated a large data set of SNPs for analysis of population structure and phylogeny. We also adapted our protocol to utilize degraded DNA from old museum skins from Libya. We found strong population structuring that largely confirmed subspecies monophyly and constructed a coalescent-based phylogeny with full support at all major nodes. The results are consistent with a recent hypothesis that La Palma and Libya are relic populations of an ancient Afrocanarian blue tit, although a small data set for Libya could not resolve its position relative to La Palma. The birds on the eastern islands of Fuerteventura and Lanzarote are similar to those in Morocco. Together they constitute the sister group to the clade containing the other Canary Islands (except La Palma), in which El Hierro is sister to the three central islands. Hence, extant Canary Islands populations seem to originate from multiple independent colonization events. We also found population divergences in a key reproductive trait, viz. sperm length, which may constitute reproductive barriers between certain populations. We recommend a taxonomic revision of this polytypic species, where several subspecies should qualify for species rank.

  20. Genetic association between SNPs in the DGAT1 gene and milk production traits in Murrah buffaloes.

    PubMed

    de Freitas, Ana Cláudia; de Camargo, Gregório Miguel Ferreira; Stafuzza, Nedenia Bonvino; Aspilcueta-Borquis, Rusbel Raul; Venturini, Guilherme Costa; Dias, Marina Mortati; Cardoso, Diercles Francisco; Tonhati, Humberto

    2016-10-01

    This study identified polymorphisms in the DGAT1 gene in Murrah buffaloes and investigated the associations to milk production and quality traits (milk, fat and protein yields and percentages, somatic cell count). Genomic DNA was extracted from hair follicles collected from the tail of 196 females. Three SNPs were identified in DGAT1 gene by sequencing. Statistical analyses were performed to verify the linkage and the association between polymorphisms and traits. The estimated value of r (2) between two SNPs in exon 17 (g.11,783G > A and g.11,785 T > C) was 0.029. SNP g.11,785 T > C was significantly associated (P < 0.05) to fat and protein percentage. Dominance effect was significant for milk and fat yields and protein percentage (P < 0.05). The additive effect of the SNP g.11,785 T > C was significant for protein production and somatic cell count (P < 0.05). It indicates that assisted marker selection might be done with considerations to balance production and udder health.

  1. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    PubMed

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-07-07

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.

  2. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs

    PubMed Central

    2013-01-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  3. Two SNPs in the SILV gene are associated with silver coat colour in ponies.

    PubMed

    Reissmann, M; Bierwolf, J; Brockmann, G A

    2007-02-01

    In horses, a pigment dilution acting only on black eumelanin is the so-called silver coat colour, which is characterized by a chocolate-to-reddish body with a white mane and tail. Using information from other species, we focused our study on SILV as a possible candidate gene for the equine silver phenotype. A 1559-bp genomic fragment was sequenced in 24 horses, and five SNPs were detected. Two of the five SNPs (DQ665301:g.697A>T and DQ665301:g.1457C>T) were genotyped in 112 horses representing eight colour phenotypes. Both mutations were completely associated with the silver phenotype: all eumelanin-producing horses (blacks and bays) with atypical white mane and tail were carriers of the [g.697T; g.1457T] haplotype. We identified this haplotype as well as the silver phenotype only in Shetland ponies and Icelandic horses. Horses without eumelanin (chestnuts) were carriers of the [g.697T; g.1457T] haplotype, but they showed no phenotypic effect. The white or flaxen mane often detected in chestnuts is presumably based on another SILV mutation or on polymorphisms in other genes.

  4. NOVEL MICROWAVE FILTER DESIGN TECHNIQUES.

    DTIC Science & Technology

    ELECTRIC FILTERS, MICROWAVE FREQUENCY), (*MICROWAVE EQUIPMENT, ELECTRIC FILTERS), CIRCUITS, CAPACITORS, COILS, RESONATORS, STRIP TRANSMISSION LINES, WAVEGUIDES, TUNING DEVICES, PARAMETRIC AMPLIFIERS, FREQUENCY CONVERTERS .

  5. Detection of SNPs in the TBC1D1 gene and their association with carcass traits in chicken.

    PubMed

    Wang, Yan; Xu, Heng-Yong; Gilbert, Elizabeth R; Peng, Xing; Zhao, Xiao-Ling; Liu, Yi-Ping; Zhu, Qing

    2014-09-01

    TBC1D1 plays an important role in numerous fundamental physiological processes including muscle metabolism, regulation of whole body energy homeostasis and lipid metabolism. The objective of the present study was to identify single nucleotide polymorphisms (SNPs) in chicken TBC1D1 using 128 Erlang mountainous chickens and to determine if these SNPs are associated with carcass traits. The approach consisted of sequencing TBC1D1 using a panel of DNA from different individuals, revealing twenty-two SNPs. Among these SNPs, two polymorphisms (g.69307744C>T and g.69307608T>G) of block 1, four polymorphisms (g.69322320C>T, g.69322314G>A, g.69317290A>G and g.69317276T>C) of block 2 and four polymorphisms of block 3 (g.69349746G>A, g.69349736C>G, g.69349727C>T and g.69349694C>T) exhibited a high degree of linkage disequilibrium in all test populations. An association analysis was performed between the twenty-two SNPs and seven performance traits. SNPs g.69307744C>T, g.69340192G>A and g.69355665T>C were demonstrated to have a strong effect on liveweight (BW), carcass weight (CW), semi-eviscerated weight (SEW) and eviscerated weight (EW) and g.69340070C>T polymorphism was related to BW, SEW and BMW in chicken populations. However, for the other SNPs, there were no significant correlations between different genotypes and carcass traits. Meanwhile, haplotype CT-TG of block 1 and combined genotype AG-TT-AC-CT of block 3 were significantly associated with BW, CW, SEW and EW. Overall, our results provide evidence that polymorphisms in TBC1D1 are associated with carcass traits and would be a useful candidate gene in selection programs for improving carcass traits.

  6. Active-R filter

    DOEpatents

    Soderstrand, Michael A.

    1976-01-01

    An operational amplifier-type active filter in which the only capacitor in the circuit is the compensating capacitance of the operational amplifiers, the various feedback and coupling elements being essentially solely resistive.

  7. Improved optical filter

    NASA Technical Reports Server (NTRS)

    Title, A. M.

    1978-01-01

    Filter includes partial polarizer between birefrigent elements. Plastic film on partial polarizer compensates for any polarization rotation by partial polarizer. Two quarter-wave plates change incident, linearly polarized light into elliptically polarized light.

  8. HEPA air filter (image)

    MedlinePlus

    ... pet dander and other irritating allergens from the air. Along with other methods to reduce allergens, such ... controlling the amount of allergens circulating in the air. HEPA filters can be found in most air ...

  9. Compact photonic spin filters

    NASA Astrophysics Data System (ADS)

    Ke, Yougang; Liu, Zhenxing; Liu, Yachao; Zhou, Junxiao; Shu, Weixing; Luo, Hailu; Wen, Shuangchun

    2016-10-01

    In this letter, we propose and experimentally demonstrate a compact photonic spin filter formed by integrating a Pancharatnam-Berry phase lens (focal length of ±f ) into a conventional plano-concave lens (focal length of -f). By choosing the input port of the filter, photons with a desired spin state, such as the right-handed component or the left-handed one, propagate alone its original propagation direction, while the unwanted spin component is quickly diverged after passing through the filter. One application of the filter, sorting the spin-dependent components of vector vortex beams on higher-order Poincaré sphere, is also demonstrated. Our scheme provides a simple method to manipulate light, and thereby enables potential applications for photonic devices.

  10. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  11. Remotely serviced filter and housing

    DOEpatents

    Ross, Maurice J.; Zaladonis, Larry A.

    1988-09-27

    A filter system for a hot cell comprises a housing adapted for input of air or other gas to be filtered, flow of the air through a filter element, and exit of filtered air. The housing is tapered at the top to make it easy to insert a filter cartridge using an overhead crane. The filter cartridge holds the filter element while the air or other gas is passed through the filter element. Captive bolts in trunnion nuts are readily operated by electromechanical manipulators operating power wrenches to secure and release the filter cartridge. The filter cartridge is adapted to make it easy to change a filter element by using a master-slave manipulator at a shielded window station.

  12. NICMOS Filter Wheel Test

    NASA Astrophysics Data System (ADS)

    Wheeler, Thomas

    2009-07-01

    This is an engineering test {described in SMOV4 Activity Description NICMOS-04} to verify the aliveness, functionality, operability, and electro-mechanical calibration of the NICMOS filter wheel motors and assembly after NCS restart in SMOV4. This test has been designed to obviate concerns over possible deformation or breakage of the fitter wheel "soda-straw" shafts due to excess rotational drag torque and/or bending moments which may be imparted due to changes in the dewar metrology from warm-up/cool-down. This test should be executed after the NCS {and filter wheel housing} has reached and approximately equilibrated to its nominal operating temperature.Addition of visits G0 - G9 {9/9/09}: Ten visits copied from proposal 11868 {visits 20, 30, ..., 90, A0, B0}. Each visit moves two filter positions, takes lamp ON/OFF exposures and then moves back to the blank position. Visits G0, G1 and G2 will leave the filter wheels disabled. The remaining visits will leave the filter wheels enabled. There are sufficient in between times to allow for data download and analysis. In the case of problem is encountered, the filter wheels will be disabled through a real time command. The in between times are all set to 22-50 hours. It is preferable to have as short as possible in between time.

  13. Contactor/filter improvements

    DOEpatents

    Stelman, David

    1989-01-01

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. The housing further includes a gas inlet means, a gas outlet means, and means for moving a body of granular material through the zone. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. Disposed on the upstream face of the filter element is a cover screen which isolates the filter element from contact with the moving granular bed and collects a portion of the particulates so as to form a dust cake having openings small enough to exclude the granular material, yet large enough to receive the dust particles. In one embodiment, the granular material is comprised of prous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses.

  14. [Multiple imputation and complete case analysis in logistic regression models: a practical assessment of the impact of incomplete covariate data].