Science.gov

Sample records for filtering snps imputed

  1. Genetic association analysis and meta-analysis of imputed SNPs in longitudinal studies

    PubMed Central

    Subirana, Isaac; González, Juan R

    2014-01-01

    In this paper we propose a new method to analyze time-to-event data in longitudinal genetic studies. This method address the fundamental problem of incorporating uncertainty when analyzing survival data and imputed single nucleotide polymorphisms (SNPs) from genomewide association studies (GWAS). Our method incorporates uncertainty in the likelihood function, the opposite of existing methods that incorporate the uncertainty in the design matrix. Through simulation studies and real data analyses, we show that our proposed method is unbiased and provides powerful results. We also show how combining results from different GWAS (meta-analysis) may lead to wrong results when effects are not estimated using our approach. The model is implemented in an R package that is designed to analyze uncertainty not only arising from imputed SNPs, but also from copy number variants (CNVs). PMID:23595425

  2. Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects.

    PubMed

    Sung, Yun J; Gu, C Charles; Tiwari, Hemant K; Arnett, Donna K; Broeckel, Ulrich; Rao, Dabeeru C

    2012-07-01

    Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of

  3. Genotype imputation efficiency in Nelore Cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotype imputation efficiency in Nelore cattle was evaluated in different scenarios of lower density (LD) chips, imputation methods and sets of animals to have their genotypes imputed. Twelve commercial and virtual custom LD chips with densities varying from 7K to 75K SNPs were tested. Customized L...

  4. Current software for genotype imputation.

    PubMed

    Ellinghaus, David; Schreiber, Stefan; Franke, Andre; Nothnagel, Michael

    2009-07-01

    Genotype imputation for single nucleotide polymorphisms (SNPs) has been shown to be a powerful means to include genetic markers in exploratory genetic association studies without having to genotype them, and is becoming a standard procedure. A number of different software programs are available. In our experience, user-friendliness is often the deciding factor in the choice of software to solve a particular task. We therefore evaluated the usability of three publicly available imputation programs: BEAGLE, IMPUTE and MACH. We found all three programs to perform well with HapMap reference data, with little effort needed for data preparation and subsequent association analysis. Each of them has different strengths and weaknesses, however, and none is optimal for all situations. PMID:19706367

  5. Genotype imputation in the domestic dog.

    PubMed

    Friedenberg, S G; Meurs, K M

    2016-10-01

    Application of imputation methods to accurately predict a dense array of SNP genotypes in the dog could provide an important supplement to current analyses of array-based genotyping data. Here, we developed a reference panel of 4,885,283 SNPs in 83 dogs across 15 breeds using whole genome sequencing. We used this panel to predict the genotypes of 268 dogs across three breeds with 84,193 SNP array-derived genotypes as inputs. We then (1) performed breed clustering of the actual and imputed data; (2) evaluated several reference panel breed combinations to determine an optimal reference panel composition; and (3) compared the accuracy of two commonly used software algorithms (Beagle and IMPUTE2). Breed clustering was well preserved in the imputation process across eigenvalues representing 75 % of the variation in the imputed data. Using Beagle with a target panel from a single breed, genotype concordance was highest using a multi-breed reference panel (92.4 %) compared to a breed-specific reference panel (87.0 %) or a reference panel containing no breeds overlapping with the target panel (74.9 %). This finding was confirmed using target panels derived from two other breeds. Additionally, using the multi-breed reference panel, genotype concordance was slightly higher with IMPUTE2 (94.1 %) compared to Beagle; Pearson correlation coefficients were slightly higher for both software packages (0.946 for Beagle, 0.961 for IMPUTE2). Our findings demonstrate that genotype imputation from SNP array-derived data to whole genome-level genotypes is both feasible and accurate in the dog with appropriate breed overlap between the target and reference panels. PMID:27129452

  6. Assessment of genotype imputation performance using 1000 Genomes in African American studies.

    PubMed

    Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Bierut, Laura J; Saccone, Nancy L; Page, Grier P; Johnson, Eric O

    2012-01-01

    Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL

  7. Analyses and comparison of accuracy of different genotype imputation methods.

    PubMed

    Pei, Yu-Fang; Li, Jian; Zhang, Lei; Papasian, Christopher J; Deng, Hong-Wen

    2008-01-01

    The power of genetic association analyses is often compromised by missing genotypic data which contributes to lack of significant findings, e.g., in in silico replication studies. One solution is to impute untyped SNPs from typed flanking markers, based on known linkage disequilibrium (LD) relationships. Several imputation methods are available and their usefulness in association studies has been demonstrated, but factors affecting their relative performance in accuracy have not been systematically investigated. Therefore, we investigated and compared the performance of five popular genotype imputation methods, MACH, IMPUTE, fastPHASE, PLINK and Beagle, to assess and compare the effects of factors that affect imputation accuracy rates (ARs). Our results showed that a stronger LD and a lower MAF for an untyped marker produced better ARs for all the five methods. We also observed that a greater number of haplotypes in the reference sample resulted in higher ARs for MACH, IMPUTE, PLINK and Beagle, but had little influence on the ARs for fastPHASE. In general, MACH and IMPUTE produced similar results and these two methods consistently outperformed fastPHASE, PLINK and Beagle. Our study is helpful in guiding application of imputation methods in association analyses when genotype data are missing. PMID:18958166

  8. Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

    PubMed Central

    Pei, Yu-Fang; Li, Jian; Zhang, Lei; Papasian, Christopher J.; Deng, Hong-Wen

    2008-01-01

    The power of genetic association analyses is often compromised by missing genotypic data which contributes to lack of significant findings, e.g., in in silico replication studies. One solution is to impute untyped SNPs from typed flanking markers, based on known linkage disequilibrium (LD) relationships. Several imputation methods are available and their usefulness in association studies has been demonstrated, but factors affecting their relative performance in accuracy have not been systematically investigated. Therefore, we investigated and compared the performance of five popular genotype imputation methods, MACH, IMPUTE, fastPHASE, PLINK and Beagle, to assess and compare the effects of factors that affect imputation accuracy rates (ARs). Our results showed that a stronger LD and a lower MAF for an untyped marker produced better ARs for all the five methods. We also observed that a greater number of haplotypes in the reference sample resulted in higher ARs for MACH, IMPUTE, PLINK and Beagle, but had little influence on the ARs for fastPHASE. In general, MACH and IMPUTE produced similar results and these two methods consistently outperformed fastPHASE, PLINK and Beagle. Our study is helpful in guiding application of imputation methods in association analyses when genotype data are missing. PMID:18958166

  9. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. PMID:27049046

  10. Genotype imputation via matrix completion.

    PubMed

    Chi, Eric C; Zhou, Hua; Chen, Gary K; Del Vecchyo, Diego Ortega; Lange, Kenneth

    2013-03-01

    Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

  11. SNP panels/Imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Participants from thirteen countries discussed services that Interbull can perform or recommendations that Interbull can make to promote harmonization and assist member countries in improving their genomic evaluations in regard to SNP panels and imputation. The panel recommended: A mechanism to shar...

  12. Multiple imputation with multivariate imputation by chained equation (MICE) package

    PubMed Central

    2016-01-01

    Multiple imputation (MI) is an advanced technique for handing missing values. It is superior to single imputation in that it takes into account uncertainty in missing value imputation. However, MI is underutilized in medical literature due to lack of familiarity and computational challenges. The article provides a step-by-step approach to perform MI by using R multivariate imputation by chained equation (MICE) package. The procedure firstly imputed m sets of complete dataset by calling mice() function. Then statistical analysis such as univariate analysis and regression model can be performed within each dataset by calling with() function. This function sets the environment for statistical analysis. Lastly, the results obtained from each analysis are combined by using pool() function. PMID:26889483

  13. Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.

    PubMed

    Bolormaa, S; Gore, K; van der Werf, J H J; Hayes, B J; Daetwyler, H D

    2015-10-01

    Genotyping sheep for genome-wide SNPs at lower density and imputing to a higher density would enable cost-effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low-density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50-475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single-breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed

  14. Rapid genotype imputation from sequence without reference panels.

    PubMed

    Davies, Robert W; Flint, Jonathan; Myers, Simon; Mott, Richard

    2016-08-01

    Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r(2) value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans. PMID:27376236

  15. Design of a bovine low-density SNP array optimized for imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where de...

  16. minimac2: faster genotype imputation

    PubMed Central

    Fuchsberger, Christian; Abecasis, Gonçalo R.; Hinds, David A.

    2015-01-01

    Summary: Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation. Availability and implementation: minimac2, including source code, documentation, and examples is available at http://genome.sph.umich.edu/wiki/Minimac2 Contact: cfuchsb@umich.edu, goncalo@umich.edu PMID:25338720

  17. The utility of low-density genotyping for imputation in the Thoroughbred horse

    PubMed Central

    2014-01-01

    Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming

  18. Imputing amino acid polymorphisms in human leukocyte antigens.

    PubMed

    Jia, Xiaoming; Han, Buhm; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick J; Rich, Stephen S; Raychaudhuri, Soumya; de Bakker, Paul I W

    2013-01-01

    DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes. PMID:23762245

  19. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.

    PubMed

    Palmer, Cameron; Pe'er, Itsik

    2016-06-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  20. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

    PubMed Central

    Palmer, Cameron; Pe’er, Itsik

    2016-01-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  1. Imputation of ungenotyped parental genotypes in dairy and beef cattle from progeny genotypes.

    PubMed

    Berry, D P; McParland, S; Kearney, J F; Sargolzaei, M; Mullen, M P

    2014-06-01

    The objective of this study was to quantify the accuracy of imputing the genotype of parents using information on the genotype of their progeny and a family-based and population-based imputation algorithm. Two separate data sets were used, one containing both dairy and beef animals (n=3122) with high-density genotypes (735 151 single nucleotide polymorphisms (SNPs)) and the other containing just dairy animals (n=5489) with medium-density genotypes (51 602 SNPs). Imputation accuracy of three different genotype density panels were evaluated representing low (i.e. 6501 SNPs), medium and high density. The full genotypes of sires with genotyped half-sib progeny were masked and subsequently imputed. Genotyped half-sib progeny group sizes were altered from 4 up to 12 and the impact on imputation accuracy was quantified. Up to 157 and 258 sires were used to test the accuracy of imputation in the dairy plus beef data set and the dairy-only data set, respectively. The efficiency and accuracy of imputation was quantified as the proportion of genotypes that could not be imputed, and as both the genotype concordance rate and allele concordance rate. The median proportion of genotypes per animal that could not be imputed in the imputation process decreased as the number of genotyped half-sib progeny increased; values for the medium-density panel ranged from a median of 0.015 with a half-sib progeny group size of 4 to a median of 0.0014 to 0.0015 with a half-sib progeny group size of 8. The accuracy of imputation across different paternal half-sib progeny group sizes was similar in both data sets. Concordance rates increased considerably as the number of genotyped half-sib progeny increased from four (mean animal allele concordance rate of 0.94 in both data sets for the medium-density genotype panel) to five (mean animal allele concordance rate of 0.96 in both data sets for the medium-density genotype panel) after which it was relatively stable up to a half-sib progeny group size

  2. APOE is not associated with Alzheimer disease: a cautionary tale of genotype imputation.

    PubMed

    Beecham, Gary W; Martin, Eden R; Gilbert, John R; Haines, Jonathan L; Pericak-Vance, Margaret A

    2010-05-01

    With the advent of publicly available genome-wide genotyping data, the use of genotype imputation methods is becoming increasingly common. These methods are of particular use in joint analyses, where data from different genotyping platforms are imputed to a reference set and combined in a single analysis. We show here that such an analysis can miss strong genetic association signals, such as that of the apolipoprotein-e gene in late-onset Alzheimer disease. This can occur in regions of weak to moderate LD; unobserved SNPs are not imputed with confidence so there is no consensus SNP set on which to perform association tests. Both IMPUTE and Mach software are tested, with similar results. Additionally, we show that a meta-analysis that properly accounts for the genotype uncertainty can recover association signals that were lost under a joint analysis. This shows that joint analyses of imputed genotypes, particularly failure to replicate strong signals, should be considered critically and examined on a case-by-case basis. PMID:20529013

  3. Genotype Imputation with Millions of Reference Samples.

    PubMed

    Browning, Brian L; Browning, Sharon R

    2016-01-01

    We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle's throughput was more than 100× greater than Impute2's throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs. PMID:26748515

  4. Genotype Imputation with Millions of Reference Samples

    PubMed Central

    Browning, Brian L.; Browning, Sharon R.

    2016-01-01

    We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle’s throughput was more than 100× greater than Impute2’s throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs. PMID:26748515

  5. Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression

    PubMed Central

    Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N.; Guan, Weihua; Kang, Jian; Li, Yun

    2016-01-01

    DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717

  6. Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.

    PubMed

    Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun

    2016-05-01

    DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717

  7. 16 CFR 1115.11 - Imputed knowledge.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 16 Commercial Practices 2 2010-01-01 2010-01-01 false Imputed knowledge. 1115.11 Section 1115.11... PRODUCT HAZARD REPORTS General Interpretation § 1115.11 Imputed knowledge. (a) In evaluating whether or... care to ascertain the truth of complaints or other representations. This includes the knowledge a...

  8. Imputation of missing data in time series for air pollutants

    NASA Astrophysics Data System (ADS)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  9. Fast imputation using medium- or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  10. A Study of Imputation Algorithms. Working Paper Series.

    ERIC Educational Resources Information Center

    Hu, Ming-xiu; Salvucci, Sameena

    Many imputation techniques and imputation software packages have been developed over the years to deal with missing data. Different methods may work well under different circumstances, and it is advisable to conduct a sensitivity analysis when choosing an imputation method for a particular survey. This study reviewed about 30 imputation methods…

  11. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies

    PubMed Central

    2014-01-01

    Background Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. Results In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. Conclusion GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of

  12. Enlargement of Traffic Information Coverage Area Using Selective Imputation of Floating Car Data

    NASA Astrophysics Data System (ADS)

    Kumagai, Masatoshi; Hiruta, Tomoaki; Fushiki, Takumi; Yokota, Takayoshi

    This paper discusses a real-time imputation method for sparse floating car data (FCD.) Floating cars are effective way to collect traffic information; however, because of the limitation of the number of floating cars, there is a large amount of missing data with FCD. In an effort to address this problem, we previously proposed a new imputation method based on feature space projection. The method consists of three major processes: (i) determination of a feature space from past FCD history; (ii) feature space projection of current FCD; and (iii) estimation of missing data performed by inverse projection from the feature space. Since estimation is achieved on each feature space axis that represents the spatial correlated component of FCD, it performs an accurate imputation and enlarges information coverage area. However, correlation difference among multiple road-links sometimes causes a trade-off problem between the accuracy and the coverage. Therefore, we developed an additional function in order to filter the road-links that have low correlation with the others. The function uses spectral factorization as filtering index, which is suitable to evaluate the correlation on the multidimensional feature space. Combination use of the imputation method and the filtering function decreases maximum estimation error-rate from 0.39 to 0.24, keeping 60% coverage area against sparse FCD of 15% observations.

  13. Posterior predictive checking of multiple imputation models.

    PubMed

    Nguyen, Cattram D; Lee, Katherine J; Carlin, John B

    2015-07-01

    Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution. PMID:25939490

  14. Improving accuracy of rare variant imputation with a two-step imputation approach.

    PubMed

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G; Rivadeneira, Fernando; Estrada, Karol

    2015-03-01

    Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies. PMID:24939589

  15. Improving accuracy of rare variant imputation with a two-step imputation approach

    PubMed Central

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G; Rivadeneira, Fernando; Estrada, Karol

    2015-01-01

    Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r2 using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies. PMID:24939589

  16. Accuracy of genomic prediction using imputed whole-genome sequence data in white layers.

    PubMed

    Heidaritabar, M; Calus, M P L; Megens, H-J; Vereijken, A; Groenen, M A M; Bastiaansen, J W M

    2016-06-01

    There is an increasing interest in using whole-genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole-genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic prediction for the number of eggs in white layers using imputed whole-genome resequence data including ~4.6 million SNPs. The prediction accuracies based on sequence data were compared with the accuracies from the 60 K SNP panel. Predictions were based on genomic best linear unbiased prediction (GBLUP) as well as a Bayesian variable selection model (BayesC). Moreover, the prediction accuracy from using different types of variants (synonymous, non-synonymous and non-coding SNPs) was evaluated. Genomic prediction using the 60 K SNP panel resulted in a prediction accuracy of 0.74 when GBLUP was applied. With sequence data, there was a small increase (~1%) in prediction accuracy over the 60 K genotypes. With both 60 K SNP panel and sequence data, GBLUP slightly outperformed BayesC in predicting the breeding values. Selection of SNPs more likely to affect the phenotype (i.e. non-synonymous SNPs) did not improve the accuracy of genomic prediction. The fact that sequence data were based on imputation from a small number of sequenced animals may have limited the potential to improve the prediction accuracy. A small reference population (n = 1004) and possible exclusion of many causal SNPs during quality control can be other possible reasons for limited benefit of sequence data. We expect, however, that the limited improvement is because the 60 K SNP panel was already sufficiently dense to accurately determine the relationships between animals in our data. PMID:26776363

  17. Mining SNPs From EST Databases

    PubMed Central

    Picoult-Newberg, Leslie; Ideker, Trey E.; Pohl, Mark G.; Taylor, Scott L.; Donaldson, Miriam A.; Nickerson, Deborah A.; Boyce-Jacino, Michael

    1999-01-01

    There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs. [The SNPs identified in this study can be found in the National Center of Biotechnology (NCBI) SNP database under submitter handles ORCHID (SNPS-981210-A) and debnick (SNPS-981209-A and SNPS-981209-B).] PMID:10022981

  18. Comparison of imputation methods for missing laboratory data in medicine

    PubMed Central

    Waljee, Akbar K; Mukherjee, Ashin; Singal, Amit G; Zhang, Yiwei; Warren, Jeffrey; Balis, Ulysses; Marrero, Jorge; Zhu, Ji; Higgins, Peter DR

    2013-01-01

    Objectives Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models. Design Retrospective cohort analysis of two large data sets. Setting A tertiary level care institution in Ann Arbor, Michigan. Participants The Cirrhosis cohort had 446 patients and the Inflammatory Bowel Disease cohort had 395 patients. Methods Non-missing laboratory data were randomly removed with varying frequencies from two large data sets, and we then compared the ability of four methods—missForest, mean imputation, nearest neighbour imputation and multivariate imputation by chained equations (MICE)—to impute the simulated missing data. We characterised the accuracy of the imputation and the effect of the imputation on predictive ability in two large data sets. Results MissForest had the least imputation error for both continuous and categorical variables at each frequency of missingness, and it had the smallest prediction difference when models used imputed laboratory values. In both data sets, MICE had the second least imputation error and prediction difference, followed by the nearest neighbour and mean imputation. Conclusions MissForest is a highly accurate method of imputation for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models. PMID:23906948

  19. Dual imputation model for incomplete longitudinal data.

    PubMed

    Jolani, Shahab; Frank, Laurence E; van Buuren, Stef

    2014-05-01

    Missing values are a practical issue in the analysis of longitudinal data. Multiple imputation (MI) is a well-known likelihood-based method that has optimal properties in terms of efficiency and consistency if the imputation model is correctly specified. Doubly robust (DR) weighing-based methods protect against misspecification bias if one of the models, but not necessarily both, for the data or the mechanism leading to missing data is correct. We propose a new imputation method that captures the simplicity of MI and protection from the DR method. This method integrates MI and DR to protect against misspecification of the imputation model under a missing at random assumption. Our method avoids analytical complications of missing data particularly in multivariate settings, and is easy to implement in standard statistical packages. Moreover, the proposed method works very well with an intermittent pattern of missingness when other DR methods can not be used. Simulation experiments show that the proposed approach achieves improved performance when one of the models is correct. The method is applied to data from the fireworks disaster study, a randomized clinical trial comparing therapies in disaster-exposed children. We conclude that the new method increases the robustness of imputations. PMID:23909566

  20. Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation.

    PubMed

    Enders, Craig K; Mistler, Stephen A; Keller, Brian T

    2016-06-01

    Although missing data methods have advanced in recent years, methodologists have devoted less attention to multilevel data structures where observations at level-1 are nested within higher-order organizational units at level-2 (e.g., individuals within neighborhoods; repeated measures nested within individuals; students nested within classrooms). Joint modeling and chained equations imputation are the principal imputation frameworks for single-level data, and both have multilevel counterparts. These approaches differ algorithmically and in their functionality; both are appropriate for simple random intercept analyses with normally distributed data, but they differ beyond that. The purpose of this paper is to describe multilevel imputation strategies and evaluate their performance in a variety of common analysis models. Using multiple imputation theory and computer simulations, we derive 4 major conclusions: (a) joint modeling and chained equations imputation are appropriate for random intercept analyses; (b) the joint model is superior for analyses that posit different within- and between-cluster associations (e.g., a multilevel regression model that includes a level-1 predictor and its cluster means, a multilevel structural equation model with different path values at level-1 and level-2); (c) chained equations imputation provides a dramatic improvement over joint modeling in random slope analyses; and (d) a latent variable formulation for categorical variables is quite effective. We use a real data analysis to demonstrate multilevel imputation, and we suggest a number of avenues for future research. (PsycINFO Database Record PMID:26690775

  1. Genotype imputation in genome-wide association studies.

    PubMed

    Porcu, Eleonora; Sanna, Serena; Fuchsberger, Christian; Fritsche, Lars G

    2013-07-01

    Imputation is an in silico method that can increase the power of association studies by inferring missing genotypes, harmonizing data sets for meta-analyses, and increasing the overall number of markers available for association testing. This unit provides an introductory overview of the imputation method and describes a two-step imputation approach that consists of the phasing of the study genotypes and the imputation of reference panel genotypes into the study haplotypes. Detailed steps for data preparation and quality control illustrate how to run the computationally intensive two-step imputation with the high-density reference panels of the 1000 Genomes Project, which currently integrates more than 39 million variants. Additionally, the influence of reference panel selection, input marker density, and imputation settings on imputation quality are demonstrated with a simulated data set to give insight into crucial points of successful genotype imputation. PMID:23853078

  2. Automatic Treatment Planning with Convex Imputing

    NASA Astrophysics Data System (ADS)

    Sayre, G. A.; Ruan, D.

    2014-03-01

    Current inverse optimization-based treatment planning for radiotherapy requires a set of complex DVH objectives to be simultaneously minimized. This process, known as multi-objective optimization, is challenging due to non-convexity in individual objectives and insufficient knowledge in the tradeoffs among the objective set. As such, clinical practice involves numerous iterations of human intervention that is costly and often inconsistent. In this work, we propose to address treatment planning with convex imputing, a new-data mining technique that explores the existence of a latent convex objective whose optimizer reflects the DVH and dose-shaping properties of previously optimized cases. Using ten clinical prostate cases as the basis for comparison, we imputed a simple least-squares problem from the optimized solutions of the prostate cases, and show that the imputed plans are more consistent than their clinical counterparts in achieving planning goals.

  3. Reproducibility and imputation of air toxics data.

    PubMed

    Le, Hien Q; Batterman, Stuart A; Wahl, Robert L

    2007-12-01

    Ambient air quality datasets include missing data, values below method detection limits and outliers, and the precision and accuracy of the measurements themselves are often unknown. At the same time, many analyses require continuous data sequences and assume that measurements are error-free. While a variety of data imputation and cleaning techniques are available, the evaluation of such techniques remains limited. This study evaluates the performance of these techniques for ambient air toxics measurements, a particularly challenging application, and includes the analysis of intra- and inter-laboratory precision. The analysis uses an unusually complete-dataset, consisting of daily measurements of over 70 species of carbonyls and volatile organic compounds (VOCs) collected over a one year period in Dearborn, Michigan, including 122 pairs of replicates. Analysis was restricted to compounds found above detection limits in > or =20% of the samples. Outliers were detected using the Gumbell extreme value distribution. Error models for inter- and intra-laboratory reproducibility were derived from replicate samples. Imputation variables were selected using a generalized additive model, and the performance of two techniques, multiple imputation and optimal linear estimation, was evaluated for three missingness patterns. Many species were rarely detected or had very poor reproducibility. Error models developed for seven carbonyls showed median intra- and inter-laboratory errors of 22% and 25%, respectively. Better reproducibility was seen for the 16 VOCs meeting detection and reproducibility criteria. Imputation performance depended on the compound and missingness pattern. Data missing at random could be adequately imputed, but imputations for row-wise deletions, the most common type of missingness pattern encountered, were not informative. The analysis shows that air toxics data require significant efforts to identify and mitigate errors, outliers and missing observations

  4. Multiple Imputation of Multilevel Missing Data-Rigor versus Simplicity

    ERIC Educational Resources Information Center

    Drechsler, Jörg

    2015-01-01

    Multiple imputation is widely accepted as the method of choice to address item-nonresponse in surveys. However, research on imputation strategies for the hierarchical structures that are typically found in the data in educational contexts is still limited. While a multilevel imputation model should be preferred from a theoretical point of view if…

  5. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  6. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    PubMed Central

    McClure, Matthew C.; Sonstegard, Tad S.; Wiggans, George R.; Van Eenennaam, Alison L.; Weber, Kristina L.; Penedo, Cecilia T.; Berry, Donagh P.; Flynn, John; Garcia, Jose F.; Carmo, Adriana S.; Regitano, Luciana C. A.; Albuquerque, Milla; Silva, Marcos V. G. B.; Machado, Marco A.; Coffey, Mike; Moore, Kirsty; Boscher, Marie-Yvonne; Genestout, Lucie; Mazza, Raffaele; Taylor, Jeremy F.; Schnabel, Robert D.; Simpson, Barry; Marques, Elisa; McEwan, John C.; Cromie, Andrew; Coutinho, Luiz L.; Kuehn, Larry A.; Keele, John W.; Piper, Emily K.; Cook, Jim; Williams, Robert; Van Tassell, Curtis P.

    2013-01-01

    To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ≤1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset. PMID:24065982

  7. Marker imputation in barley association studies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Association mapping requires higher marker density than linkage mapping, potentially leading to more missing marker data and to higher genotyping costs. In human genetics, methods exist to impute missing marker data and whole markers that were typed in a reference panel but not in the experimental d...

  8. Sequential BART for imputation of missing covariates.

    PubMed

    Xu, Dandan; Daniels, Michael J; Winterstein, Almut G

    2016-07-01

    To conduct comparative effectiveness research using electronic health records (EHR), many covariates are typically needed to adjust for selection and confounding biases. Unfortunately, it is typical to have missingness in these covariates. Just using cases with complete covariates will result in considerable efficiency losses and likely bias. Here, we consider the covariates missing at random with missing data mechanism either depending on the response or not. Standard methods for multiple imputation can either fail to capture nonlinear relationships or suffer from the incompatibility and uncongeniality issues. We explore a flexible Bayesian nonparametric approach to impute the missing covariates, which involves factoring the joint distribution of the covariates with missingness into a set of sequential conditionals and applying Bayesian additive regression trees to model each of these univariate conditionals. Using data augmentation, the posterior for each conditional can be sampled simultaneously. We provide details on the computational algorithm and make comparisons to other methods, including parametric sequential imputation and two versions of multiple imputation by chained equations. We illustrate the proposed approach on EHR data from an affiliated tertiary care institution to examine factors related to hyperglycemia. PMID:26980459

  9. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

    PubMed

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-11-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  10. Mining SNPs from EST databases.

    PubMed

    Picoult-Newberg, L; Ideker, T E; Pohl, M G; Taylor, S L; Donaldson, M A; Nickerson, D A; Boyce-Jacino, M

    1999-02-01

    There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs. PMID:10022981

  11. Clustering with Missing Values: No Imputation Required

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  12. An imputation approach for oligonucleotide microarrays.

    PubMed

    Li, Ming; Wen, Yalu; Lu, Qing; Fu, Wenjiang J

    2013-01-01

    Oligonucleotide microarrays are commonly adopted for detecting and qualifying the abundance of molecules in biological samples. Analysis of microarray data starts with recording and interpreting hybridization signals from CEL images. However, many CEL images may be blemished by noises from various sources, observed as "bright spots", "dark clouds", and "shadowy circles", etc. It is crucial that these image defects are correctly identified and properly processed. Existing approaches mainly focus on detecting defect areas and removing affected intensities. In this article, we propose to use a mixed effect model for imputing the affected intensities. The proposed imputation procedure is a single-array-based approach which does not require any biological replicate or between-array normalization. We further examine its performance by using Affymetrix high-density SNP arrays. The results show that this imputation procedure significantly reduces genotyping error rates. We also discuss the necessary adjustments for its potential extension to other oligonucleotide microarrays, such as gene expression profiling. The R source code for the implementation of approach is freely available upon request. PMID:23505547

  13. On combining reference data to improve imputation accuracy.

    PubMed

    Chen, Jun; Zhang, Ji-Gang; Li, Jian; Pei, Yu-Fang; Deng, Hong-Wen

    2013-01-01

    Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS) data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE) for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses. PMID:23383238

  14. Short communication: Imputation of markers on the bovine X chromosome.

    PubMed

    Mao, Xiaowei; Johansson, Anna Maria; Sahana, Goutam; Guldbrandtsen, Bernt; De Koning, Dirk-Jan

    2016-09-01

    Imputation is a cost-effective approach to augment marker data for genomic selection and genome-wide association studies. However, most imputation studies have focused on autosomes. Here, we assessed the imputation of markers on the X chromosome in Holstein cattle for nongenotyped animals and animals genotyped with low-density (Illumina BovineLD, Illumina Inc., San Diego, CA) chips, using animals genotyped with medium-density (Illumina BovineSNP50) chips. A total of 26,884 genotyped Holstein individuals genotyped with medium-density chips were used in this study. Imputation was carried out using FImpute V2.2. The following parameters were examined: treating the pseudoautosomal region as autosomal or as X specific, different sizes of reference groups, different male/female proportions in the reference group, and cumulated degree of relationship between the reference group and target group. The imputation accuracy of markers on the X chromosome was improved if the pseudoautosomal region was treated as autosomal. Increasing the proportion of females in the reference group improved the imputation accuracy for the X chromosome. Imputation for nongenotyped animals in general had lower accuracy compared with animals genotyped with the low-density single nucleotide polymorphism array. In addition, higher cumulative pedigree relationships between the reference group and the target animal led to higher imputation accuracy. In the future, better marker coverage of the X chromosome should be developed to facilitate genomic studies involving the X chromosome. PMID:27423959

  15. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

  16. How to Improve Postgenomic Knowledge Discovery Using Imputation

    PubMed Central

    2009-01-01

    While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN) reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures. PMID:19223972

  17. Geometric median for missing rainfall data imputation

    NASA Astrophysics Data System (ADS)

    Burhanuddin, Siti Nur Zahrah Amin; Deni, Sayang Mohd; Ramli, Norazan Mohamed

    2015-02-01

    Missing data is a common problem faced by researchers in environmental studies. Environmental data, particularly, rainfall data are highly vulnerable to be missed, which is due to several reasons, such as malfunction instrument, incorrect measurements, and relocation of stations. Rainfall data are also affected by the presence of outliers due to the temporal and spatial variability of rainfall measurements. These problems may harm the quality of rainfall data and subsequently, produce inaccuracy in the results of analysis. Thus, this study is aimed to propose an imputation method that is robust towards the presence of outliers for treating the missing rainfall data. Geometric median was applied to estimate the missing values based on the available rainfall data from neighbouring stations. The method was compared with several conventional methods, such as normal ratio and inverse distance weighting methods, in order to evaluate its performance. Thirteen rainfall stations in Peninsular Malaysia were selected for the application of the imputation methods. The results indicated that the proposed method provided the most accurate estimation values compared to both conventional methods based on the least mean absolute error. The normal ratio was found to be the worst method in estimating the missing rainfall values.

  18. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

    PubMed

    Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-03-15

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914

  19. CsSNP: A Web-Based Tool for the Detecting of Comparative Segments SNPs.

    PubMed

    Wang, Yi; Wang, Shuangshuang; Zhou, Dongjie; Yang, Shuai; Xu, Yongchao; Yang, Chao; Yang, Long

    2016-07-01

    SNP (single nucleotide polymorphism) is a popular tool for the study of genetic diversity, evolution, and other areas. Therefore, it is necessary to develop a convenient, utility, robust, rapid, and open source detecting-SNP tool for all researchers. Since the detection of SNPs needs special software and series steps including alignment, detection, analysis and present, the study of SNPs is limited for nonprofessional users. CsSNP (Comparative segments SNP, http://biodb.sdau.edu.cn/cssnp/ ) is a freely available web tool based on the Blat, Blast, and Perl programs to detect comparative segments SNPs and to show the detail information of SNPs. The results are filtered and presented in the statistics figure and a Gbrowse map. This platform contains the reference genomic sequences and coding sequences of 60 plant species, and also provides new opportunities for the users to detect SNPs easily. CsSNP is provided a convenient tool for nonprofessional users to find comparative segments SNPs in their own sequences, and give the users the information and the analysis of SNPs, and display these data in a dynamic map. It provides a new method to detect SNPs and may accelerate related studies. PMID:27347883

  20. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    PubMed

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. PMID:26990655

  1. Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx.

    PubMed

    Wang, Jiebiao; Gamazon, Eric R; Pierce, Brandon L; Stranger, Barbara E; Im, Hae Kyung; Gibbons, Robert D; Cox, Nancy J; Nicolae, Dan L; Chen, Lin S

    2016-04-01

    Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies. PMID:27040689

  2. A hybrid imputation approach for microarray missing value estimation

    PubMed Central

    2015-01-01

    Background Missing data is an inevitable phenomenon in gene expression microarray experiments due to instrument failure or human error. It has a negative impact on performance of downstream analysis. Technically, most existing approaches suffer from this prevalent problem. Imputation is one of the frequently used methods for processing missing data. Actually many developments have been achieved in the research on estimating missing values. The challenging task is how to improve imputation accuracy for data with a large missing rate. Methods In this paper, induced by the thought of collaborative training, we propose a novel hybrid imputation method, called Recursive Mutual Imputation (RMI). Specifically, RMI exploits global correlation information and local structure in the data, captured by two popular methods, Bayesian Principal Component Analysis (BPCA) and Local Least Squares (LLS), respectively. Mutual strategy is implemented by sharing the estimated data sequences at each recursive process. Meanwhile, we consider the imputation sequence based on the number of missing entries in the target gene. Furthermore, a weight based integrated method is utilized in the final assembling step. Results We evaluate RMI with three state-of-art algorithms (BPCA, LLS, Iterated Local Least Squares imputation (ItrLLS)) on four publicly available microarray datasets. Experimental results clearly demonstrate that RMI significantly outperforms comparative methods in terms of Normalized Root Mean Square Error (NRMSE), especially for datasets with large missing rates and less complete genes. Conclusions It is noted that our proposed hybrid imputation approach incorporates both global and local information of microarray genes, which achieves lower NRMSE values against to any single approach only. Besides, this study highlights the need for considering the imputing sequence of missing entries for imputation methods. PMID:26330180

  3. An imputation-based genome-wide association study on traits related to male reproduction in a White Duroc × Erhualian F2 population.

    PubMed

    Zhao, Xueyan; Zhao, Kewei; Ren, Jun; Zhang, Feng; Jiang, Chao; Hong, Yuan; Jiang, Kai; Yang, Qiang; Wang, Chengbin; Ding, Nengshui; Huang, Lusheng; Zhang, Zhiyan; Xing, Yuyun

    2016-05-01

    Boar reproductive traits are economically important for the pig industry. Here we conducted a genome-wide association study (GWAS) for 13 reproductive traits measured on 205 F2 boars at day 300 using 60 K single nucleotide polymorphism (SNP) data imputed from a reference panel of 1200 pigs in a White Duroc × Erhualian F2 intercross population. We identified 10 significant loci for seven traits on eight pig chromosomes (SSC). Two loci surpassed the genome-wide significance level, including one for epididymal weight around 60.25 Mb on SSC7 and one for semen temperature around 43.69 Mb on SSC4. Four of the 10 significant loci that we identified were consistent with previously reported quantitative trait loci for boar reproduction traits. We highlighted several interesting candidate genes at these loci, including APN, TEP1, PARP2, SPINK1 and PDE1C. To evaluate the imputation accuracy, we further genotyped nine GWAS top SNPs using PCR restriction fragment length polymorphism or Sanger sequencing. We found an average of 91.44% of genotype concordance, 95.36% of allelic concordance and 0.85 of r(2) correlation between imputed and real genotype data. This indicates that our GWAS mapping results based on imputed SNP data are reliable, providing insights into the genetic basis of boar reproductive traits. PMID:26425933

  4. Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models

    PubMed Central

    Chen, Hua Yun; Xie, Hui; Qian, Yi

    2010-01-01

    Summary Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubin's rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Monte Carlo Markov Chain sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data. PMID:21210771

  5. A second generation human haplotype map of over 3.1 million SNPs

    PubMed Central

    2009-01-01

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

  6. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs

    PubMed Central

    Pistis, Giorgio; Porcu, Eleonora; Vrieze, Scott I; Sidore, Carlo; Steri, Maristella; Danjou, Fabrice; Busonero, Fabio; Mulas, Antonella; Zoledziewska, Magdalena; Maschio, Andrea; Brennan, Christine; Lai, Sandra; Miller, Michael B; Marcelli, Marco; Urru, Maria Francesca; Pitzalis, Maristella; Lyons, Robert H; Kang, Hyun M; Jones, Chris M; Angius, Andrea; Iacono, William G; Schlessinger, David; McGue, Matt; Cucca, Francesco; Abecasis, Gonçalo R; Sanna, Serena

    2015-01-01

    The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies. PMID:25293720

  7. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.

    PubMed

    Pistis, Giorgio; Porcu, Eleonora; Vrieze, Scott I; Sidore, Carlo; Steri, Maristella; Danjou, Fabrice; Busonero, Fabio; Mulas, Antonella; Zoledziewska, Magdalena; Maschio, Andrea; Brennan, Christine; Lai, Sandra; Miller, Michael B; Marcelli, Marco; Urru, Maria Francesca; Pitzalis, Maristella; Lyons, Robert H; Kang, Hyun M; Jones, Chris M; Angius, Andrea; Iacono, William G; Schlessinger, David; McGue, Matt; Cucca, Francesco; Abecasis, Gonçalo R; Sanna, Serena

    2015-07-01

    The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies. PMID:25293720

  8. Imputing gene expression from optimally reduced probe sets

    PubMed Central

    Donner, Yoni; Feng, Ting; Benoist, Christophe; Koller, Daphne

    2012-01-01

    Measuring complete gene expression profiles for a large number of experiments is costly. We propose an approach in which a small subset of probes is selected based on a preliminary set of full expression profiles. In subsequent experiments, only the subset is measured, and the missing values are imputed. We develop several algorithms to simultaneously select probes and impute missing values, and demonstrate that these probe selection for imputation (PSI) algorithms can successfully reconstruct missing gene expression values in a wide variety of applications, as evaluated using multiple metrics of biological importance. We analyze the performance of PSI methods under varying conditions, provide guidelines for choosing the optimal method based on the experimental setting, and indicate how to estimate imputation accuracy. Finally, we apply our approach to a large-scale study of immune system variation. PMID:23064520

  9. Multiple imputation for time series data with Amelia package.

    PubMed

    Zhang, Zhongheng

    2016-02-01

    Time series data are common in medical researches. Many laboratory variables or study endpoints could be measured repeatedly over time. Multiple imputation (MI) without considering time trend of a variable may cause it to be unreliable. The article illustrates how to perform MI by using Amelia package in a clinical scenario. Amelia package is powerful in that it allows for MI for time series data. External information on the variable of interest can also be incorporated by using prior or bound argument. Such information may be based on previous published observations, academic consensus, and personal experience. Diagnostics of imputation model can be performed by examining the distributions of imputed and observed values, or by using over-imputation technique. PMID:26904578

  10. A SPATIOTEMPORAL APPROACH FOR HIGH RESOLUTION TRAFFIC FLOW IMPUTATION

    SciTech Connect

    Han, Lee; Chin, Shih-Miao; Hwang, Ho-Ling

    2016-01-01

    Along with the rapid development of Intelligent Transportation Systems (ITS), traffic data collection technologies have been evolving dramatically. The emergence of innovative data collection technologies such as Remote Traffic Microwave Sensor (RTMS), Bluetooth sensor, GPS-based Floating Car method, automated license plate recognition (ALPR) (1), etc., creates an explosion of traffic data, which brings transportation engineering into the new era of Big Data. However, despite the advance of technologies, the missing data issue is still inevitable and has posed great challenges for research such as traffic forecasting, real-time incident detection and management, dynamic route guidance, and massive evacuation optimization, because the degree of success of these endeavors depends on the timely availability of relatively complete and reasonably accurate traffic data. A thorough literature review suggests most current imputation models, if not all, focus largely on the temporal nature of the traffic data and fail to consider the fact that traffic stream characteristics at a certain location are closely related to those at neighboring locations and utilize these correlations for data imputation. To this end, this paper presents a Kriging based spatiotemporal data imputation approach that is able to fully utilize the spatiotemporal information underlying in traffic data. Imputation performance of the proposed approach was tested using simulated scenarios and achieved stable imputation accuracy. Moreover, the proposed Kriging imputation model is more flexible compared to current models.

  11. Practical considerations for imputation of untyped markers in admixed populations.

    PubMed

    Shriner, Daniel; Adeyemo, Adebowale; Chen, Guanjie; Rotimi, Charles N

    2010-04-01

    Imputation of genotypes for markers untyped in a study sample has become a standard approach to increase genome coverage in genome-wide association studies at practically zero cost. Most methods for imputing missing genotypes extend previously described algorithms for inferring haplotype phase. These algorithms generally fall into three classes based on the underlying model for estimating the conditional distribution of haplotype frequencies: a cluster-based model, a multinomial model, or a population genetics-based model. We compared BEAGLE, PLINK, and MACH, representing the three classes of models, respectively, with specific attention to measures of imputation success and selection of the reference panel for an admixed study sample of African Americans. Based on analysis of chromosome 22 and after calibration to a fixed level of 90% concordance between experimentally determined and imputed genotypes, MACH yielded the largest absolute number of successfully imputed markers and the largest gain in coverage of the variation captured by HapMap reference panels. Following the common practice of performing imputation once, the Yoruba in Ibadan, Nigeria (YRI) reference panel outperformed other HapMap reference panels, including (1) African ancestry from Southwest USA (ASW) data, (2) an unweighted combination of the Northern and Western Europe (CEU) and YRI data into a single reference panel, and (3) a combination of the CEU and YRI data into a single reference panel with weights matching estimates of admixture proportions. For our admixed study sample, the optimal strategy involved imputing twice with the HapMap CEU and YRI reference panels separately and then merging the data sets. PMID:19918757

  12. MaCH-admix: genotype imputation for admixed populations.

    PubMed

    Liu, Eric Yi; Li, Mingyao; Wang, Wei; Li, Yun

    2013-01-01

    Imputation in admixed populations is an important problem but challenging due to the complex linkage disequilibrium (LD) pattern. The emergence of large reference panels such as that from the 1,000 Genomes Project enables more accurate imputation in general, and in particular for admixed populations and for uncommon variants. To efficiently benefit from these large reference panels, one key issue to consider in modern genotype imputation framework is the selection of effective reference panels. In this work, we consider a number of methods for effective reference panel construction inside a hidden Markov model and specific to each target individual. These methods fall into two categories: identity-by-state (IBS) based and ancestry-weighted approach. We evaluated the performance on individuals from recently admixed populations. Our target samples include 8,421 African Americans and 3,587 Hispanic Americans from the Women' Health Initiative, which allow assessment of imputation quality for uncommon variants. Our experiments include both large and small reference panels; large, medium, and small target samples; and in genome regions of varying levels of LD. We also include BEAGLE and IMPUTE2 for comparison. Experiment results with large reference panel suggest that our novel piecewise IBS method yields consistently higher imputation quality than other methods/software. The advantage is particularly noteworthy among uncommon variants where we observe up to 5.1% information gain with the difference being highly significant (Wilcoxon signed rank test P-value < 0.0001). Our work is the first that considers various sensible approaches for imputation in admixed populations and presents a comprehensive comparison. PMID:23074066

  13. References for Haplotype Imputation in the Big Data Era

    PubMed Central

    Li, Wenzhi; Xu, Wei; Li, Qiling; Ma, Li; Song, Qing

    2016-01-01

    Imputation is a powerful in silico approach to fill in those missing values in the big datasets. This process requires a reference panel, which is a collection of big data from which the missing information can be extracted and imputed. Haplotype imputation requires ethnicity-matched references; a mismatched reference panel will significantly reduce the quality of imputation. However, currently existing big datasets cover only a small number of ethnicities, there is a lack of ethnicity-matched references for many ethnic populations in the world, which has hampered the data imputation of haplotypes and its downstream applications. To solve this issue, several approaches have been proposed and explored, including the mixed reference panel, the internal reference panel and genotype-converted reference panel. This review article provides the information and comparison between these approaches. Increasing evidence showed that not just one or two genetic elements dictate the gene activity and functions; instead, cis-interactions of multiple elements dictate gene activity. Cis-interactions require the interacting elements to be on the same chromosome molecule, therefore, haplotype analysis is essential for the investigation of cis-interactions among multiple genetic variants at different loci, and appears to be especially important for studying the common diseases. It will be valuable in a wide spectrum of applications from academic research, to clinical diagnosis, prevention, treatment, and pharmaceutical industry. PMID:27274952

  14. Genotype imputation reference panel selection using maximal phylogenetic diversity.

    PubMed

    Zhang, Peng; Zhan, Xiaowei; Rosenberg, Noah A; Zöllner, Sebastian

    2013-10-01

    The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample and then to impute the rest of the study sample, using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the "most diverse reference panel", defined as the subset with the maximal "phylogenetic diversity", thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can substantially improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different marker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data. PMID:23934887

  15. Combining fractional polynomial model building with multiple imputation

    PubMed Central

    Morris, Tim P.; White, Ian R.; Carpenter, James R.; Stanworth, Simon J.; Royston, Patrick

    2016-01-01

    Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. PMID:26095614

  16. Combining fractional polynomial model building with multiple imputation.

    PubMed

    Morris, Tim P; White, Ian R; Carpenter, James R; Stanworth, Simon J; Royston, Patrick

    2015-11-10

    Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. PMID:26095614

  17. Novel and efficient tag SNPs selection algorithms.

    PubMed

    Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2014-01-01

    SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels. PMID:24212035

  18. Functional annotation of colon cancer risk SNPs

    PubMed Central

    Yao, Lijing; Tak, Yu Gyoung; Berman, Benjamin P.; Farnham, Peggy J.

    2014-01-01

    Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with increased risk for CRC. A molecular understanding of the functional consequences of this genetic variation has been complicated because each GWAS SNP is a surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Here we use genomic and epigenomic information to test the hypothesis that the GWAS SNPs and/or correlated SNPs are in elements that regulate gene expression, and identify 23 promoters and 28 enhancers. Using gene expression data from normal and tumour cells, we identify 66 putative target genes of the risk-associated enhancers (10 of which were also identified by promoter SNPs). Employing CRISPR nucleases, we delete one risk-associated enhancer and identify genes showing altered expression. We suggest that similar studies be performed to characterize all CRC risk-associated enhancers. PMID:25268989

  19. Missing value imputation: with application to handwriting data

    NASA Astrophysics Data System (ADS)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  20. The Effect of Reference Panels and Software Tools on Genotype Imputation

    PubMed Central

    Nho, Kwangsik; Shen, Li; Kim, Sungeun; Swaminathan, Shanker; Risacher, Shannon L.; Saykin, Andrew J.

    2011-01-01

    Genotype imputation is increasingly employed in genome-wide association studies, particularly for integrative and cross-platform analysis. Several imputation algorithms use reference panels with a larger set of genotyped markers to infer genotypes at ungenotyped marker locations. Our objective was to assess which method and reference panel was more accurate when carrying out imputation. We investigated the influence of choice of two most popular imputation methods, IMPUTE and MACH, on two reference panels from the HapMap and the 1000 Genomes Project. Our results indicated that for the HapMap, MACH consistently yielded more accurate imputation results than IMPUTE, while for the 1000 Genomes Project, IMPUTE performed slightly better. The best imputation results were achieved by IMPUTE with the combined reference panel (HapMap + 1000 Genomes Project). IMPUTE with the combined reference panel is a promising strategy for genotype imputation, which should facilitate fine-mapping for discovery as well as known disease-associated candidate regions. PMID:22195161

  1. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…

  2. The Use of SNPs in Pharmacogenomics Studies

    PubMed Central

    Alwi, Zilfalil Bin

    2005-01-01

    Pharmacogenomics is the study of how genetic makeup determines the response to a therapeutic intervention. It has the potential to revolutionize the practice of medicine by individualisation of treatment through the use of novel diagnostic tools. This new science should reduce the trial-and-error approach to the choice of treatment and thereby limit the exposure of patients to drugs that are not effective or are toxic for them. Single Nucleotide Polymorphisms (SNPs) holds the key in defining the risk of an individual’s susceptibility to various illnesses and response to drugs. There is an ongoing process of identifying the common, biologically relevant SNPs, in particular those that are associated with the risk of disease. The identification and characterization of large numbers of these SNPs are necessary before we can begin to use them extensively as genetic tools. As SNP allele frequencies vary considerably across human ethnic groups and populations, the SNP consortium has opted to use an ethnically diverse panel to maximize the chances of SNP discovery. Currently most studies are biased deliberately towards coding regions and the data generated from them therefore are unlikely to reflect the overall distribution of SNPs throughout the genome. The SNP consortium protocol was designed to identify SNPs without any bias towards these coding regions. Most pharmacogenomic studies were carried out in heterogeneous clinical trial populations, using case-control or cohort association study designs employing either candidate gene or Linkage disequilibrium (LD) mapping approaches. Concerns about the required patient sample sizes, the extent of LD, the number of SNPs needed in a map, the cost of genotyping SNPs, and the interpretation of results are some of the challenges that surround this field. While LD mapping is appealing in that it is an unbiased approach and allows a comprehensive genome-wide survey, the challenges and limitations are significant. An alternative

  3. Imputation of Cow Genotypes and Adjustment of PTAs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two new techniques were introduced in April 2010 to incorporate all available information in the evaluations. The use of imputed genotypes has added over 1600 cows to the genomic database, and adjusting cow evaluations has increased accuracy. All other countries that are producing genomic evaluation...

  4. Imputing Phenotypes for Genome-wide Association Studies.

    PubMed

    Hormozdiari, Farhad; Kang, Eun Yong; Bilow, Michael; Ben-David, Eyal; Vulpe, Chris; McLachlan, Stela; Lusis, Aldons J; Han, Buhm; Eskin, Eleazar

    2016-07-01

    Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset. PMID:27292110

  5. Fast imputation using medium or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and microarray genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updat...

  6. Strategies to choose from millions of imputed sequence variants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of sequence variants are known, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Variant selection and imputation strategies were tested using 26 984 simulated reference bulls, of which 1 000 had 30 million sequence variants, 773 had 600 000 markers...

  7. Guidebook for Imputation of Missing Data. Technical Report No. 17.

    ERIC Educational Resources Information Center

    Wise, Lauress L.; McLaughlin, Donald H.

    This guidebook is designed for data analysts who are working with computer data files that contain records with incomplete data. It indicates choices the analyst must make and the criteria for making those choices in regard to the following questions: (1) What resources are available for performing the imputation? (2) How big is the data file? (3)…

  8. Accuracy of genotype imputation in Swiss cattle breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to evaluate the accuracy of imputation from Illumina Bovine3k Bead Chip (3k) and Illumina BovineLD (6k) to 54k chip information in Swiss dairy cattle breeds. Genotype data comprised of 54k SNP chip data of Original Braunvieh (OB), Brown Swiss (BS), Swiss Fleckvieh (SF...

  9. Imputation of Missing Categorical Data by Maximizing Internal Consistency.

    ERIC Educational Resources Information Center

    van Buuren, Stef; van Rijckevorsel, Jan L. A.

    1992-01-01

    A technique is presented to transform incomplete categorical data into complete data by imputing appropriate scores into missing cells. A solution of the optimization problem is suggested, and relevant psychometric theory is discussed. The average correlation should be at least 0.50 before the method becomes practical. (SLD)

  10. Impact of adding foreign genomic information on Mexican Holstein imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The impact of adding US and Canada genomic information to the imputation of Mexican Holstein genotypes was measured by comparing 3 scenarios: 1) 2,018 Mexican genotyped animals; 2) animals from scenario 1 plus 886 related North American animals; and 3) animals from scenario 1 and all North American ...

  11. PinSnps: structural and functional analysis of SNPs in the context of protein interaction networks

    PubMed Central

    Lu, Hui-Chun; Herrera Braga, Julián; Fraternali, Franca

    2016-01-01

    Summary: We present a practical computational pipeline to readily perform data analyses of protein–protein interaction networks by using genetic and functional information mapped onto protein structures. We provide a 3D representation of the available protein structure and its regions (surface, interface, core and disordered) for the selected genetic variants and/or SNPs, and a prediction of the mutants’ impact on the protein as measured by a range of methods. We have mapped in total 2587 genetic disorder-related SNPs from OMIM, 587 873 cancer-related variants from COSMIC, and 1 484 045 SNPs from dbSNP. All result data can be downloaded by the user together with an R-script to compute the enrichment of SNPs/variants in selected structural regions. Availability and Implementation: PinSnps is available as open-access service at http://fraternalilab.kcl.ac.uk/PinSnps/ Contact: franca.fraternali@kcl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153707

  12. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  13. Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm

    PubMed Central

    Hoffmann, Thomas J.; Zhan, Yiping; Kvale, Mark N.; Hesselson, Stephanie E.; Gollub, Jeremy; Iribarren, Carlos; Lu, Yontao; Mei, Gangwu; Purdy, Matthew M.; Quesenberry, Charles; Rowell, Sarah; Shapero, Michael H.; Smethurst, David; Somkin, Carol P.; Van den Eeden, Stephen K.; Walter, Larry; Webster, Teresa; Whitmer, Rachel A.; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil

    2012-01-01

    Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies. PMID:21903159

  14. A spatial haplotype copying model with applications to genotype imputation.

    PubMed

    Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

    2015-05-01

    Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data. PMID:25526526

  15. Fitting additive hazards models for case-cohort studies: a multiple imputation approach.

    PubMed

    Jung, Jinhyouk; Harel, Ofer; Kang, Sangwook

    2016-07-30

    In this paper, we consider fitting semiparametric additive hazards models for case-cohort studies using a multiple imputation approach. In a case-cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing-at-random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26194861

  16. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes

    PubMed Central

    2011-01-01

    Background Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data. Methods A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information. Results The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available. Conclusions The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets. PMID:21388557

  17. Multiple ant colony algorithm method for selecting tag SNPs.

    PubMed

    Liao, Bo; Li, Xiong; Zhu, Wen; Li, Renfa; Wang, Shulin

    2012-10-01

    The search for the association between complex disease and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. Finding a set of tag SNPs for haplotyping in a great number of samples is an important step to reduce cost for association study. Therefore, it is essential to select tag SNPs with more efficient algorithms. In this paper, we model problem of selection tag SNPs by MINIMUM TEST SET and use multiple ant colony algorithm (MACA) to search a smaller set of tag SNPs for haplotyping. The various experimental results on various datasets show that the running time of our method is less than GTagger and MLR. And MACA can find the most representative SNPs for haplotyping, so that MACA is more stable and the number of tag SNPs is also smaller than other evolutionary methods (like GTagger and NSGA-II). Our software is available upon request to the corresponding author. PMID:22480582

  18. Replication and Characterization of Association between ABO SNPs and Red Blood Cell Traits by Meta-Analysis in Europeans

    PubMed Central

    McLachlan, Stela; Giambartolomei, Claudia; Charoen, Pimphen; Wong, Andrew; Finan, Chris; Engmann, Jorgen; Shah, Tina; Hersch, Micha; Cavadino, Alana; Jefferis, Barbara J.; Dale, Caroline E.; Hypponen, Elina; Morris, Richard W.; Casas, Juan P.; Kumari, Meena; Ben-Shlomo, Yoav; Gaunt, Tom R.; Drenos, Fotios; Langenberg, Claudia; Kuh, Diana; Kivimaki, Mika; Rueedi, Rico; Waeber, Gerard; Hingorani, Aroon D.; Price, Jacqueline F.

    2016-01-01

    Red blood cell (RBC) traits are routinely measured in clinical practice as important markers of health. Deviations from the physiological ranges are usually a sign of disease, although variation between healthy individuals also occurs, at least partly due to genetic factors. Recent large scale genetic studies identified loci associated with one or more of these traits; further characterization of known loci and identification of new loci is necessary to better understand their role in health and disease and to identify potential molecular mechanisms. We performed meta-analysis of Metabochip association results for six RBC traits—hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC)—in 11 093 Europeans from seven studies of the UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium. We identified 394 non-overlapping SNPs in five loci at genome-wide significance: 6p22.1-6p21.33 (with HFE among others), 6q23.2 (with HBS1L among others), 6q23.3 (contains no genes), 9q34.3 (only ABO gene) and 22q13.1 (with TMPRSS6 among others), replicating previous findings of association with RBC traits at these loci and extending them by imputation to 1000 Genomes. We further characterized associations between ABO SNPs and three traits: hemoglobin, hematocrit and red blood cell count, replicating them in an independent cohort. Conditional analyses indicated the independent association of each of these traits with ABO SNPs and a role for blood group O in mediating the association. The 15 most significant RBC-associated ABO SNPs were also associated with five cardiometabolic traits, with discordance in the direction of effect between groups of traits, suggesting that ABO may act through more than one mechanism to influence cardiometabolic risk. PMID:27280446

  19. Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates.

    PubMed

    Quartagno, M; Carpenter, J R

    2016-07-30

    Recently, multiple imputation has been proposed as a tool for individual patient data meta-analysis with sporadically missing observations, and it has been suggested that within-study imputation is usually preferable. However, such within study imputation cannot handle variables that are completely missing within studies. Further, if some of the contributing studies are relatively small, it may be appropriate to share information across studies when imputing. In this paper, we develop and evaluate a joint modelling approach to multiple imputation of individual patient data in meta-analysis, with an across-study probability distribution for the study specific covariance matrices. This retains the flexibility to allow for between-study heterogeneity when imputing while allowing (i) sharing information on the covariance matrix across studies when this is appropriate, and (ii) imputing variables that are wholly missing from studies. Simulation results show both equivalent performance to the within-study imputation approach where this is valid, and good results in more general, practically relevant, scenarios with studies of very different sizes, non-negligible between-study heterogeneity and wholly missing variables. We illustrate our approach using data from an individual patient data meta-analysis of hypertension trials. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. PMID:26681666

  20. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ...'s knowledge, approval or acquiescence. The organization's acceptance of the benefits derived from the conduct is evidence of knowledge, approval or acquiescence. (b) Conduct imputed from an... individual to whom the improper conduct is imputed either participated in, had knowledge of, or reason...

  1. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  2. A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

    ERIC Educational Resources Information Center

    Wolkowitz, Amanda A.; Skorupski, William P.

    2013-01-01

    When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…

  3. Estimation of missing rainfall data using spatial interpolation and imputation methods

    NASA Astrophysics Data System (ADS)

    Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Azman, Muhammad Az-zuhri

    2015-02-01

    This study is aimed to estimate missing rainfall data by dividing the analysis into three different percentages namely 5%, 10% and 20% in order to represent various cases of missing data. In practice, spatial interpolation methods are chosen at the first place to estimate missing data. These methods include normal ratio (NR), arithmetic average (AA), coefficient of correlation (CC) and inverse distance (ID) weighting methods. The methods consider the distance between the target and the neighbouring stations as well as the correlations between them. Alternative method for solving missing data is an imputation method. Imputation is a process of replacing missing data with substituted values. A once-common method of imputation is single-imputation method, which allows parameter estimation. However, the single imputation method ignored the estimation of variability which leads to the underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. In this study, comparison of spatial interpolation methods and multiple imputations method are presented to estimate missing rainfall data. The performance of the estimation methods used are assessed using the similarity index (S-index), mean absolute error (MAE) and coefficient of correlation (R).

  4. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  5. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model

    PubMed Central

    Seaman, Shaun R; White, Ian R; Carpenter, James R

    2015-01-01

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available. PMID:24525487

  6. Methods of Imputation used in the USDA National Nutrient Database for Standard Reference

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Objective: To present the predominate methods of imputing used to estimate nutrient values for foods in the USDA National Nutrient Database for Standard Reference (SR20). Materials and Methods: The USDA Nutrient Data Laboratory developed standard methods for imputing nutrient values for foods wh...

  7. Imputation of Missing Genotypes From Sparse to High Density Using Long-Range Phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals in a population share long chromosome segments which trace to a common ancestor. We describe a long-range phasing algorithm that makes use of this property to phase whole chromosomes and simultaneously impute a large number of missing markers. We test our method by imputing marke...

  8. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  9. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  10. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  11. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  12. Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

    PubMed Central

    Deng, Yi; Chang, Changgee; Ido, Moges Seyoum; Long, Qi

    2016-01-01

    Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples. PMID:26868061

  13. Data supporting the high-accuracy haplotype imputation using unphased genotype data as the references.

    PubMed

    Li, Wenzhi; Xu, Wei; He, Shaohua; Ma, Li; Song, Qing

    2016-09-01

    The data presented in this article is related to the research article entitled "High-accuracy haplotype imputation using unphased genotype data as the references" which reports the unphased genotype data can be used as reference for haplotyping imputation [1]. This article reports different implementation generation pipeline, the results of performance comparison between different implementations (A, B, and C) and between HiFi and three major imputation software tools. Our data showed that the performances of these three implementations are similar on accuracy, in which the accuracy of implementation-B is slightly but consistently higher than A and C. HiFi performed better on haplotype imputation accuracy and three other software performed slightly better on genotype imputation accuracy. These data may provide a strategy for choosing optimal phasing pipeline and software for different studies. PMID:27595130

  14. Imputation of KIR Types from SNP Variation Data.

    PubMed

    Vukcevic, Damjan; Traherne, James A; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-10-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  15. Imputation of KIR Types from SNP Variation Data

    PubMed Central

    Vukcevic, Damjan; Traherne, James A.; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H.; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-01-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  16. Imputation for semiparametric transformation models with biased-sampling data

    PubMed Central

    Liu, Hao; Qin, Jing; Shen, Yu

    2012-01-01

    Widely recognized in many fields including economics, engineering, epidemiology, health sciences, technology and wildlife management, length-biased sampling generates biased and right-censored data but often provide the best information available for statistical inference. Different from traditional right-censored data, length-biased data have unique aspects resulting from their sampling procedures. We exploit these unique aspects and propose a general imputation-based estimation method for analyzing length-biased data under a class of flexible semiparametric transformation models. We present new computational algorithms that can jointly estimate the regression coefficients and the baseline function semiparametrically. The imputation-based method under the transformation model provides an unbiased estimator regardless whether the censoring is independent or not on the covariates. We establish large-sample properties using the empirical processes method. Simulation studies show that under small to moderate sample sizes, the proposed procedure has smaller mean square errors than two existing estimation procedures. Finally, we demonstrate the estimation procedure by a real data example. PMID:22903245

  17. Doubly robust multiple imputation using kernel-based techniques.

    PubMed

    Hsu, Chiu-Hsieh; He, Yulei; Li, Yisheng; Long, Qi; Friese, Randall

    2016-05-01

    We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real-data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data. PMID:26647734

  18. RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION

    PubMed Central

    Zhang, Zhongyang; Lange, Kenneth; Ophoff, Roel; Sabatti, Chiara

    2011-01-01

    Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18–29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way. We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost. PMID:21572975

  19. Performance of random forest when SNPs are in linkage disequilibrium

    PubMed Central

    Meng, Yan A; Yu, Yi; Cupples, L Adrienne; Farrer, Lindsay A; Lunetta, Kathryn L

    2009-01-01

    Background Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF. Results We evaluated the performance of our alternative methods by simulation of a spectrum of complex genetics models. When a haplotype rather than an individual SNP is the risk factor, we find that the original Random Forest method performed on SNPs provides good performance. When individual, genotyped SNPs are the risk factors, we find that the stronger the genetic effect, the stronger the effect LD has on the performance of the original RF. A revised importance measure used with the original RF is relatively robust to LD among SNPs; this revised importance measure used with the revised RF is sometimes inflated. Overall, we find that the revised importance measure used with the original RF is the best choice when the genetic model and the number of SNPs in LD with risk SNPs are unknown. For the haplotype-based method, under a multiplicative heterogeneity model, we observed a decrease in the performance of RF with increasing LD among the SNPs in the haplotype. Conclusion Our results suggest that by strategically revising the Random Forest method tree-building or importance measure calculation, power can increase when LD exists between SNPs. We conclude that the revised Random Forest method performed on SNPs offers an advantage of not requiring genotype phase, making it a viable tool for use in the

  20. A two-step semiparametric method to accommodate sampling weights in multiple imputation.

    PubMed

    Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trviellore E

    2016-03-01

    Multiple imputation (MI) is a well-established method to handle item-nonresponse in sample surveys. Survey data obtained from complex sampling designs often involve features that include unequal probability of selection. MI requires imputation to be congenial, that is, for the imputations to come from a Bayesian predictive distribution and for the observed and complete data estimator to equal the posterior mean given the observed or complete data, and similarly for the observed and complete variance estimator to equal the posterior variance given the observed or complete data; more colloquially, the analyst and imputer make similar modeling assumptions. Yet multiply imputed data sets from complex sample designs with unequal sampling weights are typically imputed under simple random sampling assumptions and then analyzed using methods that account for the sampling weights. This is a setting in which the analyst assumes more than the imputer, which can led to biased estimates and anti-conservative inference. Less commonly used alternatives such as including case weights as predictors in the imputation model typically require interaction terms for more complex estimators such as regression coefficients, and can be vulnerable to model misspecification and difficult to implement. We develop a simple two-step MI framework that accounts for sampling weights using a weighted finite population Bayesian bootstrap method to validly impute the whole population (including item nonresponse) from the observed data. In the second step, having generated posterior predictive distributions of the entire population, we use standard IID imputation to handle the item nonresponse. Simulation results show that the proposed method has good frequentist properties and is robust to model misspecification compared to alternative approaches. We apply the proposed method to accommodate missing data in the Behavioral Risk Factor Surveillance System when estimating means and parameters of

  1. Combining multiple imputation and meta-analysis with individual participant data.

    PubMed

    Burgess, Stephen; White, Ian R; Resche-Rigon, Matthieu; Wood, Angela M

    2013-11-20

    Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. PMID:23703895

  2. miRdSNP: a database of disease-associated SNPs and microRNA target sites on 3'UTRs of human genes

    PubMed Central

    2012-01-01

    Background Single nucleotide polymorphisms (SNPs) can lead to the susceptibility and onset of diseases through their effects on gene expression at the posttranscriptional level. Recent findings indicate that SNPs could create, destroy, or modify the efficiency of miRNA binding to the 3'UTR of a gene, resulting in gene dysregulation. With the rapidly growing number of published disease-associated SNPs (dSNPs), there is a strong need for resources specifically recording dSNPs on the 3'UTRs and their nucleotide distance from miRNA target sites. We present here miRdSNP, a database incorporating three important areas of dSNPs, miRNA target sites, and diseases. Description miRdSNP provides a unique database of dSNPs on the 3'UTRs of human genes manually curated from PubMed. The current release includes 786 dSNP-disease associations for 630 unique dSNPs and 204 disease types. miRdSNP annotates genes with experimentally confirmed targeting by miRNAs and indexes miRNA target sites predicted by TargetScan and PicTar as well as potential miRNA target sites newly generated by dSNPs. A robust web interface and search tools are provided for studying the proximity of miRNA binding sites to dSNPs in relation to human diseases. Searches can be dynamically filtered by gene name, miRBase ID, target prediction algorithm, disease, and any nucleotide distance between dSNPs and miRNA target sites. Results can be viewed at the sequence level showing the annotated locations for miRNA target sites and dSNPs on the entire 3'UTR sequences. The integration of dSNPs with the UCSC Genome browser is also supported. Conclusion miRdSNP provides a comprehensive data source of dSNPs and robust tools for exploring their distance from miRNA target sites on the 3'UTRs of human genes. miRdSNP enables researchers to further explore the molecular mechanism of gene dysregulation for dSNPs at posttranscriptional level. miRdSNP is freely available on the web at http://mirdsnp.ccr.buffalo.edu. PMID:22276777

  3. Localization of Allotetraploid Gossypium SNPs Using Physical Mapping Resources

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent efforts in Gossypium SNP development have produced thousands of putative SNPs for G. barbadense, G. mustelinum, and G. tomentosum relative to G. hirsutum. Here we report on current efforts to localize putative SNPs using physical mapping resources. Recent advances in physical mapping resour...

  4. Characterization of SNPs in strawberry cultivars in China.

    PubMed

    Ge, A J; Han, J; Li, X D; Zhao, M Z; Liu, H; Dong, Q H; Fang, J G

    2013-01-01

    Single nucleotide polymorphisms (SNPs) occur at high frequencies in both plant and animal genomes and can provide broad genome coverage and reliable estimates of genetic relationships. The availability of expressed sequence tag (EST) data has made it feasible to discover SNPs. DNA analysis is crucial in genetic studies not only for strawberry breeding programs but also for characterization of hybrids and species. We cloned 96 EST sequences, and 116 SNPs were discovered by comparing 16 strawberry cultivars grown in the region of Nanjing, China. Sequence alignment of 6 group sequences derived from 16 sample cultivars yielded 116 SNPs, within a total genomic sequence length of 1755 bp. The SNPs were discovered with a mean frequency of one SNP per 15 bp. These SNPs were comprised of 57% transitions, 32.7% transversions, 8.6% InDels, and 1.7% others, based on which a phylogenetic tree was constructed. Among the 116 SNPs, 75% were located within the open reading frame (ORF), while 25% were located outside the ORF. All 16 cultivars scattered well in dendrogram derived from the SNP data, demonstrating that SNPs can be a powerful tool for cultivar identification and genetic diversity analysis in strawberries. PMID:23546945

  5. The distribution of SNPs in human gene regulatory regions

    PubMed Central

    Guo, Yongjian; Jamison, D Curtis

    2005-01-01

    Background As a result of high-throughput genotyping methods, millions of human genetic variants have been reported in recent years. To efficiently identify those with significant biological functions, a practical strategy is to concentrate on variants located in important sequence regions such as gene regulatory regions. Results Analysis of the most common type of variant, single nucleotide polymorphisms (SNPs), shows that in gene promoter regions more SNPs occur in close proximity to transcriptional start sites than in regions further upstream, and a disproportionate number of those SNPs represent nucleotide transversions. Additionally, the number of SNPs found in the predicted transcription factor binding sites is higher than in non-binding site sequences. Conclusion Current information about transcription factor binding site sequence patterns may not be exhaustive, and SNPs may be actively involved in influencing gene expression by affecting the transcription factor binding sites. PMID:16209714

  6. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

    PubMed

    Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  7. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data

    PubMed Central

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  8. Association Studies with Imputed Variants Using Expectation-Maximization Likelihood-Ratio Tests

    PubMed Central

    Huang, Kuan-Chieh; Sun, Wei; Wu, Ying; Chen, Mengjie; Mohlke, Karen L.; Lange, Leslie A.; Li, Yun

    2014-01-01

    Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women’s Health Initiative Study, provide further support to the validity and efficiency of our proposed methods. PMID:25383782

  9. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels.

    PubMed

    Gao, Xiaoyi; Haritunians, Talin; Marjoram, Paul; McKean-Cowdin, Roberta; Torres, Mina; Taylor, Kent D; Rotter, Jerome I; Gauderman, William J; Varma, Rohit

    2012-01-01

    Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos. PMID:22754564

  10. Dealing with missing values in large-scale studies: microarray data imputation and beyond.

    PubMed

    Aittokallio, Tero

    2010-03-01

    High-throughput biotechnologies, such as gene expression microarrays or mass-spectrometry-based proteomic assays, suffer from frequent missing values due to various experimental reasons. Since the missing data points can hinder downstream analyses, there exists a wide variety of ways in which to deal with missing values in large-scale data sets. Nowadays, it has become routine to estimate (or impute) the missing values prior to the actual data analysis. After nearly a decade since the publication of the first missing value imputation methods for gene expression microarray data, new imputation approaches are still being developed at an increasing rate. However, what is lagging behind is a systematic and objective evaluation of the strengths and weaknesses of the different approaches when faced with different types of data sets and experimental questions. In this review, the present strategies for missing value imputation and the measures for evaluating their performance are described. The imputation methods are first reviewed in the context of gene expression microarray data, since most of the methods have been developed for estimating gene expression levels; then, we turn to other large-scale data sets that also suffer from the problems posed by missing values, together with pointers to possible imputation approaches in these settings. Along with a description of the basic principles behind the different imputation approaches, the review tries to provide practical guidance for the users of high-throughput technologies on how to choose the imputation tool for their data and questions, and some additional research directions for the developers of imputation methodologies. PMID:19965979

  11. Traffic Speed Data Imputation Method Based on Tensor Completion

    PubMed Central

    Ran, Bin; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches. PMID:25866501

  12. A multiple imputation strategy for sequential multiple assignment randomized trials.

    PubMed

    Shortreed, Susan M; Laber, Eric; Scott Stroup, T; Pineau, Joelle; Murphy, Susan A

    2014-10-30

    Sequential multiple assignment randomized trials (SMARTs) are increasingly being used to inform clinical and intervention science. In a SMART, each patient is repeatedly randomized over time. Each randomization occurs at a critical decision point in the treatment course. These critical decision points often correspond to milestones in the disease process or other changes in a patient's health status. Thus, the timing and number of randomizations may vary across patients and depend on evolving patient-specific information. This presents unique challenges when analyzing data from a SMART in the presence of missing data. This paper presents the first comprehensive discussion of missing data issues typical of SMART studies: we describe five specific challenges and propose a flexible imputation strategy to facilitate valid statistical estimation and inference using incomplete data from a SMART. To illustrate these contributions, we consider data from the Clinical Antipsychotic Trial of Intervention and Effectiveness, one of the most well-known SMARTs to date. PMID:24919867

  13. Data imputation through the identification of local anomalies.

    PubMed

    Ozkan, Huseyin; Pelvan, Ozgun Soner; Kozat, Suleyman S

    2015-10-01

    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. PMID:25608311

  14. Differential Network Analysis with Multiply Imputed Lipidomic Data

    PubMed Central

    Kujala, Maiju; Nevalainen, Jaakko; März, Winfried; Laaksonen, Reijo; Datta, Susmita

    2015-01-01

    The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. PMID:25822937

  15. High-accuracy haplotype imputation using unphased genotype data as the references.

    PubMed

    Li, Wenzhi; Xu, Wei; Fu, Guoxing; Ma, Li; Richards, Jendai; Rao, Weinian; Bythwood, Tameka; Guo, Shiwen; Song, Qing

    2015-11-10

    Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era. PMID:26232609

  16. In Silico Analysis of FMR1 Gene Missense SNPs.

    PubMed

    Tekcan, Akin

    2016-06-01

    The FMR1 gene, a member of the fragile X-related gene family, is responsible for fragile X syndrome (FXS). Missense single-nucleotide polymorphisms (SNPs) are responsible for many complex diseases. The effect of FMR1 gene missense SNPs is unknown. The aim of this study, using in silico techniques, was to analyze all known missense mutations that can affect the functionality of the FMR1 gene, leading to mental retardation (MR) and FXS. Data on the human FMR1 gene were collected from the Ensembl database (release 81), National Centre for Biological Information dbSNP Short Genetic Variations database, 1000 Genomes Browser, and NHLBI Exome Sequencing Project Exome Variant Server. In silico analysis was then performed. One hundred-twenty different missense SNPs of the FMR1 gene were determined. Of these, 11.66 % of the FMR1 gene missense SNPs were in highly conserved domains, and 83.33 % were in domains with high variety. The results of the in silico prediction analysis showed that 31.66 % of the FMR1 gene SNPs were disease related and that 50 % of SNPs had a pathogenic effect. The results of the structural and functional analysis revealed that although the R138Q mutation did not seem to have a damaging effect on the protein, the G266E and I304N SNPs appeared to disturb the interaction between the domains and affect the function of the protein. This is the first study to analyze all missense SNPs of the FMR1 gene. The results indicate the applicability of a bioinformatics approach to FXS and other FMR1-related diseases. I think that the analysis of FMR1 gene missense SNPs using bioinformatics methods would help diagnosis of FXS and other FMR1-related diseases. PMID:26880065

  17. SNPs selection using support vector regression and genetic algorithms in GWAS

    PubMed Central

    2014-01-01

    Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332

  18. Regulatory SNPs in complex diseases: their identification and functional validation.

    PubMed

    Prokunina, Ludmila; Alarcón-Riquelme, Marta E

    2004-04-01

    Finding the genetic causes for complex diseases is a challenge. Expression studies have shown that the level of expression of many genes is altered in disease compared with normal conditions, but what lies behind these changes? Linkage studies provide hints as to where in the genome the genetic triggers--the mutations--might be located. Fine-mapping and association studies can give yet more information about which genes, and which changes in the genes, are involved in the disease. Recent examples show that single-nucleotide polymorphisms (SNPs), which are variations at the single-nucleotide level within an individual's DNA, in the regulatory regions of some genes constitute susceptibility factors in many complex diseases. This article discusses the nature of regulatory SNPs (rSNPs) and techniques for their functional validation, and looks towards what rSNPs can tell us about complex diseases. PMID:15122975

  19. Joint multiple imputation for longitudinal outcomes and clinical events that truncate longitudinal follow-up.

    PubMed

    Hu, Bo; Li, Liang; Greene, Tom

    2016-07-30

    Longitudinal cohort studies often collect both repeated measurements of longitudinal outcomes and times to clinical events whose occurrence precludes further longitudinal measurements. Although joint modeling of the clinical events and the longitudinal data can be used to provide valid statistical inference for target estimands in certain contexts, the application of joint models in medical literature is currently rather restricted because of the complexity of the joint models and the intensive computation involved. We propose a multiple imputation approach to jointly impute missing data of both the longitudinal and clinical event outcomes. With complete imputed datasets, analysts are then able to use simple and transparent statistical methods and standard statistical software to perform various analyses without dealing with the complications of missing data and joint modeling. We show that the proposed multiple imputation approach is flexible and easy to implement in practice. Numerical results are also provided to demonstrate its performance. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26179943

  20. Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.

    PubMed

    Blue, Elizabeth M; Sun, Lei; Tintle, Nathan L; Wijsman, Ellen M

    2014-09-01

    When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184

  1. Model-based imputation approach for data analysis in the presence of non-detects.

    PubMed

    Krishnamoorthy, K; Mallick, Avishek; Mathew, Thomas

    2009-04-01

    A model-based multiple imputation approach for analyzing sample data with non-detects is proposed. The imputation approach involves randomly generating observations below the detection limit using the detected sample values and then analyzing the data using complete sample techniques, along with suitable adjustments to account for the imputation. The method is described for the normal case and is illustrated for making inferences for constructing prediction limits, tolerance limits, for setting an upper bound for an exceedance probability and for interval estimation of a log-normal mean. Two imputation approaches are investigated in the paper: one uses approximate maximum likelihood estimates (MLEs) of the parameters and a second approach uses simple ad hoc estimates that were developed for the specific purpose of imputations. The accuracy of the approaches is verified using Monte Carlo simulation. Simulation studies show that both approaches are very satisfactory for small to moderately large sample sizes, but only the MLE-based approach is satisfactory for large sample sizes. The MLE-based approach can be calibrated to perform very well for large samples. Applicability of the method to the log-normal distribution and the gamma distribution (via a cube root transformation) is outlined. Simulation studies also show that the imputation approach works well for constructing tolerance limits and prediction limits for a gamma distribution. The approach is illustrated using a few practical examples. PMID:19181626

  2. Large-scale epigenome imputation improves data quality and disease variant enrichment

    PubMed Central

    Ernst, Jason; Kellis, Manolis

    2015-01-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals, and surpass experimental datasets in consistency, recovery of gene annotations, and enrichment for disease-associated variants. We use the imputed data to detect low quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments, and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  3. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  4. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs).

    PubMed

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W; Benitez, Javier; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M; Couch, Fergus J; Cox, Angela; Cross, Simon S; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F; Fasching, Peter A; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G; Goldberg, Mark S; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L; Muir, Kenneth; Neuhausen, Susan L; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Stram, Daniel O; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tessier, Daniel C; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M; Vincent, Daniel; Winqvist, Robert; Wu, Anna H; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D P; Hall, Per; Edwards, Stacey L; Simard, Jacques; French, Juliet D; Chenevix-Trench, Georgia; Dunning, Alison M

    2016-01-01

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90-0.94; P = 8.96 × 10(-15))) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10(-09), r(2) = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10(-11), r(2) = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus. PMID:27600471

  5. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs)

    PubMed Central

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K.; Wang, Qin; Dennis, Joe; Alonso, M. Rosario; Andrulis, Irene L.; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W.; Benitez, Javier; Bogdanova, Natalia V.; Bojesen, Stig E.; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M.; Couch, Fergus J.; Cox, Angela; Cross, Simon S.; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F.; Fasching, Peter A.; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G.; Goldberg, Mark S.; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A.; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L.; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L.; Muir, Kenneth; Neuhausen, Susan L.; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J.; Schmidt, Marjanka K.; Schmutzler, Rita K.; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C.; Stram, Daniel O.; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H.; Tessier, Daniel C.; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M.; Vincent, Daniel; Winqvist, Robert; Wu, Anna H.; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D. P.; Hall, Per; Edwards, Stacey L.; Simard, Jacques; French, Juliet D.; Chenevix-Trench, Georgia; Dunning, Alison M.

    2016-01-01

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90–0.94; P = 8.96 × 10−15)) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10−09, r2 = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10−11, r2 = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus. PMID:27600471

  6. Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection.

    PubMed

    Toghiani, S; Aggrey, S E; Rekaya, R

    2016-07-01

    Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval

  7. Shrinkage regression-based methods for microarray missing value imputation

    PubMed Central

    2013-01-01

    Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods. PMID:24565159

  8. Assets of imputation to ultra-high density for productive and functional traits.

    PubMed

    Jiménez-Montero, J A; Gianola, D; Weigel, K; Alenda, R; González-Recio, O

    2013-09-01

    The aim of this study was to evaluate different-density genotyping panels for genotype imputation and genomic prediction. Genotypes from customized Golden Gate Bovine3K BeadChip [LD3K; low-density (LD) 3,000-marker (3K); Illumina Inc., San Diego, CA] and BovineLD BeadChip [LD6K; 6,000-marker (6K); Illumina Inc.] panels were imputed to the BovineSNP50v2 BeadChip [50K; 50,000-marker; Illumina Inc.]. In addition, LD3K, LD6K, and 50K genotypes were imputed to a BovineHD BeadChip [HD; high-density 800,000-marker (800K) panel], and with predictive ability evaluated and compared subsequently. Comparisons of prediction accuracy were carried out using Random boosting and genomic BLUP. Four traits under selection in the Spanish Holstein population were used: milk yield, fat percentage (FP), somatic cell count, and days open (DO). Training sets at 50K density for imputation and prediction included 1,632 genotypes. Testing sets for imputation from LD to 50K contained 834 genotypes and testing sets for genomic evaluation included 383 bulls. The reference population genotyped at HD included 192 bulls. Imputation using BEAGLE software (http://faculty.washington.edu/browning/beagle/beagle.html) was effective for reconstruction of dense 50K and HD genotypes, even when a small reference population was used, with 98.3% of SNP correctly imputed. Random boosting outperformed genomic BLUP in terms of prediction reliability, mean squared error, and selection effectiveness of top animals in the case of FP. For other traits, however, no clear differences existed between methods. No differences were found between imputed LD and 50K genotypes, whereas evaluation of genotypes imputed to HD was on average across data set, method, and trait, 4% more accurate than 50K prediction, and showed smaller (2%) mean squared error of predictions. Similar bias in regression coefficients was found across data sets but regressions were 0.32 units closer to unity for DO when genotypes were imputed to HD

  9. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed Central

    Meseck, Kristin; Jankowska, Marta M.; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and post-imputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  10. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets

    SciTech Connect

    Torres-García, Wandaliz; Brown, Steven D; Johnson, Roger; Zhang, Weiwen; Runger, George; Meldrum, Deirdre

    2011-01-01

    Despite significant improvements in recent years, proteomic datasets currently available still suffer large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic da-tasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values for proteins experi-mentally undetected, using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes expression was measured after the cells were exposed to 1 mM potassium chromate for 5-, 30-, 60-, and 90-min, while protein abundance was measured only for 45- and 90-min samples. With the goal of elucidating the relationship between temporal gene expression and protein abundance data, and then using it to impute missing proteomic values for samples of 45-min (which does not have cognate transcriptomic data) and 90-min, we initially used nonlinear Smoothing Splines Curve Fitting (SSCF) to identify temporal relationships among transcriptomic data at different time points and then imputed missing gene expression measurements for the sample at 45-min. After the imputation was validated by biological constrains (i.e. operons), we used a data-driven Gradient Boosted Trees (GBT) model to uncover possible non-linear relationships between temporal transcriptomic and proteomic data, and to impute protein abundance for the proteins experimentally undetected in the 45- and 90-min sam-ples, based on relevant predictors such as temporal mRNA gene expression data, cellular roles, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. The imputed protein values were validated using biological constraints such as operon, regulon and pathway information. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.

  11. 29 CFR 98.630 - May the Department of Labor impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... organization when the improper conduct occurred in connection with a partnership, joint venture, joint... impute the fraudulent, criminal, or other improper conduct of any officer, director, shareholder,...

  12. Analysis of mitochondrial transcription factor A SNPs in alcoholic cirrhosis

    PubMed Central

    TANG, CHUN; LIU, HONGMING; TANG, YONGLIANG; GUO, YONG; LIANG, XIANCHUN; GUO, LIPING; PI, RUXIAN; YANG, JUNTAO

    2014-01-01

    Genetic susceptibility to alcoholic cirrhosis (AC) exists. We previously demonstrated hepatic mitochondrial DNA (mtDNA) damage in patients with AC compared with chronic alcoholics without cirrhosis. Mitochondrial transcription factor A (mtTFA) is central to mtDNA expression regulation and repair; however, it is unclear whether there are specific mtTFA single nucleotide polymorphisms (SNPs) in patients with AC and whether they affect mtDNA repair. In the present study, we screened mtTFA SNPs in patients with AC and analyzed their impact on the copy number of mtDNA in AC. A total of 50 patients with AC, 50 alcoholics without AC and 50 normal subjects were enrolled in the study. SNPs of full-length mtTFA were analyzed using the polymerase chain reaction (PCR) combined with gene sequencing. The hepatic mtTFA mRNA and mtDNA copy numbers were measured using quantitative PCR (qPCR), and mtTFA protein was measured using western blot analysis. A total of 18 mtTFA SNPs specific to patients with AC with frequencies >10% were identified. Two were located in the coding region and 16 were identified in non-coding regions. Conversely, there were five SNPs that were only present in patients with AC and normal subjects and had a frequency >10%. In the AC group, the hepatic mtTFA mRNA and protein levels were significantly lower than those in the other two groups. Moreover, the hepatic mtDNA copy number was significantly lower in the AC group than in the controls and alcoholics without AC. Based on these data, we conclude that AC-specific mtTFA SNPs may be responsible for the observed reductions in mtTFA mRNA, protein levels and mtDNA copy number and they may also increase the susceptibility to AC. PMID:24348767

  13. Establishment of a pipeline to analyse non-synonymous SNPs in Bos taurus

    PubMed Central

    Lee, Michael A; Keane, Orla M; Glass, Belinda C; Manley, Tim R; Cullen, Neil G; Dodds, Ken G; McCulloch, Alan F; Morris, Chris A; Schreiber, Mark; Warren, Jonathan; Zadissa, Amonida; Wilson, Theresa; McEwan, John C

    2006-01-01

    Background Single nucleotide polymorphisms (SNPs) are an abundant form of genetic variation in the genome of every species and are useful for gene mapping and association studies. Of particular interest are non-synonymous SNPs, which may alter protein function and phenotype. We therefore examined bovine expressed sequences for non-synonymous SNPs and validated and tested selected SNPs for their association with measured traits. Results Over 500,000 public bovine expressed sequence tagged (EST) sequences were used to search for coding SNPs (cSNPs). A total of 15,353 SNPs were detected in the transcribed sequences studied, of which 6,325 were predicted to be coding SNPs with the remaining 9,028 SNPs presumed to be in untranslated regions. Of the cSNPs detected, 2,868 were predicted to result in a change in the amino acid encoded. In order to determine the actual number of non-synonymous polymorphic SNPs we designed assays for 920 of the putative SNPs. These SNPs were then genotyped through a panel of cattle DNA pools using chip-based MALDI-TOF mass spectrometry. Of the SNPs tested, 29% were found to be polymorphic with a minor allele frequency >10%. A subset of the SNPs was genotyped through animal resources in order to look for association with age of puberty, facial eczema resistance or meat yield. Three SNPs were nominally associated with resistance to the disease facial eczema (P < 0.01). Conclusion We have identified 15,353 putative SNPs in or close to bovine genes and 2,868 of these SNPs were predicted to be non-synonymous. Approximately 29% of the non-synonymous SNPs were polymorphic and common with a minor allele frequency >10%. Of the SNPs detected in this study, 99% have not been previously reported. These novel SNPs will be useful for association studies or gene mapping. PMID:17125523

  14. Imputation method for lifetime exposure assessment in air pollution epidemiologic studies

    PubMed Central

    2013-01-01

    Background Environmental epidemiology, when focused on the life course of exposure to a specific pollutant, requires historical exposure estimates that are difficult to obtain for the full time period due to gaps in the historical record, especially in earlier years. We show that these gaps can be filled by applying multiple imputation methods to a formal risk equation that incorporates lifetime exposure. We also address challenges that arise, including choice of imputation method, potential bias in regression coefficients, and uncertainty in age-at-exposure sensitivities. Methods During time periods when parameters needed in the risk equation are missing for an individual, the parameters are filled by an imputation model using group level information or interpolation. A random component is added to match the variance found in the estimates for study subjects not needing imputation. The process is repeated to obtain multiple data sets, whose regressions against health data can be combined statistically to develop confidence limits using Rubin’s rules to account for the uncertainty introduced by the imputations. To test for possible recall bias between cases and controls, which can occur when historical residence location is obtained by interview, and which can lead to misclassification of imputed exposure by disease status, we introduce an “incompleteness index,” equal to the percentage of dose imputed (PDI) for a subject. “Effective doses” can be computed using different functional dependencies of relative risk on age of exposure, allowing intercomparison of different risk models. To illustrate our approach, we quantify lifetime exposure (dose) from traffic air pollution in an established case–control study on Long Island, New York, where considerable in-migration occurred over a period of many decades. Results The major result is the described approach to imputation. The illustrative example revealed potential recall bias, suggesting that regressions

  15. Multimodal diagnosis of epilepsy using conditional dependence and multiple imputation

    PubMed Central

    Kerr, Wesley T.; Hwang, Eric S.; Raman, Kaavya R.; Barritt, Sarah E.; Patel, Akash B.; Le, Justine M.; Hori, Jessica M.; Davis, Emily C.; Braesch, Chelsea T.; Janio, Emily A.; Lau, Edward P.; Cho, Andrew Y.; Anderson, Ariana; Silverman, Daniel H.S.; Salamon, Noriko; Engel, Jerome; Stern, John M.; Cohen, Mark S.

    2014-01-01

    The definitive diagnosis of the type of epilepsy, if it exists, in medication-resistant seizure disorder is based on the efficient combination of clinical information, long-term video-electroencephalography (EEG) and neuroimaging. Diagnoses are reached by a consensus panel that combines these diverse modalities using clinical wisdom and experience. Here we compare two methods of multimodal computer-aided diagnosis, vector concatenation (VC) and conditional dependence (CD), using clinical archive data from 645 patients with medication-resistant seizure disorder, confirmed by video-EEG. CD models the clinical decision process, whereas VC allows for statistical modeling of cross-modality interactions. Due to the nature of clinical data, not all information was available in all patients. To overcome this, we multiply-imputed the missing data. Using a C4.5 decision tree, single modality classifiers achieved 53.1%, 51.5% and 51.1% average accuracy for MRI, clinical information and FDG-PET, respectively, for the discrimination between non-epileptic seizures, temporal lobe epilepsy, other focal epilepsies and generalized-onset epilepsy (vs. chance, p<0.01). Using VC, the average accuracy was significantly lower (39.2%). In contrast, the CD classifier that classified with MRI then clinical information achieved an average accuracy of 58.7% (vs. VC, p<0.01). The decrease in accuracy of VC compared to the MRI classifier illustrates how the addition of more informative features does not improve performance monotonically. The superiority of conditional dependence over vector concatenation suggests that the structure imposed by conditional dependence improved our ability to model the underlying diagnostic trends in the multimodality data. PMID:25311448

  16. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy

    PubMed Central

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy. PMID:26283989

  17. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes

    PubMed Central

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn GA; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John RB

    2016-01-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7 879 351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10−8), with minor allele frequencies of 1.3–23.9%. Novel signals included variants for progesterone (P=7.68 × 10−12), oestradiol (P=1.63 × 10−8) and FAI (P=1.50 × 10−8). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10−8) and LH (P=3.94 × 10−9) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10−14) and progesterone (P=6.09 × 10−14). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation. PMID:26014426

  18. Can we spin straw into gold? An evaluation of immigrant legal status imputation approaches.

    PubMed

    Van Hook, Jennifer; Bachmeier, James D; Coffman, Donna L; Harel, Ofer

    2015-02-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants' legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants' legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  19. PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

    PubMed Central

    Livne, Oren E.; Han, Lide; Alkorta-Aranburu, Gorka; Wentworth-Sheilds, William; Abney, Mark; Ober, Carole; Nicolae, Dan L.

    2015-01-01

    Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost. PMID:25735005

  20. Multiple imputation and analysis for high-dimensional incomplete proteomics data.

    PubMed

    Yin, Xiaoyan; Levy, Daniel; Willinger, Christine; Adourian, Aram; Larson, Martin G

    2016-04-15

    Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ≫ N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤ 40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. PMID:26565662

  1. Pharmacogenomics: accessing important alleles by imputation from commercial genome-wide SNP arrays.

    PubMed

    Liboredo, R; Pena, S D J

    2014-01-01

    Personalized medicine is becoming a medical reality, as important genotype-phenotype relationships are being unraveled. The availability of pharmacogenomic data is a key element of individualized care. In this study, we explored genotype imputation as a means to infer important pharmacogenomic alleles from a regular commercially available genome-wide SNP array. Using these arrays as a starting point can reduce testing costs, increasing access to these pharmacogenomic data and still retain a larger amount of genome-wide information. IMPUTE2 and MaCH-Admix were used to perform genotype imputation with a dense reference panel from 1000 Genomes data. We were able to correctly infer genotypes for the warfarin-related loci VKORC1 and CYP2C9 alleles 2, 3, 5, and 11 and also clopidogrel-related CYP2C19 alleles 2 and 17 for a small sample of Brazilian individuals, as well as for HapMap samples. The success of an imputation approach in admixed samples using publicly available reference panels can encourage further imputation initiatives in those populations. PMID:25117329

  2. Model-based imputation of latent cigarette counts using data from a calibration study.

    PubMed

    Griffith, Sandra D; Shiffman, Saul; Li, Yimei; Heitjan, Daniel F

    2016-06-01

    In addition to dichotomous measures of abstinence, smoking studies may use daily cigarette consumption as an outcome variable. These counts hold the promise of more efficient and detailed analyses than dichotomous measures, but present serious quality issues - measurement error and heaping - if obtained by retrospective recall. A doubly-coded dataset with a retrospective recall measurement (timeline followback, TLFB) and a more precise instantaneous measurement (ecological momentary assessment, EMA) serves as a calibration dataset, allowing us to predict EMA given TLFB and baseline factors. We apply this model to multiply impute precise cigarette counts for a randomized, placebo-controlled trial of bupropion with only TLFB measurements available. To account for repeated measurements on a subject, we induce correlation in the imputed counts. Finally, we analyze the imputed data in a longitudinal model that accommodates random subject effects and zero inflation. Both raw and imputed data show a significant drug effect for reducing the odds of non-abstinence and the number of cigarettes smoked among non-abstainers, but the imputed data provide efficiency gains. This method permits the analysis of daily cigarette consumption data previously deemed suspect due to reporting error and is applicable to other self-reported count data sets for which calibration samples are available. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26081923

  3. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods

    PubMed Central

    2012-01-01

    Background Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality. Methods We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. Results JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. Conclusions Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression. PMID:22489953

  4. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough.

    PubMed

    McMahon, George; Ring, Susan M; Davey-Smith, George; Timpson, Nicholas J

    2015-10-15

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case-control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E - 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  5. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough

    PubMed Central

    McMahon, George; Ring, Susan M.; Davey-Smith, George; Timpson, Nicholas J.

    2015-01-01

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case–control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E − 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  6. Stack filters

    NASA Astrophysics Data System (ADS)

    Wendt, P. D.; Coyle, E. J.; Gallagher, N. C., Jr.

    1986-08-01

    A large class of easily implemented nonlinear filters called stack filters are discussed which includes the rank order operators in addition to the compositions of morphological operators. Techniques similar to those used to determine the root signal behavior of median filters are employed to study the convergence properties of the filters, and necessary conditions for a stack filter to preserve monotone regions or edges in signals, and the output distribution of the filters, are obtained. Among the stack filters of window width three are found asymmetric median filters in which one removes only positive going edges, the other removes only negative going edges, while the median filter removes impulses of both signs.

  7. Association analysis of candidate SNPs on reproductive traits in swine

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Being able to identify young females with superior reproduction traits would have a large financial impact on commercial swine producers. Previous studies have discovered SNPs associated with economically important traits such as litter size, growth rate, fat deposition, and feed intake. The objecti...

  8. Intra- and interpopulation genotype reconstruction from tagging SNPs

    PubMed Central

    Paschou, Peristera; Mahoney, Michael W.; Javed, Asif; Kidd, Judith R.; Pakstis, Andrew J.; Gu, Sheng; Kidd, Kenneth K.; Drineas, Petros

    2007-01-01

    The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for ∼2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of “untyped” genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings. PMID:17151345

  9. Quality assessment parameters for EST-derived SNPs from catfish

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two factors were found to be most significant for validation of EST-derived SNPs: the contig size and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contig sizes were equal to or larger than...

  10. Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation.

    PubMed

    Jackson, Dan; White, Ian R; Seaman, Shaun; Evans, Hannah; Baisley, Kathy; Carpenter, James

    2014-11-30

    The Cox proportional hazards model is frequently used in medical statistics. The standard methods for fitting this model rely on the assumption of independent censoring. Although this is sometimes plausible, we often wish to explore how robust our inferences are as this untestable assumption is relaxed. We describe how this can be carried out in a way that makes the assumptions accessible to all those involved in a research project. Estimation proceeds via multiple imputation, where censored failure times are imputed under user-specified departures from independent censoring. A novel aspect of our method is the use of bootstrapping to generate proper imputations from the Cox model. We illustrate our approach using data from an HIV-prevention trial and discuss how it can be readily adapted and applied in other settings. PMID:25060703

  11. Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation

    PubMed Central

    Jackson, Dan; White, Ian R; Seaman, Shaun; Evans, Hannah; Baisley, Kathy; Carpenter, James

    2014-01-01

    The Cox proportional hazards model is frequently used in medical statistics. The standard methods for fitting this model rely on the assumption of independent censoring. Although this is sometimes plausible, we often wish to explore how robust our inferences are as this untestable assumption is relaxed. We describe how this can be carried out in a way that makes the assumptions accessible to all those involved in a research project. Estimation proceeds via multiple imputation, where censored failure times are imputed under user-specified departures from independent censoring. A novel aspect of our method is the use of bootstrapping to generate proper imputations from the Cox model. We illustrate our approach using data from an HIV-prevention trial and discuss how it can be readily adapted and applied in other settings. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:25060703

  12. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation

    PubMed Central

    Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.

    2015-01-01

    This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level. PMID:26377959

  13. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers.

    PubMed

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  14. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    PubMed Central

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  15. A comparison of imputation strategies in cluster randomized trials with missing binary outcomes.

    PubMed

    Caille, Agnès; Leyrat, Clémence; Giraudeau, Bruno

    2014-04-01

    In cluster randomized trials, clusters of subjects are randomized rather than subjects themselves, and missing outcomes are a concern as in individual randomized trials. We assessed strategies for handling missing data when analysing cluster randomized trials with a binary outcome; strategies included complete case, adjusted complete case, and simple and multiple imputation approaches. We performed a simulation study to assess bias and coverage rate of the population-averaged intervention-effect estimate. Both multiple imputation with a random-effects logistic regression model or classical logistic regression provided unbiased estimates of the intervention effect. Both strategies also showed good coverage properties, even slightly better for multiple imputation with a random-effects logistic regression approach. Finally, this latter approach led to a slightly negatively biased intracluster correlation coefficient estimate but less than that with a classical logistic regression model strategy. We applied these strategies to a real trial randomizing households and comparing ivermectin and malathion to treat head lice. PMID:24713160

  16. Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.

    PubMed

    Erler, Nicole S; Rizopoulos, Dimitris; Rosmalen, Joost van; Jaddoe, Vincent W V; Franco, Oscar H; Lesaffre, Emmanuel M E H

    2016-07-30

    Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27042954

  17. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  18. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  19. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  20. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  1. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  2. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  3. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  4. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  5. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  6. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  7. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments

    PubMed Central

    2010-01-01

    Background Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human. Results We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations. Conclusions More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A

  8. Imputation of Truncated p-Values For Meta-Analysis Methods and Its Genomic Application1

    PubMed Central

    Tang, Shaowu; Ding, Ying; Sibille, Etienne; Mogil, Jeffrey; Lariviere, William R.; Tseng, George C.

    2014-01-01

    Microarray analysis to monitor expression activities in thousands of genes simultaneously has become routine in biomedical research during the past decade. a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that aggregate transformed p-value evidence have been widely used in genomic settings, among which Fisher's and Stouffer's methods are the most popular ones. In practice, raw data and p-values of DE evidence are often not available in genomic studies that are to be combined. Instead, only the detected DE gene lists under a certain p-value threshold (e.g., DE genes with p-value < 0.001) are reported in journal publications. The truncated p-value information makes the aforementioned meta-analysis methods inapplicable and researchers are forced to apply a less efficient vote counting method or naïvely drop the studies with incomplete information. The purpose of this paper is to develop effective meta-analysis methods for such situations with partially censored p-values. We developed and compared three imputation methods—mean imputation, single random imputation and multiple imputation—for a general class of evidence aggregation methods of which Fisher's and Stouffer's methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis frameworks were established. Simulations were performed to investigate the type Ierror, power and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were applied to several genomic applications in colorectal cancer, pain and liquid association analysis of major depressive disorder (MDD). The results showed that imputation methods outperformed existing naïve approaches. Mean imputation and

  9. A suggested approach for imputation of missing dietary data for young children in daycare

    PubMed Central

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. Results The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children. PMID:26689313

  10. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which

  11. Short communication: Imputation performances of 3 low-density marker panels in beef and dairy cattle.

    PubMed

    Dassonneville, R; Fritz, S; Ducrocq, V; Boichard, D

    2012-07-01

    Low-density chips are appealing alternative tools contributing to the reduction of genotyping costs. Imputation enables researchers to predict missing genotypes to recreate the denser coverage of the standard 50K (∼50,000) genotype. Two alternative in silico chips were defined in this study that included markers selected to optimize minor allele frequency and spacing. The objective of this study was to compare the imputation accuracy of these custom low-density chips with a commercially available 3K chip. Data consisted of genotypes of 4,037 Holstein bulls, 1,219 Montbéliarde bulls, and 991 Blonde d'Aquitaine bulls. Criteria to select markers to include in low-density marker panels are described. To mimic a low-density genotype, all markers except the markers present on the low-density panel were masked in the validation population. Imputation was performed using the Beagle software. Combining the directed acyclic graph obtained with Beagle with the PHASEBOOK algorithm provides fast and accurate imputation that is suitable for routine genomic evaluations based on imputed genotypes. Overall, 95 to 99% of alleles were correctly imputed depending on the breed and the low-density chip used. The alternative low-density chips gave better results than the commercially available 3K chip. A low-density chip with 6,000 markers is a valuable genotyping tool suitable for both dairy and beef breeds. Such a tool could be used for preselection of young animals or large-scale screening of the female population. PMID:22720970

  12. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and postimputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  13. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    NASA Astrophysics Data System (ADS)

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  14. From SNPs to Genes: Disease Association at the Gene Level

    PubMed Central

    Lehne, Benjamin; Lewis, Cathryn M.; Schlitt, Thomas

    2011-01-01

    Interpreting Genome-Wide Association Studies (GWAS) at a gene level is an important step towards understanding the molecular processes that lead to disease. In order to incorporate prior biological knowledge such as pathways and protein interactions in the analysis of GWAS data it is necessary to derive one measure of association for each gene. We compare three different methods to obtain gene-wide test statistics from Single Nucleotide Polymorphism (SNP) based association data: choosing the test statistic from the most significant SNP; the mean test statistics of all SNPs; and the mean of the top quartile of all test statistics. We demonstrate that the gene-wide test statistics can be controlled for the number of SNPs within each gene and show that all three methods perform considerably better than expected by chance at identifying genes with confirmed associations. By applying each method to GWAS data for Crohn's Disease and Type 1 Diabetes we identified new potential disease genes. PMID:21738570

  15. SNP-VISTA: An Interactive SNPs Visualization Tool

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.

  16. Effects of reduced panel, reference origin, and genetic relationship on imputation of genotypes in Hereford cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to investigate alternative methods for designing and utilizing reduced single nucleotide polymorphism (SNP) panels for imputing SNP genotypes. Two purebred Hereford populations, an experimental population known as Line 1 Hereford (L1, N=240) and registered Hereford wi...

  17. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 5 Administrative Personnel 2 2011-01-01 2011-01-01 false May the OPM impute conduct of one person to another? 919.630 Section 919.630 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT (CONTINUED) CIVIL SERVICE REGULATIONS (CONTINUED) GOVERNMENTWIDE DEBARMENT AND SUSPENSION...

  18. Evaluation of an Imputed Pitch Velocity Model of the Auditory Kappa Effect

    ERIC Educational Resources Information Center

    Henry, Molly J.; McAuley, J. Devin

    2009-01-01

    Three experiments evaluated an imputed pitch velocity model of the auditory kappa effect. Listeners heard 3-tone sequences and judged the timing of the middle (target) tone relative to the timing of the 1st and 3rd (bounding) tones. Experiment 1 held pitch constant but varied the time (T) interval between bounding tones (T = 728, 1,000, or 1,600…

  19. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their…

  20. Imputation of missing genotypes from sparse to high density using long-range phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals share potentially long chromosome segments that trace to a common ancestor. A phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations was developed to phase large sections of a chromosome. In addition to phasing, ChromoPhase imputes missing genotyp...

  1. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

  2. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    ERIC Educational Resources Information Center

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a…

  3. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  4. The Effects of Methods of Imputation for Missing Values on the Validity and Reliability of Scales

    ERIC Educational Resources Information Center

    Cokluk, Omay; Kayri, Murat

    2011-01-01

    The main aim of this study is the comparative examination of the factor structures, corrected item-total correlations, and Cronbach-alpha internal consistency coefficients obtained by different methods used in imputation for missing values in conditions of not having missing values, and having missing values of different rates in terms of testing…

  5. S-PRIME/TI-SNPS Conceptual Design Summary

    NASA Astrophysics Data System (ADS)

    Mills, Joseph C.; Determan, William R.; van Hagan, Tom H.

    1994-07-01

    A conceptual design for a 40-kWe thermionic space nuclear power system (TI-SNPS) known as the S-PRIME system is being developed by Rockwell and its subcontractors for the U.S. Department of Energy (DOE), United States Air Force (USAF), and Ballistic Missile Defense Organization (BMDO) under the TI-SNPS Program. Phase 1 of this program includes the development of a conceptual design of a 5- to 40-kWe range TI-SNPS and validation of key technologies supporting the design. All key technologies for the S-PRIME design have been identified along with six critical component demonstrations, which will be used to validate the S-PREME design features. Phase 1 is scheduled for completion in September 1994 culminating in a conceptual design review. Phase 2 of the contract, which is an option, provides for the development of a system preliminary design and demonstration of technology readiness with a preliminary design review (PDR) scheduled for September 1995.

  6. Joint Effect of Multiple Common SNPs Predicts Melanoma Susceptibility

    PubMed Central

    Fang, Shenying; Han, Jiali; Zhang, Mingfeng; Wang, Li-e; Wei, Qingyi; Amos, Christopher I.; Lee, Jeffrey E.

    2013-01-01

    Single genetic variants discovered so far have been only weakly associated with melanoma. This study aims to use multiple single nucleotide polymorphisms (SNPs) jointly to obtain a larger genetic effect and to improve the predictive value of a conventional phenotypic model. We analyzed 11 SNPs that were associated with melanoma risk in previous studies and were genotyped in MD Anderson Cancer Center (MDACC) and Harvard Medical School investigations. Participants with ≥15 risk alleles were 5-fold more likely to have melanoma compared to those carrying ≤6. Compared to a model using the most significant single variant rs12913832, the increase in predictive value for the model using a polygenic risk score (PRS) comprised of 11 SNPs was 0.07(95% CI, 0.05-0.07). The overall predictive value of the PRS together with conventional phenotypic factors in the MDACC population was 0.69 (95% CI, 0.64-0.69). PRS significantly improved the risk prediction and reclassification in melanoma as compared with the conventional model. Our study suggests that a polygenic profile can improve the predictive value of an individual gene polymorphism and may be able to significantly improve the predictive value beyond conventional phenotypic melanoma risk factors. PMID:24392023

  7. Disk filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  8. Disk filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  9. Imputation methods for temporal radiographic texture analysis in the detection of periprosthetic osteolysis

    NASA Astrophysics Data System (ADS)

    Wilkie, Joel R.; Giger, Maryellen L.; Pesce, Lorenzo L.; Engh, Charles A., Sr.; Hopper, Robert H., Jr.; Martell, John M.

    2007-03-01

    Periprosthetic osteolysis is a disease triggered by the body's response to tiny wear fragments from total hip replacements (THR), which leads to localized bone loss and disappearance of the trabecular bone texture. We have been investigating methods of temporal radiographic texture analysis (tRTA) to help detect periprosthetic osteolysis. One method involves merging feature measurements at multiple time points using an LDA or BANN. The major drawback of this method is that several cases do not meet the inclusion criteria because of missing data, i.e., missing image data at the necessary time intervals. In this research, we investigated imputation methods to fill in missing data points using feature averaging, linear interpolation, and first and second order polynomial fitting. The database consisted of 101 THR cases with full data available from four follow-up intervals. For 200 iterations, missing data were randomly created to simulate a typical THR database, and the missing points were then filled in using the imputation methods. ROC analysis was used to assess the performance of tRTA in distinguishing between osteolysis and normal cases for the full database and each simulated database. The calculated values from the 200 iterations showed that the imputation methods produced negligible bias, and substantially decreased the variance of the AUC estimator, relative to excluding incomplete cases. The best performing imputation methods were those that heavily weighted the data points closest to the missing data. The results suggest that these imputation methods appear to be acceptable means to include cases with missing data for tRTA.

  10. Comparison of SNPs and microsatellites in identifying offtypes of cacao clones from Cameroon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single Nucleotide Polymorphism (SNP) markers are increasingly being used in crop breeding programs, slowly replacing microsatellites and other markers. SNPs provide many benefits over microsatellites, including ease of analysis and unambiguous results across various platforms. We compare SNPs to m...

  11. Constructing bootstrap confidence intervals for principal component loadings in the presence of missing data: a multiple-imputation approach.

    PubMed

    van Ginkel, Joost R; Kiers, Henk A L

    2011-11-01

    Earlier research has shown that bootstrap confidence intervals from principal component loadings give a good coverage of the population loadings. However, this only applies to complete data. When data are incomplete, missing data have to be handled before analysing the data. Multiple imputation may be used for this purpose. The question is how bootstrap confidence intervals for principal component loadings should be corrected for multiply imputed data. In this paper, several solutions are proposed. Simulations show that the proposed corrections for multiply imputed data give a good coverage of the population loadings in various situations. PMID:21973098

  12. In-Silico Computing of the Most Deleterious nsSNPs in HBA1 Gene

    PubMed Central

    AbdulAzeez, Sayed; Borgio, J. Francis

    2016-01-01

    Background α-Thalassemia (α-thal) is a genetic disorder caused by the substitution of single amino acid or large deletions in the HBA1 and/or HBA2 genes. Method Using modern bioinformatics tools as a systematic in-silico approach to predict the deleterious SNPs in the HBA1 gene and its significant pathogenic impact on the functions and structure of HBA1 protein was predicted. Results and Discussion A total of 389 SNPs in HBA1 were retrieved from dbSNP database, which includes: 201 non-coding synonymous (nsSNPs), 43 human active SNPs, 16 intronic SNPs, 11 mRNA 3′ UTR SNPs, 9 coding synonymous SNPs, 9 5′ UTR SNPs and other types. Structural homology-based method (PolyPhen) and sequence homology-based tool (SIFT), SNPs&Go, PROVEAN and PANTHER revealed that 2.4% of the nsSNPs are pathogenic. Conclusions A total of 5 nsSNPs (G60V, K17M, K17T, L92F and W15R) were predicted to be responsible for the structural and functional modifications of HBA1 protein. It is evident from the deep comprehensive in-silico analysis that, two nsSNPs such as G60Vand W15R in HBA1 are highly deleterious. These “2 pathogenic nsSNPs” can be considered for wet-lab confirmatory analysis. PMID:26824843

  13. SNPs Selection using Gravitational Search Algorithm and Exhaustive Search for Association Mapping

    NASA Astrophysics Data System (ADS)

    Kusuma, W. A.; Hasibuan, L. S.; Istiadi, M. A.

    2016-01-01

    Single Nucleotide Polymorphisms (SNPs) are known having association to phenotipic variations. The study of linking SNPs to interest phenotype is refer to Association Mapping (AM), which is classified as a combinatorial problem. Exhaustive Search (ES) approach is able to be implemented to select targeted SNPs exactly since it evaluate all possible combinations of SNPs, but it is not efficient in terms of computer resources and computation time. Heuristic Search (HS) approach is an alternative to improve the performance of ES in those terms, but it still suffers high false positive SNPs in each combinations. Gravitational Search Algorithm (GSA) is a new HS algorithm that yields better performance than other nature inspired HS. This paper proposed a new method which combined GSA and ES to identify the most appropriate combination of SNPs linked to interest phenotype. Testing was conducted using dataset without epistasis and dataset with epistasis. Using dataset without epistasis with 7 targeted SNPs, the proposed method identified 7 SNPs - 6 True Positive (TP) SNPs and 1 False Positive (FP) SNP- with association value of 0.83. In addition, the proposed method could identified 3 SNPs- 2 TP SNP and 1 FP SNP with association value of 0.87 by using dataset with epistases and 5 targeted SNPs. The results showed that the method is robust in reducing redundant SNPs and identifying main markers.

  14. Water Filters

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The Aquaspace H2OME Guardian Water Filter, available through Western Water International, Inc., reduces lead in water supplies. The filter is mounted on the faucet and the filter cartridge is placed in the "dead space" between sink and wall. This filter is one of several new filtration devices using the Aquaspace compound filter media, which combines company developed and NASA technology. Aquaspace filters are used in industrial, commercial, residential, and recreational environments as well as by developing nations where water is highly contaminated.

  15. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions.

    PubMed

    Han, Ying; Hazelett, Dennis J; Wiklund, Fredrik; Schumacher, Fredrick R; Stram, Daniel O; Berndt, Sonja I; Wang, Zhaoming; Rand, Kristin A; Hoover, Robert N; Machiela, Mitchell J; Yeager, Merideth; Burdette, Laurie; Chung, Charles C; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C; Key, Timothy J; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L; Kolb, Suzanne; Gapstur, Susan M; Diver, W Ryan; Stevens, Victoria L; Strom, Sara S; Pettaway, Curtis A; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A; Yeboah, Edward D; Tettey, Yao; Biritwum, Richard B; Adjei, Andrew A; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P; Isaacs, William B; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M; Ingles, Sue A; Kittles, Rick A; Murphy, Adam B; Blot, William J; Signorello, Lisa B; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M Cristina; Wu, Suh-Yuh; Hennis, Anselm J M; Rybicki, Benjamin A; Neslund-Dudas, Christine; Hsing, Ann W; Chu, Lisa; Goodman, Phyllis J; Klein, Eric A; Zheng, S Lilly; Witte, John S; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L; Hunter, David J; Gronberg, Henrik; Cook, Michael B; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J; Easton, Douglas F; Henderson, Brian E; Coetzee, Gerhard A; Conti, David V; Haiman, Christopher A

    2015-10-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 × 10(-4)-5.6 × 10(-3)) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 × 10(-6)) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation. PMID:26162851

  16. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    PubMed Central

    2013-01-01

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while

  17. SNPs Array Karyotyping in Non-Hodgkin Lymphoma

    PubMed Central

    Etebari, Maryam; Navari, Mohsen; Piccaluga, Pier Paolo

    2015-01-01

    The traditional methods for detection of chromosomal aberrations, which included cytogenetic or gene candidate solutions, suffered from low sensitivity or the need for previous knowledge of the target regions of the genome. With the advent of single nucleotide polymorphism (SNP) arrays, genome screening at global level in order to find chromosomal aberrations like copy number variants, DNA amplifications, deletions, and also loss of heterozygosity became feasible. In this review, we present an update of the knowledge, gained by SNPs arrays, of the genomic complexity of the most important subtypes of non-Hodgkin lymphomas.

  18. Molecular Beacon CNT-based Detection of SNPs

    NASA Astrophysics Data System (ADS)

    Egorova, V. P.; Krylova, H. V.; Lipnevich, I. V.; Veligura, A. A.; Shulitsky, B. G.; Y Fedotenkova, L.

    2015-11-01

    An fluorescence quenching effect due to few-walled carbon nanotubes chemically modified by carboxyl groups has been utilized to discriminate Single Nucleotide Polymorphism (SNP). It was shown that the complex obtained from these nanotube and singlestranded primer DNA is formed due to stacking interactions between the hexagons of the nanotubes and aromatic rings of nucleotide bases as well as due to establishing of hydrogen bonds between acceptor amine groups of nucleotide bases and donor carboxyl groups of the nanotubes. It has been demonstrated that these complexes may be used to make highly effective DNA biosensors detecting SNPs which operate as molecular beacons.

  19. Biological Filters.

    ERIC Educational Resources Information Center

    Klemetson, S. L.

    1978-01-01

    Presents the 1978 literature review of wastewater treatment. The review is concerned with biological filters, and it covers: (1) trickling filters; (2) rotating biological contractors; and (3) miscellaneous reactors. A list of 14 references is also presented. (HM)

  20. Metallic Filters

    NASA Technical Reports Server (NTRS)

    1985-01-01

    Filtration technology originated in a mid 1960's NASA study. The results were distributed to the filter industry, an HR Textron responded, using the study as a departure for the development of 421 Filter Media. The HR system is composed of ultrafine steel fibers metallurgically bonded and compressed so that the pore structure is locked in place. The filters are used to filter polyesters, plastics, to remove hydrocarbon streams, etc. Several major companies use the product in chemical applications, pollution control, etc.

  1. Water Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    A compact, lightweight electrolytic water filter generates silver ions in concentrations of 50 to 100 parts per billion in the water flow system. Silver ions serve as effective bactericide/deodorizers. Ray Ward requested and received from NASA a technical information package on the Shuttle filter, and used it as basis for his own initial development, a home use filter.

  2. FILTER TREATMENT

    DOEpatents

    Sutton, J.B.; Torrey, J.V.P.

    1958-08-26

    A process is described for reconditioning fused alumina filters which have become clogged by the accretion of bismuth phosphate in the filter pores, The method consists in contacting such filters with faming sulfuric acid, and maintaining such contact for a substantial period of time.

  3. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    PubMed Central

    Jattawa, Danai; Elzo, Mauricio A.; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-01-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  4. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population.

    PubMed

    Jattawa, Danai; Elzo, Mauricio A; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-04-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  5. Tetra-allelic SNPs: Informative forensic markers compiled from public whole-genome sequence data.

    PubMed

    Phillips, C; Amigo, J; Carracedo, Á; Lareu, M V

    2015-11-01

    Multiple-allele single nucleotide polymorphisms (SNPs) are potentially useful for forensic DNA analysis as they can provide more discrimination power than normal binary SNPs. In addition, the presence in a profile of more than two alleles per marker provides a clearer indication of mixed DNA than assessments of imbalanced signals in the peak pairs of binary SNPs. Using the 1000 Genomes Phase III human variant data release of 2014 as the starting point, this study collated 961 tetra-allelic SNPs that pass minimum sequence quality thresholds and where four separate nucleotide substitution alleles were detected. Although most of these loci had three of the four alleles in combined frequencies of 2% or less, 160 had high heterozygosities with 50 exceeding those of 'ideal' 0.5:0.5 binary SNPs. From this set of most polymorphic tetra-allelic SNPs, we identified markers most informative for forensic purposes and explored these loci in detail. Subsets of the most polymorphic tetra-allelic SNPs will make useful additions to current panels of forensic identification SNPs and ancestry-informative SNPs. The 24 most discriminatory tetra-allelic SNPs were estimated to detect more than two alleles in at least one marker per profile in 99.9% of mixtures of African contributors. In European contributor mixtures 99.4% of profiles would show multiple allele patterns, but this drops to 92.6% of East Asian contributor mixtures due to reduced levels of polymorphism for the 24 SNPs in this population group. PMID:26209763

  6. Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'.

    PubMed

    Deelen, Patrick; Menelaou, Androniki; van Leeuwen, Elisabeth M; Kanterakis, Alexandros; van Dijk, Freerk; Medina-Gomez, Carolina; Francioli, Laurent C; Hottenga, Jouke Jan; Karssen, Lennart C; Estrada, Karol; Kreiner-Møller, Eskil; Rivadeneira, Fernando; van Setten, Jessica; Gutierrez-Achury, Javier; Westra, Harm-Jan; Franke, Lude; van Enckevort, David; Dijkstra, Martijn; Byelas, Heorhiy; van Duijn, Cornelia M; de Bakker, Paul I W; Wijmenga, Cisca; Swertz, Morris A

    2014-11-01

    Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results. PMID:24896149

  7. Bayesian multiple imputation for missing multivariate longitudinal data from a Parkinson’s disease clinical trial

    PubMed Central

    Luo, Sheng; Lawson, Andrew B; He, Bo; Elm, Jordan J; Tilley, Barbara C

    2013-01-01

    In Parkinson’s disease (PD) clinical trials, Parkinson’s disease is studied using multiple outcomes of various types (e.g. binary, ordinal, continuous) collected repeatedly over time. The overall treatment effects across all outcomes can be evaluated based on a global test statistic. However, missing data occur in outcomes for many reasons, e.g. dropout, death, etc., and need to be imputed in order to conduct an intent-to-treat analysis. We propose a Bayesian method based on item response theory to perform multiple imputation while accounting for multiple sources of correlation. Sensitivity analysis is performed under various scenarios. Our simulation results indicate that the proposed method outperforms standard methods such as last observation carried forward and separate random effects model for each outcome. Our method is motivated by and applied to a Parkinson’s disease clinical trial. The proposed method can be broadly applied to longitudinal studies with multiple outcomes subject to missingness. PMID:23242384

  8. Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys

    PubMed Central

    Rendall, Michael S.; Ghosh-Dastidar, Bonnie; Weden, Margaret M.; Baker, Elizabeth H.; Nazarov, Zafar

    2013-01-01

    Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys. PMID:24223447

  9. Bayesian multiple imputation for missing multivariate longitudinal data from a Parkinson's disease clinical trial.

    PubMed

    Luo, Sheng; Lawson, Andrew B; He, Bo; Elm, Jordan J; Tilley, Barbara C

    2016-04-01

    In Parkinson's disease (PD) clinical trials, Parkinson's disease is studied using multiple outcomes of various types (e.g. binary, ordinal, continuous) collected repeatedly over time. The overall treatment effects across all outcomes can be evaluated based on a global test statistic. However, missing data occur in outcomes for many reasons, e.g. dropout, death, etc., and need to be imputed in order to conduct an intent-to-treat analysis. We propose a Bayesian method based on item response theory to perform multiple imputation while accounting for multiple sources of correlation. Sensitivity analysis is performed under various scenarios. Our simulation results indicate that the proposed method outperforms standard methods such as last observation carried forward and separate random effects model for each outcome. Our method is motivated by and applied to a Parkinson's disease clinical trial. The proposed method can be broadly applied to longitudinal studies with multiple outcomes subject to missingness. PMID:23242384

  10. Normalization and missing value imputation for label-free LC-MS analysis

    SciTech Connect

    Karpievitch, Yuliya; Dabney, Alan R.; Smith, Richard D.

    2012-11-05

    Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

  11. Imputing historical statistics, soils information, and other land-use data to crop area

    NASA Technical Reports Server (NTRS)

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  12. Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study

    PubMed Central

    Liu, Yang; De, Anindya

    2016-01-01

    Missing data commonly occur in large epidemiologic studies. Ignoring incompleteness or handling the data inappropriately may bias study results, reduce power and efficiency, and alter important risk/benefit relationships. Standard ways of dealing with missing values, such as complete case analysis (CCA), are generally inappropriate due to the loss of precision and risk of bias. Multiple imputation by fully conditional specification (FCS MI) is a powerful and statistically valid method for creating imputations in large data sets which include both categorical and continuous variables. It specifies the multivariate imputation model on a variable-by-variable basis and offers a principled yet flexible method of addressing missing data, which is particularly useful for large data sets with complex data structures. However, FCS MI is still rarely used in epidemiology, and few practical resources exist to guide researchers in the implementation of this technique. We demonstrate the application of FCS MI in support of a large epidemiologic study evaluating national blood utilization patterns in a sub-Saharan African country. A number of practical tips and guidelines for implementing FCS MI based on this experience are described.

  13. Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model

    PubMed Central

    Wu, Chuanli; Gao, Yuexia; Hua, Tianqi; Xu, Chenwu

    2016-01-01

    Background It is challenging to deal with mixture models when missing values occur in clustering datasets. Methods and Results We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a “pseudo-complete” dataset. Parameters from different clusters and missing values are estimated according to the maximum likelihood implemented with an expectation-maximization algorithm, and multivariate individuals are clustered with Bayesian posterior probability. A simulation showed that our proposed method has a fast convergence speed and it accurately estimates missing values. Our proposed algorithm was further validated with Fisher’s Iris dataset, the Yeast Cell-cycle Gene-expression dataset, and the CIFAR-10 images dataset. The results indicate that our algorithm offers highly accurate clustering, comparable to that using a complete dataset without missing values. Furthermore, our algorithm resulted in a lower misjudgment rate than both clustering algorithms with missing data deleted and with missing-value imputation by mean replacement. Conclusion We demonstrate that our missing-value imputation clustering algorithm is feasible and superior to both of these other clustering algorithms in certain situations. PMID:27552203

  14. Missing data imputation of solar radiation data under different atmospheric conditions.

    PubMed

    Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  15. A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers.

    PubMed

    Turrado, Concepción Crespo; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés-José; Melero, Manuel G; de Cos Juez, Francisco Javier

    2016-01-01

    The storage of data is a key process in the study of electrical power networks related to the search for harmonics and the finding of a lack of balance among phases. The presence of missing data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, current in each phase and power factor) affects any time series study in a negative way that has to be addressed. When this occurs, missing data imputation algorithms are required. These algorithms are able to substitute the data that are missing for estimated values. This research presents a new algorithm for the missing data imputation method based on Self-Organized Maps Neural Networks and Mahalanobis distances and compares it not only with a well-known technique called Multivariate Imputation by Chained Equations (MICE) but also with an algorithm previously proposed by the authors called Adaptive Assignation Algorithm (AAA). The results obtained demonstrate how the proposed method outperforms both algorithms. PMID:27626419

  16. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    PubMed Central

    Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  17. A multiple imputation approach for MNAR mechanisms compatible with Heckman's model.

    PubMed

    Galimard, Jacques-Emmanuel; Chevret, Sylvie; Protopopescu, Camelia; Resche-Rigon, Matthieu

    2016-07-30

    Standard implementations of multiple imputation (MI) approaches provide unbiased inferences based on an assumption of underlying missing at random (MAR) mechanisms. However, in the presence of missing data generated by missing not at random (MNAR) mechanisms, MI is not satisfactory. Originating in an econometric statistical context, Heckman's model, also called the sample selection method, deals with selected samples using two joined linear equations, termed the selection equation and the outcome equation. It has been successfully applied to MNAR outcomes. Nevertheless, such a method only addresses missing outcomes, and this is a strong limitation in clinical epidemiology settings, where covariates are also often missing. We propose to extend the validity of MI to some MNAR mechanisms through the use of the Heckman's model as imputation model and a two-step estimation process. This approach will provide a solution that can be used in an MI by chained equation framework to impute missing (either outcomes or covariates) data resulting either from a MAR or an MNAR mechanism when the MNAR mechanism is compatible with a Heckman's model. The approach is illustrated on a real dataset from a randomised trial in patients with seasonal influenza. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26893215

  18. Imputation of missing covariate values in epigenome-wide analysis of DNA methylation data

    PubMed Central

    Wu, Chong; Demerath, Ellen W.; Pankow, James S.; Bressler, Jan; Fornage, Myriam; Grove, Megan L.; Chen, Wei; Guan, Weihua

    2016-01-01

    ABSTRACT DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed. PMID:26890800

  19. MirSNP, a database of polymorphisms altering miRNA target sites, identifies miRNA-related SNPs in GWAS SNPs and eQTLs

    PubMed Central

    2012-01-01

    Background Numerous single nucleotide polymorphisms (SNPs) associated with complex diseases have been identified by genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) studies. However, few of these SNPs have explicit biological functions. Recent studies indicated that the SNPs within the 3’UTR regions of susceptibility genes could affect complex traits/diseases by affecting the function of miRNAs. These 3’UTR SNPs are functional candidates and therefore of interest to GWAS and eQTL researchers. Description We developed a publicly available online database, MirSNP (http://cmbi.bjmu.edu.cn/mirsnp), which is a collection of human SNPs in predicted miRNA-mRNA binding sites. We identified 414,510 SNPs that might affect miRNA-mRNA binding. Annotations were added to these SNPs to predict whether a SNP within the target site would decrease/break or enhance/create an miRNA-mRNA binding site. By applying MirSNP database to three brain eQTL data sets, we identified four unreported SNPs (rs3087822, rs13042, rs1058381, and rs1058398), which might affect miRNA binding and thus affect the expression of their host genes in the brain. We also applied the MirSNP database to our GWAS for schizophrenia: seven predicted miRNA-related SNPs (p < 0.0001) were found in the schizophrenia GWAS. Our findings identified the possible functions of these SNP loci, and provide the basis for subsequent functional research. Conclusion MirSNP could identify the putative miRNA-related SNPs from GWAS and eQTLs researches and provide the direction for subsequent functional researches. PMID:23173617

  20. Lazy collaborative filtering for data sets with missing values.

    PubMed

    Ren, Yongli; Li, Gang; Zhang, Jun; Zhou, Wanlei

    2013-12-01

    As one of the biggest challenges in research on recommender systems, the data sparsity issue is mainly caused by the fact that users tend to rate a small proportion of items from the huge number of available items. This issue becomes even more problematic for the neighborhood-based collaborative filtering (CF) methods, as there are even lower numbers of ratings available in the neighborhood of the query item. In this paper, we aim to address the data sparsity issue in the context of neighborhood-based CF. For a given query (user, item), a set of key ratings is first identified by taking the historical information of both the user and the item into account. Then, an auto-adaptive imputation (AutAI) method is proposed to impute the missing values in the set of key ratings. We present a theoretical analysis to show that the proposed imputation method effectively improves the performance of the conventional neighborhood-based CF methods. The experimental results show that our new method of CF with AutAI outperforms six existing recommendation methods in terms of accuracy. PMID:23757575

  1. Detection of Regulatory SNPs in Human Genome Using ChIP-seq ENCODE Data

    PubMed Central

    Matveeva, Marina Yu.; Shilov, Alexander G.; Kashina, Elena V.; Mordvinov, Viatcheslav A.; Merkulova, Tatyana I.

    2013-01-01

    A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project. PMID:24205329

  2. Chemical derivatization of compact disc polycarbonate surfaces for SNPs detection.

    PubMed

    Bañuls, María-José; García-Piñón, Francisco; Puchades, Rosa; Maquieira, Angel

    2008-03-01

    Compact discs have been proposed as an efficient analytical platform, with potential to develop high-throughput affinity assays for genomics, proteomics, clinics, and health monitoring. Chemical derivatization of CD surfaces is one of the keys to developing highly efficient microarraying-based assays on discs. Approaches for mild chemical modification of polycarbonate (PC) disc surface based on nitration, reduction, and chloromethylation reactions have been developed. Derivatized surfaces as amino and thiol are obtained for PC, maintaining unchanged the mechanical and optical properties of the discs. Studies of covalent attachment of oligonucleotide probes (5' Cy5-labeled, 3' NH 2-ended) on the modified surfaces have been performed to develop microarraying assays based on hybridization of cDNA strands and single nucleotide polymorphism discrimination (SNPs). A demonstration of the applicability to the compact disc audio/video technology for its use as analytical system is performed, including the employment of a commercial CD player to read the results on disc. PMID:18254580

  3. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa

    PubMed Central

    Masconi, Katya L.; Matsha, Tandi E.; Erasmus, Rajiv T.; Kengne, Andre P.

    2015-01-01

    Background Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. Methods Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. Results The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. Conclusions Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation. PMID:26406594

  4. Filtering apparatus

    DOEpatents

    Haldipur, Gaurang B.; Dilmore, William J.

    1992-01-01

    A vertical vessel having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas.

  5. Filtering apparatus

    DOEpatents

    Haldipur, G.B.; Dilmore, W.J.

    1992-09-01

    A vertical vessel is described having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas. 18 figs.

  6. The operating regimes and basic control principles of SNPS Topaz''. [Cs

    SciTech Connect

    Makarov, A.N.; Volberg, M.S.; Grayznov, G.M.; Zhabotinsky, E.E.; Serbin, V.I. )

    1991-01-05

    The basic operating regimes of space nuclear power system (SNPS) Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

  7. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

    PubMed Central

    Paschou, Peristera; Ziv, Elad; Burchard, Esteban G; Choudhry, Shweta; Rodriguez-Cintron, William; Mahoney, Michael W; Drineas, Petros

    2007-01-01

    Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population

  8. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    PubMed

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context. PMID:26906401

  9. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    PubMed Central

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L.; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Al Turki, Saeed; Amuzu, Antoinette; Anderson, Carl A.; Anney, Richard; Antony, Dinu; Artigas, María Soler; Ayub, Muhammad; Bala, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Benn, Marianne; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick F.; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Pablo Casas, Juan; Chambers, John C.; Charlton, Ruth; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebahattin; Clapham, Peter; Clement, Gail; Coates, Guy; Cocca, Massimiliano; Collier, David A.; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day, Ian N. M.; Day-Williams, Aaron; Dedoussis, George; Down, Thomas; Du, Yuanping; van Duijn, Cornelia M.; Dunham, Ian; Edkins, Sarah; Ekong, Rosemary; Ellis, Peter; Evans, David M.; Farooqi, I. Sadaf; Fitzpatrick, David R.; Flicek, Paul; Floyd, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Gasparini, Paolo; Gaunt, Tom R.; Geihs, Matthias; Geschwind, Daniel; Greenwood, Celia; Griffin, Heather; Grozeva, Detelina; Guo, Xiaosen; Guo, Xueqin; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey E.; Holmans, Peter; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Iotchkova, Valentina; Isaacs, Aaron; Jackson, David K.; Jamshidi, Yalda; Johnson, Jon; Joyce, Chris; Karczewski, Konrad J.; Kaye, Jane; Keane, Thomas; Kemp, John P.; Kennedy, Karen; Kent, Alastair; Keogh, Julia; Khawaja, Farrah; Kleber, Marcus E.; van Kogelenberg, Margriet; Kolb-Kokocinski, Anja; Kooner, Jaspal S.; Lachance, Genevieve; Langenberg, Claudia; Langford, Cordelia; Lawson, Daniel; Lee, Irene; van Leeuwen, Elisabeth M.; Lek, Monkol; Li, Rui; Li, Yingrui; Liang, Jieqin; Lin, Hong; Liu, Ryan; Lönnqvist, Jouko; Lopes, Luis R.; Lopes, Margarida; Luan, Jian'an; MacArthur, Daniel G.; Mangino, Massimo; Marenne, Gaëlle; März, Winfried; Maslen, John; Matchan, Angela; Mathieson, Iain; McGuffin, Peter; McIntosh, Andrew M.; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Migone, Nicola; Mitchison, Hannah M.; Moayyeri, Alireza; Morris, James; Morris, Richard; Muddyman, Dawn; Muntoni, Francesco; Nordestgaard, Børge G.; Northstone, Kate; O'Donovan, Michael C.; O'Rahilly, Stephen; Onoufriadis, Alexandros; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Payne, Stewart J.; Perry, John R. B.; Pietilainen, Olli; Plagnol, Vincent; Pollitt, Rebecca C.; Povey, Sue; Quail, Michael A.; Quaye, Lydia; Raymond, Lucy; Rehnström, Karola; Ridout, Cheryl K.; Ring, Susan; Ritchie, Graham R. S.; Roberts, Nicola; Robinson, Rachel L.; Savage, David B.; Scambler, Peter; Schiffels, Stephan; Schmidts, Miriam; Schoenmakers, Nadia; Scott, Richard H.; Scott, Robert A.; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shaw, Adam; Shihab, Hashem A.; Shin, So-Youn; Skuse, David; Small, Kerrin S.; Smee, Carol; Smith, George Davey; Southam, Lorraine; Spasic-Boskovic, Olivera; Spector, Timothy D.; St Clair, David; St Pourcain, Beate; Stalker, Jim; Stevens, Elizabeth; Sun, Jianping; Surdulescu, Gabriela; Suvisaari, Jaana; Syrris, Petros; Tachmazidou, Ioanna; Taylor, Rohan; Tian, Jing; Tobin, Martin D.; Toniolo, Daniela; Traglia, Michela; Tybjaerg-Hansen, Anne; Valdes, Ana M.; Vandersteen, Anthony M.; Varbo, Anette; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T. R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Eleanor; Whincup, Peter; Whyte, Tamieka; Williams, Hywel J.; Williamson, Kathleen A.; Wilson, Crispian; Wilson, Scott G.; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zaza, Gianluigi; Zeggini, Eleftheria; Zhang, Feng; Zhang, Pingbo; Zhang, Weihua; Gambaro, Giovanni; Richards, J. Brent; Durbin, Richard; Timpson, Nicholas J.; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants. PMID:26368830

  10. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

    PubMed

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J Brent; Durbin, Richard; Timpson, Nicholas J; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants. PMID:26368830

  11. Imputation of Continuous Tree Suitability over the Continental United States from Sparse Measurements Using Associative Clustering

    NASA Astrophysics Data System (ADS)

    Hargrove, W. W.; Kumar, J.; Hoffman, F. M.; Potter, K. M.; Mills, R. T.

    2012-12-01

    Up-scaling from sparse measurements to a continuous raster of estimated values is a common problem in Earth System Science. We present a new general-purpose empirical imputation method based on associative clustering, which associates sparse measurements of dependent variables with particular multivariate clustered combinations of the independent variables, and then uses several methods to estimate values for unmeasured clusters, based on directional proximity in multidimensional data space, at both the cluster and map cell levels of resolution. We demonstrate this new imputation tool on tree species range distribution maps, which describe the suitable extent and expected growth performance of a particular tree species over a wide area. Range maps having continuous estimates of tree growth performance are more useful than more classical tree range maps that simply show binary occurence suitability. The USDA Forest Service Forest Inventory Assessment (FIA) plots provide information about the occurence and growth performance for various tree species across the US, but such measurements are limited to FIA plots. Using Associative Clustering, we scale up the discontinuous FIA Inventory growth measurements into continuous maps that show the expected growth and suitabilty for individual tree species covering the Continental United States. A multivariate cluster analysis was applied to global output from a General Circulation Model (GCM) consisting of 17 variables downscaled to 4km2 resolution. Present global growing conditions were divided into 30 thousand relatively homogeneous ecoregions describing climatic and topographic conditions. At every mapcell a multi-linear regression was applied in 17 dimensional hyperspace to derive the suitability of a tree species where not measured using the forest inventory data. The continuous species distribution maps obtained were compared and validated against existing tree range suitability maps. Associative Clustering is intended

  12. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

    PubMed Central

    2012-01-01

    Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were

  13. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification

  14. Trend tests in time series with missing values: A case study with imputation

    NASA Astrophysics Data System (ADS)

    Ramos, M. Rosário; Cordeiro, Clara

    2013-10-01

    Testing for trend is an important problem, especially when one is dealing with environmental time series. The tests considered here are the usual t-test and the Mann-Kendall test, a nonparametric version widely used because it requires fewer assumptions. The aim is to assess the performance of two trend tests in time series with autocorrelation after an imputation method is applied to estimate the missing observations. The performance of the trend tests will be illustrated for some well-known data sets existing in R software.

  15. Collaborative development of SNPs for cotton research, introgression, MAS and breeding

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Extensive use of genome-wide analyses requires that molecular markers be highly abundant, informative and, once developed, extremely cost-effective to use, such as single-nucleotide polymorphisms (SNPs). The efforts toward development of cotton SNPs have been few and small-scale. The novel cotton ...

  16. selectSNP – An R package for selecting SNPs optimal for genetic evaluation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    There has been a huge increase in the number of SNPs in the public repositories. This has made it a challenge to design low and medium density SNP panels, which requires careful selection of available SNPs considering many criteria, such as map position, allelic frequency, possible biological functi...

  17. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values.

    PubMed

    García-Laencina, Pedro J; Abreu, Pedro Henriques; Abreu, Miguel Henriques; Afonoso, Noémia

    2015-04-01

    Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment. This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (III) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation. Prediction models for breast cancer survivability are constructed using four different methods: K-Nearest Neighbors, Classification Trees, Logistic Regression and Support Vector Machines. Experiments are performed in a nested ten-fold cross-validation procedure and, according to the obtained results, the best results are provided by the K-Nearest Neighbors algorithm: more than 81% of accuracy and more than 0.78 of area under the Receiver Operator Characteristic curve, which constitutes very good results in this complex scenario. PMID:25725446

  18. Thermal state of SNPS ``Topaz'' units: Calculation basing and experimental confirmation

    NASA Astrophysics Data System (ADS)

    Bogush, Igor P.; Bushinsky, Alexander V.; Galkin, Anatoly Ya.; Serbin, Victor I.; Zhabotinsky, Evgeny E.

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS ``Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS ``Topaz''.

  19. Thermal state of SNPS Topaz'' units: Calculation basing and experimental confirmation

    SciTech Connect

    Bogush, I.P.; Bushinsky, A.V.; Galkin, A.Y.; Serbin, V.I.; Zhabotinsky, E.E. )

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS Topaz''.

  20. Selection of human p75NTR tag SNPs and its biological significance for clinical association studies.

    PubMed

    Wang, Yong-Tang; Lu, Xiu-Min; Shu, Ya-Hai; Xiao, Lan; Chen, Kai-Ting

    2014-01-01

    To select tag single nucleotide polymorphisms (SNPs) within and around human p75 neurotrophin receptor (p75NTR) gene in Chinese Han population, the sequence involving p75NTR gene as well as the upstream and downstream of the gene was identified according to the data from National Center for Biotechnology Information (NCBI) GenBank database, and the SNP genotype data involving 63 SNPs in the regions were obtained from Chinese Han Beijing (CHB) population of HapMap database. Then, Haploview (version 4.2) was used to calculate linkage disequilibrium (LD) statistics for the selected 32 common SNPs with a minor allele frequence (MAF) more than 0.05. Haplotype blocks were constructed throughout the p75NTR gene according to the upper and the lower 95% confidence bound of D' value, and the tag SNPs were selected based on the r2 and LOD values between SNPs as well as the results of bioinformatics analysis. The results indicated that five haplotype blocks were constructed within and around p75NTR gene and 12 tag SNPs including rs2537710, rs603769, rs614455, rs2537706, rs534561, rs2072445, rs2072446, rs7219709, rs734194, rs741071, rs741073 and rs2671641 were selected to represent the other 51 SNPs in p75NTR gene. Therefore, the 12 selected SNPs may act as tag SNPs for the entire p75NTR gene in Chinese Han population, which will provide an effective way to select tag SNPs in a whole gene, and its biological significance is to further guide the clinical association studies between the candidate gene and disease susceptibility. PMID:25227100

  1. RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera).

    PubMed

    Yang, Mei; Xu, Liming; Liu, Yanling; Yang, Pingfang

    2015-01-01

    RNA-Seq is an efficient way to comprehensively identify single nucleotide polymorphisms (SNPs) and alternative splicing (AS) events from the expressed genes. In this study, we conducted transcriptome sequencing of four Asian lotus (Nelumbo nucifera) cultivars using Illumina HiSeq2000 platform to identify SNPs and AS events in lotus. A total of 505 million pair-end RNA-Seq reads were generated from four cultivars, of which 86% were mapped to the lotus reference genome. Using the four sets of data together, a total of 357,689 putative SNPs were identified with an average density of one SNP per 2.2 kb. These SNPs were located in 1,253 scaffolds and 15,016 expressed genes. A/G and C/T were the two major types of SNPs in the Asian lotus transcriptome. In parallel, a total of 177,540 AS events were detected in the four cultivars and were distributed in 64% of the expressed genes of lotus. The predominant type of AS events was alternative 5' first exon, which accounted for 41.2% of all the observed AS events, and exon skipping only accounted for 4.3% of all AS. Gene Ontology analysis was conducted to analyze the function of the genes containing SNPs and AS events. Validation of selected SNPs and AS events revealed that 74% of SNPs and 80% of AS events were reliable, which indicates that RNA-Seq is an efficient approach to uncover gene-associated SNPs and AS events. A large number of SNPs and AS events identified in our study will facilitate further genetic and functional genomics research in lotus. PMID:25928215

  2. Genotyping of 75 SNPs using arrays for individual identification in five population groups.

    PubMed

    Hwa, Hsiao-Lin; Wu, Lawrence Shih Hsin; Lin, Chun-Yen; Huang, Tsun-Ying; Yin, Hsiang-I; Tseng, Li-Hui; Lee, James Chun-I

    2016-01-01

    Single nucleotide polymorphism (SNP) typing offers promise to forensic genetics. Various strategies and panels for analyzing SNP markers for individual identification have been published. However, the best panels with fewer identity SNPs for all major population groups are still under discussion. This study aimed to find more autosomal SNPs with high heterozygosity for individual identification among Asian populations. Ninety-six autosomal SNPs of 502 DNA samples from unrelated individuals of five population groups (208 Taiwanese Han, 83 Filipinos, 62 Thais, 69 Indonesians, and 80 individuals with European, Near Eastern, or South Asian ancestry) were analyzed using arrays in an initial screening, and 75 SNPs (group A, 46 newly selected SNPs; groups B, 29 SNPs based on a previous SNP panel) were selected for further statistical analyses. Some SNPs with high heterozygosity from Asian populations were identified. The combined random match probability of the best 40 and 45 SNPs was between 3.16 × 10(-17) and 7.75 × 10(-17) and between 2.33 × 10(-19) and 7.00 × 10(-19), respectively, in all five populations. These loci offer comparable power to short tandem repeats (STRs) for routine forensic profiling. In this study, we demonstrated the population genetic characteristics and forensic parameters of 75 SNPs with high heterozygosity from five population groups. This SNPs panel can provide valuable genotypic information and can be helpful in forensic casework for individual identification among these populations. PMID:26297200

  3. Identity-by-Descent-Based Phasing and Imputation in Founder Populations Using Graphical Models

    PubMed Central

    Palin, Kimmo; Campbell, Harry; Wright, Alan F; Wilson, James F; Durbin, Richard

    2011-01-01

    Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc.35:853-860, 2011 PMID:22006673

  4. High-throughput SNPs for all: genotyping-in-thousands.

    PubMed

    Pavey, Scott A

    2015-07-01

    Understanding the genetic structure of species is essential for conservation. It is only with this information that managers, academics, user groups and land-use planners can understand the spatial scale of migration and local adaptation, source-sink dynamics and effective population size. Such information is essential for a multitude of applications including delineating management units, balancing management priorities, discovering cryptic species and implementing captive breeding programmes. Species can range from locally adapted by hundreds of metres (Pavey et al. ) to complete species panmixia (Côté et al. ). Even more remarkable is that this essential information can be obtained without fully sequenced or annotated genomes, but from mere (putatively) nonfunctional variants. First with allozymes, then microsatellites and now SNPs, this neutral genetic variation carries a wealth of information about migration and drift. For many of us, it may be somewhat difficult to remember our understanding of species conservation before the widespread usage of these useful tools. However most species on earth have yet to give us that 'peek under the curtain'. With the current diversity on earth estimated to be nearly 9 million species (Mora et al. ), we have a long way to go for a comprehensive meta-phylogeographic understanding. A method presented in this issue by Campbell and colleagues (Campbell et al. ) is a tool that will accelerate the pace in this area. Genotyping-in-thousands (GT-seq) leverages recent advancements in sequencing technology to save many hours and dollars over previous methods to generate this important neutral genetic information. PMID:26095005

  5. Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

    PubMed Central

    2011-01-01

    Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context. PMID:21756330

  6. Gender Imputation

    ERIC Educational Resources Information Center

    National Student Clearinghouse, 2013

    2013-01-01

    In late 2007, the National Student Clearinghouse (NSC) expanded its Enrollment Reporting service to include several additional data elements (commonly referred to as the "A2" or "expanded" data elements). One of these expanded data elements is student gender. Although gender is potentially important to a number of research…

  7. Population Genomic Analyses Based on 1 Million SNPs in Commercial Egg Layers

    PubMed Central

    Gholami, Mahmood; Erbe, Malena; Gärke, Christian; Preisinger, Rudolf; Weigend, Annett; Weigend, Steffen; Simianer, Henner

    2014-01-01

    Identifying signatures of selection can provide valuable insight about the genes or genomic regions that are or have been under selective pressure, which can lead to a better understanding of genotype-phenotype relationships. A common strategy for selection signature detection is to compare samples from several populations and search for genomic regions with outstanding genetic differentiation. Wright's fixation index, FST, is a useful index for evaluation of genetic differentiation between populations. The aim of this study was to detect selective signatures between different chicken groups based on SNP-wise FST calculation. A total of 96 individuals of three commercial layer breeds and 14 non-commercial fancy breeds were genotyped with three different 600K SNP-chips. After filtering a total of 1 million SNPs were available for FST calculation. Averages of FST values were calculated for overlapping windows. Comparisons of these were then conducted between commercial egg layers and non-commercial fancy breeds, as well as between white egg layers and brown egg layers. Comparing non-commercial and commercial breeds resulted in the detection of 630 selective signatures, while 656 selective signatures were detected in the comparison between the commercial egg-layer breeds. Annotation of selection signature regions revealed various genes corresponding to productions traits, for which layer breeds were selected. Among them were NCOA1, SREBF2 and RALGAPA1 associated with reproductive traits, broodiness and egg production. Furthermore, several of the detected genes were associated with growth and carcass traits, including POMC, PRKAB2, SPP1, IGF2, CAPN1, TGFb2 and IGFBP2. Our approach demonstrates that including different populations with a specific breeding history can provide a unique opportunity for a better understanding of farm animal selection. PMID:24739889

  8. Evaluating GWAS-Identified SNPs for Age at Natural Menopause among Chinese Women

    PubMed Central

    Shen, Chong; Delahanty, Ryan J.; Gao, Yu-Tang; Lu, Wei; Xiang, Yong-Bing; Zheng, Ying; Cai, Qiuyin; Zheng, Wei; Shu, Xiao-Ou; Long, Jirong

    2013-01-01

    Background Age at natural menopause (ANM) is a complex trait with high heritability and is associated with several major hormonal-related diseases. Recently, several genome-wide association studies (GWAS), conducted exclusively among women of European ancestry, have discovered dozens of genetic loci influencing ANM. No study has been conducted to evaluate whether these findings can be generalized to Chinese women. Methodology/Principal Findings We evaluated the index single nucleotide polymorphisms (SNPs) in 19 GWAS-identified genetic susceptibility loci for ANM among 3,533 Chinese women who had natural menopause. We also investigated 3 additional SNPs which were in LD with the index SNP in European-ancestry but not in Asian-ancestry populations. Two genetic risk scores (GRS) were calculated to summarize SNPs across multiple loci one for all SNPs tested (GRSall), and one for SNPs which showed association in our study (GRSsel). All 22 SNPs showed the same association direction as previously reported. Eight SNPs were nominally statistically significant with P≤0.05: rs4246511 (RHBDL2), rs12461110 (NLRP11), rs2307449 (POLG), rs12611091 (BRSK1), rs1172822 (BRSK1), rs365132 (UIMC1), rs2720044 (ASH2L), and rs7246479 (TMEM150B). Especially, SNPs rs4246511, rs365132, rs1172822, and rs7246479 remained significant even after Bonferroni correction. Significant associations were observed for GRS. Women in the highest quartile began menopause 0.7 years (P = 3.24×10−9) and 0.9 years (P = 4.61×10−11) later than those in the lowest quartile for GRSsel and GRSall, respectively. Conclusions Among the 22 investigated SNPs, eight showed associations with ANM (P<0.05) in our Chinese population. Results from this study extend some recent GWAS findings to the Asian-ancestry population and may guide future efforts to identify genetic determination of menopause. PMID:23536822

  9. Assessing statistical power of SNPs for population structure and conservation studies.

    PubMed

    Morin, Phillip A; Martien, Karen K; Taylor, Barbara L

    2009-01-01

    Single nucleotide polymorphisms (SNPs) have been proposed by some as the new frontier for population studies, and several papers have presented theoretical and empirical evidence reporting the advantages and limitations of SNPs. As a practical matter, however, it remains unclear how many SNP markers will be required or what the optimal characteristics of those markers should be in order to obtain sufficient statistical power to detect different levels of population differentiation. We use a hypothetical case to illustrate the process of designing a population genetics project, and present results from simulations that address several issues for maximizing statistical power to detect differentiation while minimizing the amount of effort in developing SNPs. Results indicate that (i) while ~30 SNPs should be sufficient to detect moderate (F(ST)  = 0.01) levels of differentiation, studies aimed at detecting demographic independence (e.g. F(ST)  < 0.005) may require 80 or more SNPs and large sample sizes; (ii) different SNP allele frequencies have little affect on power, and thus, selection of SNPs can be relatively unbiased; (iii) increasing the sample size has a strong effect on power, so that the number of loci can be minimized when sample number is known, and increasing sample size is almost always beneficial; and (iv) power is increased by including multiple SNPs within loci and inferring haplotypes, rather than trying to use only unlinked SNPs. This also has the practical benefit of reducing the SNP ascertainment effort, and may influence the decision of whether to seek SNPs in coding or noncoding regions. PMID:21564568

  10. Water Filters

    NASA Technical Reports Server (NTRS)

    1988-01-01

    Seeking to find a more effective method of filtering potable water that was highly contaminated, Mike Pedersen, founder of Western Water International, learned that NASA had conducted extensive research in methods of purifying water on board manned spacecraft. The key is Aquaspace Compound, a proprietary WWI formula that scientifically blends various types of glandular activated charcoal with other active and inert ingredients. Aquaspace systems remove some substances; chlorine, by atomic adsorption, other types of organic chemicals by mechanical filtration and still others by catalytic reaction. Aquaspace filters are finding wide acceptance in industrial, commercial, residential and recreational applications in the U.S. and abroad.

  11. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...

  12. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    PubMed Central

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  13. 7 CFR 3017.630 - May the Department of Agriculture impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false May the Department of Agriculture impute conduct of one person to another? 3017.630 Section 3017.630 Agriculture Regulations of the Department of Agriculture (Continued) OFFICE OF THE CHIEF FINANCIAL OFFICER, DEPARTMENT OF AGRICULTURE...

  14. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1471.630 May the Federal Mediation...

  15. Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

    PubMed Central

    Hu, Yang; Wu, Xiaoliang; Ma, Rui

    2016-01-01

    Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer. PMID:27429976

  16. Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data.

    PubMed

    Zhang, Tianjiao; Hu, Yang; Wu, Xiaoliang; Ma, Rui; Jiang, Qinghua; Wang, Yadong

    2016-01-01

    Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer. PMID:27429976

  17. 1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data.

    PubMed

    Huang, Jie; Ellinghaus, David; Franke, Andre; Howie, Bryan; Li, Yun

    2012-07-01

    We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two 'missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10(-16). The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, 'missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation. PMID:22293688

  18. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    PubMed Central

    Artigas, María Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria; Viñuela, Ana; Völzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10−8) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  19. MULTIPLE IMPUTATION FOR SHARING PRECISE GEOGRAPHIES IN PUBLIC USE DATA1

    PubMed Central

    Wang, Hao; Reiter, Jerome P.

    2013-01-01

    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects’ identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data. PMID:23990852

  20. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation.

    PubMed

    Soler Artigas, María; Wain, Louise V; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R; Grallert, Harald; Hammond, Chris J; Harris, Sarah E; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W; Navarro, Pau; Nickle, David C; Padmanabhan, Sandosh; Raitakari, Olli T; Ried, Janina S; Ripatti, Samuli; Schulz, Holger; Scott, Robert A; Sin, Don D; Starr, John M; Viñuela, Ana; Völzke, Henry; Wild, Sarah H; Wright, Alan F; Zemunik, Tatijana; Jarvis, Deborah L; Spector, Tim D; Evans, David M; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J; Karrasch, Stefan; Probst-Hensch, Nicole M; Heinrich, Joachim; Stubbe, Beate; Wilson, James F; Wareham, Nicholas J; James, Alan L; Morris, Andrew P; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P; Hall, Ian P; Tobin, Martin D

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  1. Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note.

    PubMed

    Grund, Simon; Lüdtke, Oliver; Robitzsch, Alexander

    2016-06-01

    Multiple imputation (MI) has become one of the main procedures used to treat missing data, but the guidelines from the methodological literature are not easily transferred to multilevel research. For models including random slopes, proper MI can be difficult, especially when the covariate values are partially missing. In the present article, we discuss applications of MI in multilevel random-coefficient models, theoretical challenges posed by slope variation, and the current limitations of standard MI software. Our findings from three simulation studies suggest that (a) MI is able to recover most parameters, but is currently not well suited to capture slope variation entirely when covariate values are missing; (b) MI offers reasonable estimates for most parameters, even in smaller samples or when its assumptions are not met; and PMID:25939979

  2. Spatial Implications Associated with Using Euclidean Distance Measurements and Geographic Centroid Imputation in Health Care Research

    PubMed Central

    Jones, Stephen G; Ashby, Avery J; Momin, Soyal R; Naidoo, Allen

    2010-01-01

    Objective To determine the effect of using Euclidean measurements and zip-code centroid geo-imputation versus more precise spatial analytical techniques in health care research. Data Sources Commercially insured members from a southeastern managed care organization. Study Design Distance from admitting inpatient facility to member's home and zip-code centroid (geographic placement) was compared using Euclidean straight-line and shortest-path drive distances (measurement technique). Data Collection Administrative claims from October 2005 to September 2006. Principal Findings Measurement technique had a greater impact on distance values compared with geographic placement. Drive distance from the geocoded address was highly correlated (r=0.99) with the Euclidean distance from the zip-code centroid. Conclusions Actual differences were relatively small. Researchers without capabilities to produce drive distance measurements and/or address geocoding techniques could rely on simple linear regressions to estimate correction factors with a high degree of confidence. PMID:19780852

  3. Impute DC link (IDCL) cell based power converters and control thereof

    DOEpatents

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  4. SNP-Seek database of SNPs derived from 3000 rice genomes.

    PubMed

    Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L

    2015-01-01

    We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

  5. Confidence intervals after multiple imputation: combining profile likelihood information from logistic regressions.

    PubMed

    Heinze, Georg; Ploner, Meinhard; Beyea, Jan

    2013-12-20

    In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter β  at level 1 - α to be identified as those β* and β** that satisfy CDF c (β*) = α ∕ 2 and CDF c (β**) = 1 - α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. PMID:23873477

  6. A Multiethnic Replication Study of Plasma Lipoprotein Levels-Associated SNPs Identified in Recent GWAS

    PubMed Central

    Bryant, Emily K.; Dressen, Amy S.; Bunker, Clareann H.; Hokanson, John E.; Hamman, Richard F.; Kamboh, M. Ilyas; Demirci, F. Yesim

    2013-01-01

    Genome-wide association studies (GWAS) have identified a number of loci/SNPs associated with plasma total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride (TG) levels. The purpose of this study was to replicate 40 recent GWAS-identified HDL-C-related new loci in 3 epidemiological samples comprising U.S. non-Hispanic Whites (NHWs), U.S. Hispanics, and African Blacks. In each sample, the association analyses were performed with all 4 major lipid traits regardless of previously reported specific associations with selected SNPs. A total of 22 SNPs showed nominally significant association (p<0.05) with at least one lipid trait in at least one ethnic group, although not always with the same lipid traits reported as genome-wide significant in the original GWAS. The total number of significant loci was 10 for TC, 12 for LDL-C, 10 for HDL-C, and 6 for TG levels. Ten SNPs were significantly associated with more than one lipid trait in at least one ethnic group. Six SNPs were significantly associated with at least one lipid trait in more than one ethnic group, although not always with the same trait across various ethnic groups. For 25 SNPs, the associations were replicated with the same genome-wide significant lipid traits in the same direction in at least one ethnic group; at nominal significance for 13 SNPs and with a trend for association for 12 SNPs. However, the associations were not consistently present in all ethnic groups. This observation was consistent with mixed results obtained in other studies that also examined various ethnic groups. PMID:23717430

  7. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

    PubMed

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

    2013-04-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  8. netview p: a network visualization tool to unravel complex population structure using genome-wide SNPs.

    PubMed

    Steinig, Eike J; Neuditschko, Markus; Khatkar, Mehar S; Raadsma, Herman W; Zenger, Kyall R

    2016-01-01

    Network-based approaches are emerging as valuable tools for the analysis of complex genetic structure in wild and captive populations. netview p combines data quality control with the construction of population networks through mutual k-nearest neighbours thresholds applied to genome-wide SNPs. The program is cross-platform compatible, open-source and efficiently operates on data ranging from hundreds to hundreds of thousands of SNPs. The pipeline was used for the analysis of pedigree data from simulated (n = 750, SNPs = 1279) and captive silver-lipped pearl oysters (n = 415, SNPs = 1107), wild populations of the European hake from the Atlantic and Mediterranean (n = 834, SNPs = 380) and grey wolves from North America (n = 239, SNPs = 78 255). The population networks effectively visualize large- and fine-scale genetic structure within and between populations, including family-level structure and relationships. netview p comprises a network-based addition to other population analysis tools and provides user-friendly access to a complex network analysis pipeline through implementation in python. PMID:26129944

  9. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution. PMID:26833483

  10. Computational Characterization of Osteoporosis Associated SNPs and Genes Identified by Genome-Wide Association Studies

    PubMed Central

    Wang, Ya; Wu, Guiju; Chen, Jie; Ye, Weiyuan; Yang, Jiancai; Huang, Qingyang

    2016-01-01

    Objectives Genome-wide association studies (GWASs) have revealed many SNPs and genes associated with osteoporosis. However, influence of these SNPs and genes on the predisposition to osteoporosis is not fully understood. We aimed to identify osteoporosis GWASs-associated SNPs potentially influencing the binding affinity of transcription factors and miRNAs, and reveal enrichment signaling pathway and “hub” genes of osteoporosis GWAS-associated genes. Methods We conducted multiple computational analyses to explore function and mechanisms of osteoporosis GWAS-associated SNPs and genes, including SNP conservation analysis and functional annotation (influence of SNPs on transcription factors and miRNA binding), gene ontology analysis, pathway analysis and protein-protein interaction analysis. Results Our results suggested that a number of SNPs potentially influence the binding affinity of transcription factors (NFATC2, MEF2C, SOX9, RUNX2, ESR2, FOXA1 and STAT3) and miRNAs. Osteoporosis GWASs-associated genes showed enrichment of Wnt signaling pathway, basal cell carcinoma and Hedgehog signaling pathway. Highly interconnected “hub” genes revealed by interaction network analysis are RUNX2, SP7, TNFRSF11B, LRP5, DKK1, ESR1 and SOST. Conclusions Our results provided the targets for further experimental assessment and further insight on osteoporosis pathophysiology. PMID:26930606

  11. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon

    PubMed Central

    Tsai, Hsin Y.; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Bishop, Steve C.; Houston, Ross D.

    2015-01-01

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth. PMID:26703584

  12. Water Filter

    NASA Technical Reports Server (NTRS)

    1982-01-01

    A compact, lightweight electrolytic water sterilizer available through Ambassador Marketing, generates silver ions in concentrations of 50 to 100 parts per billion in water flow system. The silver ions serve as an effective bactericide/deodorizer. Tap water passes through filtering element of silver that has been chemically plated onto activated carbon. The silver inhibits bacterial growth and the activated carbon removes objectionable tastes and odors caused by addition of chlorine and other chemicals in municipal water supply. The three models available are a kitchen unit, a "Tourister" unit for portable use while traveling and a refrigerator unit that attaches to the ice cube water line. A filter will treat 5,000 to 10,000 gallons of water.

  13. Eyeglass Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Biomedical Optical Company of America's suntiger lenses eliminate more than 99% of harmful light wavelengths. NASA derived lenses make scenes more vivid in color and also increase the wearer's visual acuity. Distant objects, even on hazy days, appear crisp and clear; mountains seem closer, glare is greatly reduced, clouds stand out. Daytime use protects the retina from bleaching in bright light, thus improving night vision. Filtering helps prevent a variety of eye disorders, in particular cataracts and age related macular degeneration.

  14. Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

    PubMed Central

    Hibbert, James D; Liese, Angela D; Lawson, Andrew; Porter, Dwayne E; Puett, Robin C; Standiford, Debra; Liu, Lenna; Dabelea, Dana

    2009-01-01

    Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution). Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003). Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on

  15. Imputation of the Rare HOXB13 G84E Mutation and Cancer Risk in a Large Population-Based Cohort

    PubMed Central

    Hoffmann, Thomas J.; Sakoda, Lori C.; Shen, Ling; Jorgenson, Eric; Habel, Laurel A.; Liu, Jinghua; Kvale, Mark N.; Asgari, Maryam M.; Banda, Yambazi; Corley, Douglas; Kushi, Lawrence H.; Quesenberry, Charles P.; Schaefer, Catherine; Van Den Eeden, Stephen K.; Risch, Neil; Witte, John S.

    2015-01-01

    An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37−0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4×10−12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8×10−4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects

  16. Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling

    PubMed Central

    Hieke, Stefanie; Benner, Axel; Schlenk, Richard F.; Schumacher, Martin; Bullinger, Lars; Binder, Harald

    2016-01-01

    Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses. PMID:27159447

  17. Ceramic filters

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1995-12-31

    Filters were formed from ceramic fibers, organic fibers, and a ceramic bond phase using a papermaking technique. The distribution of particulate ceramic bond phase was determined using a model silicon carbide system. As the ceramic fiber increased in length and diameter the distance between particles decreased. The calculated number of particles per area showed good agreement with the observed value. After firing, the papers were characterized using a biaxial load test. The strength of papers was proportional to the amount of bond phase included in the paper. All samples exhibited strain-tolerant behavior.

  18. Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands'

    PubMed Central

    Deelen, Patrick; Menelaou, Androniki; van Leeuwen, Elisabeth M; Kanterakis, Alexandros; van Dijk, Freerk; Medina-Gomez, Carolina; Francioli, Laurent C; Hottenga, Jouke Jan; Karssen, Lennart C; Estrada, Karol; Kreiner-Møller, Eskil; Rivadeneira, Fernando; van Setten, Jessica; Gutierrez-Achury, Javier; Westra, Harm-Jan; Franke, Lude; van Enckevort, David; Dijkstra, Martijn; Byelas, Heorhiy; van Duijn, Cornelia M; Swertz, Morris A; Francioli, Laurent C; van Dijk, Freerk; Menelaou, Androniki; Neerincx, Pieter B T; Pulit, Sara L; Deelen, Patrick; Elbers, Clara C; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F J; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H; van den Berg, Leonard H; Byelas, Heorhiy; den Dunnen, Johan T; Dijkstra, Martijn; Amin, Najaf; van der Velde, K Joeri; Jan Hottenga, Jouke; van Setten, Jessica; van Leeuwen, Elisabeth M; Kanterakis, Alexandros; Kattenberg, Mathijs; Karssen, Lennart C; van Schaik, Barbera D C; Bot, Jan; Nijman, Isaäuc J; van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H; Hehir-Kwa, Jayne Y; Handsaker, Robert E; Sunyaev, Shamil R; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Marschall, Schönhuth; Guryev, Victor; de Bakker, Paul I W; Slagboom, P Eline; Beekman, Marian B; de Craen, Anton J M; Suchiman, H Eka D; Hofman, Albert; van Duijn, Cornelia; Boomsma, Dorret I; Willemsen, Gonneke; Wolffenbuttel, Bruce H; Platteel, Mathieu; Pitts, Steven J; Potluri, Shobha; Cox, David R; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A; committee, Steering; Wijmenga, Cisca; Swertz, Morris A; van Duijn, Cornelia M; Boomsma, Dorret I; Slagboom, P Eline; van Ommen, Gertjan B; de Bakker, Paul I W; de Bakker, Paul I W; Wijmenga, Cisca; Swertz, Morris A

    2014-01-01

    Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with ‘true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05–0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r2, increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r2 improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r2 increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results. PMID:24896149

  19. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

    DOE PAGESBeta

    Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; Brown, Joseph N.; Wang, Jing; McDermott, Jason E.; Smith, Richard D.; Rodland, Karin D.; Metz, Thomas O.; Pounds, Joel G.; et al

    2015-04-09

    In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yieldedmore » the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.« less

  20. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; Brown, Joseph N.; Wang, Jing; McDermott, Jason E.; Smith, Richard D.; Rodland, Karin D.; Metz, Thomas O.; Pounds, Joel G.; Waters, Katrina M.

    2015-04-09

    In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.

  1. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    SciTech Connect

    Yang, Jing; Li, Yuan-Yuan; Li, Yi-Xue; Ye, Zhi-Qiang

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

  2. Enrichment of SNPs in Functional Categories Reveals Genes Affecting Complex Traits.

    PubMed

    Zhao, Huiying; Fan, Dongsheng; Nyholt, Dale R; Yang, Yuedong

    2016-08-01

    Genome-wide association studies (GWAS) have indicated potential to identify heritability of common complex phenotypes, but traditional approaches have limited ability to detect hiding signals because single SNP has weak effect size accounting for only a small fraction of overall phenotypic variations. To improve the power of GWAS, methods have been developed to identify truly associated genes by jointly testing effects of all SNPs. However, equally considering all SNPs within a gene might dilute strong signals of SNPs in real functional categories. Here, we observed a consistent pattern on enrichment of significant SNPs in eight functional categories across six phenotypes, with the highest enrichment in coding and both UTR regions while the lowest enrichment in the intron. Based on the pattern of SNP enrichment in functional categories, we developed a new approach for detecting gene associations on traits (DGAT) by selecting the most significant functional category and then using SNPs within it to assess gene associations. The method was found to be robust in type I error rate on simulated data, and to have mostly higher power in detecting associated genes for three different diseases than other methods. Further analysis indicated ability of the DGAT to detect novel genes. The DGAT is available by http://sparks-lab.org/server/DGAT. PMID:27113629

  3. Portability of tag SNPs across isolated population groups: an example from India.

    PubMed

    Sarkar Roy, N; Farheen, S; Roy, N; Sengupta, S; Majumder, P P

    2008-01-01

    Isolated population groups are useful in conducting association studies of complex diseases to avoid various pitfalls, including those arising from population stratification. Since DNA resequencing is expensive, it is recommended that genotyping be carried out at tagSNP (tSNP) loci. For this, tSNPs identified in one isolated population need to be used in another. Unless tSNPs are highly portable across populations this strategy may result in loss of information in association studies. We examined the issue of tSNP portability by sampling individuals from 10 isolated ethnic groups from India. We generated DNA resequencing data pertaining to 3 genomic regions and identified tSNPs in each population. We defined an index of tSNP portability and showed that portability is low across isolated Indian ethnic groups. The extent of portability did not significantly correlate with genetic similarity among the populations studied here. We also analyzed our data with sequence data from individuals of African and European descent. Our results indicated that it may be necessary to carry out resequencing in a small number of individuals to discover SNPs and identify tSNPs in the specific isolated population in which a disease association study is to be conducted. PMID:17627800

  4. A computational method for prediction of rSNPs in human genome.

    PubMed

    Li, Rong; Han, Jiuqiang; Liu, Jun; Zheng, Jiguang; Liu, Ruiling

    2016-06-01

    Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn't rely on data of regulatory elements, thus it will have better adaptability when facing different data scenarios. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnpdect/. PMID:27107687

  5. Association Analysis Identifies Melampsora ×columbiana Poplar Leaf Rust Resistance SNPs

    PubMed Central

    La Mantia, Jonathan; Klápště, Jaroslav; El-Kassaby, Yousry A.; Azam, Shofiul; Guy, Robert D.; Douglas, Carl J.; Mansfield, Shawn D.; Hamelin, Richard

    2013-01-01

    Populus species are currently being domesticated through intensive time- and resource-dependent programs for utilization in phytoremediation, wood and paper products, and conversion to biofuels. Poplar leaf rust disease can greatly reduce wood volume. Genetic resistance is effective in reducing economic losses but major resistance loci have been race-specific and can be readily defeated by the pathogen. Developing durable disease resistance requires the identification of non-race-specific loci. In the presented study, area under the disease progress curve was calculated from natural infection of Melampsora ×columbiana in three consecutive years. Association analysis was performed using 412 P. trichocarpa clones genotyped with 29,355 SNPs covering 3,543 genes. We found 40 SNPs within 26 unique genes significantly associated (permutated P<0.05) with poplar rust severity. Moreover, two SNPs were repeated in all three years suggesting non-race-specificity and three additional SNPs were differentially expressed in other poplar rust interactions. These five SNPs were found in genes that have orthologs in Arabidopsis with functionality in pathogen induced transcriptome reprogramming, Ca2+/calmodulin and salicylic acid signaling, and tolerance to reactive oxygen species. The additive effect of non-R gene functional variants may constitute high levels of durable poplar leaf rust resistance. Therefore, these findings are of significance for speeding the genetic improvement of this long-lived, economically important organism. PMID:24236018

  6. TRES: Identification of Discriminatory and Informative SNPs from Population Genomic Data.

    PubMed

    Kavakiotis, Ioannis; Triantafyllidis, Alexandros; Ntelidou, Despoina; Alexandri, Panoraia; Megens, Hendrik-Jan; Crooijmans, Richard P M A; Groenen, Martien A M; Tsoumakas, Grigorios; Vlahavas, Ioannis

    2015-01-01

    The advent of high-throughput genomic technologies is enabling analyses on thousands or even millions of single-nucleotide polymorphisms (SNPs). At the same time, the selection of a minimum number of SNPs with the maximum information content is becoming increasingly problematic. Available locus ranking programs have been accused of providing upwardly biased results (concerning the predicted accuracy of the chosen set of markers for population assignment), cannot handle high-dimensional datasets, and some of them are computationally intensive. The toolbox for ranking and evaluation of SNPs (TRES) is a collection of algorithms built in a user-friendly and computationally efficient software that can manipulate and analyze datasets even in the order of millions of genotypes in a matter of seconds. It offers a variety of established methods for evaluating and ranking SNPs on user defined groups of populations and produces a set of predefined number of top ranked loci. Moreover, dataset manipulation algorithms enable users to convert datasets in different file formats, split the initial datasets into train and test sets, and finally create datasets containing only selected SNPs occurring from the SNP selection analysis for later on evaluation in dedicated software such as GENECLASS. This application can aid biologists to select loci with maximum power for optimization of cost-effective panels with applications related to e.g. species identification, wildlife management, and forensic problems. TRES is available for all operating systems at http://mlkd.csd.auth.gr/bio/tres. PMID:26137847

  7. Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability

    PubMed Central

    Das, Avinash; Morley, Michael; Moravec, Christine S.; Tang, W. H. W.; Hakonarson, Hakon; Ashley, Euan A.; Brandimarto, Jeffrey; Hu, Ray; Li, Mingyao; Li, Hongzhe; Liu, Yichuan; Qu, Liming; Sanchez, Pablo; Margulies, Kenneth B.; Cappola, Thomas P.; Jensen, Shane; Hannenhalli, Sridhar

    2015-01-01

    The standard expression quantitative trait loci (eQTL) detects polymorphisms associated with gene expression without revealing causality. We introduce a coupled Bayesian regression approach—eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combination of regulatory single-nucleotide polymorphisms (SNPs) that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance but also predicts gene expression more accurately than other methods. Based on realistic simulated data, we demonstrate that eQTeL accurately detects causal regulatory SNPs, including those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. PMID:26456756

  8. Association of MHC region SNPs with irritant susceptibility in healthcare workers.

    PubMed

    Yucesoy, Berran; Talzhanov, Yerkebulan; Michael Barmada, M; Johnson, Victor J; Kashon, Michael L; Baron, Elma; Wilson, Nevin W; Frye, Bonnie; Wang, Wei; Fluharty, Kara; Gharib, Rola; Meade, Jean; Germolec, Dori; Luster, Michael I; Nedorost, Susan

    2016-09-01

    Irritant contact dermatitis is the most common work-related skin disease, especially affecting workers in "wet-work" occupations. This study was conducted to investigate the association between single nucleotide polymorphisms (SNPs) within the major histocompatibility complex (MHC) and skin irritant response in a group of healthcare workers. 585 volunteer healthcare workers were genotyped for MHC SNPs and patch tested with three different irritants: sodium lauryl sulfate (SLS), sodium hydroxide (NaOH) and benzalkonium chloride (BKC). Genotyping was performed using Illumina Goldengate MHC panels. A number of SNPs within the MHC Class I (OR2B3, TRIM31, TRIM10, TRIM40 and IER3), Class II (HLA-DPA1, HLA-DPB1) and Class III (C2) genes were associated (p < 0.001) with skin response to tested irritants in different genetic models. Linkage disequilibrium patterns and functional annotations identified two SNPs in the TRIM40 (rs1573298) and HLA-DPB1 (rs9277554) genes, with a potential impact on gene regulation. In addition, SNPs in PSMB9 (rs10046277 and ITPR3 (rs499384) were associated with hand dermatitis. The results are of interest as they demonstrate that genetic variations in inflammation-related genes within the MHC can influence chemical-induced skin irritation and may explain the connection between inflamed skin and propensity to subsequent allergic contact sensitization. PMID:27258892

  9. 34 CFR 85.630 - May the Department of Education impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... (3 CFR, 1986 Comp., p. 189); E.O 12689 (3 CFR, 1989 Comp., p. 235); 20 U.S.C. 1082, 1094, 1221e-3 and... 34 Education 1 2010-07-01 2010-07-01 false May the Department of Education impute conduct of one person to another? 85.630 Section 85.630 Education Office of the Secretary, Department of...

  10. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    PubMed

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial. PMID:21390998

  11. Tailored selection of study individuals to be sequenced in order to improve the accuracy of genotype imputation.

    PubMed

    Peil, Barbara; Kabisch, Maria; Fischer, Christine; Hamann, Ute; Bermejo, Justo Lorenzo

    2015-02-01

    The addition of sequence data from own-study individuals to genotypes from external data repositories, for example, the HapMap, has been shown to improve the accuracy of imputed genotypes. Early approaches for reference panel selection favored individuals who best reflect recombination patterns in the study population. By contrast, a maximization of genetic diversity in the reference panel has been recently proposed. We investigate here a novel strategy to select individuals for sequencing that relies on the characterization of the ancestral kernel of the study population. The simulated study scenarios consisted of several combinations of subpopulations from HapMap. HapMap individuals who did not belong to the study population constituted an external reference panel which was complemented with the sequences of study individuals selected according to different strategies. In addition to a random choice, individuals with the largest statistical depth according to the first genetic principal components were selected. In all simulated scenarios the integration of sequences from own-study individuals increased imputation accuracy. The selection of individuals based on the statistical depth resulted in the highest imputation accuracy for European and Asian study scenarios, whereas random selection performed best for an African-study scenario. Present findings indicate that there is no universal 'best strategy' to select individuals for sequencing. We propose to use the methodology described in the manuscript to assess the advantage of focusing on the ancestral kernel under own study characteristics (study size, genetic diversity, availability and properties of external reference panels, frequency of imputed variants…). PMID:25537753

  12. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions

    PubMed Central

    Druet, T; Macleod, I M; Hayes, B J

    2014-01-01

    Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%. PMID:23549338

  13. Seq4SNPs: new software for retrieval of multiple, accurately annotated DNA sequences, ready formatted for SNP assay design

    PubMed Central

    Field, Helen I; Scollen, Serena A; Luccarini, Craig; Baynes, Caroline; Morrison, Jonathan; Dunning, Alison M; Easton, Douglas F; Pharoah, Paul DP

    2009-01-01

    Background In moderate-throughput SNP genotyping there was a gap in the workflow, between choosing a set of SNPs and submitting their sequences to proprietary assay design software, which was not met by existing software. Retrieval and formatting of sequences flanking each SNP, prior to assay design, becomes rate-limiting for more than about ten SNPs, especially if annotated for repetitive regions and adjacent variations. We routinely process up to 50 SNPs at once. Implementation We created Seq4SNPs, a web-based, walk-away software that can process one to several hundred SNPs given rs numbers as input. It outputs a file of fully annotated sequences formatted for one of three proprietary design softwares: TaqMan's Primer-By-Design FileBuilder, Sequenom's iPLEX or SNPstream's Autoprimer, as well as unannotated fasta sequences. We found genotyping assays to be inhibited by repetitive sequences or the presence of additional variations flanking the SNP under test, and in multiplexes, repetitive sequence flanking one SNP adversely affects multiple assays. Assay design software programs avoid such regions if the input sequences are appropriately annotated, so we used Seq4SNPs to provide suitably annotated input sequences, and improved our genotyping success rate. Adjacent SNPs can also be avoided, by annotating sequences used as input for primer design. Conclusion The accuracy of annotation by Seq4SNPs is significantly better than manual annotation (P < 1e-5). Using Seq4SNPs to incorporate all annotation for additional SNPs and repetitive elements into sequences, for genotyping assay designer software, minimizes assay failure at the design stage, reducing the cost of genotyping. Seq4SNPs provides a rapid route for replacement of poor test SNP sequences. We routinely use this software for assay sequence preparation. Seq4SNPs is available as a service at and , currently for human SNPs, but easily extended to include any species in dbSNP. PMID:19523221

  14. Evaluation of transethnic fine mapping with population-specific and cosmopolitan imputation reference panels in diverse Asian populations.

    PubMed

    Wang, Xu; Cheng, Ching-Yu; Liao, Jiemin; Sim, Xueling; Liu, Jianjun; Chia, Kee-Seng; Tai, E-Shyong; Little, Peter; Khor, Chiea-Chuen; Aung, Tin; Wong, Tien-Yin; Teo, Yik-Ying

    2016-04-01

    There has been limited success in identifying causal variants underlying association signals observed in genome-wide association studies (GWAS). The use of 1000 Genomes Project (1KGP) allows the imputation to estimate the genetic information at untyped variants. However, long stretches of high linkage disequilibrium within the genome prevent us from differentiating between causal variants and perfect surrogates, thus limiting our ability to identify causal variants. Transethnic strategies have been proposed as a possible solution to mitigate this. However, these studies generally rely on imputing genotypes from multiple ancestries from 1KGP but not against population-specific reference panels. Here, we perform the first transethnic fine-mapping study across three Asian cohorts from diverse ancestries at the loci implicated with eye and blood lipid traits, using population-specific reference panels that have been generated by whole-genome sequencing samples from the same ancestry groups. Our study outlines several challenges faced in a fine-mapping exercise where one simply aims to meta-analyse existing GWAS that have been imputed against reference haplotypes from the 1KGP. PMID:26130488

  15. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    NASA Astrophysics Data System (ADS)

    Riggi, S.; Riggi, D.; Riggi, F.

    2015-04-01

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures' models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers' Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  16. Complete genome sequence and SNPs of Raja pulchra (Rajiformes, Rajidae) mitochondria.

    PubMed

    Hwang, Jae Yeon; Jin, Gwi-Deuk; Park, Jongbin; Kim, Heebal; Lee, Chang-Kyu; Kwak, Woori; Nam, Bo-Hye; An, Cheul Min; Park, Jung Youn; Park, Kyu-Hyun; Huh, Chul-Sung; Kim, Eun Bae

    2016-07-01

    Mitochondrial genomes were sequenced from five Raja pulchra individuals, and single-nucleotide polymorphisms (SNPs) were identified by comparing previously announced sequences in this study. Total 117 SNPs were detected and they were present in 2 rRNA genes, 9 tRNA genes, 13 protein coding genes and non-coding region. One deleted polymorphic site, which was located in 16S rRNA gene, was observed in two individuals. Six polymorphic sites were non-synonymous SNPs, which were distributed in ND1, ND2, ATP6 and ND4 gene. Phylogenic analysis validated current taxa. The genome sequences of R. pulchra mitochondria could be comparable information for understanding species divergence and genomic variation among the populations. PMID:26122344

  17. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.

    PubMed

    Burton, Paul R; Clayton, David G; Cardon, Lon R; Craddock, Nick; Deloukas, Panos; Duncanson, Audrey; Kwiatkowski, Dominic P; McCarthy, Mark I; Ouwehand, Willem H; Samani, Nilesh J; Todd, John A; Donnelly, Peter; Barrett, Jeffrey C; Davison, Dan; Easton, Doug; Evans, David M; Leung, Hin-Tak; Marchini, Jonathan L; Morris, Andrew P; Spencer, Chris C A; Tobin, Martin D; Attwood, Antony P; Boorman, James P; Cant, Barbara; Everson, Ursula; Hussey, Judith M; Jolley, Jennifer D; Knight, Alexandra S; Koch, Kerstin; Meech, Elizabeth; Nutland, Sarah; Prowse, Christopher V; Stevens, Helen E; Taylor, Niall C; Walters, Graham R; Walker, Neil M; Watkins, Nicholas A; Winzer, Thilo; Jones, Richard W; McArdle, Wendy L; Ring, Susan M; Strachan, David P; Pembrey, Marcus; Breen, Gerome; St Clair, David; Caesar, Sian; Gordon-Smith, Katharine; Jones, Lisa; Fraser, Christine; Green, Elaine K; Grozeva, Detelina; Hamshere, Marian L; Holmans, Peter A; Jones, Ian R; Kirov, George; Moskivina, Valentina; Nikolov, Ivan; O'Donovan, Michael C; Owen, Michael J; Collier, David A; Elkin, Amanda; Farmer, Anne; Williamson, Richard; McGuffin, Peter; Young, Allan H; Ferrier, I Nicol; Ball, Stephen G; Balmforth, Anthony J; Barrett, Jennifer H; Bishop, Timothy D; Iles, Mark M; Maqbool, Azhar; Yuldasheva, Nadira; Hall, Alistair S; Braund, Peter S; Dixon, Richard J; Mangino, Massimo; Stevens, Suzanne; Thompson, John R; Bredin, Francesca; Tremelling, Mark; Parkes, Miles; Drummond, Hazel; Lees, Charles W; Nimmo, Elaine R; Satsangi, Jack; Fisher, Sheila A; Forbes, Alastair; Lewis, Cathryn M; Onnie, Clive M; Prescott, Natalie J; Sanderson, Jeremy; Matthew, Christopher G; Barbour, Jamie; Mohiuddin, M Khalid; Todhunter, Catherine E; Mansfield, John C; Ahmad, Tariq; Cummings, Fraser R; Jewell, Derek P; Webster, John; Brown, Morris J; Lathrop, Mark G; Connell, John; Dominiczak, Anna; Marcano, Carolina A Braga; Burke, Beverley; Dobson, Richard; Gungadoo, Johannie; Lee, Kate L; Munroe, Patricia B; Newhouse, Stephen J; Onipinla, Abiodun; Wallace, Chris; Xue, Mingzhan; Caulfield, Mark; Farrall, Martin; Barton, Anne; Bruce, Ian N; Donovan, Hannah; Eyre, Steve; Gilbert, Paul D; Hilder, Samantha L; Hinks, Anne M; John, Sally L; Potter, Catherine; Silman, Alan J; Symmons, Deborah P M; Thomson, Wendy; Worthington, Jane; Dunger, David B; Widmer, Barry; Frayling, Timothy M; Freathy, Rachel M; Lango, Hana; Perry, John R B; Shields, Beverley M; Weedon, Michael N; Hattersley, Andrew T; Hitman, Graham A; Walker, Mark; Elliott, Kate S; Groves, Christopher J; Lindgren, Cecilia M; Rayner, Nigel W; Timpson, Nicolas J; Zeggini, Eleftheria; Newport, Melanie; Sirugo, Giorgio; Lyons, Emily; Vannberg, Fredrik; Hill, Adrian V S; Bradbury, Linda A; Farrar, Claire; Pointon, Jennifer J; Wordsworth, Paul; Brown, Matthew A; Franklyn, Jayne A; Heward, Joanne M; Simmonds, Matthew J; Gough, Stephen C L; Seal, Sheila; Stratton, Michael R; Rahman, Nazneen; Ban, Maria; Goris, An; Sawcer, Stephen J; Compston, Alastair; Conway, David; Jallow, Muminatou; Newport, Melanie; Sirugo, Giorgio; Rockett, Kirk A; Bumpstead, Suzannah J; Chaney, Amy; Downes, Kate; Ghori, Mohammed J R; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Keniry, Andrew; King, Emma; McGinnis, Ralph; Potter, Simon; Ravindrarajah, Rathi; Whittaker, Pamela; Widden, Claire; Withers, David; Cardin, Niall J; Davison, Dan; Ferreira, Teresa; Pereira-Gale, Joanne; Hallgrimsdo'ttir, Ingeleif B; Howie, Bryan N; Su, Zhan; Teo, Yik Ying; Vukcevic, Damjan; Bentley, David; Brown, Matthew A; Compston, Alastair; Farrall, Martin; Hall, Alistair S; Hattersley, Andrew T; Hill, Adrian V S; Parkes, Miles; Pembrey, Marcus; Stratton, Michael R; Mitchell, Sarah L; Newby, Paul R; Brand, Oliver J; Carr-Smith, Jackie; Pearce, Simon H S; McGinnis, R; Keniry, A; Deloukas, P; Reveille, John D; Zhou, Xiaodong; Sims, Anne-Marie; Dowling, Alison; Taylor, Jacqueline; Doan, Tracy; Davis, John C; Savage, Laurie; Ward, Michael M; Learch, Thomas L; Weisman, Michael H; Brown, Mathew

    2007-11-01

    We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases. PMID:17952073

  18. S-PRIME/TI-SNPS program activities in FY94 critical components testing

    SciTech Connect

    Brown, C.; Dale Rogers, R.; Determan, W.R.; Van Hagan, T.

    1995-01-20

    A conceptual design for a 40-kWe thermionic space nuclear power system (TI-SNPS) known as the S-PRIME system is being developed by Rockwell and its subcontractors for the U.S. Department of Energy (DOE), United States Air Force (USAF), and Ballistic Missile Defense Organization (BMDO) under the TI-SNPS Program. Phase 1 of this program includes developing a conceptual design of a 5- to 40-kWe range TI-SNPS and validating key technologies that support the design. All key technologies for the S-PRIME design have been identified along with six critical component demonstrations, which will be used to validate the S-PRIME design features. {copyright}American Institute of Physics 1995

  19. S-PRIME/TI-SNPS program activities in FY94 critical components testing

    NASA Astrophysics Data System (ADS)

    Brown, Colette; Dale Rogers, R.; Determan, William R.; Van Hagan, Tom

    1995-01-01

    A conceptual design for a 40-kWe thermionic space nuclear power system (TI-SNPS) known as the S-PRIME system is being developed by Rockwell and its subcontractors for the U.S. Department of Energy (DOE), United States Air Force (USAF), and Ballistic Missile Defense Organization (BMDO) under the TI-SNPS Program. Phase 1 of this program includes developing a conceptual design of a 5- to 40-kWe range TI-SNPS and validating key technologies that support the design. All key technologies for the S-PRIME design have been identified along with six critical component demonstrations, which will be used to validate the S-PRIME design features.

  20. Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs

    NASA Astrophysics Data System (ADS)

    Watson, Corey T.; Disanto, Giulio; Breden, Felix; Giovannoni, Gavin; Ramagopalan, Sreeram V.

    2012-10-01

    Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.

  1. Genome-wide association analysis of canine atopic dermatitis and identification of disease related SNPs.

    PubMed

    Wood, Shona Hiedi; Ke, Xiayi; Nuttall, Tim; McEwan, Neil; Ollier, William E; Carter, Stuart D

    2009-12-01

    In humans, genome-wide association studies (GWAS) have been shown to be an effective and thorough approach for identifying polymorphisms associated with disease phenotypes. Here, we describe the first study to perform a genome-wide association study in canine atopic dermatitis (cAD) using the Illumina Canine SNP20 array, containing 22,362 single-nucleotide polymorphisms (SNPs). The aim of the study was to identify SNPs associated with cAD using affected and unaffected Golden Retrievers. Further validation studies were performed for potentially associated SNPs using Sequenom genotyping of larger numbers of cases and controls across eight breeds (Boxer, German Shepherd Dog, Labrador, Golden Retriever, Shiba Inu, Shih Tzu, Pit Bull, and West Highland White Terriers). Using meta-analysis, two SNPs were associated with cAD in all breeds tested. RS22114085 was identified as a susceptibility locus (p=0.00014, odds ratio=2) and RS23472497 as a protective locus (p=0.0015, odds ratio=0.6). Both of these SNPs were located in intergenic regions, and their effects have been demonstrated to be independent of each other, highlighting that further fine mapping and resequencing is required of these areas. Further, 12 SNPs were validated by Sequenom genotyping as associated with cAD, but these were not associated with all breeds. This study suggests that GWAS will be a useful approach for identifying genetic risk factors for cAD. Given the clinical heterogeneity within this condition and the likelihood that the relative genetic effect sizes are small, greater sample sizes and further studies will be required. PMID:19838693

  2. Silver sulfide nanoparticles (Ag2S-NPs) are taken up by plants and are phytotoxic.

    PubMed

    Wang, Peng; Menzies, Neal W; Lombi, Enzo; Sekine, Ryo; Blamey, F Pax C; Hernandez-Soriano, Maria C; Cheng, Miaomiao; Kappen, Peter; Peijnenburg, Willie J G M; Tang, Caixian; Kopittke, Peter M

    2015-01-01

    Silver nanoparticles (NPs) are used in more consumer products than any other nanomaterial and their release into the environment is unavoidable. Of primary concern is the wastewater stream in which most silver NPs are transformed to silver sulfide NPs (Ag2S-NPs) before being applied to agricultural soils within biosolids. While Ag2S-NPs are assumed to be biologically inert, nothing is known of their effects on terrestrial plants. The phytotoxicity of Ag and its accumulation was examined in short-term (24 h) and longer-term (2-week) solution culture experiments with cowpea (Vigna unguiculata L. Walp.) and wheat (Triticum aestivum L.) exposed to Ag2S-NPs (0-20 mg Ag L(-1)), metallic Ag-NPs (0-1.6 mg Ag L(-1)), or ionic Ag (AgNO3; 0-0.086 mg Ag L(-1)). Although not inducing any effects during 24-h exposure, Ag2S-NPs reduced growth by up to 52% over a 2-week period. This toxicity did not result from their dissolution and release of toxic Ag(+) in the rooting medium, with soluble Ag concentrations remaining below 0.001 mg Ag L(-1). Rather, Ag accumulated as Ag2S in the root and shoot tissues when plants were exposed to Ag2S-NPs, consistent with their direct uptake. Importantly, this differed from the form of Ag present in tissues of plants exposed to AgNO3. For the first time, our findings have shown that Ag2S-NPs exert toxic effects through their direct accumulation in terrestrial plant tissues. These findings need to be considered to ensure high yield of food crops, and to avoid increasing Ag in the food chain. PMID:25686712

  3. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris)

    PubMed Central

    Villordo-Pineda, Emiliano; González-Chavira, Mario M.; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A.; Caballero-Pérez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  4. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

    PubMed Central

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules. PMID:27199552

  5. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

    PubMed

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules. PMID:27199552

  6. Imputation-Based Meta-Analysis of Severe Malaria in Three African Populations

    PubMed Central

    Band, Gavin; Le, Quang Si; Jostins, Luke; Pirinen, Matti; Kivinen, Katja; Jallow, Muminatou; Sisay-Joof, Fatoumatta; Bojang, Kalifa; Pinder, Margaret; Sirugo, Giorgio; Conway, David J.; Nyirongo, Vysaul; Kachala, David; Molyneux, Malcolm; Taylor, Terrie; Ndila, Carolyne; Peshu, Norbert; Marsh, Kevin; Williams, Thomas N.; Alcock, Daniel; Andrews, Robert; Edkins, Sarah; Gray, Emma; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Schuldt, Kathrin; Clark, Taane G.; Small, Kerrin S.; Teo, Yik Ying; Kwiatkowski, Dominic P.; Rockett, Kirk A.; Barrett, Jeffrey C.; Spencer, Chris C. A.

    2013-01-01

    Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP–based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles. PMID:23717212

  7. A multiple imputation approach to disclosure limitation for high-age individuals in longitudinal studies.

    PubMed

    An, Di; Little, Roderick J A; McNally, James W

    2010-07-30

    Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because of the risk of disclosure, ages of very old respondents can often not be released; in particular, this is a specific stipulation of the Health Insurance Portability and Accountability Act (HIPAA) for the release of health data for individuals. Top-coding of individuals beyond a certain age is a standard way of dealing with this issue, and it may be adequate for cross-sectional data, when a modest number of cases are affected. However, this approach leads to serious loss of information in longitudinal studies when individuals have been followed for many years. We propose and evaluate an alternative to top-coding for this situation based on multiple imputation (MI). This MI method is applied to a survival analysis of simulated data, and data from the Charleston Heart Study (CHS), and is shown to work well in preserving the relationship between hazard and covariates. PMID:20552576

  8. The search for stable prognostic models in multiple imputed data sets

    PubMed Central

    2010-01-01

    Background In prognostic studies model instability and missing data can be troubling factors. Proposed methods for handling these situations are bootstrapping (B) and Multiple imputation (MI). The authors examined the influence of these methods on model composition. Methods Models were constructed using a cohort of 587 patients consulting between January 2001 and January 2003 with a shoulder problem in general practice in the Netherlands (the Dutch Shoulder Study). Outcome measures were persistent shoulder disability and persistent shoulder pain. Potential predictors included socio-demographic variables, characteristics of the pain problem, physical activity and psychosocial factors. Model composition and performance (calibration and discrimination) were assessed for models using a complete case analysis, MI, bootstrapping or both MI and bootstrapping. Results Results showed that model composition varied between models as a result of how missing data was handled and that bootstrapping provided additional information on the stability of the selected prognostic model. Conclusion In prognostic modeling missing data needs to be handled by MI and bootstrap model selection is advised in order to provide information on model stability. PMID:20846460

  9. Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips.

    PubMed

    Boison, S A; Santos, D J A; Utsunomiya, A H T; Carvalheiro, R; Neves, H H R; O'Brien, A M Perez; Garcia, J F; Sölkner, J; da Silva, M V G B

    2015-07-01

    Genotype imputation is widely used as a cost-effective strategy in genomic evaluation of cattle. Key determinants of imputation accuracies, such as linkage disequilibrium patterns, marker densities, and ascertainment bias, differ between Bos indicus and Bos taurus breeds. Consequently, there is a need to investigate effectiveness of genotype imputation in indicine breeds. Thus, the objective of the study was to investigate strategies and factors affecting the accuracy of genotype imputation in Gyr (Bos indicus) dairy cattle. Four imputation scenarios were studied using 471 sires and 1,644 dams genotyped on Illumina BovineHD (HD-777K; San Diego, CA) and BovineSNP50 (50K) chips, respectively. Scenarios were based on which reference high-density single nucleotide polymorphism (SNP) panel (HDP) should be adopted [HD-777K, 50K, and GeneSeek GGP-75Ki (Lincoln, NE)]. Depending on the scenario, validation animals had their genotypes masked for one of the lower-density panels: Illumina (3K, 7K, and 50K) and GeneSeek (SGGP-20Ki and GGP-75Ki). We randomly selected 171 sires as reference and 300 as validation for all the scenarios. Additionally, all sires were used as reference and the 1,644 dams were imputed for validation. Genotypes of 98 individuals with 4 and more offspring were completely masked and imputed. Imputation algorithms FImpute and Beagle v3.3 and v4 were used. Imputation accuracies were measured using the correlation and allelic correct rate. FImpute resulted in highest accuracies, whereas Beagle 3.3 gave the least-accurate imputations. Accuracies evaluated as correlation (allelic correct rate) ranged from 0.910 (0.942) to 0.961 (0.974) using 50K as HDP and with 3K (7K) as low-density panels. With GGP-75Ki as HDP, accuracies were moderate for 3K, 7K, and 50K, but high for SGGP-20Ki. The use of HD-777K as HDP resulted in accuracies of 0.888 (3K), 0.941 (7K), 0.980 (SGGP-20Ki), 0.982 (50K), and 0.993 (GGP-75Ki). Ungenotyped individuals were imputed with an

  10. Polymorphisms involving gain or loss of CpG sites are significantly enriched in trait-associated SNPs

    PubMed Central

    Zhou, Dan; Li, Zhenli; Yu, Dan; Wan, Ledong; Zhu, Yimin; Lai, Maode; Zhang, Dandan

    2015-01-01

    Some single nucleotide polymorphisms (SNPs) influence the existence of CpG sites, the basis of DNA modification such as methylation and hydroxymethylation. These polymorphisms can lead to gain or loss of CpG sites and were defined as CpG site related SNPs (cgSNPs) in this study. The cgSNPs change DNA sequence and might potentially affect DNA modification such as methylation. However, the functional consequence of cgSNPs is poorly understood. We observed that a considerable proportion (23.0%) of common variants were cgSNPs in human genome. Mutations involving loss of CpG sites were associated with reduced levels of methylation (~20.2%) using The Cancer Genome Atlas (TCGA) data. Using public databases (SCAN and seeQTL) of expression quantitative trait loci (eQTLs), we found that the cgSNPs were significantly enriched in eQTLs via logistic regression and simulation test. Furthermore, we observed that cgSNPs were more likely to be trait-associated loci especially cancers using a catalog of published genome-wide association studies (GWAS) recorded by National Human Genome Research Institute (NHGRI). Our results indicated that cgSNP might be meaningful as annotation either in SNP functional prediction or in screening for trait-associated SNPs. PMID:26503467

  11. Identification of immune-related SNPs in the transcriptome of Mytilus chilensis through high-throughput sequencing.

    PubMed

    Núñez-Acuña, Gustavo; Gallardo-Escárate, Cristian

    2013-12-01

    Single nucleotide polymorphisms (SNPs) identified in coding regions represent a useful tool for understanding the immune response against pathogens and stressful environmental conditions. In this study, a SNPs database was generated from transcripts involved in the innate immune response of the mussel Mytilus chilensis. The SNPs were identified through hemocytes transcriptome sequencing from 18 individuals, and SNPs mining was performed in 225,336 contigs, yielding 20,306 polymorphisms associated to immune-related genes. Classification of identified SNPs was based on different pathways of the immune response for Mytilus sp. A total of 28 SNPs were identified in the Toll-like receptor pathway and included 5 non-synonymous polymorphisms; 19 SNPs were identified in the apoptosis pathway and included 3 non-synonymous polymorphisms; 35 SNPs were identified in the Ubiquitin-mediated proteolysis pathway and included 4 non-synonymous variants; and 54 SNPs involved in other molecular functions related to the immune response, such as molecular chaperones, antimicrobial peptides, and genes that interacts with marine toxins were also identified. The molecular markers identified in this work could be useful for novel studies, such as those related to associations between high-resolution molecular markers and functional response to pathogen agents. PMID:24080470

  12. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

    PubMed

    Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D

    2015-05-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  13. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass

    PubMed Central

    Ramstein, Guillaume P.; Lipka, Alexander E.; Lu, Fei; Costich, Denise E.; Cherney, Jerome H.; Buckler, Edward S.; Casler, Michael D.

    2015-01-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  14. Connecting SNPs in Diabetes: A Spatial Analysis of Meta-GWAS Loci

    PubMed Central

    Schierding, William; O’Sullivan, Justin M.

    2015-01-01

    Meta-analyses of genome-wide association studies (GWAS) have improved our understanding of the genetic foundations of a number of diseases, including diabetes. However, single nucleotide polymorphisms (SNPs) that are identified by GWAS, especially those that fall outside of gene regions, do not always clearly link to the underlying biology. Despite this, these SNPs have often been validated through re-sequencing efforts as not just tag SNPs, but as causative SNPs, and so must play a role in disease development or progression. In this study, we show how the 3D genome (spatial connections) and trans-expression Quantitative Trait Loci connect diabetes loci from different GWAS meta-analyses, informing the backbone of regulatory networks. Our findings include a three-way functional–spatial connection between the TM6SF2, CTRB1–BCAR1, and CELSR2–PSRC1 loci (rs201189528, rs7202844, and rs7202844, respectively) connected through the KCNIP3 and BCAR1/BCAR3 loci, respectively. These spatial hubs serve as an example of how loci in genes with little biological connection to disease come together to contribute to the diabetes phenotype. PMID:26191039

  15. Cross-Amplification and Validation of SNPs Conserved over 44 Million Years between Seals and Dogs

    PubMed Central

    Hoffman, Joseph I.; Thorne, Michael A. S.; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  16. Alteration of Antiviral Signalling by Single Nucleotide Polymorphisms (SNPs) of Mitochondrial Antiviral Signalling Protein (MAVS)

    PubMed Central

    Xing, Fei; Matsumiya, Tomoh; Hayakari, Ryo; Yoshida, Hidemi; Kawaguchi, Shogo; Takahashi, Ippei; Nakaji, Shigeyuki; Imaizumi, Tadaatsu

    2016-01-01

    Genetic variation is associated with diseases. As a type of genetic variation occurring with certain regularity and frequency, the single nucleotide polymorphism (SNP) is attracting more and more attention because of its great value for research and real-life application. Mitochondrial antiviral signalling protein (MAVS) acts as a common adaptor molecule for retinoic acid-inducible gene-I (RIG-I)-like receptors (RLRs), which can recognize foreign RNA, including viral RNA, leading to the induction of type I interferons (IFNs). Therefore, MAVS is thought to be a crucial molecule in antiviral innate immunity. We speculated that genetic variation of MAVS may result in susceptibility to infectious diseases. To assess the risk of viral infection based on MAVS variation, we tested the effects of twelve non-synonymous MAVS coding-region SNPs from the National Center for Biotechnology Information (NCBI) database that result in amino acid substitutions. We found that five of these SNPs exhibited functional alterations. Additionally, four resulted in an inhibitory immune response, and one had the opposite effect. In total, 1,032 human genomic samples obtained from a mass examination were genotyped at these five SNPs. However, no homozygous or heterozygous variation was detected. We hypothesized that these five SNPs are not present in the Japanese population and that such MAVS variations may result in serious immune diseases. PMID:26954674

  17. The effects of single nucleotide polymorphisms (SNPs) of calpastatin (CAST) gene on meat tenderness of yak.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The association of single nucleotide polymorphisms (SNPs) of calpastatin (CAST) gene with shear force of 2.54 cm steaks from M. longissimus dorsi from Gannan yaks (Bos grunniens, n=181) was studied. Yaks were harvested at 2, 3, and 4 yr of age (n=51, 59, and 71, respectively), and samples of each ya...

  18. Identification of new SNPs in native South American populations by resequencing the Y chromosome.

    PubMed

    Geppert, M; Ayub, Q; Xue, Y; Santos, S; Ribeiro-dos-Santos, Â; Baeta, M; Núñez, C; Martínez-Jarreta, B; Tyler-Smith, C; Roewer, L

    2015-03-01

    The Y-chromosomal genetic landscape of South America is relatively homogenous. The majority of native Amerindian people are assigned to haplogroup Q and only a small percentage belongs to haplogroup C. With the aim of further differentiating the major Q lineages and thus obtaining new insights into the population history of South America, two individuals, both belonging to the sub-haplogroup Q-M3, were analyzed with next-generation sequencing. Several new candidate SNPs were evaluated and four were confirmed to be new, haplogroup Q-specific, and variable. One of the new SNPs, named MG2, identifies a new sub-haplogroup downstream of Q-M3; the other three (MG11, MG13, MG15) are upstream of Q-M3 but downstream of M242, and describe branches at the same phylogenetic positions as previously known SNPs in the samples tested. These four SNPs were typed in 100 individuals belonging to haplogroup Q. PMID:25303787

  19. Cross-amplification and validation of SNPs conserved over 44 million years between seals and dogs.

    PubMed

    Hoffman, Joseph I; Thorne, Michael A S; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  20. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing

    PubMed Central

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  1. Large-scale enrichment and discovery of gene-associated SNPs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...

  2. Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA Segments

    PubMed Central

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  3. SNPs for parentage testing and traceability in globally diverse breeds of sheep

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA-based parentage determination accelerates genetic improvement by increasing pedigree accuracy. However, the utility of any “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities...

  4. Mining SNPs and Indels in Mung Bean (Vigna radiata) by Ecotilling

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ecotilling is a powerful genetic analysis tool. It can provide rapid identification of naturally occurring Single Nucleotide Polymorphisms (SNPs) and small insertion/deletions (indels) in a pool of accessions for a gene of interest. This technique eliminates the time consuming and expensive proced...

  5. Assessing SNPs versus RAPDs for predicting heterogeneity and screening efficiency in wild potato (Solanum)species

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Knowing how genetic diversity is partitioned among and within wild potato species populations is important for efficient sampling for collection, preservation and evaluation. We sought to evaluate the effectiveness of SNPs for assessing germplasm by using the exact set of four model species previous...

  6. Angiogenic, neurotrophic, and inflammatory system SNPs moderate the association between birth weight and ADHD symptom severity.

    PubMed

    Smith, Taylor F; Anastopoulos, Arthur D; Garrett, Melanie E; Arias-Vasquez, Alejandro; Franke, Barbara; Oades, Robert D; Sonuga-Barke, Edmund; Asherson, Philip; Gill, Michael; Buitelaar, Jan K; Sergeant, Joseph A; Kollins, Scott H; Faraone, Stephen V; Ashley-Koch, Allison

    2014-12-01

    Low birth weight is associated with increased risk for Attention-Deficit/Hyperactivity Disorder (ADHD); however, the etiological underpinnings of this relationship remain unclear. This study investigated if genetic variants in angiogenic, dopaminergic, neurotrophic, kynurenine, and cytokine-related biological pathways moderate the relationship between birth weight and ADHD symptom severity. A total of 398 youth from two multi-site, family-based studies of ADHD were included in the analysis. The sample consisted of 360 ADHD probands, 21 affected siblings, and 17 unaffected siblings. A set of 164 SNPs from 31 candidate genes, representing five biological pathways, were included in our analyses. Birth weight and gestational age data were collected from a state birth registry, medical records, and parent report. Generalized Estimating Equations tested for main effects and interactions between individual SNPs and birth weight centile in predicting ADHD symptom severity. SNPs within neurotrophic (NTRK3) and cytokine genes (CNTFR) were associated with ADHD inattentive symptom severity. There was no main effect of birth weight centile on ADHD symptom severity. SNPs within angiogenic (NRP1 & NRP2), neurotrophic (NTRK1 & NTRK3), cytokine (IL16 & S100B), and kynurenine (CCBL1 & CCBL2) genes moderate the association between birth weight centile and ADHD symptom severity. The SNP main effects and SNP × birth weight centile interactions remained significant after adjusting for multiple testing. Genetic variability in angiogenic, neurotrophic, and inflammatory systems may moderate the association between restricted prenatal growth, a proxy for an adverse prenatal environment, and risk to develop ADHD. PMID:25346392

  7. Prioritization of candidate SNPs in colon cancer using bioinformatics tools: an alternative approach for a cancer biologist.

    PubMed

    George Priya Doss, C; Rajasekaran, R; Arjun, P; Sethumadhavan, Rao

    2010-12-01

    The genetics of human phenotype variation and especially, the genetic basis of human complex diseases could be understood by knowing the functions of Single Nucleotide Polymorphisms (SNPs). The main goal of this work is to predict the deleterious non-synonymous SNPs (nsSNPs), so that the number of SNPs screened for association with disease can be reduced to that most likely alters gene function. In this work by using computational tools, we have analyzed the SNPs that can alter the expression and function of cancerous genes involved in colon cancer. To explore possible relationships between genetic mutation and phenotypic variation, different computational algorithm tools like Sorting Intolerant from Tolerant (evolutionary-based approach), Polymorphism Phenotyping (structure-based approach), PupaSuite, UTRScan and FASTSNP were used for prioritization of high-risk SNPs in coding region (exonic nonsynonymous SNPs) and non-coding regions (intronic and exonic 5' and 3'-untranslated region (UTR) SNPs). We developed semi-quantitative relative ranking strategy (non availability of 3D structure) that can be adapted to a priori SNP selection or post hoc evaluation of variants identified in whole genome scans or within haplotype blocks associated with disease. Lastly, we analyzed haplotype tagging SNPs (htSNPs) in the coding and untranslated regions of all the genes by selecting the force tag SNPs selection using iHAP analysis. The computational architecture proposed in this review is based on integrating relevant biomedical information sources to provide a systematic analysis of complex diseases. We have shown a "real world" application of interesting existing bioinformatics tools for SNP analysis in colon cancer. PMID:21153778

  8. lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse

    PubMed Central

    Gong, Jing; Liu, Wei; Zhang, Jiayou; Miao, Xiaoping; Guo, An-Yuan

    2015-01-01

    Long non-coding RNAs (lncRNAs) play key roles in various cellular contexts and diseases by diverse mechanisms. With the rapid growth of identified lncRNAs and disease-associated single nucleotide polymorphisms (SNPs), there is a great demand to study SNPs in lncRNAs. Aiming to provide a useful resource about lncRNA SNPs, we systematically identified SNPs in lncRNAs and analyzed their potential impacts on lncRNA structure and function. In total, we identified 495 729 and 777 095 SNPs in more than 30 000 lncRNA transcripts in human and mouse, respectively. A large number of SNPs were predicted with the potential to impact on the miRNA–lncRNA interaction. The experimental evidence and conservation of miRNA–lncRNA interaction, as well as miRNA expressions from TCGA were also integrated to prioritize the miRNA–lncRNA interactions and SNPs on the binding sites. Furthermore, by mapping SNPs to GWAS results, we found that 142 human lncRNA SNPs are GWAS tagSNPs and 197 827 lncRNA SNPs are in the GWAS linkage disequilibrium regions. All these data for human and mouse lncRNAs were imported into lncRNASNP database (http://bioinfo.life.hust.edu.cn/lncRNASNP/), which includes two sub-databases lncRNASNP-human and lncRNASNP-mouse. The lncRNASNP database has a user-friendly interface for searching and browsing through the SNP, lncRNA and miRNA sections. PMID:25332392

  9. Identification of putative SNPs in progressive retinal atrophy affected Canis lupus familiaris using exome sequencing.

    PubMed

    Reddy, Bhaskar; Kelawala, Divyesh N; Shah, Tejas; Patel, Anand B; Patil, Deepak B; Parikh, Pinesh V; Patel, Namrata; Parmar, Nidhi; Mohapatra, Amit B; Singh, Krishna M; Menon, Ramesh; Pandya, Dipal; Jakhesara, Subhash J; Koringa, Prakash G; Rao, Mandava V; Joshi, Chaitanya G

    2015-12-01

    Progressive retinal atrophy (PRA) is one of the major causes of retinal photoreceptor cell degeneration in canines. The inheritance pattern of PRA is autosomal recessive and genetically heterogeneous. Here, using targeted sequencing technology, we have performed exome sequencing of 10 PRA-affected (Spitz=7, Cocker Spaniel=1, Lhasa Aphso=1 and Spitz-Labrador cross breed=1) and 6 normal (Spitz=5, Cocker Spaniel=1) dogs. The high-throughput sequencing using 454-Roche Titanium sequencer generated about 2.16 Giga bases of raw data. Initially, we have successfully identified 25,619 single nucleotide polymorphisms (SNPs) that passed the stringent SNP calling parameters. Further, we performed association study on the cohort, and the highly significant (0.001) associations were short-listed and investigated in-depth. Out of the 171 significant SNPs, 113 were previously unreported. Interestingly, six among them were non-synonymous coding (NSC) SNPs, which includes CPPED1 A>G (p.M307V), PITRM1 T>G (p.S715A), APP G>A (p.T266M), RNF213 A>G (p.V1482A), C>A (p.V1456L), and SLC46A3 G>A (p.R168Q). On the other hand, 35 out of 113 unreported SNPs were falling in regulatory regions such as 3'-UTR, 5'-UTR, etc. In-depth bioinformatics analysis revealed that majority of NSC SNPs have damaging effect and alter protein stability. This study highlighted the genetic markers associated with PRA, which will help to develop genetic assay-based screening in effective breeding. PMID:26515695

  10. Association of Sirtuin 1 (SIRT1) Gene SNPs and Transcript Expression Levels With Severe Obesity

    PubMed Central

    Clark, Stephen J.; Falchi, Mario; Olsson, Bob; Jacobson, Peter; Cauchi, Stéphane; Balkau, Beverley; Marre, Michel; Lantieri, Olivier; Andersson, Johanna C.; Jernås, Margareta; Aitman, Timothy J.; Richardson, Sylvia; Sjöström, Lars; Wong, Hang Y.; Carlsson, Lena M. S.; Froguel, Philippe; Walley, Andrew J.

    2013-01-01

    Recent studies have reported associations of sirtuin 1 (SIRT1) single nucleotide polymorphisms (SNPs) to both obesity and BMI. This study was designed to investigate association between SIRT1 SNPs, SIRT1 gene expression and obesity. Case-control analyses were performed using 1,533 obese subjects (896 adults, BMI >40 kg/m2 and 637 children, BMI >97th percentile for age and sex) and 1,237 nonobese controls, all French Caucasians. Two SNPs (in high linkage disequilibrium (LD), r2 = 0.96) were significantly associated with adult obesity, rs33957861 (P value = 0.003, odds ratio (OR) = 0.75, confidence interval (CI) = 0.61–0.92) and rs11599176 (P value: 0.006, OR = 0.74, CI = 0.61–0.90). Expression of SIRT1 mRNA was measured in BMI-discordant siblings from 154 Swedish families. Transcript expression was significantly correlated to BMI in the lean siblings (r2 = 0.13, P value = 3.36 × 10−7) and lower SIRT1 expression was associated with obesity (P value = 1.56 × 10−35). There was also an association between four SNPs (rs11599176, rs12413112, rs33957861, and rs35689145) and BMI (P values: 4 × 10−4, 6 × 10−4, 4 × 10−4, and 2 × 10−3) with the rare allele associated with a lower BMI. However, no SNP was associated with SIRT1 transcript expression level. In summary, both SNPs and SIRT1 gene expression are associated with severe obesity. PMID:21760635

  11. MiR-SNPs as Markers of Toxicity and Clinical Outcome in Hodgkin Lymphoma Patients

    PubMed Central

    Navarro, Alfons; Muñoz, Carmen; Gaya, Anna; Díaz-Beyá, Marina; Gel, Bernat; Tejero, Rut; Díaz, Tania; Martinez, Antonio; Monzó, Mariano

    2013-01-01

    Background In recent years, microRNA (miRNA) pathways have emerged as a crucial system for the regulation of tumorogenesis. miR-SNPs are a novel class of single nucleotide polymorphisms that can affect miRNA pathways. Design and Methods We analyzed eight miR-SNPs by allelic discrimination in 141 patients with Hodgkin lymphoma and correlated the results with treatment-related toxicity, response, disease-free survival (DFS) and overall survival (OS). Results The KRT81 (rs3660) GG genotype was associated with an increased risk of neurological toxicity (P = 0.016), while patients with XPO5 (rs11077) AA or CC genotypes had a higher rate of bleomycin-associated pulmonary toxicity (P = 0.048). Both miR-SNPs emerged as independent factors in the multivariate analysis. The XPO5 AA and CC genotypes were also associated with a lower response rate (P = 0.036). XPO5 (P = 0.039) and TRBP (rs784567) (P = 0.022) genotypes emerged as prognostic markers for DFS, and XPO5 was also associated with OS (P = 0.033). In the multivariate analysis, only XPO5 emerged as an independent prognostic factor for DFS (HR: 2.622; 95%CI 1.039–6.620; P = 0.041). Given the influence of XPO5 and TRBP as individual markers, we then investigated the combined effect of these miR-SNPs. Patients with both the XPO5 AA/CC and TRBP TT/TC genotypes had the shortest DFS (P = 0.008) and OS (P = 0.008). Conclusion miR-SNPs can add useful prognostic information on treatment-related toxicity and clinical outcome in Hodgkin lymphoma and can be used to identify patients likely to be chemoresistant or to relapse. PMID:23705004

  12. Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation.

    PubMed

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R; Lin, Xihong

    2015-06-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  13. Multimodal MRI-based imputation of the Aβ+ in early mild cognitive impairment

    PubMed Central

    Tosun, Duygu; Joshi, Sarang; Weiner, Michael W; for the Alzheimer's Disease Neuroimaging Initiative

    2014-01-01

    Objective The primary goal of this study was to identify brain atrophy from structural MRI (magnetic resonance imaging) and cerebral blood flow (CBF) patterns from arterial spin labeling perfusion MRI that are best predictors of the Aβ-burden, measured as composite 18F-AV45-PET (positron emission tomography) uptake, in individuals with early mild cognitive impairment (MCI). Furthermore, another objective was to assess the relative importance of imaging modalities in classification of Aβ+/Aβ− early MCI. Methods Sixty-seven Alzheimer's Disease Neuroimaging Initiative (ADNI)-GO/2 participants with early MCI were included. Voxel-wise anatomical shape variation measures were computed by estimating the initial diffeomorphic mapping momenta from an unbiased control template. CBF measures normalized to average motor cortex CBF were mapped onto the template space. Using partial least squares regression, we identified the structural and CBF signatures of Aβ after accounting for normal cofounding effects of age, gender, and education. Results 18F-AV45-positive early MCIs could be identified with 83% classification accuracy, 87% positive predictive value, and 84% negative predictive value by multidisciplinary classifiers combining demographics data, ApoE ε4-genotype, and a multimodal MRI-based Aβ score. Interpretation Multimodal MRI can be used to predict the amyloid status of early-MCI individuals. MRI is a very attractive candidate for the identification of inexpensive and noninvasive surrogate biomarkers of Aβ deposition. Our approach is expected to have value for the identification of individuals likely to be Aβ+ in circumstances where cost or logistical problems prevent Aβ detection using cerebrospinal fluid analysis or Aβ-PET. This can also be used in clinical settings and clinical trials, aiding subject recruitment and evaluation of treatment efficacy. Imputation of the Aβ-positivity status could also complement Aβ-PET by identifying individuals who would

  14. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    PubMed Central

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  15. Multiple imputation in veterinary epidemiological studies: a case study and simulation.

    PubMed

    Dohoo, Ian R; Nielsen, Christel R; Emanuelson, Ulf

    2016-07-01

    The problem of missing data occurs frequently in veterinary epidemiological studies. Most studies use a complete case (CC) analysis which excludes all observations for which any relevant variable have missing values. Alternative approaches (most notably multiple imputation (MI)) which avoid the exclusion of observations with missing values are now widely available but have been used very little in veterinary epidemiology. This paper uses a case study based on research into dairy producers' attitudes toward mastitis control procedures, combined with two simulation studies to evaluate the use of MI and compare results with a CC analysis. MI analysis of the original data produced results which had relatively minor differences from the CC analysis. However, most of the missing data in the original data set were in the dependent variable and a subsequent simulation study based on the observed missing data pattern and 1000 simulations showed that an MI analysis would not be expected to offer any advantages over a CC analysis in this situation. This was true regardless of the missing data mechanism (MCAR - missing completely at random, MAR - missing at random, or NMAR - not missing at random) underlying the missing values. Surprisingly, recent textbooks dealing with MI make little reference to this limitation of MI for dealing with missing values in the dependent variable. An additional simulation study (1000 runs for each of the three missing data mechanisms) compared MI and CC analyses for data in which varying levels (n=7) of missing data were created in predictor variables. This study showed that MI analyses generally produced results that were less biased on average, were more precise (smaller SEs), were more consistent (less variability between simulation runs) and consequently were more likely to produce estimates that were close to the "truth" (results obtained from a data set with no missing values). While the benefit of MI varied with the mechanism used to

  16. eQuIPS: eQTL Analysis Using Informed Partitioning of SNPs - A Fully Bayesian Approach.

    PubMed

    Boggis, E M; Milo, M; Walters, K

    2016-05-01

    We develop a Bayesian multi-SNP Markov chain Monte Carlo approach that allows published functional significance scores to objectively inform single nucleotide polymorphism (SNP) prior effect sizes in expression quantitative trait locus (eQTL) studies. We developed the Normal Gamma prior to allow the inclusion of functional information. We partition SNPs into predefined functional groups and select prior distributions that fit the group-specific observed functional significance scores. We test our method on two simulated datasets and previously analysed human eQTL data containing validated causal SNPs. In our simulations the modified Normal Gamma always performs at least as well, and generally outperforms, the other methods considered. When analysing the human eQTL data, we placed all SNPs into their actual functional group. The ranks of the four validated causal SNPs analysed using the modified Normal Gamma increase dramatically compared to those of the other methods considered. Using our new method, three of the four validated SNPs are ranked in the top 1% of SNPs and the other is in the top 2%. For the standard Normal Gamma, the best of the other methods, the four validated SNPs had ranks in the top 1%, 4%, 20% and 59%. Crucially these substantive improvements in the ranks make it highly likely that most, if not all, of these validated SNPs would have been flagged for follow-up using our new method, whereas at least two of them would certainly not have been using the current approaches. PMID:26989050

  17. Application of Population Sequencing (POPSEQ) for Ordering and Imputing Genotyping-by-Sequencing Markers in Hexaploid Wheat

    PubMed Central

    Edae, Erena A.; Bowden, Robert L.; Poland, Jesse

    2015-01-01

    The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based, high-resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for association mapping and genomics-assisted breeding is limited by a large proportion of missing data per marker. For species with a reference genomic sequence, markers can be ordered on the physical map. However, in the absence of reference marker order, the use and imputation of GBS markers is challenging. Here, we demonstrate how the population sequencing (POPSEQ) approach can be used to provide marker context for GBS in wheat. The utility of a POPSEQ-based genetic map as a reference map to create genetically ordered markers on a chromosome for hexaploid wheat was validated by constructing an independent de novo linkage map of GBS markers from a Synthetic W7984 × Opata M85 recombinant inbred line (SynOpRIL) population. The results indicated that there is strong agreement between the independent de novo linkage map and the POPSEQ mapping approach in mapping and ordering GBS markers for hexaploid wheat. After ordering, a large number of GBS markers were imputed, thus providing a high-quality reference map that can be used for QTL mapping for different traits. The POPSEQ-based reference map and whole-genome sequence assemblies are valuable resources that can be used to order GBS markers and enable the application of highly accurate imputation methods to leverage the application GBS markers in wheat. PMID:26530417

  18. Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene.

    PubMed

    Hassan, Mohamed M; Omer, Shaza E; Khalf-Allah, Rahma M; Mustafa, Razaz Y; Ali, Isra S; Mohamed, Sofia B

    2016-01-01

    This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3' UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5' UTR). In addition for 5'/3' splice sites, analysis showed that one SNP within 5' splice site and one Indel in 3' splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases. PMID:27478437

  19. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

    PubMed

    Hopke, P K; Liu, C; Rubin, D B

    2001-03-01

    Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets. PMID:11252602

  20. HEPA filter dissolution process

    SciTech Connect

    Brewer, K.N.; Murphy, J.A.

    1992-12-31

    This invention is comprised of a process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  1. Recirculating electric air filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric air filter cartridge has a cylindrical inner high voltage eleode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  2. Hepa filter dissolution process

    DOEpatents

    Brewer, Ken N.; Murphy, James A.

    1994-01-01

    A process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  3. HEPA filter dissolution process

    DOEpatents

    Brewer, K.N.; Murphy, J.A.

    1994-02-22

    A process is described for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal. 4 figures.

  4. Recirculating electric air filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric air filter cartridge has a cylindrical inner high voltage electrode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  5. A Comprehensive Survey of Single Nucleotide Polymorphisms (SNPs) across Mycobacterium bovis Strains and M. bovis BCG Vaccine Strains Refines the Genealogy and Defines a Minimal Set of SNPs That Separate Virulent M. bovis Strains and M. bovis BCG Strains▿ †

    PubMed Central

    Garcia Pelayo, M. Carmen; Uplekar, Swapna; Keniry, Andrew; Mendoza Lopez, Pablo; Garnier, Thierry; Nunez Garcia, Javier; Boschiroli, Laura; Zhou, Xiangmei; Parkhill, Julian; Smith, Noel; Hewinson, R. Glyn; Cole, Stewart T.; Gordon, Stephen V.

    2009-01-01

    To further unravel the mechanisms responsible for attenuation of the tuberculosis vaccine Mycobacterium bovis BCG, comparative genomics was used to identify single nucleotide polymorphisms (SNPs) that differed between sequenced strains of Mycobacterium bovis and M. bovis BCG. SNPs were assayed in M. bovis isolates from France and the United Kingdom and from different BCG vaccines in order to identify those that arose during the attenuation process which gave rise to BCG. Informative data sets were obtained for 658 SNPs from 21 virulent M. bovis strains and 13 BCG strains; these SNPs showed phylogenetic clustering that was consistent with the geographical origin of the strains and previous schemes for BCG genealogies. The data revealed a closer relationship between BCG Tice and BCG Pasteur than was previously appreciated, while we were able to position BCG Beijing within a grouping of BCG Denmark-derived strains. Only 186 SNPs were identified between virulent M. bovis strains and all BCG strains, with 115 nonsynonymous SNPs affecting important functions such as global regulators, transcriptional factors, and central metabolism, which might impact on virulence. We therefore refine previous genealogies of BCG vaccines and define a minimal set of SNPs between virulent M. bovis strains and the attenuated BCG strain that will underpin future functional analyses. PMID:19289514

  6. SNPhood: investigate, quantify and visualise the epigenomic neighbourhood of SNPs using NGS data

    PubMed Central

    Arnold, Christian; Bhat, Pooja; Zaugg, Judith B.

    2016-01-01

    Motivation: The vast majority of the many thousands of disease-associated single nucleotide polymorphisms (SNPs) lie in the non-coding part of the genome. They are likely to affect regulatory elements, such as enhancers and promoters, rather than the function of a protein. To understand the molecular mechanisms underlying genetic diseases, it is therefore increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin or transcription factor binding. Results: We developed SNPhood, a user-friendly Bioconductor R package to investigate, quantify and visualise the local epigenetic neighbourhood of a set of SNPs in terms of chromatin marks or TF binding sites using data from NGS experiments. Availability and implementation: SNPhood is publicly available and maintained as an R Bioconductor package at http://bioconductor.org/packages/SNPhood/. Contact: judith.zaugg@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153574

  7. Identification of SNPs associated with susceptibility for development of adverse reactions to radiotherapy.

    PubMed

    Rosenstein, Barry S

    2011-02-01

    Although cancer treatment with radiation can produce high cure rates, adverse effects often result from radiotherapy. These toxicities are manifested as damage to normal tissues and organs in the radiation field. In recognition of the substantial variation in the intrinsic response of individuals to radiation, an effort began approximately 10 years ago to discover the genetic markers, primarily SNPs, which are associated with susceptibility for the development of these adverse responses to radiation therapy. The goal of this research is to identify the SNPs that could serve as the basis of an assay to predict which cancer patients are most likely to develop complications resulting from radiotherapy. This would permit personalization and optimization of the treatment plan for each cancer patient. PMID:21332318

  8. SNPs in microRNA Binding Sites as Prognostic and Predictive Cancer Biomarkers

    PubMed Central

    Preskill, Carina; Weidhaas, Joanne B.

    2014-01-01

    Single nucleotide polymorphisms within microRNA (miRNA) binding sites comprise a novel genre of cancer biomarkers. Since miRNA regulation is dependent on sequence complementarity between the mRNA transcript and the miRNA, even single nucleotide aberrations can have significant effects. Over the past few years, many examples of these functional miRNA binding site SNPs have been identified as cancer biomarkers. While most of the research to date focuses on associations with cancer risk, more and more studies are linking these SNPs to cancer prognosis and response to treatment as well. This review summarizes the state of the field and draws importance to this rapidly expanding area of cancer biomarkers. PMID:23614619

  9. SNPs in microRNA binding sites as prognostic and predictive cancer biomarkers.

    PubMed

    Preskill, Carina; Weidhaas, Joanne B

    2013-01-01

    Single-nucleotide polymorphisms within microRNA (miRNA) binding sites comprise a novel genre of cancer biomarkers. Since miRNA regulation is dependent on sequence complementarity between the mRNA transcript and the miRNA, even single-nucleotide aberrations can have significant effects. Over the past few years, many examples of these functional miRNA binding site SNPs have been identified as cancer biomarkers. While most of the research to date focuses on associations with cancer risk, more and more studies are linking these SNPs to cancer prognosis and response to treatment as well. This review summarizes the state of the field and draws importance to this rapidly expanding area of cancer biomarkers. PMID:23614619

  10. Properties of multilayer filters

    NASA Technical Reports Server (NTRS)

    Baumeister, P. W.

    1973-01-01

    New methods were investigated of using optical interference coatings to produce bandpass filters for the spectral region 110 nm to 200 nm. The types of filter are: triple cavity metal dielectric filters; all dielectric reflection filters; and all dielectric Fabry Perot type filters. The latter two types use thorium fluoride and either cryolite films or magnesium fluoride films in the stacks. The optical properties of the thorium fluoride were also measured.

  11. Endothelial nitric oxide synthase tagSNPs influence the effects of enalapril in essential hypertension.

    PubMed

    Oliveira-Paula, Gustavo H; Lacchini, Riccardo; Luizon, Marcelo R; Fontana, Vanessa; Silva, Pamela S; Biagi, Celso; Tanus-Santos, Jose E

    2016-05-01

    The antihypertensive effects of angiotensin-converting enzyme inhibitors (ACEi) are associated with up-regulation of endothelial nitric oxide synthase (NOS3) activity. This mechanism may explain how polymorphisms in NOS3 gene affect the antihypertensive responses to ACEi. While clinically relevant NOS3 polymorphisms were previously shown to affect the antihypertensive responses to enalapril, no study has tested the hypothesis that NOS3 tagSNPs influence the antihypertensive effects of this drug. We examined whether the NOS3 tagSNPs rs3918226, rs3918188, and rs743506, and their haplotypes, affect the antihypertensive responses to enalapril in 101 patients with essential hypertension. Subjects were prospectively treated only with enalapril for 8 weeks. Genotypes were determined by Taqman(®) allele discrimination assay and real-time polymerase chain reaction (PCR) and haplotype frequencies were estimated. We compared the effects of NOS3 tagSNPs on changes in blood pressure after enalapril treatment. To confirm our findings, multiple linear regression analysis was performed adjusting for age, gender, ethnicity, and alcohol consumption. We found that hypertensive patients carrying the AA genotype for the tagSNP rs3918188 showed lower decreases in blood pressure in response to enalapril. Moreover, the TCA haplotype was associated with improved decreases in blood pressure in response to enalapril compared with the CAG haplotype. Adjustment for covariates in multiple linear regression analysis did not change these effects. In addition, when patients were stratified according to the dose of enalapril used, we found that the carries of the T allele for the functional tagSNP rs3918226 showed more intense decreases in blood pressure in response to enalapril 20 mg/day. Our findings suggest that NOS3 tagSNPs influence the effects of enalapril in essential hypertension. PMID:27060232

  12. Genome-wide association studies using haplotypes and individual SNPs in Simmental cattle.

    PubMed

    Wu, Yang; Fan, Huizhong; Wang, Yanhui; Zhang, Lupei; Gao, Xue; Chen, Yan; Li, Junya; Ren, HongYan; Gao, Huijiang

    2014-01-01

    Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components. PMID:25330174

  13. Genome-Wide Association Studies Using Haplotypes and Individual SNPs in Simmental Cattle

    PubMed Central

    Wu, Yang; Fan, Huizhong; Wang, Yanhui; Zhang, Lupei; Gao, Xue; Chen, Yan; Li, Junya; Ren, HongYan; Gao, Huijiang

    2014-01-01

    Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components. PMID:25330174

  14. A systematic confirmation study of reported prostate cancer risk-associated SNPs in Chinese men

    PubMed Central

    Liu, Fang; Hsing, Ann W.; Wang, Xiang; Shao, Qiang; Qi, Jun; Ye, Yu; Wang, Zhong; Chen, Hongyan; Gao, Xin; Wang, Guozeng; Chu, Lisa W.; Ding, Qiang; OuYang, Jun; Gao, Xu; Huang, Yichen; Chen, Yanbo; Gao, Yu Tang; Zhang, Zuo-Feng; Rao, Jianyu; Shi, Rong; Wu, Qijun; Wang, Meilin; Zhang, Zhengdong; Zhang, Yuanyuan; Jiang, Haowen; Zheng, Jie; Hu, Yanlin; Guo, Ling; Lin, Xiaoling; Tao, Sha; Jin, Guangfu; Sun, Jielin; Lu, Daru; Zheng, S. Lilly; Sun, Yinghao; Mo, Zengnan; Xu, Jianfeng

    2013-01-01

    More than 30 prostate cancer (PCa) risk-associated loci have been identified in populations of European descent by genome-wide association studies (GWAS). We hypothesized that a subset of these loci may be associated with PCa risk in Chinese men. To test this hypothesis, 33 single nucleotide polymorphisms (SNPs), one each from the 33 independent PCa risk-associated loci reported in populations of European descent, were investigated for their associations with PCa risk in a case-control study of Chinese men (1,108 cases and 1,525 controls). We found that 11 of the 33 SNPs were significantly associated with PCa risk in Chinese men (P < 0.05). The reported risk alleles were associated with increased risk for PCa, with allelic odds ratios ranging from 1.12 to 1.44. The most significant locus was located on 8q24 Region 2 (rs16901979, P = 5.14×10−9) with a genome-wide significance (P < 10−8), and three loci reached the Bonferroni correction significance level (P < 1.52×10−3), including 8q24 Region 1 (rs1447295, P = 7.04×10−6), 8q24 Region 5 (rs10086908, P = 9.24×10−4), and 8p21 (rs1512268, P = 9.39×10−4). Our results suggest that a subset of the PCa risk-associated SNPs discovered by GWAS among men of European descent is also associated with PCa risk in Chinese men. This finding provides evidence of ethnic differences and similarity in genetic susceptibility to PCa. GWAS in Chinese men are needed to identify Chinese-specific PCa risk-associated SNPs. PMID:21756274

  15. Association of obesity risk SNPs in PCSK1 with insulin sensitivity and proinsulin conversion

    PubMed Central

    2010-01-01

    Background Prohormone convertase 1 is involved in maturation of peptides. Rare mutations in gene PCSK1, encoding this enzyme, cause childhood obesity and abnormal glucose homeostasis with elevated proinsulin concentrations. Common single nucleotide polymorphisms (SNPs) within this gene, rs6232 and rs6235, are associated with obesity. We studied whether these SNPs influence the prediabetic traits insulin resistance, β-cell dysfunction, or glucose intolerance. Methods We genotyped 1498 German subjects for SNPs rs6232 and rs6235 within PCSK1. The subjects were metabolically characterized by oral glucose tolerance test with glucose, insulin, proinsulin, and C-peptide measurements. A subgroup of 512 subjects underwent a hyperinsulinemic-euglycemic clamp. Results The minor allele frequencies were 25.8% for SNP rs6235 and 6.0% for rs6232. After adjustment for sex and age, we found no association of SNPs rs6235 and rs6232 with BMI or other weight-related traits (all p ≥ 0.07). Both minor alleles, adjusted for sex, age, BMI and insulin sensitivity were associated with elevated AUCproinsulin and AUCproinsulin/AUCinsulin (rs6235: padditive model ≤ 0.009, effect sizes 8/8%, rs6232: pdominant model ≤ 0.01, effect sizes 10/21%). Insulin secretion was not affected by the variants (different secretion parameters, all p ≥ 0.08). The minor allele of SNP rs6232 was additionally associated with 15% higher OGTT-derived and 19% higher clamp-derived insulin sensitivity (pdom ≤ 0.0047), 4.5% lower HOMAIR (pdom = 0.02) and 3.5% lower 120-min glucose (pdom = 0.0003) independently of BMI and proinsulin conversion. SNP rs6235 was not associated with parameters of glucose metabolism. Conclusions Like rare mutations in PCSK1, the more common variants tested determine glucose-stimulated proinsulin conversion, but not insulin secretion. In addition, rs6232, encoding the amino acid exchange N221D, influences insulin sensitivity and glucose homeostasis. PMID:20534142

  16. Biological implications of SNPs in signal peptide domains of human proteins.

    PubMed

    Jarjanazi, Hamdi; Savas, Sevtap; Pabalan, Noel; Dennis, James W; Ozcelik, Hilmi

    2008-02-01

    Proteins destined for secretion or membrane compartments possess signal peptides for insertion into the membrane. The signal peptide is therefore critical for localization and function of cell surface receptors and ligands that mediate cell-cell communication. About 4% of all human proteins listed in UniProt database have signal peptide domains in their N terminals. A comprehensive literature survey was performed to retrieve functional and disease associated genetic variants in the signal peptide domains of human proteins. In 21 human proteins we have identified 26 disease associated mutations within their signal peptide domains, 14 mutations of which have been experimentally shown to impair the signal peptide function and thus influence protein transportation. We took advantage of SignalP 3.0 predictions to characterize the signal peptide prediction score differences between the mutant and the wild-type alleles of each mutation, as well as 189 previously uncharacterized single nucleotide polymorphisms (SNPs) found to be located in the signal peptide domains of 165 human proteins. Comparisons of signal peptide prediction outcomes of mutations and SNPs, have implicated SNPs potentially impacting the signal peptide function, and thus the cellular localization of the human proteins. The majority of the top candidate proteins represented membrane and secreted proteins that are associated with molecular transport, cell signaling and cell to cell interaction processes of the cell. This is the first study that systematically characterizes genetic variation occurring in the signal peptides of all human proteins. This study represents a useful strategy for prioritization of SNPs occurring within the signal peptide domains of human proteins. Functional evaluation of candidates identified herein may reveal effects on major cellular processes including immune cell function, cell recognition and adhesion, and signal transduction. PMID:17680692

  17. Enrichment of risk SNPs in regulatory regions implicate diverse tissues in Parkinson's disease etiology.

    PubMed

    Coetzee, Simon G; Pierce, Steven; Brundin, Patrik; Brundin, Lena; Hazelett, Dennis J; Coetzee, Gerhard A

    2016-01-01

    Recent genome-wide association studies (GWAS) of Parkinson's disease (PD) revealed at least 26 risk loci, with associated single nucleotide polymorphisms (SNPs) located in non-coding DNA having unknown functions in risk. In order to explore in which cell types these SNPs (and their correlated surrogates at r(2) ≥ 0.8) could alter cellular function, we assessed their location overlap with histone modification regions that indicate transcription regulation in 77 diverse cell types. We found statistically significant enrichment of risk SNPs at 12 loci in active enhancers or promoters. We investigated 4 risk loci in depth that were most significantly enriched (-logeP > 14) and contained 8 putative enhancers in the different cell types. These enriched loci, along with eQTL associations, were unexpectedly present in non-neuronal cell types. These included lymphocytes, mesendoderm, liver- and fat-cells, indicating that cell types outside the brain are involved in the genetic predisposition to PD. Annotating regulatory risk regions within specific cell types may unravel new putative risk mechanisms and molecular pathways that contribute to PD development. PMID:27461410

  18. Genetic association of SNPs in the FTO gene and predisposition to obesity in Malaysian Malays.

    PubMed

    Apalasamy, Y D; Ming, M F; Rampal, S; Bulgiba, A; Mohamed, Z

    2012-12-01

    The common variants in the fat mass- and obesity-associated (FTO) gene have been previously found to be associated with obesity in various adult populations. The objective of the present study was to investigate whether the single nucleotide polymorphisms (SNPs) and linkage disequilibrium (LD) blocks in various regions of the FTO gene are associated with predisposition to obesity in Malaysian Malays. Thirty-one FTO SNPs were genotyped in 587 (158 obese and 429 non-obese) Malaysian Malay subjects. Obesity traits and lipid profiles were measured and single-marker association testing, LD testing, and haplotype association analysis were performed. LD analysis of the FTO SNPs revealed the presence of 57 regions with complete LD (D' = 1.0). In addition, we detected the association of rs17817288 with low-density lipoprotein cholesterol. The FTO gene may therefore be involved in lipid metabolism in Malaysian Malays. Two haplotype blocks were present in this region of the FTO gene, but no particular haplotype was found to be significantly associated with an increased risk of obesity in Malaysian Malays. PMID:22911346

  19. Genetic association of SNPs in the FTO gene and predisposition to obesity in Malaysian Malays

    PubMed Central

    Apalasamy, Y.D.; Ming, M.F.; Rampal, S.; Bulgiba, A.; Mohamed, Z.

    2012-01-01

    The common variants in the fat mass- and obesity-associated (FTO) gene have been previously found to be associated with obesity in various adult populations. The objective of the present study was to investigate whether the single nucleotide polymorphisms (SNPs) and linkage disequilibrium (LD) blocks in various regions of the FTO gene are associated with predisposition to obesity in Malaysian Malays. Thirty-one FTO SNPs were genotyped in 587 (158 obese and 429 non-obese) Malaysian Malay subjects. Obesity traits and lipid profiles were measured and single-marker association testing, LD testing, and haplotype association analysis were performed. LD analysis of the FTO SNPs revealed the presence of 57 regions with complete LD (D' = 1.0). In addition, we detected the association of rs17817288 with low-density lipoprotein cholesterol. The FTO gene may therefore be involved in lipid metabolism in Malaysian Malays. Two haplotype blocks were present in this region of the FTO gene, but no particular haplotype was found to be significantly associated with an increased risk of obesity in Malaysian Malays. PMID:22911346

  20. Identification of Sex-Linked SNPs and Sex-Determining Regions in the Yellowtail Genome.

    PubMed

    Koyama, Takashi; Ozaki, Akiyuki; Yoshida, Kazunori; Suzuki, Junpei; Fuji, Kanako; Aoki, Jun-ya; Kai, Wataru; Kawabata, Yumi; Tsuzaki, Tatsuo; Araki, Kazuo; Sakamoto, Takashi

    2015-08-01

    Unlike the conservation of sex-determining (SD) modes seen in most mammals and birds, teleost fishes exhibit a wide variety of SD systems and genes. Hence, the study of SD genes and sex chromosome turnover in fish is one of the most interesting topics in evolutionary biology. To increase resolution of the SD gene evolutionary trajectory in fish, identification of the SD gene in more fish species is necessary. In this study, we focused on the yellowtail, a species widely cultivated in Japan. It is a member of family Carangidae in which no heteromorphic sex chromosome has been observed, and no SD gene has been identified to date. By performing linkage analysis and BAC walking, we identified a genomic region and SNPs with complete linkage to yellowtail sex. Comparative genome analysis revealed the yellowtail SD region ancestral chromosome structure as medaka-fugu. Two inversions occurred in the yellowtail linage after it diverged from the yellowtail-medaka ancestor. An association study using wild yellowtails and the SNPs developed from BAC ends identified two SNPs that can reasonably distinguish the sexes. Therefore, these will be useful genetic markers for yellowtail breeding. Based on a comparative study, it was suggested that a PDZ domain containing the GIPC protein might be involved in yellowtail sex determination. The homomorphic sex chromosomes widely observed in the Carangidae suggest that this family could be a suitable marine fish model to investigate the early stages of sex chromosome evolution, for which our results provide a good starting point. PMID:25975833

  1. Concordant Gene Expression in Leukemia Cells and Normal Leukocytes Is Associated with Germline cis-SNPs

    PubMed Central

    French, Deborah; Yang, Wenjian; Hamilton, Leo H.; Neale, Geoffrey; Fan, Yiping; Downing, James R.; Cox, Nancy J.; Pui, Ching-Hon; Evans, William E.; Relling, Mary V.

    2008-01-01

    The degree to which gene expression covaries between different primary tissues within an individual is not well defined. We hypothesized that expression that is concordant across tissues is more likely influenced by genetic variability than gene expression which is discordant between tissues. We quantified expression of 11,873 genes in paired samples of primary leukemia cells and normal leukocytes from 92 patients with acute lymphoblastic leukemia (ALL). Genetic variation at >500,000 single nucleotide polymorphisms (SNPs) was also assessed. The expression of only 176/11,783 (1.5%) genes was correlated (p<0.008, FDR = 25%) in the two tissue types, but expression of a high proportion (20 of these 176 genes) was significantly related to cis-SNP genotypes (adjusted p<0.05). In an independent set of 134 patients with ALL, 14 of these 20 genes were validated as having expression related to cis-SNPs, as were 9 of 20 genes in a second validation set of HapMap cell lines. Genes whose expression was concordant among tissue types were more likely to be associated with germline cis-SNPs than genes with discordant expression in these tissues; genes affected were involved in housekeeping functions (GSTM2, GAPDH and NCOR1) and purine metabolism. PMID:18478092

  2. [Identification of Bletillae Rhizoma and its adulterants by SNPs in ITS2].

    PubMed

    Zhao, Dan; Zhou, Tao; Jiang, Wei-ke; Xiao, Cheng-hong; Kang, Chuan-zhi

    2015-09-01

    To establish a molecular identification method for Bletillae Rhizoma, this paper extracted genome DNA from Bletillae Rhizoma and its adulterants. The sequences of rDNA ITS2 were sequenced after amplifying. Then multiple alignments of ITS2 were constructed phylogenetic tree with Neighbor Joining by MEGA 5. 1 and found out SNPs loci. The result showed that rDNA ITS2 region could identify Bletillae Rhizoma and its adulterants. There existed the SNPs loci, which could identify Bletilla striata and B. ochracea. Furthermore, we designed specific primers against the SNPs loci of B. striata and B. ochracea, then screened primers and optimized the PCR amplification conditions. Finally, the DNA of B. striata and B. ochracea were specifically amplified by BJ59-412F, BJ59-412R and HHBJ-225R. The length of amplification products were respectively about 350 bp and 520 bp that were effectively identified of B. striata and B. ochracea. While, the adulterants of Bletillae Rhizoma were no-reaction occurring. To sum up, the amplification conditions of the primers can identify B. striata, B. ochracea and their adulterants successfully at the same time. This method was easy, time-saving, and reliable, which can be used as a rapid method for molecular identification of Bletillae Rhizoma. PMID:26983202

  3. Enrichment of risk SNPs in regulatory regions implicate diverse tissues in Parkinson’s disease etiology

    PubMed Central

    Coetzee, Simon G.; Pierce, Steven; Brundin, Patrik; Brundin, Lena; Hazelett, Dennis J.; Coetzee, Gerhard A.

    2016-01-01

    Recent genome-wide association studies (GWAS) of Parkinson’s disease (PD) revealed at least 26 risk loci, with associated single nucleotide polymorphisms (SNPs) located in non-coding DNA having unknown functions in risk. In order to explore in which cell types these SNPs (and their correlated surrogates at r2 ≥ 0.8) could alter cellular function, we assessed their location overlap with histone modification regions that indicate transcription regulation in 77 diverse cell types. We found statistically significant enrichment of risk SNPs at 12 loci in active enhancers or promoters. We investigated 4 risk loci in depth that were most significantly enriched (−logeP > 14) and contained 8 putative enhancers in the different cell types. These enriched loci, along with eQTL associations, were unexpectedly present in non-neuronal cell types. These included lymphocytes, mesendoderm, liver- and fat-cells, indicating that cell types outside the brain are involved in the genetic predisposition to PD. Annotating regulatory risk regions within specific cell types may unravel new putative risk mechanisms and molecular pathways that contribute to PD development. PMID:27461410

  4. Novel SNPs in the Ankyrin 1 gene and their association with beef quality traits.

    PubMed

    Horodyska, J; Sweeney, T; Ryan, M; Hamill, R M

    2015-10-01

    Single nucleotide polymorphisms (SNPs) in the promoter region of bovine Ankyrin 1 (ANK1) have been associated with tenderness and intramuscular fat level in beef. The objectives of this study were to characterise novel DNA variants in the coding region of bovine ANK1 and test for association with beef quality traits. A 3kb region of ANK1 cDNA was amplified and sequenced in 32 Charolais cattle using five sets of overlapping primers. Eighteen SNPs were identified and a predicted exon was confirmed. An in silico translation indicated that SNP4 and SNP16 were non-conservative. Three SNPs were genotyped in 158 crossbred cattle (n=158) with associated meat quality data. SNP6 was associated with texture scores while SNP17 was associated with juiciness. Haplotype (cHAP) 1 was associated with lightness, redness, ultimate pH, as well as sarcomere length. Alleles of the ANK1 gene could be potential targets for gene-assisted selection to improve a range of meat quality traits in beef. PMID:26051041

  5. [Association analysis between SNPs of the growth hormone receptor gene and growth traits in arctic fox].

    PubMed

    DU, Zhi-Heng; Liu, Zong-Yue; Bai, Xiu-Juan

    2010-06-01

    Using single-strand conformation polymorphism (PCR-SSCP) and DNA sequencing, single nucleotide polymorphisms (SNPs) of growth hormone receptor (GHR) gene were detected in an arctic fox population. Correlation analysis between GHR polymorphisms and growth traits were carried out using the appropriate model. Four SNPs, G3A in the 5'UTR, C99T in the first exon, T59C and G65A in the fifth exon were identified on the arctic fox GHR gene. The G3A and C99T polymorphisms of GHR were associated with female fox body weight (Pamp;0.05) and the T59C and G65A polymorphisms of GHR were associated with male fox body weight (Pamp;0.05) and the skin length of the female fox (Pamp;0.01). Therefore, marker assistant selection on body weight and skin length of arctic foxes using these SNPs can be applied to get big and high quality arctic foxes. PMID:20566464

  6. A semi-automated system for analysis and storage of SNPs.

    PubMed

    Lehnert, V; Holzwarth, J; Ott, M; Thompson, A; Demmak, S; Foernzler, D

    2001-04-01

    The discovery of single nucleotide polymorphisms ( SNPs) is currently pursued with a tremendous effort. SNPs represent a rich source for molecular markers, since estimations predict six to seven million of these DNA variations in the human genome. A subset of these genetic variants is thought to have a pervasive impact on modern medicine, be it for the elucidation of differential pharmacological response or for the facilitated identification of genes involved in monogenetic and complex human diseases. Here we describe the overall process that leads to the set up of a SNP database. We describe a high-throughput sequencing assay for SNP discovery, automation of the dataflow from the DNA sequencer to the SNP analysis, and the tools to facilitate it. At the end of the process, a web-accessible interface collects the SNP information, which is processed in order to be written into the SNP database and to be available for end users who would like to select appropriate SNPs for their special screening needs. PMID:11295821

  7. Identification of candidate SNPs for drug induced toxicity from differentially expressed genes in associated tissues.

    PubMed

    Hasmats, Johanna; Kupershmidt, Ilya; Rodríguez-Antona, Cristina; Su, Qiaojuan Jane; Khan, Muhammad Suleman; Jara, Carlos; Mielgo, Xabier; Lundeberg, Joakim; Green, Henrik

    2012-09-10

    The growing collection of publicly available high-throughput data provides an invaluable resource for generating preliminary in silico data in support of novel hypotheses. In this study we used a cross-dataset meta-analysis strategy to identify novel candidate genes and genetic variations relevant to paclitaxel/carboplatin-induced myelosuppression and neuropathy. We identified genes affected by drug exposure and present in tissues associated with toxicity. From ten top-ranked genes 42 non-synonymous single nucleotide polymorphisms (SNPs) were identified in silico and genotyped in 94 cancer patients treated with carboplatin/paclitaxel. We observed variations in 11 SNPs, of which seven were present in a sufficient frequency for statistical evaluation. Of these seven SNPs, three were present in ABCA1 and ATM, and showed significant or borderline significant association with either myelosuppression or neuropathy. The strikingly high number of associations between genotype and clinically observed toxicity provides support for our data-driven computations strategy to identify biomarkers for drug toxicity. PMID:22759513

  8. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  9. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    PubMed Central

    Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

    2012-01-01

    Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses. PMID:22013517

  10. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island

    PubMed Central

    Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A.; Shouche, Yogesh S.; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1–40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1–20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25–40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  11. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  12. Rank and order: evaluating the performance of SNPs for individual assignment in a non-model organism.

    PubMed

    Storer, Caroline G; Pascal, Carita E; Roberts, Steven B; Templin, William D; Seeb, Lisa W; Seeb, James E

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: F(ST), informativeness (I(n)), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from F(ST), I(n), and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  13. ARRANGEMENT FOR REPLACING FILTERS

    DOEpatents

    Blomgren, R.A.; Bohlin, N.J.C.

    1957-08-27

    An improved filtered air exhaust system which may be continually operated during the replacement of the filters without the escape of unfiltered air is described. This is accomplished by hermetically sealing the box like filter containers in a rectangular tunnel with neoprene covered sponge rubber sealing rings coated with a silicone impregnated pneumatic grease. The tunnel through which the filters are pushed is normal to the exhaust air duct. A number of unused filters are in line behind the filters in use, and are moved by a hydraulic ram so that a fresh filter is positioned in the air duct. The used filter is pushed into a waiting receptacle and is suitably disposed. This device permits a rapid and safe replacement of a radiation contaminated filter without interruption to the normal flow of exhaust air.

  14. Corrosion resistant filter unit

    SciTech Connect

    Gentry, J.M.

    1992-02-18

    This patent describes a fluid filter assembly adapted for the filtration of corrosive fluid to be injected into a well bore at pressure levels which may exceed 10,000 pounds per square. It comprises: a frame assembly for the mounting of a portion of the fluid filter assembly therein, the frame assembly; filter pods, the plurality of filter pods forming at least two banks of filter pods, each bank having at least two filter pods therein, each bank of the filter pods being supported by one or more the supports of the plurality of supports secured to selected struts of the frame assembly; an inlet manifold to direct the corrosive fluid to the plurality of filter pods, the inlet manifold being interconnected to the banks of filter pods formed by the filter pods whereby flow of the corrosive fluid can be directed to each bank of the filter pods; an outlet manifold to direct the corrosive fluid from the filter pods, the outlet manifold being interconnected to the banks of filter pods formed by the filter pods; a first valve means to control the flow of the corrosive fluid between banks of filter pods formed by the filter pods whereby the flow of the corrosive fluid can be selectively directed to each bank of the filter pods; a second valve means to selectively control the flow of the corrosive fluid between the inlet manifold and the outlet manifold; and union means for interconnecting the filter pods, inlet manifold and outlet manifold, each of the union means including mechanical connection means and internal seal means for isolating the corrosive fluids from the mechanical connection means.

  15. A comprehensive in silico analysis of non-synonymous and regulatory SNPs of human MBL2 gene.

    PubMed

    Kalia, Namarta; Sharma, Aarti; Kaur, Manpreet; Kamboj, Sukhdev Singh; Singh, Jatinder

    2016-01-01

    Mannose binding lectin (MBL) is a liver derived protein which plays an important role in innate immunity. Mannose binding lectin gene 2 (MBL2) polymorphisms are reported to be associated with various diseases. In spite of being exhaustively studied molecule, no attempt has been made till date to comprehensively and systematically analyze the SNPs of MBL2 gene. The present study was carried out to identify and prioritize the SNPs of MBL2 gene for further genotyping and functional studies. To predict the possible impact of SNPs on MBL structure and function SNP data obtained from dbSNP database were analyzed using various bioinformatics tools. Out of total 661 SNPs, only 37 validated SNPs having minor allele frequency ≥0.10 were considered for the present study. These 37 SNPs includes one in 3' near gene, nine in 3' UTR, one non-synonymous SNP (nsSNP), thirteen intronic SNPs and thirteen in 5' near gene. From these 37 SNPs, 11 non-coding SNPs were identified to be of functional significance and evolutionary conserved. Out of these, 4 SNPs from 3' UTR were found to play role in miRNA binding, 7 SNPs from 5' near and intronic region were predicted to involve in transcription factor binding and expression of MBL2 gene. One nsSNP Gly54Asp (rs1800450) was found to be deleterious and damaging by both SIFT and Polyphen-2 servers and thus affecting MBL2 protein stability and expression. Protein structural analysis with this amino acid variant was performed by using I-TASSER, RAMPAGE, Swiss-PdbViewer, Chimera and I-mutant. Information regarding solvent accessibility, molecular dynamics and energy minimization calculations showed that this variant causes clashes with neighboring amino acids residues that must interfere in the normal triple helix formation of trimeric subunit and further with the normal assembly of MBL oligomeric form, hence decrease in stability. Thus, findings of the present study indicated 12 SNPs of MBL2 gene to be functionally important. Exploration of

  16. Rigid porous filter

    DOEpatents

    Chiang, Ta-Kuan; Straub, Douglas L.; Dennis, Richard A.

    2000-01-01

    The present invention involves a porous rigid filter including a plurality of concentric filtration elements having internal flow passages and forming external flow passages there between. The present invention also involves a pressure vessel containing the filter for the removal of particulates from high pressure particulate containing gases, and further involves a method for using the filter to remove such particulates. The present filter has the advantage of requiring fewer filter elements due to the high surface area-to-volume ratio provided by the filter, requires a reduced pressure vessel size, and exhibits enhanced mechanical design properties, improved cleaning properties, configuration options, modularity and ease of fabrication.

  17. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, Harry S.; Thompson, Robert C.; Hubbard, Charles W.; Perkins, Richard W.

    1997-01-01

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, whereafter the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant.

  18. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, H.S.; Thompson, R.C.; Hubbard, C.W.; Perkins, R.W.

    1997-03-25

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, where after the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant. 5 figs.

  19. Cordierite silicon nitride filters

    SciTech Connect

    Sawyer, J.; Buchan, B. ); Duiven, R.; Berger, M. ); Cleveland, J.; Ferri, J. )

    1992-02-01

    The objective of this project was to develop a silicon nitride based crossflow filter. This report summarizes the findings and results of the project. The project was phased with Phase I consisting of filter material development and crossflow filter design. Phase II involved filter manufacturing, filter testing under simulated conditions and reporting the results. In Phase I, Cordierite Silicon Nitride (CSN) was developed and tested for permeability and strength. Target values for each of these parameters were established early in the program. The values were met by the material development effort in Phase I. The crossflow filter design effort proceeded by developing a macroscopic design based on required surface area and estimated stresses. Then the thermal and pressure stresses were estimated using finite element analysis. In Phase II of this program, the filter manufacturing technique was developed, and the manufactured filters were tested. The technique developed involved press-bonding extruded tiles to form a filter, producing a monolithic filter after sintering. Filters manufactured using this technique were tested at Acurex and at the Westinghouse Science and Technology Center. The filters did not delaminate during testing and operated and high collection efficiency and good cleanability. Further development in areas of sintering and filter design is recommended.

  20. Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The dissection of complex traits of economic importance for the pig industry requires the availability of a significant number of genetic markers, such as SNPs. This study was conducted in order to discover thousands of porcine SNPs using next generation sequencing technologies and use those SNPs, a...

  1. Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...

  2. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...

  3. Marker genotype imputation in a low-marker-density panel with a high-marker-density reference panel: accuracy evaluation in barley breeding lines

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We evaluated a strategy in which the scores of markers untyped in a low-density experimental panel were imputed on the basis of data from a high-density reference panel, in its application to whole-genome genotyping of barley breeding lines. Using a barley core set consisting of 98 lines genotyped w...

  4. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  5. Bag filters for TPP

    SciTech Connect

    L.V. Chekalov; Yu.I. Gromov; V.V. Chekalov

    2007-05-15

    Cleaning of TPP flue gases with bag filters capable of pulsed regeneration is examined. A new filtering element with a three-dimensional filtering material formed from a needle-broached cloth in which the filtration area, as compared with a conventional smooth bag, is increased by more than two times, is proposed. The design of a new FRMI type of modular filter is also proposed. A standard series of FRMI filters with a filtration area ranging from 800 to 16,000 m{sup 2} is designed for an output more than 1 million m{sub 3}/h of with respect to cleaned gas. The new bag filter permits dry collection of sulfur oxides from waste gases at TPP operating on high-sulfur coals. The design of the filter makes it possible to replace filter elements without taking the entire unit out of service.

  6. HEPA filter monitoring program

    NASA Astrophysics Data System (ADS)

    Kirchner, K. N.; Johnson, C. M.; Aiken, W. F.; Lucerna, J. J.; Barnett, R. L.; Jensen, R. T.

    1986-07-01

    The testing and replacement of HEPA filters, widely used in the nuclear industry to purify process air, are costly and labor-intensive. Current methods of testing filter performance, such as differential pressure measurement and scanning air monitoring, allow determination of overall filter performance but preclude detection of incipient filter failure such as small holes in the filters. Using current technology, a continual in-situ monitoring system was designed which provides three major improvements over current methods of filter testing and replacement. The improvements include: cost savings by reducing the number of intact filters which are currently being replaced unnecessarily; more accurate and quantitative measurement of filter performance; and reduced personnel exposure to a radioactive environment by automatically performing most testing operations.

  7. Novel Backup Filter Device for Candle Filters

    SciTech Connect

    Bishop, B.; Goldsmith, R.; Dunham, G.; Henderson, A.

    2002-09-18

    The currently preferred means of particulate removal from process or combustion gas generated by advanced coal-based power production processes is filtration with candle filters. However, candle filters have not shown the requisite reliability to be commercially viable for hot gas clean up for either integrated gasifier combined cycle (IGCC) or pressurized fluid bed combustion (PFBC) processes. Even a single candle failure can lead to unacceptable ash breakthrough, which can result in (a) damage to highly sensitive and expensive downstream equipment, (b) unacceptably low system on-stream factor, and (c) unplanned outages. The U.S. Department of Energy (DOE) has recognized the need to have fail-safe devices installed within or downstream from candle filters. In addition to CeraMem, DOE has contracted with Siemens-Westinghouse, the Energy & Environmental Research Center (EERC) at the University of North Dakota, and the Southern Research Institute (SRI) to develop novel fail-safe devices. Siemens-Westinghouse is evaluating honeycomb-based filter devices on the clean-side of the candle filter that can operate up to 870 C. The EERC is developing a highly porous ceramic disk with a sticky yet temperature-stable coating that will trap dust in the event of filter failure. SRI is developing the Full-Flow Mechanical Safeguard Device that provides a positive seal for the candle filter. Operation of the SRI device is triggered by the higher-than-normal gas flow from a broken candle. The CeraMem approach is similar to that of Siemens-Westinghouse and involves the development of honeycomb-based filters that operate on the clean-side of a candle filter. The overall objective of this project is to fabricate and test silicon carbide-based honeycomb failsafe filters for protection of downstream equipment in advanced coal conversion processes. The fail-safe filter, installed directly downstream of a candle filter, should have the capability for stopping essentially all particulate

  8. In Silico Model-Driven Assessment of the Effects of Single Nucleotide Polymorphisms (SNPs) on Human Red Blood Cell Metabolism

    PubMed Central

    Jamshidi, Neema; Wiback, Sharon J.; Palsson, Bernhard Ø.

    2002-01-01

    The completion of the human genome project and the construction of single nucleotide polymorphism (SNP) maps have lead to significant efforts to find SNPs that can be linked to pathophysiology. In silico models of complete biochemical reaction networks relate a cell's individual reactions to the function of the entire network. Sequence variations can in turn be related to kinetic properties of individual enzymes, thus allowing an in silico model-driven assessment of the effects of defined SNPs on overall cellular functions. This process is applied to defined SNPs in two key enzymes of human red blood cell metabolism: glucose-6-phosphate dehydrogenase and pyruvate kinase. The results demonstrate the utility of in silico models in providing insight into differences between red cell function in patients with chronic and nonchronic anemia. In silico models of complex cellular processes are thus likely to aid in defining and understanding key SNPs in human pathophysiology. PMID:12421755

  9. Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans

    PubMed Central

    Du, Mengmeng; Auer, Paul L.; Jiao, Shuo; Haessler, Jeffrey; Altshuler, David; Boerwinkle, Eric; Carlson, Christopher S.; Carty, Cara L.; Chen, Yii-Der Ida; Curtis, Keith; Franceschini, Nora; Hsu, Li; Jackson, Rebecca; Lange, Leslie A.; Lettre, Guillaume; Monda, Keri L.; Nickerson, Deborah A.; Reiner, Alex P.; Rich, Stephen S.; Rosse, Stephanie A.; Rotter, Jerome I.; Willer, Cristen J.; Wilson, James G.; North, Kari; Kooperberg, Charles; Heard-Costa, Nancy; Peters, Ulrike

    2014-01-01

    Adult body height is a quantitative trait for which genome-wide association studies (GWAS) have identified numerous loci, primarily in European populations. These loci, comprising common variants, explain <10% of the phenotypic variance in height. We searched for novel associations between height and common (minor allele frequency, MAF ≥5%) or infrequent (0.5% < MAF < 5%) variants across the exome in African Americans. Using a reference panel of 1692 African Americans and 471 Europeans from the National Heart, Lung, and Blood Institute's (NHLBI) Exome Sequencing Project (ESP), we imputed whole-exome sequence data into 13 719 African Americans with existing array-based GWAS data (discovery). Variants achieving a height-association threshold of P < 5E−06 in the imputed dataset were followed up in an independent sample of 1989 African Americans with whole-exome sequence data (replication). We used P < 2.5E−07 (=0.05/196 779 variants) to define statistically significant associations in meta-analyses combining the discovery and replication sets (N = 15 708). We discovered and replicated three independent loci for association: 5p13.3/C5orf22/rs17410035 (MAF = 0.10, β = 0.64 cm, P = 8.3E−08), 13q14.2/SPRYD7/rs114089985 (MAF = 0.03, β = 1.46 cm, P = 4.8E−10) and 17q23.3/GH2/rs2006123 (MAF = 0.30; β = 0.47 cm; P = 4.7E−09). Conditional analyses suggested 5p13.3 (C5orf22/rs17410035) and 13q14.2 (SPRYD7/rs114089985) may harbor novel height alleles independent of previous GWAS-identified variants (r2 with GWAS loci <0.01); whereas 17q23.3/GH2/rs2006123 was correlated with GWAS-identified variants in European and African populations. Notably, 13q14.2/rs114089985 is infrequent in African Americans (MAF = 3%), extremely rare in European Americans (MAF = 0.03%), and monomorphic in Asian populations, suggesting it may be an African-American-specific height allele. Our findings demonstrate that whole-exome imputation of sequence variants can identify low

  10. Accuracy of genomic predictions for feed efficiency traits of beef cattle using 50K and imputed HD genotypes.

    PubMed

    Lu, D; Akanno, E C; Crowley, J J; Schenkel, F; Li, H; De Pauw, M; Moore, S S; Wang, Z; Li, C; Stothard, P; Plastow, G; Miller, S P; Basarab, J A

    2016-04-01

    The accuracy of genomic predictions can be used to assess the utility of dense marker genotypes for genetic improvement of beef efficiency traits. This study was designed to test the impact of genomic distance between training and validation populations, training population size, statistical methods, and density of genetic markers on prediction accuracy for feed efficiency traits in multibreed and crossbred beef cattle. A total of 6,794 beef cattle data collated from various projects and research herds across Canada were used. Illumina BovineSNP50 (50K) and imputed Axiom Genome-Wide BOS 1 Array (HD) genotypes were available for all animals. The traits studied were DMI, ADG, and residual feed intake (RFI). Four validation groups of 150 animals each, including Angus (AN), Charolais (CH), Angus-Hereford crosses (ANHH), and a Charolais-based composite (TX) were created by considering the genomic distance between pairs of individuals in the validation groups. Each validation group had 7 corresponding training groups of increasing sizes ( = 1,000, 1,999, 2,999, 3,999, 4,999, 5,998, and 6,644), which also represent increasing average genomic distance between pairs of individuals in the training and validations groups. Prediction of genomic estimated breeding values (GEBV) was performed using genomic best linear unbiased prediction (GBLUP) and Bayesian method C (BayesC). The accuracy of genomic predictions was defined as the Pearson's correlation between adjusted phenotype and GEBV (), unless otherwise stated. Using 50K genotypes, the highest average achieved in purebreds (AN, CH) was 0.41 for DMI, 0.34 for ADG, and 0.35 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.21 for ADG, and 0.25 for RFI. Similarly, when imputed HD genotypes were applied in purebreds (AN, CH), the highest average was 0.14 for DMI, 0.15 for ADG, and 0.14 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.22 for ADG, and 0.24 for RFI. The of GBLUP predictions were

  11. MST Filterability Tests

    SciTech Connect

    Poirier, M. R.; Burket, P. R.; Duignan, M. R.

    2015-03-12

    The Savannah River Site (SRS) is currently treating radioactive liquid waste with the Actinide Removal Process (ARP) and the Modular Caustic Side Solvent Extraction Unit (MCU). The low filter flux through the ARP has limited the rate at which radioactive liquid waste can be treated. Recent filter flux has averaged approximately 5 gallons per minute (gpm). Salt Batch 6 has had a lower processing rate and required frequent filter cleaning. Savannah River Remediation (SRR) has a desire to understand the causes of the low filter flux and to increase ARP/MCU throughput. In addition, at the time the testing started, SRR was assessing the impact of replacing the 0.1 micron filter with a 0.5 micron filter. This report describes testing of MST filterability to investigate the impact of filter pore size and MST particle size on filter flux and testing of filter enhancers to attempt to increase filter flux. The authors constructed a laboratory-scale crossflow filter apparatus with two crossflow filters operating in parallel. One filter was a 0.1 micron Mott sintered SS filter and the other was a 0.5 micron Mott sintered SS filter. The authors also constructed a dead-end filtration apparatus to conduct screening tests with potential filter aids and body feeds, referred to as filter enhancers. The original baseline for ARP was 5.6 M sodium salt solution with a free hydroxide concentration of approximately 1.7 M.3 ARP has been operating with a sodium concentration of approximately 6.4 M and a free hydroxide concentration of approximately 2.5 M. SRNL conducted tests varying the concentration of sodium and free hydroxide to determine whether those changes had a significant effect on filter flux. The feed slurries for the MST filterability tests were composed of simple salts (NaOH, NaNO2, and NaNO3) and MST (0.2 – 4.8 g/L). The feed slurry for the filter enhancer tests contained simulated salt batch 6 supernate, MST, and filter enhancers.

  12. Predicting functional regulatory SNPs in the human antimicrobial peptide genes DEFB1 and CAMP in tuberculosis and HIV/AIDS.

    PubMed

    Flores Saiffe Farías, Adolfo; Jaime Herrera López, Enrique; Moreno Vázquez, Cristopher Jorge; Li, Wentian; Prado Montes de Oca, Ernesto

    2015-12-01

    Single nucleotide polymorphisms (SNPs) in transcription factor binding sites (TFBSs) within gene promoter region or enhancers can modify the transcription rate of genes related to complex diseases. These SNPs can be called regulatory SNPs (rSNPs). Data compiled from recent projects, such as the 1000 Genomes Project and ENCODE, has revealed essential information used to perform in silico prediction of the molecular and biological repercussions of SNPs within TFBS. However, most of these studies are very limited, as they only analyze SNPs in coding regions or when applied to promoters, and do not integrate essential biological data like TFBSs, expression profiles, pathway analysis, homotypic redundancy (number of TFBSs for the same TF in a region), chromatin accessibility and others, which could lead to a more accurate prediction. Our aim was to integrate different data in a biologically coherent method to analyze the proximal promoter regions of two antimicrobial peptide genes, DEFB1 and CAMP, that are associated with tuberculosis (TB) and HIV/AIDS. We predicted SNPs within the promoter regions that are more likely to interact with transcription factors (TFs). We also assessed the impact of homotypic redundancy using a novel approach called the homotypic redundancy weight factor (HWF). Our results identified 10 SNPs, which putatively modify the binding affinity of 24 TFs previously identified as related to TB and HIV/AIDS expression profiles (e.g. KLF5, CEBPA and NFKB1 for TB; FOXP2, BRCA1, CEBPB, CREB1, EBF1 and ZNF354C for HIV/AIDS; and RUNX2, HIF1A, JUN/AP-1, NR4A2, EGR1 for both diseases). Validating with the OregAnno database and cell-specific functional/non functional SNPs from additional 13 genes, our algorithm performed 53% sensitivity and 84.6% specificity to detect functional rSNPs using the DNAseI-HUP database. We are proposing our algorithm as a novel in silico method to detect true functional rSNPs in antimicrobial peptide genes. With further

  13. Survey of digital filtering

    NASA Technical Reports Server (NTRS)

    Nagle, H. T., Jr.

    1972-01-01

    A three part survey is made of the state-of-the-art in digital filtering. Part one presents background material including sampled data transformations and the discrete Fourier transform. Part two, digital filter theory, gives an in-depth coverage of filter categories, transfer function synthesis, quantization and other nonlinear errors, filter structures and computer aided design. Part three presents hardware mechanization techniques. Implementations by general purpose, mini-, and special-purpose computers are presented.

  14. Nonlinear optimal semirecursive filtering

    NASA Astrophysics Data System (ADS)

    Daum, Frederick E.

    1996-05-01

    This paper describes a new hybrid approach to filtering, in which part of the filter is recursive but another part in non-recursive. The practical utility of this notion is to reduce computational complexity. In particular, if the non- recursive part of the filter is sufficiently small, then such a filter might be cost-effective to run in real-time with computer technology available now or in the future.

  15. LincSNP: a database of linking disease-associated SNPs to human large intergenic non-coding RNAs

    PubMed Central

    2014-01-01

    Background Genome-wide association studies (GWAS) have successfully identified a large number of single nucleotide polymorphisms (SNPs) that are associated with a wide range of human diseases. However, many of these disease-associated SNPs are located in non-coding regions and have remained largely unexplained. Recent findings indicate that disease-associated SNPs in human large intergenic non-coding RNA (lincRNA) may lead to susceptibility to diseases through their effects on lincRNA expression. There is, therefore, a need to specifically record these SNPs and annotate them as potential candidates for disease. Description We have built LincSNP, an integrated database, to identify and annotate disease-associated SNPs in human lincRNAs. The current release of LincSNP contains approximately 140,000 disease-associated SNPs (or linkage disequilibrium SNPs), which can be mapped to around 5,000 human lincRNAs, together with their comprehensive functional annotations. The database also contains annotated, experimentally supported SNP-lincRNA-disease associations and disease-associated lincRNAs. It provides flexible search options for data extraction and searches can be performed by disease/phenotype name, SNP ID, lincRNA name and chromosome region. In addition, we provide users with a link to download all the data from LincSNP and have developed a web interface for the submission of novel identified SNP-lincRNA-disease associations. Conclusions The LincSNP database aims to integrate disease-associated SNPs and human lincRNAs, which will be an important resource for the investigation of the functions and mechanisms of lincRNAs in human disease. The database is available at http://bioinfo.hrbmu.edu.cn/LincSNP. PMID:24885522

  16. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs.

    PubMed

    Wu, XianMing; Hurst, Laurence D

    2016-02-01

    Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3-69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20-45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3' non-"AGgt" splice site. We suggest the concept of the "fragile" exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density. PMID:26545919

  17. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs

    PubMed Central

    Wu, XianMing; Hurst, Laurence D.

    2016-01-01

    Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3–69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20–45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3′ non-“AGgt” splice site. We suggest the concept of the “fragile” exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density. PMID:26545919

  18. A real-time PCR genotyping assay to detect FAD2A SNPs in peanuts (Arachis hypogaea L.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The high oleic (C18:1) phenotype in peanuts has been previously demonstrated to result from a homozygous recessive genotype (ol1ol1ol2ol2) in two homeologous fatty acid desaturase genes (FAD2A and FAD2B) with two key SNPs. These mutant SNPs, specifically G448A in FAD2A and 442insA in FAD2B, signifi...

  19. The ribosome filter redux.

    PubMed

    Mauro, Vincent P; Edelman, Gerald M

    2007-09-15

    The ribosome filter hypothesis postulates that ribosomes are not simply translation machines but also function as regulatory elements that differentially affect or filter the translation of particular mRNAs. On the basis of new information, we take the opportunity here to review the ribosome filter hypothesis, suggest specific mechanisms of action, and discuss recent examples from the literature that support it. PMID:17890902

  20. Filter service system

    DOEpatents

    Sellers, Cheryl L.; Nordyke, Daniel S.; Crandell, Richard A.; Tomlins, Gregory; Fei, Dong; Panov, Alexander; Lane, William H.; Habeger, Craig F.

    2008-12-09

    According to an exemplary embodiment of the present disclosure, a system for removing matter from a filtering device includes a gas pressurization assembly. An element of the assembly is removably attachable to a first orifice of the filtering device. The system also includes a vacuum source fluidly connected to a second orifice of the filtering device.

  1. HEPA filter encapsulation

    DOEpatents

    Gates-Anderson, Dianne D.; Kidd, Scott D.; Bowers, John S.; Attebery, Ronald W.

    2003-01-01

    A low viscosity resin is delivered into a spent HEPA filter or other waste. The resin is introduced into the filter or other waste using a vacuum to assist in the mass transfer of the resin through the filter media or other waste.

  2. Nonlinear Attitude Filtering Methods

    NASA Technical Reports Server (NTRS)

    Markley, F. Landis; Crassidis, John L.; Cheng, Yang

    2005-01-01

    This paper provides a survey of modern nonlinear filtering methods for attitude estimation. Early applications relied mostly on the extended Kalman filter for attitude estimation. Since these applications, several new approaches have been developed that have proven to be superior to the extended Kalman filter. Several of these approaches maintain the basic structure of the extended Kalman filter, but employ various modifications in order to provide better convergence or improve other performance characteristics. Examples of such approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters that propagate and update a discrete set of sigma points rather than using linearized equations for the mean and covariance are also reviewed. A two-step approach is discussed with a first-step state that linearizes the measurement model and an iterative second step to recover the desired attitude states. These approaches are all based on the Gaussian assumption that the probability density function is adequately specified by its mean and covariance. Other approaches that do not require this assumption are reviewed, including particle filters and a Bayesian filter based on a non-Gaussian, finite-parameter probability density function on SO(3). Finally, the predictive filter, nonlinear observers and adaptive approaches are shown. The strengths and weaknesses of the various approaches are discussed.

  3. Practical Active Capacitor Filter

    NASA Technical Reports Server (NTRS)

    Shuler, Robert L., Jr. (Inventor)

    2005-01-01

    A method and apparatus is described that filters an electrical signal. The filtering uses a capacitor multiplier circuit where the capacitor multiplier circuit uses at least one amplifier circuit and at least one capacitor. A filtered electrical signal results from a direct connection from an output of the at least one amplifier circuit.

  4. Genetic Association of Recovery from Eating Disorders: The Role of GABA Receptor SNPs

    PubMed Central

    Bloss, Cinnamon S; Berrettini, Wade; Bergen, Andrew W; Magistretti, Pierre; Duvvuri, Vikas; Strober, Michael; Brandt, Harry; Crawford, Steve; Crow, Scott; Fichter, Manfred M; Halmi, Katherine A; Johnson, Craig; Kaplan, Allan S; Keel, Pamela; Klump, Kelly L; Mitchell, James; Treasure, Janet; Woodside, D Blake; Marzola, Enrica; Schork, Nicholas J; Kaye, Walter H

    2011-01-01

    Follow-up studies of eating disorders (EDs) suggest outcomes ranging from recovery to chronic illness or death, but predictors of outcome have not been consistently identified. We tested 5151 single-nucleotide polymorphisms (SNPs) in approximately 350 candidate genes for association with recovery from ED in 1878 women. Initial analyses focused on a strictly defined discovery cohort of women who were over age 25 years, carried a lifetime diagnosis of an ED, and for whom data were available regarding the presence (n=361 ongoing symptoms in the past year, ie, ‘ill') or absence (n=115 no symptoms in the past year, ie, ‘recovered') of ED symptoms. An intronic SNP (rs17536211) in GABRG1 showed the strongest statistical evidence of association (p=4.63 × 10−6, false discovery rate (FDR)=0.021, odds ratio (OR)=0.46). We replicated these findings in a more liberally defined cohort of women age 25 years or younger (n=464 ill, n=107 recovered; p=0.0336, OR=0.68; combined sample p=4.57 × 10−6, FDR=0.0049, OR=0.55). Enrichment analyses revealed that GABA (γ-aminobutyric acid) SNPs were over-represented among SNPs associated at p<0.05 in both the discovery (Z=3.64, p=0.0003) and combined cohorts (Z=2.07, p=0.0388). In follow-up phenomic association analyses with a third independent cohort (n=154 ED cases, n=677 controls), rs17536211 was associated with trait anxiety (p=0.049), suggesting a possible mechanism through which this variant may influence ED outcome. These findings could provide new insights into the development of more effective interventions for the most treatment-resistant patients. PMID:21750581

  5. A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

    PubMed Central

    Bakir-Gungor, Burcu; Sezerman, Osman Ugur

    2011-01-01

    Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network

  6. A Joint Association Test for Multiple SNPs in Genetic Case-Control Studies

    PubMed Central

    Wang, Tao; Jacob, Howard; Ghosh, Soumitra; Wang, Xujing; Zeng, Zhao-Bang

    2009-01-01

    For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods. PMID:18770519

  7. Association between SNPs in genes involved in folate metabolism and preterm birth risk.

    PubMed

    Wang, B J; Liu, M J; Wang, Y; Dai, J R; Tao, J Y; Wang, S N; Zhong, N; Chen, Y

    2015-01-01

    We investigated the association between 12 single nucleotide polymorphisms (SNPs) in 11 genes involved in folate metabolic and preterm birth. A subset of SNPs selected from 11 genes/loci involved in the folic acid metabolism pathway were subjected to SNaPshot analysis in a case-control study. Twelve SNPs (CBS-C699T, DHFR-c594+59del19, GST01-C428T, MTHFD-G1958A, MTHFR-C677T, MTHFR-A1298C, MTR-A2756G, MTRR-A66G, NFE2L2-ins1+C11108T, RFC1-G80A, TCN2-C776G, and TYMS-1494del6) in 503 DNA samples were simultaneously tested, and included 315 preterm births and 188 controls. None of the 12 SNP genotype distributions related to the folic acid metabolism pathway showed a significant difference between preterm and term babies. The frequency of the compound mutation genotype of MTHFD-G1958A, MTR-A2756G and RFC1-G80A in preterm babies was 7.3%, which was significantly higher than the 2.7% in term babies. Seven babies carried the compound mutation genotype of MTHFD-G1958A, MTR-A2756G, and CBS-C699T, but this was not observed in term babies. The frequency of the combined wild-type genotype of MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1- G80A in preterm babies was 3.17%, which was significantly lower than the 7.4% in term babies. The 12 SNPs screened in this study were not independent risk factors of preterm birth. Compound mutation genotypes, including MTHFD-G1958A, MTR-A2756G, and RFC1- G80A and MTHFD-G1958A, MTR-A2756G, and CBS-C699T, may increase the risk of preterm birth. The combined wild-type genotype MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1-G80A may decrease the risk of preterm birth. PMID:25730024

  8. Genome of The Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels.

    PubMed

    van Leeuwen, Elisabeth M; Karssen, Lennart C; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J; Huffman, Jennifer E; White, Charles C; Feitosa, Mary F; Bartz, Traci M; Manichaikul, Ani; Joshi, Peter K; Peloso, Gina M; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J; Milaneschi, Yuri; Penninx, Brenda W J H; Francioli, Laurent C; Menelaou, Androniki; Pulit, Sara L; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A; Franco, Oscar H; Mateo Leach, Irene; Beekman, Marian; de Craen, Anton J M; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J; Porteous, David J; Sattar, Naveed; Packard, Chris J; Buckley, Brendan M; Brody, Jennifer A; Bis, Joshua C; Rotter, Jerome I; Mychaleckyj, Josyf C; Campbell, Harry; Duan, Qing; Lange, Leslie A; Wilson, James F; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F; Rich, Stephen S; Psaty, Bruce M; Borecki, Ingrid B; Kearney, Patricia M; Stott, David J; Adrienne Cupples, L; Jukema, J Wouter; van der Harst, Pim; Sijbrands, Eric J; Hottenga, Jouke-Jan; Uitterlinden, Andre G; Swertz, Morris A; van Ommen, Gert-Jan B; de Bakker, Paul I W; Eline Slagboom, P; Boomsma, Dorret I; Wijmenga, Cisca; van Duijn, Cornelia M

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of The Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10(-4)), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  9. Backward multiple imputation estimation of the conditional lifetime expectancy function with application to censored human longevity data.

    PubMed

    Kong, Jing; Klein, Barbara E K; Klein, Ronald; Wahba, Grace

    2015-09-29

    The conditional lifetime expectancy function (LEF) is the expected lifetime of a subject given survival past a certain time point and the values of a set of explanatory variables. This function is attractive to researchers because it summarizes the entire residual life distribution and has an easy interpretation compared with the popularly used hazard function. In this paper, we propose a general framework of backward multiple imputation for estimating the conditional LEF and the variance of the estimator in the right-censoring setting. Simulation studies are conducted to investigate the empirical properties of the proposed estimator and the corresponding variance estimator. We demonstrate the method on the Beaver Dam Eye Study data, where the expected human lifetime is modeled with smoothing-spline ANOVA given the covariates information including sex, lifestyle factors, and disease variables. PMID:26371300

  10. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    PubMed Central

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  11. Regenerative particulate filter development

    NASA Technical Reports Server (NTRS)

    Descamp, V. A.; Boex, M. W.; Hussey, M. W.; Larson, T. P.

    1972-01-01

    Development, design, and fabrication of a prototype filter regeneration unit for regenerating clean fluid particle filter elements by using a backflush/jet impingement technique are reported. Development tests were also conducted on a vortex particle separator designed for use in zero gravity environment. A maintainable filter was designed, fabricated and tested that allows filter element replacement without any leakage or spillage of system fluid. Also described are spacecraft fluid system design and filter maintenance techniques with respect to inflight maintenance for the space shuttle and space station.

  12. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer.

    PubMed

    Al-Tassan, Nada A; Whiffin, Nicola; Hosking, Fay J; Palles, Claire; Farrington, Susan M; Dobbins, Sara E; Harris, Rebecca; Gorman, Maggie; Tenesa, Albert; Meyer, Brian F; Wakil, Salma M; Kinnersley, Ben; Campbell, Harry; Martin, Lynn; Smith, Christopher G; Idziaszczyk, Shelley; Barclay, Ella; Maughan, Timothy S; Kaplan, Richard; Kerr, Rachel; Kerr, David; Buchanan, Daniel D; Buchannan, Daniel D; Win, Aung Ko; Hopper, John; Jenkins, Mark; Lindor, Noralane M; Newcomb, Polly A; Gallinger, Steve; Conti, David; Schumacher, Fred; Casey, Graham; Dunlop, Malcolm G; Tomlinson, Ian P; Cheadle, Jeremy P; Houlston, Richard S

    2015-01-01

    Genome-wide association studies (GWAS) of colorectal cancer (CRC) have identified 23 susceptibility loci thus far. Analyses of previously conducted GWAS indicate additional risk loci are yet to be discovered. To identify novel CRC susceptibility loci, we conducted a new GWAS and performed a meta-analysis with five published GWAS (totalling 7,577 cases and 9,979 controls of European ancestry), imputing genotypes utilising the 1000 Genomes Project. The combined analysis identified new, significant associations with CRC at 1p36.2 marked by rs72647484 (minor allele frequency [MAF] = 0.09) near CDC42 and WNT4 (P = 1.21 × 10(-8), odds ratio [OR] = 1.21 ) and at 16q24.1 marked by rs16941835 (MAF = 0.21, P = 5.06 × 10(-8); OR = 1.15) within the long non-coding RNA (lncRNA) RP11-58A18.1 and ~500 kb from the nearest coding gene FOXL1. Additionally we identified a promising association at 10p13 with rs10904849 intronic to CUBN (MAF = 0.32, P = 7.01 × 10(-8); OR = 1.14). These findings provide further insights into the genetic and biological basis of inherited genetic susceptibility to CRC. Additionally, our analysis further demonstrates that imputation can be used to exploit GWAS data to identify novel disease-causing variants. PMID:25990418

  13. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    PubMed

    Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  14. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

    PubMed Central

    van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

    2015-01-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  15. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    NASA Astrophysics Data System (ADS)

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  16. Outcome-adaptive randomization for a delayed outcome with a short-term predictor: imputation-based designs.

    PubMed

    Kim, Mi-Ok; Liu, Chunyan; Hu, Feifang; Lee, J Jack

    2014-10-15

    Delay in the outcome variable is challenging for outcome-adaptive randomization, as it creates a lag between the number of subjects accrued and the information known at the time of the analysis. Motivated by a real-life pediatric ulcerative colitis trial, we consider a case where a short-term predictor is available for the delayed outcome. When a short-term predictor is not considered, studies have shown that the asymptotic properties of many outcome-adaptive randomization designs are little affected unless the lag is unreasonably large relative to the accrual process. These theoretical results assumed independent identical delays, however, whereas delays in the presence of a short-term predictor may only be conditionally homogeneous. We consider delayed outcomes as missing and propose mitigating the delay effect by imputing them. We apply this approach to the doubly adaptive biased coin design (DBCD) for motivating pediatric ulcerative colitis trial. We provide theoretical results that if the delays, although non-homogeneous, are reasonably short relative to the accrual process similarly as in the iid delay case, the lag is also asymptotically ignorable in the sense that a standard DBCD that utilizes only observed outcomes attains target allocation ratios in the limit. Empirical studies, however, indicate that imputation-based DBCDs performed more reliably in finite samples with smaller root mean square errors. The empirical studies assumed a common clinical setting where a delayed outcome is positively correlated with a short-term predictor similarly between treatment arm groups. We varied the strength of the correlation and considered fast and slow accrual settings. PMID:24889540

  17. Ceramic fiber filter technology

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1996-06-01

    Fibrous filters have been used for centuries to protect individuals from dust, disease, smoke, and other gases or particulates. In the 1970s and 1980s ceramic filters were developed for filtration of hot exhaust gases from diesel engines. Tubular, or candle, filters have been made to remove particles from gases in pressurized fluidized-bed combustion and gasification-combined-cycle power plants. Very efficient filtration is necessary in power plants to protect the turbine blades. The limited lifespan of ceramic candle filters has been a major obstacle in their development. The present work is focused on forming fibrous ceramic filters using a papermaking technique. These filters are highly porous and therefore very lightweight. The papermaking process consists of filtering a slurry of ceramic fibers through a steel screen to form paper. Papermaking and the selection of materials will be discussed, as well as preliminary results describing the geometry of papers and relative strengths.

  18. Compact planar microwave blocking filters

    NASA Technical Reports Server (NTRS)

    U-Yen, Kongpop (Inventor); Wollack, Edward J. (Inventor)

    2012-01-01

    A compact planar microwave blocking filter includes a dielectric substrate and a plurality of filter unit elements disposed on the substrate. The filter unit elements are interconnected in a symmetrical series cascade with filter unit elements being organized in the series based on physical size. In the filter, a first filter unit element of the plurality of filter unit elements includes a low impedance open-ended line configured to reduce the shunt capacitance of the filter.

  19. Investigation of MC1R SNPs and Their Relationships with Plumage Colors in Korean Native Chicken.

    PubMed

    Hoque, M R; Jin, S; Heo, K N; Kang, B S; Jo, C; Lee, J H

    2013-05-01

    The melanocortin 1 receptor (MC1R) gene is related to the plumage color variations in chicken. Initially, the MC1R gene from 30 individuals was sequenced and nine polymorphisms were obtained. Of these, three and six single nucleotide polymorphisms (SNPs) were confirmed as synonymous and nonsynonymous mutations, respectively. Among these, three selected SNPs were genotyped using the restriction fragment length polymorphism (RFLP) method in 150 individuals from five chicken breeds, which identified the plumage color responding alleles. The neighbor-joining phylogenetic tree using MC1R gene sequences indicated three well-differentiated different plumage pigmentations (eumelanin, pheomelanin and albino). Also, the genotype analyses indicated that the TT, AA and GG genotypes corresponded to the eumelanin, pheomelanin and albino plumage pigmentations at nucleotide positions 69, 376 and 427, respectively. In contrast, high allele frequencies with T, A and G alleles corresponded to black, red/yellow and white plumage color in 69, 376 and 427 nucleotide positions, respectively. Also, amino acids changes at position Asn23Asn, Val126Ile and Thr143Ala were observed in melanin synthesis with identified possible alleles, respectively. In addition, high haplotype frequencies in TGA, CGG and CAA haplotypes were well discriminated based on the plumage pigmentation in chicken breeds. The results obtained in this study can be used for designing proper breeding and conservation strategies for the Korean native chicken breeds, as well as for the developing breed identification markers in chicken. PMID:25049831

  20. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    PubMed

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  1. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  2. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

    PubMed

    Lee, S Hong; Ripke, Stephan; Neale, Benjamin M; Faraone, Stephen V; Purcell, Shaun M; Perlis, Roy H; Mowry, Bryan J; Thapar, Anita; Goddard, Michael E; Witte, John S; Absher, Devin; Agartz, Ingrid; Akil, Huda; Amin, Farooq; Andreassen, Ole A; Anjorin, Adebayo; Anney, Richard; Anttila, Verneri; Arking, Dan E; Asherson, Philip; Azevedo, Maria H; Backlund, Lena; Badner, Judith A; Bailey, Anthony J; Banaschewski, Tobias; Barchas, Jack D; Barnes, Michael R; Barrett, Thomas B; Bass, Nicholas; Battaglia, Agatino; Bauer, Michael; Bayés, Mònica; Bellivier, Frank; Bergen, Sarah E; Berrettini, Wade; Betancur, Catalina; Bettecken, Thomas; Biederman, Joseph; Binder, Elisabeth B; Black, Donald W; Blackwood, Douglas H R; Bloss, Cinnamon S; Boehnke, Michael; Boomsma, Dorret I; Breen, Gerome; Breuer, René; Bruggeman, Richard; Cormican, Paul; Buccola, Nancy G; Buitelaar, Jan K; Bunney, William E; Buxbaum, Joseph D; Byerley, William F; Byrne, Enda M; Caesar, Sian; Cahn, Wiepke; Cantor, Rita M; Casas, Miguel; Chakravarti, Aravinda; Chambert, Kimberly; Choudhury, Khalid; Cichon, Sven; Cloninger, C Robert; Collier, David A; Cook, Edwin H; Coon, Hilary; Cormand, Bru; Corvin, Aiden; Coryell, William H; Craig, David W; Craig, Ian W; Crosbie, Jennifer; Cuccaro, Michael L; Curtis, David; Czamara, Darina; Datta, Susmita; Dawson, Geraldine; Day, Richard; De Geus, Eco J; Degenhardt, Franziska; Djurovic, Srdjan; Donohoe, Gary J; Doyle, Alysa E; Duan, Jubao; Dudbridge, Frank; Duketis, Eftichia; Ebstein, Richard P; Edenberg, Howard J; Elia, Josephine; Ennis, Sean; Etain, Bruno; Fanous, Ayman; Farmer, Anne E; Ferrier, I Nicol; Flickinger, Matthew; Fombonne, Eric; Foroud, Tatiana; Frank, Josef; Franke, Barbara; Fraser, Christine; Freedman, Robert; Freimer, Nelson B; Freitag, Christine M; Friedl, Marion; Frisén, Louise; Gallagher, Louise; Gejman, Pablo V; Georgieva, Lyudmila; Gershon, Elliot S; Geschwind, Daniel H; Giegling, Ina; Gill, Michael; Gordon, Scott D; Gordon-Smith, Katherine; Green, Elaine K; Greenwood, Tiffany A; Grice, Dorothy E; Gross, Magdalena; Grozeva, Detelina; Guan, Weihua; Gurling, Hugh; De Haan, Lieuwe; Haines, Jonathan L; Hakonarson, Hakon; Hallmayer, Joachim; Hamilton, Steven P; Hamshere, Marian L; Hansen, Thomas F; Hartmann, Annette M; Hautzinger, Martin; Heath, Andrew C; Henders, Anjali K; Herms, Stefan; Hickie, Ian B; Hipolito, Maria; Hoefels, Susanne; Holmans, Peter A; Holsboer, Florian; Hoogendijk, Witte J; Hottenga, Jouke-Jan; Hultman, Christina M; Hus, Vanessa; Ingason, Andrés; Ising, Marcus; Jamain, Stéphane; Jones, Edward G; Jones, Ian; Jones, Lisa; Tzeng, Jung-Ying; Kähler, Anna K; Kahn, René S; Kandaswamy, Radhika; Keller, Matthew C; Kennedy, James L; Kenny, Elaine; Kent, Lindsey; Kim, Yunjung; Kirov, George K; Klauck, Sabine M; Klei, Lambertus; Knowles, James A; Kohli, Martin A; Koller, Daniel L; Konte, Bettina; Korszun, Ania; Krabbendam, Lydia; Krasucki, Robert; Kuntsi, Jonna; Kwan, Phoenix; Landén, Mikael; Långström, Niklas; Lathrop, Mark; Lawrence, Jacob; Lawson, William B; Leboyer, Marion; Ledbetter, David H; Lee, Phil H; Lencz, Todd; Lesch, Klaus-Peter; Levinson, Douglas F; Lewis, Cathryn M; Li, Jun; Lichtenstein, Paul; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Liu, Chunyu; Lohoff, Falk W; Loo, Sandra K; Lord, Catherine; Lowe, Jennifer K; Lucae, Susanne; MacIntyre, Donald J; Madden, Pamela A F; Maestrini, Elena; Magnusson, Patrik K E; Mahon, Pamela B; Maier, Wolfgang; Malhotra, Anil K; Mane, Shrikant M; Martin, Christa L; Martin, Nicholas G; Mattheisen, Manuel; Matthews, Keith; Mattingsdal, Morten; McCarroll, Steven A; McGhee, Kevin A; McGough, James J; McGrath, Patrick J; McGuffin, Peter; McInnis, Melvin G; McIntosh, Andrew; McKinney, Rebecca; McLean, Alan W; McMahon, Francis J; McMahon, William M; McQuillin, Andrew; Medeiros, Helena; Medland, Sarah E; Meier, Sandra; Melle, Ingrid; Meng, Fan; Meyer, Jobst; Middeldorp, Christel M; Middleton, Lefkos; Milanova, Vihra; Miranda, Ana; Monaco, Anthony P; Montgomery, Grant W; Moran, Jennifer L; Moreno-De-Luca, Daniel; Morken, Gunnar; Morris, Derek W; Morrow, Eric M; Moskvina, Valentina; Muglia, Pierandrea; Mühleisen, Thomas W; Muir, Walter J; Müller-Myhsok, Bertram; Murtha, Michael; Myers, Richard M; Myin-Germeys, Inez; Neale, Michael C; Nelson, Stan F; Nievergelt, Caroline M; Nikolov, Ivan; Nimgaonkar, Vishwajit; Nolen, Willem A; Nöthen, Markus M; Nurnberger, John I; Nwulia, Evaristus A; Nyholt, Dale R; O'Dushlaine, Colm; Oades, Robert D; Olincy, Ann; Oliveira, Guiomar; Olsen, Line; Ophoff, Roel A; Osby, Urban; Owen, Michael J; Palotie, Aarno; Parr, Jeremy R

    2013-09-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  3. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs

    PubMed Central

    2013-01-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  4. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing

    PubMed Central

    Bowers, John E.; Pearl, Stephanie A.; Burke, John M.

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  5. Transposon Insertions, Structural Variations, and SNPs Contribute to the Evolution of the Melon Genome.

    PubMed

    Sanseverino, Walter; Hénaff, Elizabeth; Vives, Cristina; Pinosio, Sara; Burgos-Paz, William; Morgante, Michele; Ramos-Onsins, Sebastián E; Garcia-Mas, Jordi; Casacuberta, Josep Maria

    2015-10-01

    The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution. PMID:26174143

  6. No Observed Association for Mitochondrial SNPs with Preterm Delivery and Related Outcomes

    PubMed Central

    Alleman, Brandon W.; Myking, Solveig; Ryckman, Kelli K.; Myhre, Ronny; Feingold, Eleanor; Feenstra, Bjarke; Geller, Frank; Boyd, Heather A.; Shaffer, John R.; Zhang, Qi; Begum, Ferdouse; Crosslin, David; Doheny, Kim; Pugh, Elizabeth; Pay, Aase Serine Devold; Østensen, Ingrid H.G.; Morken, Nils-Halvdan; Magnus, Per; Marazita, Mary L.; Jacobsson, Bo; Melbye, Mads; Murray, Jeffrey C.

    2013-01-01

    Background Preterm delivery (PTD) is the leading cause of neonatal morbidity and mortality. Epidemiologic studies indicate recurrence of PTD is maternally inherited creating a strong possibility that mitochondrial variants contribute to its etiology. This study examines the association between mitochondrial genotypes with PTD and related outcomes. Methods This study combined, through meta-analysis, two case-control, genome-wide association studies (GWAS); one from the Danish National Birth Cohort (DNBC) Study and one from the Norwegian Mother and Child Cohort Study (MoBa) conducted by the Norwegian Institute of Public Health. The outcomes of PTD (≤36 weeks), very PTD (≤32 weeks) and preterm prelabor rupture of membranes (PPROM) were examined. 135 individual SNP associations were tested using the combined genome from mothers and neonates (case vs. control) in each population and then pooled via meta-analysis. Results After meta-analysis there were four SNPs for the outcome of PTD below p≤0.10, and two below p≤0.05. For the additional outcomes of very PTD and PPROM there were three and four SNPs respectively below p≤0.10. Conclusion Given the number of tests no single SNP reached study wide significance (p=0.0006). Our study does not support the hypothesis that mitochondrial genetics contributes to the maternal transmission of PTD and related outcomes. PMID:22902432

  7. The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs

    PubMed Central

    2004-01-01

    Understanding the nature of evolutionary relationships among persons and populations is important for the efficient application of genome science to biomedical research. We have analysed 8,525 autosomal single nucleotide polymorphisms (SNPs) in 84 individuals from four populations: African-American, European-American, Chinese and Japanese. Individual relationships were reconstructed using the allele sharing distance and the neighbour-joining tree making method. Trees show clear clustering according to population, with the root branching from the African-American clade. The African-American cluster is much less star-like than European-American and East Asian clusters, primarily because of admixture. Furthermore, on the East Asian branch, all ten Chinese individuals cluster together and all ten Japanese individuals cluster together. Using positional information, we demonstrate strong correlations between inter-marker distance and both locus-specific FST (the proportion of total variation due to differentiation) levels and branch lengths. Chromosomal maps of the distribution of locus-specific branch lengths were constructed by combining these data with other published SNP markers (total of 33,704 SNPs). These maps clearly illustrate a non-uniform distribution of human genetic substructure, an instructional and useful paradigm for education and research. PMID:15588487

  8. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests

    PubMed Central

    2015-01-01

    Background Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. Results This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction

  9. Natural Functional SNPs in miR-155 Alter Its Expression Level, Blood Cell Counts, and Immune Responses.

    PubMed

    Li, Congcong; He, Huabin; Liu, An; Liu, Huazhen; Huang, Haibo; Zhao, Changzhi; Jing, Lu; Ni, Juan; Yin, Lilin; Hu, Suqin; Wu, Hui; Li, Xinyun; Zhao, Shuhong

    2016-01-01

    miR-155 has been confirmed to be a key factor in immune responses in humans and other mammals. Therefore, investigation of variations in miR-155 could be useful for understanding the differences in immunity between individuals. In this study, four SNPs in miR-155 were identified in mice (Mus musculus) and humans (Homo sapiens). In mice, the four SNPs were closely linked and formed two miR-155 haplotypes (A and B). Ten distinct types of blood parameters were associated with miR-155 expression under normal conditions. Additionally, 4 and 14 blood parameters were significantly different between these two genotypes under normal and lipopolysaccharide (LPS) stimulation conditions, respectively. Moreover, the expression levels of miR-155, the inflammatory response to LPS stimulation, and the lethal ratio following Salmonella typhimurium infection were significantly increased in mice harboring the AA genotype. Further, two SNPs, one in the loop region and the other near the 3' terminal of pre-miR-155, were confirmed to be responsible for the differential expression of miR-155 in mice. Interestingly, two additional SNPs, one in the loop region and the other in the middle of miR-155*, modulated the function of miR-155 in humans. Predictions of secondary RNA structure using RNAfold showed that these SNPs affected the structure of miR-155 in both mice and humans. Our results provide novel evidence of the natural functional SNPs of miR-155 in both mice and humans, which may affect the expression levels of mature miR-155 by modulating its secondary structure. The SNPs of human miR-155 may be considered as causal mutations for some immune-related diseases in the clinic. The two genotypes of mice could be used as natural models for studying the mechanisms of immune diseases caused by abnormal expression of miR-155 in humans. PMID:27532002

  10. Natural Functional SNPs in miR-155 Alter Its Expression Level, Blood Cell Counts, and Immune Responses

    PubMed Central

    Li, Congcong; He, Huabin; Liu, An; Liu, Huazhen; Huang, Haibo; Zhao, Changzhi; Jing, Lu; Ni, Juan; Yin, Lilin; Hu, Suqin; Wu, Hui; Li, Xinyun; Zhao, Shuhong

    2016-01-01

    miR-155 has been confirmed to be a key factor in immune responses in humans and other mammals. Therefore, investigation of variations in miR-155 could be useful for understanding the differences in immunity between individuals. In this study, four SNPs in miR-155 were identified in mice (Mus musculus) and humans (Homo sapiens). In mice, the four SNPs were closely linked and formed two miR-155 haplotypes (A and B). Ten distinct types of blood parameters were associated with miR-155 expression under normal conditions. Additionally, 4 and 14 blood parameters were significantly different between these two genotypes under normal and lipopolysaccharide (LPS) stimulation conditions, respectively. Moreover, the expression levels of miR-155, the inflammatory response to LPS stimulation, and the lethal ratio following Salmonella typhimurium infection were significantly increased in mice harboring the AA genotype. Further, two SNPs, one in the loop region and the other near the 3′ terminal of pre-miR-155, were confirmed to be responsible for the differential expression of miR-155 in mice. Interestingly, two additional SNPs, one in the loop region and the other in the middle of miR-155*, modulated the function of miR-155 in humans. Predictions of secondary RNA structure using RNAfold showed that these SNPs affected the structure of miR-155 in both mice and humans. Our results provide novel evidence of the natural functional SNPs of miR-155 in both mice and humans, which may affect the expression levels of mature miR-155 by modulating its secondary structure. The SNPs of human miR-155 may be considered as causal mutations for some immune-related diseases in the clinic. The two genotypes of mice could be used as natural models for studying the mechanisms of immune diseases caused by abnormal expression of miR-155 in humans. PMID:27532002

  11. Detection of SNPs in the TBC1D1 gene and their association with carcass traits in chicken.

    PubMed

    Wang, Yan; Xu, Heng-Yong; Gilbert, Elizabeth R; Peng, Xing; Zhao, Xiao-Ling; Liu, Yi-Ping; Zhu, Qing

    2014-09-01

    TBC1D1 plays an important role in numerous fundamental physiological processes including muscle metabolism, regulation of whole body energy homeostasis and lipid metabolism. The objective of the present study was to identify single nucleotide polymorphisms (SNPs) in chicken TBC1D1 using 128 Erlang mountainous chickens and to determine if these SNPs are associated with carcass traits. The approach consisted of sequencing TBC1D1 using a panel of DNA from different individuals, revealing twenty-two SNPs. Among these SNPs, two polymorphisms (g.69307744C>T and g.69307608T>G) of block 1, four polymorphisms (g.69322320C>T, g.69322314G>A, g.69317290A>G and g.69317276T>C) of block 2 and four polymorphisms of block 3 (g.69349746G>A, g.69349736C>G, g.69349727C>T and g.69349694C>T) exhibited a high degree of linkage disequilibrium in all test populations. An association analysis was performed between the twenty-two SNPs and seven performance traits. SNPs g.69307744C>T, g.69340192G>A and g.69355665T>C were demonstrated to have a strong effect on liveweight (BW), carcass weight (CW), semi-eviscerated weight (SEW) and eviscerated weight (EW) and g.69340070C>T polymorphism was related to BW, SEW and BMW in chicken populations. However, for the other SNPs, there were no significant correlations between different genotypes and carcass traits. Meanwhile, haplotype CT-TG of block 1 and combined genotype AG-TT-AC-CT of block 3 were significantly associated with BW, CW, SEW and EW. Overall, our results provide evidence that polymorphisms in TBC1D1 are associated with carcass traits and would be a useful candidate gene in selection programs for improving carcass traits. PMID:24979340

  12. Fabric filter blinding mechanisms

    SciTech Connect

    Notestein, J.E.; Shang, J.Y.

    1982-08-01

    This discussion of various bag/cloth filter degradation mechanisms is mostly common sense. However, this information is occasionally lost in the subtleties of real-system operation. Although this paper is written with reference to fluidized-bed combustion (FBC) applications, the insights are generally applicable. For enumeration of particular filter fabric and baghouse experiences in FBC applications, the reader is referred to a report by Davy McKee Corporatin (no date). A fabric filter is a composite matrix of fibers oriented to retain the dust particles from dust-laden gas. The cleaned gas passes through the fabric filter; the retained dust particles are deposited on the surface of (and within) the fiber matrix. The retained dust can be later removed through mechanical means. The fabric may be made of any fibrous material, spun in yarn, and then woven, impacted, needled, or bonded into a felt. Deep penetration of aggregated fine particles, lack of dust removal during filter cleaning, and chars or condensed aerosols may contribute to the increase in pressure drop across the filter. This increases the filter operation power consumption and, consequently, reduces the filtration capacity. The phenomenon of building a high-pressure drop in spite of filter cleaning provisions is known as blinding. In order to maintain an acceptable gas throughput, blinding problems must be addressed. Recommendations are given: maintain temperature above dew point, use filter aids, by-pass filter during start-up or operational upsets, etc.

  13. Filtering separators having filter cleaning apparatus

    SciTech Connect

    Margraf, A.

    1984-08-28

    This invention relates to filtering separators of the kind having a housing which is subdivided by a partition, provided with parallel rows of holes or slots, into a dust-laden gas space for receiving filter elements positioned in parallel rows and being impinged upon by dust-laden gas from the outside towards the inside, and a clean gas space. In addition, the housing is provided with a chamber for cleansing the filter element surfaces of a row by counterflow action while covering at the same time the partition holes or slots leading to the adjacent rows of filter elements. The chamber is arranged for the supply of compressed air to at least one injector arranged to feed compressed air and secondary air to the row of filter elements to be cleansed. The chamber is also reciprocatingly displaceable along the partition in periodic and intermittent manner. According to the invention, a surface of the chamber facing towards the partition covers at least two of the rows of holes or slots of the partition, and the chamber is closed upon itself with respect to the clean gas space, and is connected to a compressed air reservoir via a distributor pipe and a control valve. At least one of the rows of holes or slots of the partition and the respective row of filter elements in flow communication therewith are in flow communication with the discharge side of at least one injector acted upon with compressed air. At least one other row of the rows of holes or slots of the partition and the respective row of filter elements is in flow communication with the suction side of the injector.

  14. Generic Kalman Filter Software

    NASA Technical Reports Server (NTRS)

    Lisano, Michael E., II; Crues, Edwin Z.

    2005-01-01

    The Generic Kalman Filter (GKF) software provides a standard basis for the development of application-specific Kalman-filter programs. Historically, Kalman filters have been implemented by customized programs that must be written, coded, and debugged anew for each unique application, then tested and tuned with simulated or actual measurement data. Total development times for typical Kalman-filter application programs have ranged from months to weeks. The GKF software can simplify the development process and reduce the development time by eliminating the need to re-create the fundamental implementation of the Kalman filter for each new application. The GKF software is written in the ANSI C programming language. It contains a generic Kalman-filter-development directory that, in turn, contains a code for a generic Kalman filter function; more specifically, it contains a generically designed and generically coded implementation of linear, linearized, and extended Kalman filtering algorithms, including algorithms for state- and covariance-update and -propagation functions. The mathematical theory that underlies the algorithms is well known and has been reported extensively in the open technical literature. Also contained in the directory are a header file that defines generic Kalman-filter data structures and prototype functions and template versions of application-specific subfunction and calling navigation/estimation routine code and headers. Once the user has provided a calling routine and the required application-specific subfunctions, the application-specific Kalman-filter software can be compiled and executed immediately. During execution, the generic Kalman-filter function is called from a higher-level navigation or estimation routine that preprocesses measurement data and post-processes output data. The generic Kalman-filter function uses the aforementioned data structures and five implementation- specific subfunctions, which have been developed by the user on

  15. Concentric Split Flow Filter

    NASA Technical Reports Server (NTRS)

    Stapleton, Thomas J. (Inventor)

    2015-01-01

    A concentric split flow filter may be configured to remove odor and/or bacteria from pumped air used to collect urine and fecal waste products. For instance, filter may be designed to effectively fill the volume that was previously considered wasted surrounding the transport tube of a waste management system. The concentric split flow filter may be configured to split the air flow, with substantially half of the air flow to be treated traveling through a first bed of filter media and substantially the other half of the air flow to be treated traveling through the second bed of filter media. This split flow design reduces the air velocity by 50%. In this way, the pressure drop of filter may be reduced by as much as a factor of 4 as compare to the conventional design.

  16. Optically tunable optical filter

    NASA Astrophysics Data System (ADS)

    James, Robert T. B.; Wah, Christopher; Iizuka, Keigo; Shimotahira, Hiroshi

    1995-12-01

    We experimentally demonstrate an optically tunable optical filter that uses photorefractive barium titanate. With our filter we implement a spectrum analyzer at 632.8 nm with a resolution of 1.2 nm. We simulate a wavelength-division multiplexing system by separating two semiconductor laser diodes, at 1560 nm and 1578 nm, with the same filter. The filter has a bandwidth of 6.9 nm. We also use the same filter to take 2.5-nm-wide slices out of a 20-nm-wide superluminescent diode centered at 840 nm. As a result, we experimentally demonstrate a phenomenal tuning range from 632.8 to 1578 nm with a single filtering device.

  17. Contactor/filter improvements

    DOEpatents

    Stelman, D.

    1988-06-30

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream is described. The filter includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. A cover screen isolates the filter element from contact with the moving granular bed. In one embodiment, the granular material is comprised of porous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses. 6 figs.

  18. Hybrid Filter Membrane

    NASA Technical Reports Server (NTRS)

    Laicer, Castro; Rasimick, Brian; Green, Zachary

    2012-01-01

    Cabin environmental control is an important issue for a successful Moon mission. Due to the unique environment of the Moon, lunar dust control is one of the main problems that significantly diminishes the air quality inside spacecraft cabins. Therefore, this innovation was motivated by NASA s need to minimize the negative health impact that air-suspended lunar dust particles have on astronauts in spacecraft cabins. It is based on fabrication of a hybrid filter comprising nanofiber nonwoven layers coated on porous polymer membranes with uniform cylindrical pores. This design results in a high-efficiency gas particulate filter with low pressure drop and the ability to be easily regenerated to restore filtration performance. A hybrid filter was developed consisting of a porous membrane with uniform, micron-sized, cylindrical pore channels coated with a thin nanofiber layer. Compared to conventional filter media such as a high-efficiency particulate air (HEPA) filter, this filter is designed to provide high particle efficiency, low pressure drop, and the ability to be regenerated. These membranes have well-defined micron-sized pores and can be used independently as air filters with discreet particle size cut-off, or coated with nanofiber layers for filtration of ultrafine nanoscale particles. The filter consists of a thin design intended to facilitate filter regeneration by localized air pulsing. The two main features of this invention are the concept of combining a micro-engineered straight-pore membrane with nanofibers. The micro-engineered straight pore membrane can be prepared with extremely high precision. Because the resulting membrane pores are straight and not tortuous like those found in conventional filters, the pressure drop across the filter is significantly reduced. The nanofiber layer is applied as a very thin coating to enhance filtration efficiency for fine nanoscale particles. Additionally, the thin nanofiber coating is designed to promote capture of

  19. Filter vapor trap

    DOEpatents

    Guon, Jerold

    1976-04-13

    A sintered filter trap is adapted for insertion in a gas stream of sodium vapor to condense and deposit sodium thereon. The filter is heated and operated above the melting temperature of sodium, resulting in a more efficient means to remove sodium particulates from the effluent inert gas emanating from the surface of a liquid sodium pool. Preferably the filter leaves are precoated with a natrophobic coating such as tetracosane.

  20. A Genome-Wide Investigation of SNPs and CNVs in Schizophrenia

    PubMed Central

    Maia, Jessica; Feng, Sheng; Heinzen, Erin L.; Shianna, Kevin V.; Yoon, Woohyun; Kasperavičiūtė, Dalia; Gennarelli, Massimo; Strittmatter, Warren J.; Bonvicini, Cristian; Rossi, Giuseppe; Jayathilake, Karu; Cola, Philip A.; McEvoy, Joseph P.; Keefe, Richard S. E.; Fisher, Elizabeth M. C.; St. Jean, Pamela L.; Giegling, Ina; Hartmann, Annette M.; Möller, Hans-Jürgen; Ruppert, Andreas; Fraser, Gillian; Crombie, Caroline; Middleton, Lefkos T.; St. Clair, David; Roses, Allen D.; Muglia, Pierandrea; Francks, Clyde; Rujescu, Dan; Meltzer, Herbert Y.; Goldstein, David B.

    2009-01-01

    We report a genome-wide assessment of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) in schizophrenia. We investigated SNPs using 871 patients and 863 controls, following up the top hits in four independent cohorts comprising 1,460 patients and 12,995 controls, all of European origin. We found no genome-wide significant associations, nor could we provide support for any previously reported candidate gene or genome-wide associations. We went on to examine CNVs using a subset of 1,013 cases and 1,084 controls of European ancestry, and a further set of 60 cases and 64 controls of African ancestry. We found that eight cases and zero controls carried deletions greater than 2 Mb, of which two, at 8p22 and 16p13.11-p12.4, are newly reported here. A further evaluation of 1,378 controls identified no deletions greater than 2 Mb, suggesting a high prior probability of disease involvement when such deletions are observed in cases. We also provide further evidence for some smaller, previously reported, schizophrenia-associated CNVs, such as those in NRXN1 and APBA2. We could not provide strong support for the hypothesis that schizophrenia patients have a significantly greater “load” of large (>100 kb), rare CNVs, nor could we find common CNVs that associate with schizophrenia. Finally, we did not provide support for the suggestion that schizophrenia-associated CNVs may preferentially disrupt genes in neurodevelopmental pathways. Collectively, these analyses provide the first integrated study of SNPs and CNVs in schizophrenia and support the emerging view that rare deleterious variants may be more important in schizophrenia predisposition than common polymorphisms. While our analyses do not suggest that implicated CNVs impinge on particular key pathways, we do support the contribution of specific genomic regions in schizophrenia, presumably due to recurrent mutation. On balance, these data suggest that very few schizophrenia patients share identical

  1. High-throughput identification, database storage and analysis of SNPs in EST sequences.

    PubMed

    Useche, F J; Gao, G; Harafey, M; Rafalski, A

    2001-01-01

    Single nucleotide polymorphisms (SNPs) are the most frequent form of DNA variation and disease-causing mutations in many genes. Due to their abundance and slow mutation rate within generations, they are thought to be the next generation of genetic markers that can be used in a myriad of important biological, genetic, pharmacological, and medical applications. There are several strategies both experimental, and in-silico for SNP discovery and mapping. Experimental SNP discovery consists of a number of labourious steps that make this process complex and expensive. In-silico discovery has been proposed as an alternative discovery method that makes use and takes advantage of large data sets with potential SNP information that have been generated with other purposes and have not been used as a SNP information source yet. However, in order to successfully apply the in-silico method to large data sets, the following challenges need to be addressed: First it is necessary to build an integrated SNP pipeline that handles data processing steps smoothly from the beginning (collecting sequence information) to end (SNPs in the database). Also, SNP detection tool parameters have to be optimized to satisfy specific goals of the project. Finally, SNP data could not be fully used until the in-silico method is validated experimentally. In this paper we present a design and implementation of an in-silico SNP detection software pipeline that exploits the existence of large EST (expressed sequence tag) data sets and effectively addresses the above challenges. First, the pipeline allows for smooth data transition between its different components by implementing data interfaces that translate the data formats of the different tools in the different stages. Second, we optimized PolyBayes parameters for SNP detection in maize EST. Finally, we implemented a user interface that along with the database structure created allows the scientist to perform preliminary analysis of the data and to

  2. Mapping the genetic variation of regional brain volumes as explained by all common SNPs from the ADNI study.

    PubMed

    Bryant, Christopher; Giovanello, Kelly S; Ibrahim, Joseph G; Chang, Jing; Shen, Dinggang; Peterson, Bradley S; Zhu, Hongtu

    2013-01-01

    Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p = 0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p = 0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer's disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs. PMID:24015190

  3. SNPs in the aryl hydrocarbon receptor-interacting protein gene associated with sporadic non-functioning pituitary adenoma

    PubMed Central

    HU, YESHUAI; YANG, JUN; CHANG, YONGKAI; MA, SHUNCHANG; QI, JIANFA

    2016-01-01

    Mutations in the aryl hydrocarbon receptor-interacting protein (AIP) gene have previously been associated with a predisposition to pituitary adenomas. However, to the best of our knowledge, mutations in AIP that relate specifically to sporadic non-functioning pituitary adenomas (NFPAs) have yet to be reported. Therefore, the present study aimed to identify single nucleotide polymorphisms (SNPs) in the AIP gene that may be associated with NFPAs. Peripheral blood samples and the entire coding sequence of the AIP gene from 56 patients with NFPAs and 56 controls were analyzed in triplicate. Of the 56 patients with NFPAs, 9 patients (16.1%) were identified as harboring five different SNPs, although no germline mutations in the AIP gene were detected in any of the patients. Three different SNPs (7051C>T, 8012G>C and 8020G>C) were identified in exons 4 and 6 in 3 different patients (each in 1 patient). Two different SNPs (7318C>A and 7886A>G) were identified in exons 5 and 6, respectively, in 6 different patients (each in 3 patients). No SNPs or germline mutations in the AIP gene were identified in the controls. The results of the present study suggested that mutations in the AIP gene might not have an important role in the tumorigenesis of NFPAs. However, further studies are required in order to investigate potential molecular and genetic mechanisms that may underlie the involvement of AIP in NFPA. PMID:26998050

  4. Mapping the Genetic Variation of Regional Brain Volumes as Explained by All Common SNPs from the ADNI Study

    PubMed Central

    Bryant, Christopher; Giovanello, Kelly S.; Ibrahim, Joseph G.; Chang, Jing; Shen, Dinggang; Peterson, Bradley S.; Zhu, Hongtu

    2013-01-01

    Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p = 0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p = 0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer's disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs. PMID:24015190

  5. A Mismatch EndoNuclease Array-Based Methodology (MENA) for Identifying Known SNPs or Novel Point Mutations.

    PubMed

    Comeron, Josep M; Reed, Jordan; Christie, Matthew; Jacobs, Julia S; Dierdorff, Jason; Eberl, Daniel F; Manak, J Robert

    2016-01-01

    Accurate and rapid identification or confirmation of single nucleotide polymorphisms (SNPs), point mutations and other human genomic variation facilitates understanding the genetic basis of disease. We have developed a new methodology (called MENA (Mismatch EndoNuclease Array)) pairing DNA mismatch endonuclease enzymology with tiling microarray hybridization in order to genotype both known point mutations (such as SNPs) as well as identify previously undiscovered point mutations and small indels. We show that our assay can rapidly genotype known SNPs in a human genomic DNA sample with 99% accuracy, in addition to identifying novel point mutations and small indels with a false discovery rate as low as 10%. Our technology provides a platform for a variety of applications, including: (1) genotyping known SNPs as well as confirming newly discovered SNPs from whole genome sequencing analyses; (2) identifying novel point mutations and indels in any genomic region from any organism for which genome sequence information is available; and (3) screening panels of genes associated with particular diseases and disorders in patient samples to identify causative mutations. As a proof of principle for using MENA to discover novel mutations, we report identification of a novel allele of the beethoven (btv) gene in Drosophila, which encodes a ciliary cytoplasmic dynein motor protein important for auditory mechanosensation. PMID:27600073

  6. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  7. Effect of DISC1 SNPs on brain structure in healthy controls and patients with a history of psychosis.

    PubMed

    Kähler, Anna K; Rimol, Lars M; Brown, Andrew Anand; Djurovic, Srdjan; Hartberg, Cecilie B; Melle, Ingrid; Dale, Anders M; Andreassen, Ole A; Agartz, Ingrid

    2012-09-01

    Disrupted-in-Schizophrenia-1 (DISC1) has been suggested as a susceptibility locus for a broad spectrum of psychiatric disorders. Risk variants have been associated with brain structural changes, which overlap alterations reported in schizophrenia and bipolar disorder patients. We used genome-wide genotyping data for a Norwegian sample of healthy controls (n = 171) and patients with a history of psychosis (n = 184), to investigate 61 SNPs in the DISC1 region for putative association with structural magnetic resonance imaging (sMRI) measures (hippocampal volume; mean cortical thickness; and total surface area, as well as cortical thickness and area divided into four lobar measures). SNP rs821589 was associated with mean temporal and total brain cortical thickness in controls (P(adjusted) = 0.009 and 0.02, respectively), but not in patients. SNPs rs11122319 and rs1417584 were associated with mean temporal cortical thickness in patients (P(adjusted) = 0.04 and 0.03, respectively), but not in controls, and both SNPs have previously been highly associated with DISC1 gene expression. There were significant genotype ×  case-control interactions. There was no significant association between SNPs and cortical area or hippocampal volume in controls, or with any of the structural measures in cases, after correction for multiple comparisons. In conclusion, DISC1 SNPs might impact brain structural variation, possibly differently in psychosis patients versus controls, but independent replication will be needed to confirm our findings. PMID:22815203

  8. Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs.

    PubMed

    Amorim, António; Pereira, Luísa

    2005-05-28

    Recent advances in single nucleotide polymorphisms (SNPs) research have raised the possibility that these markers could replace the forensically established short tandem repeats (STRs). In this work, we compare STRs and SNPs applicability for kinship investigation in terms of expected informative content and probability of occurrence of "difficult cases" (when isolated Mendelian incompatibilities between alleged father and child are found). Since SNPs have a much lower mutation rate than STRs, these difficulties were expected to occur less frequently if SNPs were used instead of STRs. The purpose of this paper is to make some simulations allowing the estimation of how often such difficult cases are expected to occur using both types of markers and how serious can be their impact in routine work. Our results demonstrate that a battery based exclusively on SNPs matching the informative power of current STR kits would be prone, if applied to routine paternity investigation, to the occurrence of cases where the statistical evidence would be inconclusive. We infer that the introduction of a SNP based strategy, as a substitute to the now classical STR approach poses statistical problems that must be carefully evaluated. PMID:15837005

  9. In silico analysis of consequences of non-synonymous SNPs of Slc11a2 gene in Indian bovines.

    PubMed

    Patel, Shreya M; Koringa, Prakash G; Reddy, Bhaskar B; Nathani, Neelam M; Joshi, Chaitanya G

    2015-09-01

    The aim of our study was to analyze the consequences of non-synonymous SNPs in Slc11a2 gene using bioinformatic tools. There is a current need of efficient bioinformatic tools for in-depth analysis of data generated by the next generation sequencing technologies. SNPs are known to play an imperative role in understanding the genetic basis of many genetic diseases. Slc11a2 is one of the major metal transporter families in mammals and plays a critical role in host defenses. In this study, we performed a comprehensive analysis of the impact of all non-synonymous SNPs in this gene using multiple tools like SIFT, PROVEAN, I-Mutant and PANTHER. Among the total 124 SNPs obtained from amplicon sequencing of Slc11a2 gene by Ion Torrent PGM involving 10 individuals of Gir cattle and Murrah buffalo each, we found 22 non-synonymous. Comparing the prediction of these 4 methods, 5 nsSNPs (G369R, Y374C, A377V, Q385H and N492S) were identified as deleterious. In addition, while tested out for polar interactions with other amino acids in the protein, from above 5, Y374C, Q385H and N492S showed a change in interaction pattern and further confirmed by an increase in total energy after energy minimizations in case of mutant protein compared to the native. PMID:26484229

  10. A Mismatch EndoNuclease Array-Based Methodology (MENA) for Identifying Known SNPs or Novel Point Mutations

    PubMed Central

    Comeron, Josep M.; Reed, Jordan; Christie, Matthew; Jacobs, Julia S.; Dierdorff, Jason; Eberl, Daniel F.; Manak, J. Robert

    2016-01-01

    Accurate and rapid identification or confirmation of single nucleotide polymorphisms (SNPs), point mutations and other human genomic variation facilitates understanding the genetic basis of disease. We have developed a new methodology (called MENA (Mismatch EndoNuclease Array)) pairing DNA mismatch endonuclease enzymology with tiling microarray hybridization in order to genotype both known point mutations (such as SNPs) as well as identify previously undiscovered point mutations and small indels. We show that our assay can rapidly genotype known SNPs in a human genomic DNA sample with 99% accuracy, in addition to identifying novel point mutations and small indels with a false discovery rate as low as 10%. Our technology provides a platform for a variety of applications, including: (1) genotyping known SNPs as well as confirming newly discovered SNPs from whole genome sequencing analyses; (2) identifying novel point mutations and indels in any genomic region from any organism for which genome sequence information is available; and (3) screening panels of genes associated with particular diseases and disorders in patient samples to identify causative mutations. As a proof of principle for using MENA to discover novel mutations, we report identification of a novel allele of the beethoven (btv) gene in Drosophila, which encodes a ciliary cytoplasmic dynein motor protein important for auditory mechanosensation. PMID:27600073

  11. Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene

    PubMed Central

    Omer, Shaza E.; Khalf-allah, Rahma M.; Mustafa, Razaz Y.; Ali, Isra S.; Mohamed, Sofia B.

    2016-01-01

    This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3′ UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5′ UTR). In addition for 5′/3′ splice sites, analysis showed that one SNP within 5′ splice site and one Indel in 3′ splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases. PMID:27478437

  12. Westinghouse filter update

    SciTech Connect

    Bruck, G.J.; Smeltzer, E.E.; Newby, R.A.; Bachovchin, D.M.

    1993-06-01

    The Department of Energy, Morgantown Energy Technology Center (DOE/METC), with Westinghouse are developing high temperature particulate filters for application in integrated, coal gasification combined cycle (IGCC) and pressurized fluidized bed combustion (PFBC) power generation systems. Development of these IGCC and PFBC advanced power cycles using subpilot and pilot scale facilities include the integrated operation of a high temperature particulate filter. This testing provides the basis for evaluating filter design, performance and operation characteristics in the actual process gas environment. This operating data is essential for the specification of components and materials and successful scaleup of the filter systems for demonstration and commercial application.

  13. Independent task Fourier filters

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    2001-11-01

    Since the early 1960s, a major part of optical computing systems has been Fourier pattern recognition, which takes advantage of high speed filter changes to enable powerful nonlinear discrimination in `real time.' Because filter has a task quite independent of the tasks of the other filters, they can be applied and evaluated in parallel or, in a simple approach I describe, in sequence very rapidly. Thus I use the name ITFF (independent task Fourier filter). These filters can also break very complex discrimination tasks into easily handled parts, so the wonderful space invariance properties of Fourier filtering need not be sacrificed to achieve high discrimination and good generalizability even for ultracomplex discrimination problems. The training procedure proceeds sequentially, as the task for a given filter is defined a posteriori by declaring it to be the discrimination of particular members of set A from all members of set B with sufficient margin. That is, we set the threshold to achieve the desired margin and note the A members discriminated by that threshold. Discriminating those A members from all members of B becomes the task of that filter. Those A members are then removed from the set A, so no other filter will be asked to perform that already accomplished task.

  14. Filter construction and design.

    PubMed

    Jornitz, Maik W

    2006-01-01

    Sterilizing and pre-filters are manufactured in different formats and designs. The criteria for the specific designs are set by the application and the specifications of the filter user. The optimal filter unit or even system requires evaluation, such as flow rate, throughput, unspecific adsorption, steam sterilizability and chemical compatibility. These parameters are commonly tested within a qualification phase, which ensures that an optimal filter design and combination finds its use. If such design investigations are neglected it could be costly in the process scale. PMID:16570863

  15. Nanofiber Filters Eliminate Contaminants

    NASA Technical Reports Server (NTRS)

    2009-01-01

    With support from Phase I and II SBIR funding from Johnson Space Center, Argonide Corporation of Sanford, Florida tested and developed its proprietary nanofiber water filter media. Capable of removing more than 99.99 percent of dangerous particles like bacteria, viruses, and parasites, the media was incorporated into the company's commercial NanoCeram water filter, an inductee into the Space Foundation's Space Technology Hall of Fame. In addition to its drinking water filters, Argonide now produces large-scale nanofiber filters used as part of the reverse osmosis process for industrial water purification.

  16. Linear phase compressive filter

    DOEpatents

    McEwan, T.E.

    1995-06-06

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line. 2 figs.

  17. Linear phase compressive filter

    DOEpatents

    McEwan, Thomas E.

    1995-01-01

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line.

  18. Relationship between three novel SNPs of BRCA1 and canine mammary tumors

    PubMed Central

    SUN, Weidong; YANG, Xu; QIU, Hengbin; ZHANG, Di; WANG, Huanan; HUANG, Jian; LIN, Degui

    2015-01-01

    The BRCA1 gene plays an important role in the development of human breast cancer, and recent research indicated that genetic variations of BRCA1 are also related to canine mammary tumors (CMTs). Here, using rapid amplification of cDNA ends (RACE), we cloned the 5′- and 3′-UTRs of BRCA1. By direct sequencing of the flanking sequences of the 5′- and 3′-UTRs of BRCA1, three previously unreported single-nucleotide polymorphisms (SNPs) were identified, two (−1228T >C, −1173C >T) in the putative promoter regions and one non-synonymous SNP (63449G >A) in exon 23. Compared with 16 normal samples, the sequences from 34 CMTs suggested that SNP (−1173C >T) was associated with the development of CMTs (odds ratio (OR)=2.57, 95% confidence interval (CI): 1.07–6.15). PMID:26156012

  19. Uneven-order decentered Shapiro filters for boundary filtering

    NASA Astrophysics Data System (ADS)

    Falissard, F.

    2015-07-01

    This paper addresses the use of Shapiro filters for boundary filtering. A new class of uneven-order decentered Shapiro filters is proposed and compared to classical Shapiro filters and even-order decentered Shapiro filters. The theoretical analysis shows that the proposed boundary filters are more accurate than the centered Shapiro filters and more robust than the even-order decentered boundary filters usable at the same distance to the boundary. The benefit of the new boundary filters is assessed for computations using the compressible Euler equations.

  20. Tagging SNPs in the ERCC4 gene are associated with gastric cancer risk.

    PubMed

    Chu, Haiyan; Zhao, Qinghong; Wang, Shizhi; Wang, Meilin; Xu, Ming; Gao, Yan; Luo, Dewei; Tan, Yongfei; Gong, Weida; Zhang, Zhengdong; Wu, Dongmei

    2013-05-25

    ERCC4 plays an essential role in the nucleotide excision repair (NER) pathway, which is involved in the removal of a wide variety of DNA lesions. To determine whether the ERCC4 tagging SNPs (tSNPs) are associated with risk of gastric cancer, we conducted a hospital-based case-control study of 350 cases and 468 cancer-free controls. In the logistic regression (LR) analysis, we found a significantly decreased risk of gastric cancer associated with the rs744154 GC/CC genotypes [adjusted odds ratio (OR)=0.56, 95% confidence interval (CI)=0.42-0.75, false discovery rate (FDR) P=0.003] compared with the wild-type GG genotype. Haplotype-based association study revealed that the CGC haplotype that containing the rs744154 C allele can decrease the risk of gastric cancer compared with the most common haplotype GGT (adjusted OR=0.61, 95% CI=0.46-0.81). Using the multifactor dimensionality reduction (MDR) analysis, we identified that the SNP rs744154 and smoking status were the best two predictive factors for gastric cancer with a testing accuracy of 55.76% and a perfect cross-validation consistency (CVC) of 10 (P=0.001). Furthermore, the smokers with the rs744154 GC/CC genotypes showed a decreased risk of gastric cancer (adjusted OR=0.55, 95% CI=0.35-0.85) compared with the smokers with the GG genotype using multivariate LR analysis. The above findings consistently suggested that genetic variants in the ERCC4 gene may play a protective role in the etiology of gastric cancer, even in the smokers. PMID:23537993

  1. Allelic Spectra of Risk SNPs Are Different for Environment/Lifestyle Dependent versus Independent Diseases

    PubMed Central

    Amos, Christopher I.

    2015-01-01

    Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning “environment” or “lifestyle” AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases. PMID:26201053

  2. SNPs and breast cancer risk prediction for African American and Hispanic women.

    PubMed

    Allman, Richard; Dite, Gillian S; Hopper, John L; Gordon, Ora; Starlard-Davenport, Athena; Chlebowski, Rowan; Kooperberg, Charles

    2015-12-01

    For African American or Hispanic women, the extent to which clinical breast cancer risk prediction models are improved by including information on susceptibility single nucleotide polymorphisms (SNPs) is unknown, even though these women comprise increasing proportions of the US population and represent a large proportion of the world's population. We studied 7539 African American and 3363 Hispanic women from the Women's Health Initiative. The age-adjusted 5-year risks from the BCRAT and IBIS risk prediction models were measured and combined with a risk score based on >70 independent susceptibility SNPs. Logistic regression, adjusting for age group, was used to estimate risk associations with log-transformed age-adjusted 5-year risks. Discrimination was measured by the odds ratio (OR) per standard deviation (SD) and the area under the receiver operator curve (AUC). When considered alone, the ORs for African American women were 1.28 for BCRAT, and 1.04 for IBIS. When combined with the SNP risk score (OR 1.23), the corresponding ORs were 1.39 and 1.22. For Hispanic women the corresponding ORs were 1.25 for BCRAT, and 1.15 for IBIS. When combined with the SNP risk score (OR 1.39), the corresponding ORs were 1.48 and 1.42. There was no evidence that any of the combined models were not well calibrated. Including information on known breast cancer susceptibility loci provides approximately 10 and 19% improvement in risk prediction using BCRAT for African Americans and Hispanics, respectively. The corresponding figures for IBIS are approximately 18 and 26%, respectively. PMID:26589314

  3. Comparison of family history and SNPs for predicting risk of complex disease.

    PubMed

    Do, Chuong B; Hinds, David A; Francke, Uta; Eriksson, Nicholas

    2012-01-01

    The clinical utility of family history and genetic tests is generally well understood for simple Mendelian disorders and rare subforms of complex diseases that are directly attributable to highly penetrant genetic variants. However, little is presently known regarding the performance of these methods in situations where disease susceptibility depends on the cumulative contribution of multiple genetic factors of moderate or low penetrance. Using quantitative genetic theory, we develop a model for studying the predictive ability of family history and single nucleotide polymorphism (SNP)-based methods for assessing risk of polygenic disorders. We show that family history is most useful for highly common, heritable conditions (e.g., coronary artery disease), where it explains roughly 20%-30% of disease heritability, on par with the most successful SNP models based on associations discovered to date. In contrast, we find that for diseases of moderate or low frequency (e.g., Crohn disease) family history accounts for less than 4% of disease heritability, substantially lagging behind SNPs in almost all cases. These results indicate that, for a broad range of diseases, already identified SNP associations may be better predictors of risk than their family history-based counterparts, despite the large fraction of missing heritability that remains to be explained. Our model illustrates the difficulty of using either family history or SNPs for standalone disease prediction. On the other hand, we show that, unlike family history, SNP-based tests can reveal extreme likelihood ratios for a relatively large percentage of individuals, thus providing potentially valuable adjunctive evidence in a differential diagnosis. PMID:23071447

  4. The more the merrier? How a few SNPs predict pigmentation phenotypes in the Northern German population.

    PubMed

    Caliebe, Amke; Harder, Melanie; Schuett, Rebecca; Krawczak, Michael; Nebel, Almut; von Wurmb-Schwark, Nicole

    2016-05-01

    Human pigmentation traits are of great interest to many research areas, from ancient DNA analysis to forensic science. We developed a gene-based predictive model for pigmentation phenotypes in a realistic target population for forensic case work from Northern Germany and compared our model with those brought forth by previous studies of genetically more heterogeneous populations. In doing so, we aimed at answering the following research questions: (1) do existing models allow good prediction of high-quality phenotypes in a genetically similar albeit more homogeneous population? (2) Would a model specifically set up for the more homogeneous population perform notably better than existing models? (3) Can the number of markers included in existing models be reduced without compromising their predictive capability in the more homogenous population? We investigated the association between eye, hair and skin colour and 12 candidate single-nucleotide polymorphisms (SNPs) from six genes. Our study comprised two samples of 300 and 100 individuals from Northern Germany. SNP rs12913832 in HERC2 was found to be strongly associated with blue eye colour (odds ratio=40.0, P<1.2 × 10(-4)) and to yield moderate predictive power (AUC: 77%; sensitivity: 90%, specificity: 63%, both at a 0.5 threshold for blue eye colour probability). SNP associations with hair and skin colour were weaker and genotypes less predictive. A comparison with two recently published sets of markers to predict eye and hair colour revealed that the consideration of additional SNPs with weak-to-moderate effect increased the predictive power for eye colour, but not for hair colour. PMID:26286644

  5. [Construction and function identification of luciferase reporter gene vectors containing SNPs in NFKBIA gene 3'UTR].

    PubMed

    Yang, Shuo; Li, Jia-li; Bi, Hui-chang; Zhou, Shou-ning; Liu, Xiao-man; Zeng, Hang; Hu, Bing-fang; Huang, Min

    2016-01-01

    This study aims to investigate the function of two SNPs (rs8904C > T and rs696G >A) in 3' untranslated region (3'UTR) of NFKBIA gene by constructing luciferase reporter gene. A patient's genomic DNA with rs8904 CC and rs696 GA genotype was used as the PCR template. Full-length 3'UTR of NFKBIA gene was amplified by different primers. After sequencing validation, these fragments were inserted to the luciferase reporter vector, pGL3-promoter to construct recombinant plasmids containing four kinds of haplotypes, pGL3-rs8904C/rs696G, pGL3-rs8904C/rs696A, pGL3-rs8904T/rs696G and pGL3-rs8904T/rs696A. Then these plasmids were transfected into LS174T cells and the luciferase activity was detected. Compared with pGL3-vector transfected cells (negative control), the luciferase activity of the four kinds of recombinant plasmids was significantly decreased (P < 0.001). For rs696G > A, the luciferase activity of the recombinant plasmids containing A allele (pGL3-rs8904C/rs696A and pGL3-rs8904T/rs696A) was about 45.1% (P < 0.05) and 56.1% (P < 0.001) lower than those containing G allele (pGL3-rs8904C/rs696G and pGL3-rs8904T/rs696G), respectively. For rs8904C > T, there were no significant differences in the luciferase activity between the recombinant plasmids containing T allele and those with C allele. Together, the luciferase reporter gene vectors containing SNPs in NFKBIA gene 3'UTR were constructed successfully and rs696G > A could decrease the luciferase activity while rs8904C >T didn't have much effect on the luciferase activity. PMID:27405166

  6. A consensus linkage map of the grass carp (Ctenopharyngodon idella) based on microsatellites and SNPs

    PubMed Central

    2010-01-01

    Background Grass carp (Ctenopharyngodon idella) belongs to the family Cyprinidae which includes more than 2000 fish species. It is one of the most important freshwater food fish species in world aquaculture. A linkage map is an essential framework for mapping traits of interest and is often the first step towards understanding genome evolution. The aim of this study is to construct a first generation genetic map of grass carp using microsatellites and SNPs to generate a new resource for mapping QTL for economically important traits and to conduct a comparative mapping analysis to shed new insights into the evolution of fish genomes. Results We constructed a first generation linkage map of grass carp with a mapping panel containing two F1 families including 192 progenies. Sixteen SNPs in genes and 263 microsatellite markers were mapped to twenty-four linkage groups (LGs). The number of LGs was corresponding to the haploid chromosome number of grass carp. The sex-specific map was 1149.4 and 888.8 cM long in females and males respectively whereas the sex-averaged map spanned 1176.1 cM. The average resolution of the map was 4.2 cM/locus. BLAST searches of sequences of mapped markers of grass carp against the whole genome sequence of zebrafish revealed substantial macrosynteny relationship and extensive colinearity of markers between grass carp and zebrafish. Conclusions The linkage map of grass carp presented here is the first linkage map of a food fish species based on co-dominant markers in the family Cyprinidae. This map provides a valuable resource for mapping phenotypic variations and serves as a reference to approach comparative genomics and understand the evolution of fish genomes and could be complementary to grass carp genome sequencing project. PMID:20181260

  7. SNPs in transporter and metabolizing genes as predictive markers for oxaliplatin treatment in colorectal cancer patients.

    PubMed

    Kap, Elisabeth J; Seibold, Petra; Scherer, Dominique; Habermann, Nina; Balavarca, Yesilda; Jansen, Lina; Zucknick, Manuela; Becker, Natalia; Hoffmeister, Michael; Ulrich, Alexis; Benner, Axel; Ulrich, Cornelia M; Burwinkel, Barbara; Brenner, Hermann; Chang-Claude, Jenny

    2016-06-15

    Oxaliplatin is frequently used as part of a chemotherapeutic regimen with 5-fluorouracil in the treatment of colorectal cancer (CRC). The cellular availability of oxaliplatin is dependent on metabolic and transporter enzymes. Variants in genes encoding these enzymes may cause variation in response to oxaliplatin and could be potential predictive markers. Therefore, we used a two-step procedure to comprehensively investigate 1,444 single nucleotide polymorphisms (SNPs) from these pathways for their potential as predictive markers for oxaliplatin treatment, using 623 stage II-IV CRC patients (of whom 201 patients received oxaliplatin) from a German prospective patient cohort treated with adjuvant or palliative chemotherapy. First, all genes were screened using the global test that evaluated SNP*oxaliplatin interaction terms per gene. Second, one model was created by backward elimination on all SNP*oxaliplatin interactions of the selected genes. The statistical procedure was evaluated using bootstrap analyses. Nine genes differentially associated with overall survival according to oxaliplatin treatment (unadjusted p values < 0.05) were selected. Model selection resulted in the inclusion of 14 SNPs from eight genes (six transporter genes, ABCA9, ABCB11, ABCC10, ATP1A1, ATP1B2, ATP8B3, and two metabolism genes GSTM5, GRHPR), which significantly improved model fit. Using bootstrap analysis we show an improvement of the prediction error of 3.7% in patients treated with oxaliplatin. Several variants in genes involved in metabolism and transport could thus be potential predictive markers for oxaliplatin treatment in CRC patients. If confirmed, inclusion of these variants in a predictive test could identify patients who are more likely to benefit from treatment with oxaliplatin. PMID:26835885

  8. The identification of trans-associations between prostate cancer GWAS SNPs and RNA expression differences in tumor-adjacent stroma

    PubMed Central

    Chen, Xin; McClelland, Michael; Jia, Zhenyu; Rahmatpanah, Farah B.; Sawyers, Anne; Trent, Jeffrey; Duggan, David; Mercola, Dan

    2015-01-01

    Here we tested the hypothesis that SNPs associated with prostate cancer risk, might differentially affect RNA expression in prostate cancer stroma. The most significant 35 SNP loci were selected from Genome Wide Association (GWA) studies of ~40,000 patients. We also selected 4030 transcripts previously associated with prostate cancer diagnosis and prognosis. eQTL analysis was carried out by a modified BAYES method to analyze the associations between the risk variants and expressed transcripts jointly in a single model. We observed 47 significant associations between eight risk variants and the expression patterns of 46 genes. This is the first study to identify associations between multiple SNPs and multiple in trans gene expression differences in cancer stroma. Potentially, a combination of SNPs and associated expression differences in prostate stroma may increase the power of risk assessment for individuals, and for cancer progression. PMID:25638161

  9. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, T.E.; Alvin, M.A.; Bruck, G.J.; Smeltzer, E.E.

    1999-03-02

    A filter holder and gasket assembly are disclosed for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut. 9 figs.

  10. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, Thomas Edwin; Alvin, Mary Anne; Bruck, Gerald Joseph; Smeltzer, Eugene E.

    1999-03-02

    A filter holder and gasket assembly for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut.

  11. Downflow dust filter

    SciTech Connect

    Richard, K.L.

    1986-09-09

    This patent describes an industrial dust filter apparatus comprising: a housing including upper, intermediate and lower sections, the upper section having a top opening inlet for particulate laden gases and the lower section tapering downwardly to a particulate outlet; a plurality of vertically arranged substantially cylindrical filters supported in substantially parallel relationship to each other in the intermediate section of the housing, the filters being closed at their upper ends and having their exterior filter surfaces exposed to particulate laden gases from the inlet; at least one horizontal duct extending across the housing beneath the filters, closed at one end and opening at its other end to a clean gas outlet through a side wall of the intermediate housing section; means communicating the lower open end of the filters through the upper walls of the duct so that the duct functions as a clean gas plenum; a plurality of verturis, vertically supported in the duct, on aligned with each filter; and means in the duct for pulse firing a jet of air upwardly through the venturis into the interior of the filters to remove particulates from the outer surfaces thereof.

  12. Filtering reprecipitated slurry

    SciTech Connect

    Morrissey, M.F.

    1992-12-31

    As part of the Late Washing Demonstration at Savannah River Technology Center, Interim Waste Technology has filtered reprecipitated and non reprecipitated slurry with the Experimental Laboratory Filter (ELF) at TNX. Reprecipitated slurry generates higher permeate fluxes than non reprecipitated slurry. Washing reprecipitated slurry may require a defoamer because reprecipitation encourages foaming.

  13. Filtering reprecipitated slurry

    SciTech Connect

    Morrissey, M.F.

    1992-01-01

    As part of the Late Washing Demonstration at Savannah River Technology Center, Interim Waste Technology has filtered reprecipitated and non reprecipitated slurry with the Experimental Laboratory Filter (ELF) at TNX. Reprecipitated slurry generates higher permeate fluxes than non reprecipitated slurry. Washing reprecipitated slurry may require a defoamer because reprecipitation encourages foaming.

  14. Active rejector filter

    SciTech Connect

    Kuchinskii, A.G.; Pirogov, S.G.; Savchenko, V.M.; Yakushev, A.K.

    1985-01-01

    This paper describes an active rejector filter for suppressing noise signals in the frequency range 50-100 Hz and for extracting a vlf information signal. The filter has the following characteristics: a high input impedance, a resonant frequency of 75 Hz, a Q of 1.25, and an attenuation factor of 53 dB at resonant frequency.

  15. APCR, factor V gene known and novel SNPs and adverse pregnancy outcomes in an Irish cohort of pregnant women

    PubMed Central

    2010-01-01

    Background Activated Protein C Resistance (APCR), a poor anticoagulant response of APC in haemostasis, is the commonest heritable thrombophilia. Adverse outcomes during pregnancy have been linked to APCR. This study determined the frequency of APCR, factor V gene known and novel SNPs and adverse outcomes in a group of pregnant women. Methods Blood samples collected from 907 pregnant women were tested using the Coatest® Classic and Modified functional haematological tests to establish the frequency of APCR. PCR-Restriction Enzyme Analysis (PCR-REA), PCR-DNA probe hybridisation analysis and DNA sequencing were used for molecular screening of known mutations in the factor V gene in subjects determined to have APCR based on the Coatest® Classic and/or Modified functional haematological tests. Glycosylase Mediated Polymorphism Detection (GMPD), a SNP screening technique and DNA sequencing, were used to identify SNPs in the factor V gene of 5 APCR subjects. Results Sixteen percent of the study group had an APCR phenotype. Factor V Leiden (FVL), FV Cambridge, and haplotype (H) R2 alleles were identified in this group. Thirty-three SNPs; 9 silent SNPs and 24 missense SNPs, of which 20 SNPs were novel, were identified in the 5 APCR subjects. Adverse pregnancy outcomes were found at a frequency of 35% in the group with APCR based on Classic Coatest® test only and at 45% in the group with APCR based on the Modified Coatest® test. Forty-eight percent of subjects with FVL had adverse outcomes while in the group of subjects with no FVL, adverse outcomes occurred at a frequency of 37%. Conclusions Known mutations and novel SNPs in the factor V gene were identified in the study cohort determined to have APCR in pregnancy. Further studies are required to investigate the contribution of these novel SNPs to the APCR phenotype. Adverse outcomes including early pregnancy loss (EPL), preeclampsia (PET) and intrauterine growth restriction (IGUR) were not significantly more frequent

  16. Sintered composite filter

    DOEpatents

    Bergman, W.

    1986-05-02

    A particulate filter medium formed of a sintered composite of 0.5 micron diameter quartz fibers and 2 micron diameter stainless steel fibers is described. Preferred composition is about 40 vol.% quartz and about 60 vol.% stainless steel fibers. The media is sintered at about 1100/sup 0/C to bond the stainless steel fibers into a cage network which holds the quartz fibers. High filter efficiency and low flow resistance are provided by the smaller quartz fibers. High strength is provided by the stainless steel fibers. The resulting media has a high efficiency and low pressure drop similar to the standard HEPA media, with tensile strength at least four times greater, and a maximum operating temperature of about 550/sup 0/C. The invention also includes methods to form the composite media and a HEPA filter utilizing the composite media. The filter media can be used to filter particles in both liquids and gases.

  17. Sub-micron filter

    DOEpatents

    Tepper, Frederick; Kaledin, Leonid

    2009-10-13

    Aluminum hydroxide fibers approximately 2 nanometers in diameter and with surface areas ranging from 200 to 650 m.sup.2/g have been found to be highly electropositive. When dispersed in water they are able to attach to and retain electronegative particles. When combined into a composite filter with other fibers or particles they can filter bacteria and nano size particulates such as viruses and colloidal particles at high flux through the filter. Such filters can be used for purification and sterilization of water, biological, medical and pharmaceutical fluids, and as a collector/concentrator for detection and assay of microbes and viruses. The alumina fibers are also capable of filtering sub-micron inorganic and metallic particles to produce ultra pure water. The fibers are suitable as a substrate for growth of cells. Macromolecules such as proteins may be separated from each other based on their electronegative charges.

  18. Implicit Kalman filtering

    NASA Technical Reports Server (NTRS)

    Skliar, M.; Ramirez, W. F.

    1997-01-01

    For an implicitly defined discrete system, a new algorithm for Kalman filtering is developed and an efficient numerical implementation scheme is proposed. Unlike the traditional explicit approach, the implicit filter can be readily applied to ill-conditioned systems and allows for generalization to descriptor systems. The implementation of the implicit filter depends on the solution of the congruence matrix equation (A1)(Px)(AT1) = Py. We develop a general iterative method for the solution of this equation, and prove necessary and sufficient conditions for convergence. It is shown that when the system matrices of an implicit system are sparse, the implicit Kalman filter requires significantly less computer time and storage to implement as compared to the traditional explicit Kalman filter. Simulation results are presented to illustrate and substantiate the theoretical developments.

  19. Multidimensional synthetic estimation filter

    NASA Technical Reports Server (NTRS)

    Monroe, Stanley E., Jr.; Juday, Richard D.

    1990-01-01

    The synthetic estimation filter (SEF) crafts an affine variation into its response to a changing parameter (e.g. scale or rotation). Sets of such filters are used in an estimation correlator to reduce the number of filters required for a given tracking accuracy. By overspecifying the system (one more SEF than parameters to be tracked), the ratio of correlation responses between filters forms a robust estimator into the spanned domain of the parameters. Previous results dealt with a laboratory correlator which could track a single parameter. This paper explores the SEF and the estimator's extension to more dimensions. A 2D example is given in which a reduction of filters from 25 to 3 is demonstrated to span a 4-degree square portion of pose space.

  20. Information geometric nonlinear filtering

    NASA Astrophysics Data System (ADS)

    Newton, Nigel J.

    2015-06-01

    This paper develops information geometric representations for nonlinear filters in continuous time. The posterior distribution associated with an abstract nonlinear filtering problem is shown to satisfy a stochastic differential equation on a Hilbert information manifold. This supports the Fisher metric as a pseudo-Riemannian metric. Flows of Shannon information are shown to be connected with the quadratic variation of the process of posterior distributions in this metric. Apart from providing a suitable setting in which to study such information-theoretic properties, the Hilbert manifold has an appropriate topology from the point of view of multi-objective filter approximations. A general class of finite-dimensional exponential filters is shown to fit within this framework, and an intrinsic evolution equation, involving Amari's -1-covariant derivative, is developed for such filters. Three example systems, one of infinite dimension, are developed in detail.