Science.gov

Sample records for filtering snps imputed

  1. SWEEP: A Tool for Filtering High-Quality SNPs in Polyploid Crops

    PubMed Central

    Clevenger, Josh P.; Ozias-Akins, Peggy

    2015-01-01

    High-throughput next-generation sequence-based genotyping and single nucleotide polymorphism (SNP) detection opens the door for emerging genomics-based breeding strategies such as genome-wide association analysis and genomic selection. In polyploids, SNP detection is confounded by a highly similar homeologous sequence where a polymorphism between subgenomes must be differentiated from a SNP. We have developed and implemented a novel tool called SWEEP: Sliding Window Extraction of Explicit Polymorphisms. SWEEP uses subgenome polymorphism haplotypes as contrast to identify true SNPs between genotypes. The tool is a single command script that calls a series of modules based on user-defined options and takes sorted/indexed bam files or vcf files as input. Filtering options are highly flexible and include filtering based on sequence depth, alternate allele ratio, and SNP quality on top of the SWEEP filtering procedure. Using real and simulated data we show that SWEEP outperforms current SNP filtering methods for polyploids. SWEEP can be used for high-quality SNP discovery in polyploid crops. PMID:26153076

  2. Analyses and Comparison of Accuracy of Different Genotype Imputation Methods

    PubMed Central

    Pei, Yu-Fang; Li, Jian; Zhang, Lei; Papasian, Christopher J.; Deng, Hong-Wen

    2008-01-01

    The power of genetic association analyses is often compromised by missing genotypic data which contributes to lack of significant findings, e.g., in in silico replication studies. One solution is to impute untyped SNPs from typed flanking markers, based on known linkage disequilibrium (LD) relationships. Several imputation methods are available and their usefulness in association studies has been demonstrated, but factors affecting their relative performance in accuracy have not been systematically investigated. Therefore, we investigated and compared the performance of five popular genotype imputation methods, MACH, IMPUTE, fastPHASE, PLINK and Beagle, to assess and compare the effects of factors that affect imputation accuracy rates (ARs). Our results showed that a stronger LD and a lower MAF for an untyped marker produced better ARs for all the five methods. We also observed that a greater number of haplotypes in the reference sample resulted in higher ARs for MACH, IMPUTE, PLINK and Beagle, but had little influence on the ARs for fastPHASE. In general, MACH and IMPUTE produced similar results and these two methods consistently outperformed fastPHASE, PLINK and Beagle. Our study is helpful in guiding application of imputation methods in association analyses when genotype data are missing. PMID:18958166

  3. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation.

  4. Genotype imputation via matrix completion.

    PubMed

    Chi, Eric C; Zhou, Hua; Chen, Gary K; Del Vecchyo, Diego Ortega; Lange, Kenneth

    2013-03-01

    Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

  5. Multiple imputation with multivariate imputation by chained equation (MICE) package

    PubMed Central

    2016-01-01

    Multiple imputation (MI) is an advanced technique for handing missing values. It is superior to single imputation in that it takes into account uncertainty in missing value imputation. However, MI is underutilized in medical literature due to lack of familiarity and computational challenges. The article provides a step-by-step approach to perform MI by using R multivariate imputation by chained equation (MICE) package. The procedure firstly imputed m sets of complete dataset by calling mice() function. Then statistical analysis such as univariate analysis and regression model can be performed within each dataset by calling with() function. This function sets the environment for statistical analysis. Lastly, the results obtained from each analysis are combined by using pool() function. PMID:26889483

  6. A reference panel of 64,976 haplotypes for genotype imputation.

    PubMed

    McCarthy, Shane; Das, Sayantan; Kretzschmar, Warren; Delaneau, Olivier; Wood, Andrew R; Teumer, Alexander; Kang, Hyun Min; Fuchsberger, Christian; Danecek, Petr; Sharp, Kevin; Luo, Yang; Sidore, Carlo; Kwong, Alan; Timpson, Nicholas; Koskinen, Seppo; Vrieze, Scott; Scott, Laura J; Zhang, He; Mahajan, Anubha; Veldink, Jan; Peters, Ulrike; Pato, Carlos; van Duijn, Cornelia M; Gillies, Christopher E; Gandin, Ilaria; Mezzavilla, Massimo; Gilly, Arthur; Cocca, Massimiliano; Traglia, Michela; Angius, Andrea; Barrett, Jeffrey C; Boomsma, Dorrett; Branham, Kari; Breen, Gerome; Brummett, Chad M; Busonero, Fabio; Campbell, Harry; Chan, Andrew; Chen, Sai; Chew, Emily; Collins, Francis S; Corbin, Laura J; Smith, George Davey; Dedoussis, George; Dorr, Marcus; Farmaki, Aliki-Eleni; Ferrucci, Luigi; Forer, Lukas; Fraser, Ross M; Gabriel, Stacey; Levy, Shawn; Groop, Leif; Harrison, Tabitha; Hattersley, Andrew; Holmen, Oddgeir L; Hveem, Kristian; Kretzler, Matthias; Lee, James C; McGue, Matt; Meitinger, Thomas; Melzer, David; Min, Josine L; Mohlke, Karen L; Vincent, John B; Nauck, Matthias; Nickerson, Deborah; Palotie, Aarno; Pato, Michele; Pirastu, Nicola; McInnis, Melvin; Richards, J Brent; Sala, Cinzia; Salomaa, Veikko; Schlessinger, David; Schoenherr, Sebastian; Slagboom, P Eline; Small, Kerrin; Spector, Timothy; Stambolian, Dwight; Tuke, Marcus; Tuomilehto, Jaakko; Van den Berg, Leonard H; Van Rheenen, Wouter; Volker, Uwe; Wijmenga, Cisca; Toniolo, Daniela; Zeggini, Eleftheria; Gasparini, Paolo; Sampson, Matthew G; Wilson, James F; Frayling, Timothy; de Bakker, Paul I W; Swertz, Morris A; McCarroll, Steven; Kooperberg, Charles; Dekker, Annelot; Altshuler, David; Willer, Cristen; Iacono, William; Ripatti, Samuli; Soranzo, Nicole; Walter, Klaudia; Swaroop, Anand; Cucca, Francesco; Anderson, Carl A; Myers, Richard M; Boehnke, Michael; McCarthy, Mark I; Durbin, Richard

    2016-10-01

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

  7. A reference panel of 64,976 haplotypes for genotype imputation.

    PubMed

    McCarthy, Shane; Das, Sayantan; Kretzschmar, Warren; Delaneau, Olivier; Wood, Andrew R; Teumer, Alexander; Kang, Hyun Min; Fuchsberger, Christian; Danecek, Petr; Sharp, Kevin; Luo, Yang; Sidore, Carlo; Kwong, Alan; Timpson, Nicholas; Koskinen, Seppo; Vrieze, Scott; Scott, Laura J; Zhang, He; Mahajan, Anubha; Veldink, Jan; Peters, Ulrike; Pato, Carlos; van Duijn, Cornelia M; Gillies, Christopher E; Gandin, Ilaria; Mezzavilla, Massimo; Gilly, Arthur; Cocca, Massimiliano; Traglia, Michela; Angius, Andrea; Barrett, Jeffrey C; Boomsma, Dorrett; Branham, Kari; Breen, Gerome; Brummett, Chad M; Busonero, Fabio; Campbell, Harry; Chan, Andrew; Chen, Sai; Chew, Emily; Collins, Francis S; Corbin, Laura J; Smith, George Davey; Dedoussis, George; Dorr, Marcus; Farmaki, Aliki-Eleni; Ferrucci, Luigi; Forer, Lukas; Fraser, Ross M; Gabriel, Stacey; Levy, Shawn; Groop, Leif; Harrison, Tabitha; Hattersley, Andrew; Holmen, Oddgeir L; Hveem, Kristian; Kretzler, Matthias; Lee, James C; McGue, Matt; Meitinger, Thomas; Melzer, David; Min, Josine L; Mohlke, Karen L; Vincent, John B; Nauck, Matthias; Nickerson, Deborah; Palotie, Aarno; Pato, Michele; Pirastu, Nicola; McInnis, Melvin; Richards, J Brent; Sala, Cinzia; Salomaa, Veikko; Schlessinger, David; Schoenherr, Sebastian; Slagboom, P Eline; Small, Kerrin; Spector, Timothy; Stambolian, Dwight; Tuke, Marcus; Tuomilehto, Jaakko; Van den Berg, Leonard H; Van Rheenen, Wouter; Volker, Uwe; Wijmenga, Cisca; Toniolo, Daniela; Zeggini, Eleftheria; Gasparini, Paolo; Sampson, Matthew G; Wilson, James F; Frayling, Timothy; de Bakker, Paul I W; Swertz, Morris A; McCarroll, Steven; Kooperberg, Charles; Dekker, Annelot; Altshuler, David; Willer, Cristen; Iacono, William; Ripatti, Samuli; Soranzo, Nicole; Walter, Klaudia; Swaroop, Anand; Cucca, Francesco; Anderson, Carl A; Myers, Richard M; Boehnke, Michael; McCarthy, Mark I; Durbin, Richard

    2016-10-01

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently. PMID:27548312

  8. [Jurisdiction and imputability].

    PubMed

    Tapiador Sanjuán, M J

    2004-12-01

    Validity, efficacy and responsibility of acts depend on the intelligence and will of the acting subject; therefore when they are reduced or debilitated, these acts may be declared as non-valid and the author, not-responsible for the acts. Some neurological pathologies may generate physical and/or psychic permanent deficiencies, which prevent subjects from acting on their own. For these cases, the law establishes the incapacity state, in order to protect the disabled and complete the reduced ability, guaranteeing their rights and security. The disabled state will be determined by a legal sentence, which states the lack of ability to manage. In that sentence extension and limits of the disability will be determined; disability level will be proportional to the insight degree.Similarly, a subject suffering a pathological condition that invalidates his/her will and intelligence will be considered non-responsible and not imputable, since there is no culpability ability. The Penal Code establishes the criteria that will determine the possibility of imputability or its absence, as well as modifying circumstances.

  9. Practical Consideration of Genotype Imputation: Sample Size, Window Size, Reference Choice, and Untyped Rate

    PubMed Central

    Zhang, Boshao; Zhi, Degui; Zhang, Kui; Gao, Guimin; Limdi, Nita N.; Liu, Nianjun

    2011-01-01

    Imputation offers a promising way to infer the missing and/or untyped genotypes in genetic studies. In practice, however, many factors may affect the quality of imputation. In this study, we evaluated the influence of untyped rate, sizes of the study sample and the reference sample, window size, and reference choice (for admixed population), as the factors affecting the quality of imputation. The results show that in order to obtain good imputation quality, it is necessary to have an untyped rate less than 50%, a reference sample size greater than 50, and a window size of greater than 500 SNPs (roughly 1 MB in base pairs). Compared with the whole-region imputation, piecewise imputation with large-enough window sizes provides improved efficacy. For an admixed study sample, if only an external reference panel is used, it should include samples from the ancestral populations that represent the admixed population under investigation. Internal references are strongly recommended. When internal references are limited, however, augmentation by external references should be used carefully. More specifically, augmentation with samples from the major source populations of the admixture can lower the quality of imputation; augmentation with seemingly genetically unrelated cohorts may improve the quality of imputation. PMID:22308193

  10. Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.

    PubMed

    Bolormaa, S; Gore, K; van der Werf, J H J; Hayes, B J; Daetwyler, H D

    2015-10-01

    Genotyping sheep for genome-wide SNPs at lower density and imputing to a higher density would enable cost-effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low-density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50-475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single-breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed

  11. Rapid genotype imputation from sequence without reference panels.

    PubMed

    Davies, Robert W; Flint, Jonathan; Myers, Simon; Mott, Richard

    2016-08-01

    Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r(2) value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans. PMID:27376236

  12. minimac2: faster genotype imputation

    PubMed Central

    Fuchsberger, Christian; Abecasis, Gonçalo R.; Hinds, David A.

    2015-01-01

    Summary: Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation. Availability and implementation: minimac2, including source code, documentation, and examples is available at http://genome.sph.umich.edu/wiki/Minimac2 Contact: cfuchsb@umich.edu, goncalo@umich.edu PMID:25338720

  13. Effect of reference population size and available ancestor genotypes on imputation of Mexican Holstein genotypes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The effects of reference population size and the availability of information from genotyped ancestors on the accuracy of imputation of single nucleotide polymorphisms (SNPs) were investigated for Mexican Holstein cattle. Three scenarios for reference population size were examined: (1) a local popula...

  14. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

    PubMed Central

    Palmer, Cameron; Pe’er, Itsik

    2016-01-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  15. Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites

    PubMed Central

    Samad, Hanif; Coll, Francesc; Preston, Mark D.; Ocholla, Harold; Fairhurst, Rick M.; Clark, Taane G.

    2015-01-01

    Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r2 for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r2, 0.87-0.96), but the performance of IMPUTE was mixed (allelic r2, 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima’s D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and

  16. Imputation-based population genetics analysis of Plasmodium falciparum malaria parasites.

    PubMed

    Samad, Hanif; Coll, Francesc; Preston, Mark D; Ocholla, Harold; Fairhurst, Rick M; Clark, Taane G

    2015-04-01

    Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r2 for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86 k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r2, 0.87-0.96), but the performance of IMPUTE was mixed (allelic r2, 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima's D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and

  17. Imputation-based population genetics analysis of Plasmodium falciparum malaria parasites.

    PubMed

    Samad, Hanif; Coll, Francesc; Preston, Mark D; Ocholla, Harold; Fairhurst, Rick M; Clark, Taane G

    2015-04-01

    Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r2 for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86 k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r2, 0.87-0.96), but the performance of IMPUTE was mixed (allelic r2, 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima's D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and

  18. Imputation of ungenotyped parental genotypes in dairy and beef cattle from progeny genotypes.

    PubMed

    Berry, D P; McParland, S; Kearney, J F; Sargolzaei, M; Mullen, M P

    2014-06-01

    The objective of this study was to quantify the accuracy of imputing the genotype of parents using information on the genotype of their progeny and a family-based and population-based imputation algorithm. Two separate data sets were used, one containing both dairy and beef animals (n=3122) with high-density genotypes (735 151 single nucleotide polymorphisms (SNPs)) and the other containing just dairy animals (n=5489) with medium-density genotypes (51 602 SNPs). Imputation accuracy of three different genotype density panels were evaluated representing low (i.e. 6501 SNPs), medium and high density. The full genotypes of sires with genotyped half-sib progeny were masked and subsequently imputed. Genotyped half-sib progeny group sizes were altered from 4 up to 12 and the impact on imputation accuracy was quantified. Up to 157 and 258 sires were used to test the accuracy of imputation in the dairy plus beef data set and the dairy-only data set, respectively. The efficiency and accuracy of imputation was quantified as the proportion of genotypes that could not be imputed, and as both the genotype concordance rate and allele concordance rate. The median proportion of genotypes per animal that could not be imputed in the imputation process decreased as the number of genotyped half-sib progeny increased; values for the medium-density panel ranged from a median of 0.015 with a half-sib progeny group size of 4 to a median of 0.0014 to 0.0015 with a half-sib progeny group size of 8. The accuracy of imputation across different paternal half-sib progeny group sizes was similar in both data sets. Concordance rates increased considerably as the number of genotyped half-sib progeny increased from four (mean animal allele concordance rate of 0.94 in both data sets for the medium-density genotype panel) to five (mean animal allele concordance rate of 0.96 in both data sets for the medium-density genotype panel) after which it was relatively stable up to a half-sib progeny group size

  19. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    PubMed Central

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-01-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase. PMID:27762341

  20. Genotype Imputation with Millions of Reference Samples.

    PubMed

    Browning, Brian L; Browning, Sharon R

    2016-01-01

    We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle's throughput was more than 100× greater than Impute2's throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs. PMID:26748515

  1. 16 CFR 1115.11 - Imputed knowledge.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 16 Commercial Practices 2 2010-01-01 2010-01-01 false Imputed knowledge. 1115.11 Section 1115.11... PRODUCT HAZARD REPORTS General Interpretation § 1115.11 Imputed knowledge. (a) In evaluating whether or... care to ascertain the truth of complaints or other representations. This includes the knowledge a...

  2. What Improves with Increased Missing Data Imputations?

    ERIC Educational Resources Information Center

    Bodner, Todd E.

    2008-01-01

    When using multiple imputation in the analysis of incomplete data, a prominent guideline suggests that more than 10 imputed data values are seldom needed. This article calls into question the optimism of this guideline and illustrates that important quantities (e.g., p values, confidence interval half-widths, and estimated fractions of missing…

  3. Risk-Stratified Imputation in Survival Analysis

    PubMed Central

    Kennedy, Richard E.; Adragni, Kofi P.; Tiwari, Hemant K.; Voeks, Jenifer H.; Brott, Thomas G.; Howard, George

    2013-01-01

    Background Censoring that is dependent on covariates associated with survival can arise in randomized trials due to changes in recruitment and eligibility criteria to minimize withdrawals, potentially leading to biased treatment effect estimates. Imputation approaches have been proposed to address censoring in survival analysis; and while these approaches may provide unbiased estimates of treatment effects, imputation of a large number of outcomes may over- or underestimate the associated variance based on the imputation pool selected. Purpose We propose an improved method, risk-stratified imputation, as an alternative to address withdrawal related to the risk of events in the context of time-to-event analyses. Methods Our algorithm performs imputation from a pool of replacement subjects with similar values of both treatment and covariate(s) of interest, that is, from a risk-stratified sample. This stratification prior to imputation addresses the requirement of time-to-event analysis that censored observations are representative of all other observations in the risk group with similar exposure variables. We compared our risk-stratified imputation to case deletion and bootstrap imputation in a simulated dataset in which the covariate of interest (study withdrawal) was related to treatment. A motivating example from a recent clinical trial is also presented to demonstrate the utility of our method. Results In our simulations, risk-stratified imputation gives estimates of treatment effect comparable to bootstrap and auxiliary variable imputation while avoiding inaccuracies of the latter two in estimating the associated variance. Similar results were obtained in analysis of clinical trial data. Limitations Risk-stratified imputation has little advantage over other imputation methods when covariates of interest are not related to treatment, although its performance is superior when covariates are related to treatment. Risk-stratified imputation is intended for

  4. Extending long-range phasing and haplotype library imputation methods to impute genotypes on sex chromosomes

    PubMed Central

    2013-01-01

    AlphaImpute is a flexible and accurate genotype imputation tool that was originally designed for the imputation of genotypes on autosomal chromosomes. In some species, sex chromosomes comprise a large portion of the genome. For example, chromosome Z represents approximately 8% of the chicken genome and therefore is likely to be important in determining genetic variation in a population. When breeding programs make selection decisions based on genomic information, chromosomes that are not represented on the genotyping platform will not be subject to selection. Therefore imputation algorithms should be able to impute genotypes for all chromosomes. The objective of this research was to extend AlphaImpute so that it could impute genotypes on sex chromosomes. The accuracy of imputation was assessed using different genotyping strategies in a real commercial chicken population. The correlation between true and imputed genotypes was high in all the scenarios and was 0.96 for the most favourable scenario. Overall, the accuracy of imputation of the sex chromosome was slightly lower than that of autosomes for all scenarios considered. PMID:23617460

  5. Fast imputation using medium- or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  6. A Study of Imputation Algorithms. Working Paper Series.

    ERIC Educational Resources Information Center

    Hu, Ming-xiu; Salvucci, Sameena

    Many imputation techniques and imputation software packages have been developed over the years to deal with missing data. Different methods may work well under different circumstances, and it is advisable to conduct a sensitivity analysis when choosing an imputation method for a particular survey. This study reviewed about 30 imputation methods…

  7. Extending Rare-Variant Testing Strategies: Analysis of Noncoding Sequence and Imputed Genotypes

    PubMed Central

    Zawistowski, Matthew; Gopalakrishnan, Shyam; Ding, Jun; Li, Yun; Grimm, Sara; Zöllner, Sebastian

    2010-01-01

    Next Generation Sequencing Technology has revolutionized our ability to study the contribution of rare genetic variation to heritable traits. However, existing single-marker association tests are underpowered for detecting rare risk variants. A more powerful approach involves pooling methods that combine multiple rare variants from the same gene into a single test statistic. Proposed pooling methods can be limited because they generally assume high-quality genotypes derived from deep-coverage sequencing, which may not be available. In this paper, we consider an intuitive and computationally efficient pooling statistic, the cumulative minor-allele test (CMAT). We assess the performance of the CMAT and other pooling methods on datasets simulated with population genetic models to contain realistic levels of neutral variation. We consider study designs ranging from exon-only to whole-gene analyses that contain noncoding variants. For all study designs, the CMAT achieves power comparable to that of previously proposed methods. We then extend the CMAT to probabilistic genotypes and describe application to low-coverage sequencing and imputation data. We show that augmenting sequence data with imputed samples is a practical method for increasing the power of rare-variant studies. We also provide a method of controlling for confounding variables such as population stratification. Finally, we demonstrate that our method makes it possible to use external imputation templates to analyze rare variants imputed into existing GWAS datasets. As proof of principle, we performed a CMAT analysis of more than 8 million SNPs that we imputed into the GAIN psoriasis dataset by using haplotypes from the 1000 Genomes Project. PMID:21070896

  8. Dual imputation model for incomplete longitudinal data.

    PubMed

    Jolani, Shahab; Frank, Laurence E; van Buuren, Stef

    2014-05-01

    Missing values are a practical issue in the analysis of longitudinal data. Multiple imputation (MI) is a well-known likelihood-based method that has optimal properties in terms of efficiency and consistency if the imputation model is correctly specified. Doubly robust (DR) weighing-based methods protect against misspecification bias if one of the models, but not necessarily both, for the data or the mechanism leading to missing data is correct. We propose a new imputation method that captures the simplicity of MI and protection from the DR method. This method integrates MI and DR to protect against misspecification of the imputation model under a missing at random assumption. Our method avoids analytical complications of missing data particularly in multivariate settings, and is easy to implement in standard statistical packages. Moreover, the proposed method works very well with an intermittent pattern of missingness when other DR methods can not be used. Simulation experiments show that the proposed approach achieves improved performance when one of the models is correct. The method is applied to data from the fireworks disaster study, a randomized clinical trial comparing therapies in disaster-exposed children. We conclude that the new method increases the robustness of imputations. PMID:23909566

  9. Comparison of imputation methods for missing laboratory data in medicine

    PubMed Central

    Waljee, Akbar K; Mukherjee, Ashin; Singal, Amit G; Zhang, Yiwei; Warren, Jeffrey; Balis, Ulysses; Marrero, Jorge; Zhu, Ji; Higgins, Peter DR

    2013-01-01

    Objectives Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models. Design Retrospective cohort analysis of two large data sets. Setting A tertiary level care institution in Ann Arbor, Michigan. Participants The Cirrhosis cohort had 446 patients and the Inflammatory Bowel Disease cohort had 395 patients. Methods Non-missing laboratory data were randomly removed with varying frequencies from two large data sets, and we then compared the ability of four methods—missForest, mean imputation, nearest neighbour imputation and multivariate imputation by chained equations (MICE)—to impute the simulated missing data. We characterised the accuracy of the imputation and the effect of the imputation on predictive ability in two large data sets. Results MissForest had the least imputation error for both continuous and categorical variables at each frequency of missingness, and it had the smallest prediction difference when models used imputed laboratory values. In both data sets, MICE had the second least imputation error and prediction difference, followed by the nearest neighbour and mean imputation. Conclusions MissForest is a highly accurate method of imputation for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models. PMID:23906948

  10. Combinations of SNPs Related to Signal Transduction in Bipolar Disorder

    PubMed Central

    Koefoed, Pernille; Andreassen, Ole A.; Bennike, Bente; Dam, Henrik; Djurovic, Srdjan; Hansen, Thomas; Jorgensen, Martin Balslev; Kessing, Lars Vedel; Melle, Ingrid; Møller, Gert Lykke; Mors, Ole; Werge, Thomas; Mellerup, Erling

    2011-01-01

    Any given single nucleotide polymorphism (SNP) in a genome may have little or no functional impact. A biologically significant effect may possibly emerge only when a number of key SNP-related genotypes occur together in a single organism. Thus, in analysis of many SNPs in association studies of complex diseases, it may be useful to look at combinations of genotypes. Genes related to signal transmission, e.g., ion channel genes, may be of interest in this respect in the context of bipolar disorder. In the present study, we analysed 803 SNPs in 55 genes related to aspects of signal transmission and calculated all combinations of three genotypes from the 3×803 SNP genotypes for 1355 controls and 607 patients with bipolar disorder. Four clusters of patient-specific combinations were identified. Permutation tests indicated that some of these combinations might be related to bipolar disorder. The WTCCC bipolar dataset were use for replication, 469 of the 803 SNP were present in the WTCCC dataset either directly (n = 132) or by imputation (n = 337) covering 51 of our selected genes. We found three clusters of patient-specific 3×SNP combinations in the WTCCC dataset. Different SNPs were involved in the clusters in the two datasets. The present analyses of the combinations of SNP genotypes support a role for both genetic heterogeneity and interactions in the genetic architecture of bipolar disorder. PMID:21897858

  11. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  12. Multiple Imputation of Multilevel Missing Data-Rigor versus Simplicity

    ERIC Educational Resources Information Center

    Drechsler, Jörg

    2015-01-01

    Multiple imputation is widely accepted as the method of choice to address item-nonresponse in surveys. However, research on imputation strategies for the hierarchical structures that are typically found in the data in educational contexts is still limited. While a multilevel imputation model should be preferred from a theoretical point of view if…

  13. Marker imputation in barley association studies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Association mapping requires higher marker density than linkage mapping, potentially leading to more missing marker data and to higher genotyping costs. In human genetics, methods exist to impute missing marker data and whole markers that were typed in a reference panel but not in the experimental d...

  14. Data Driven Estimation of Imputation Error—A Strategy for Imputation with a Reject Option

    PubMed Central

    Bak, Nikolaj; Hansen, Lars K.

    2016-01-01

    Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values indiscriminately. We note that the effects of imputation can be strongly dependent on what is missing. To help make decisions about which records should be imputed, we propose to use a machine learning approach to estimate the imputation error for each case with missing data. The method is thought to be a practical approach to help users using imputation after the informed choice to impute the missing data has been made. To do this all patterns of missing values are simulated in all complete cases, enabling calculation of the “true error” in each of these new cases. The error is then estimated for each case with missing values by weighing the “true errors” by similarity. The method can also be used to test the performance of different imputation methods. A universal numerical threshold of acceptable error cannot be set since this will differ according to the data, research question, and analysis method. The effect of threshold can be estimated using the complete cases. The user can set an a priori relevant threshold for what is acceptable or use cross validation with the final analysis to choose the threshold. The choice can be presented along with argumentation for the choice rather than holding to conventions that might not be warranted in the specific dataset. PMID:27723782

  15. Next-generation genotype imputation service and methods.

    PubMed

    Das, Sayantan; Forer, Lukas; Schönherr, Sebastian; Sidore, Carlo; Locke, Adam E; Kwong, Alan; Vrieze, Scott I; Chew, Emily Y; Levy, Shawn; McGue, Matt; Schlessinger, David; Stambolian, Dwight; Loh, Po-Ru; Iacono, William G; Swaroop, Anand; Scott, Laura J; Cucca, Francesco; Kronenberg, Florian; Boehnke, Michael; Abecasis, Gonçalo R; Fuchsberger, Christian

    2016-10-01

    Genotype imputation is a key component of genetic association studies, where it increases power, facilitates meta-analysis, and aids interpretation of signals. Genotype imputation is computationally demanding and, with current tools, typically requires access to a high-performance computing cluster and to a reference panel of sequenced genomes. Here we describe improvements to imputation machinery that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools. We also describe a new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity. PMID:27571263

  16. Multiple imputation using chained equations: Issues and guidance for practice.

    PubMed

    White, Ian R; Royston, Patrick; Wood, Angela M

    2011-02-20

    Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments.

  17. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    PubMed Central

    McClure, Matthew C.; Sonstegard, Tad S.; Wiggans, George R.; Van Eenennaam, Alison L.; Weber, Kristina L.; Penedo, Cecilia T.; Berry, Donagh P.; Flynn, John; Garcia, Jose F.; Carmo, Adriana S.; Regitano, Luciana C. A.; Albuquerque, Milla; Silva, Marcos V. G. B.; Machado, Marco A.; Coffey, Mike; Moore, Kirsty; Boscher, Marie-Yvonne; Genestout, Lucie; Mazza, Raffaele; Taylor, Jeremy F.; Schnabel, Robert D.; Simpson, Barry; Marques, Elisa; McEwan, John C.; Cromie, Andrew; Coutinho, Luiz L.; Kuehn, Larry A.; Keele, John W.; Piper, Emily K.; Cook, Jim; Williams, Robert; Van Tassell, Curtis P.

    2013-01-01

    To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ≤1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset. PMID:24065982

  18. Comparing performance of modern genotype imputation methods in different ethnicities

    PubMed Central

    Roshyara, Nab Raj; Horn, Katrin; Kirsten, Holger; Ahnert, Peter; Scholz, Markus

    2016-01-01

    A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei’s GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes. PMID:27698363

  19. Comparing performance of modern genotype imputation methods in different ethnicities

    NASA Astrophysics Data System (ADS)

    Roshyara, Nab Raj; Horn, Katrin; Kirsten, Holger; Ahnert, Peter; Scholz, Markus

    2016-10-01

    A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei’s GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes.

  20. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

    PubMed

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-11-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates.

  1. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms

    PubMed Central

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-01-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  2. Analysis of Variance of Multiply Imputed Data.

    PubMed

    van Ginkel, Joost R; Kroonenberg, Pieter M

    2014-01-01

    As a procedure for handling missing data, Multiple imputation consists of estimating the missing data multiple times to create several complete versions of an incomplete data set. All these data sets are analyzed by the same statistical procedure, and the results are pooled for interpretation. So far, no explicit rules for pooling F-tests of (repeated-measures) analysis of variance have been defined. In this paper we outline the appropriate procedure for the results of analysis of variance for multiply imputed data sets. It involves both reformulation of the ANOVA model as a regression model using effect coding of the predictors and applying already existing combination rules for regression models. The proposed procedure is illustrated using three example data sets. The pooled results of these three examples provide plausible F- and p-values.

  3. An imputation approach for oligonucleotide microarrays.

    PubMed

    Li, Ming; Wen, Yalu; Lu, Qing; Fu, Wenjiang J

    2013-01-01

    Oligonucleotide microarrays are commonly adopted for detecting and qualifying the abundance of molecules in biological samples. Analysis of microarray data starts with recording and interpreting hybridization signals from CEL images. However, many CEL images may be blemished by noises from various sources, observed as "bright spots", "dark clouds", and "shadowy circles", etc. It is crucial that these image defects are correctly identified and properly processed. Existing approaches mainly focus on detecting defect areas and removing affected intensities. In this article, we propose to use a mixed effect model for imputing the affected intensities. The proposed imputation procedure is a single-array-based approach which does not require any biological replicate or between-array normalization. We further examine its performance by using Affymetrix high-density SNP arrays. The results show that this imputation procedure significantly reduces genotyping error rates. We also discuss the necessary adjustments for its potential extension to other oligonucleotide microarrays, such as gene expression profiling. The R source code for the implementation of approach is freely available upon request.

  4. When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?

    PubMed

    Ramnarine, Shelina; Zhang, Juan; Chen, Li-Shiun; Culverhouse, Robert; Duan, Weimin; Hancock, Dana B; Hartz, Sarah M; Johnson, Eric O; Olfson, Emily; Schwantes-An, Tae-Hwi; Saccone, Nancy L

    2015-01-01

    Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen's kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants. PMID:26458263

  5. EMINIM: An Adaptive and Memory-Efficient Algorithm for Genotype Imputation

    PubMed Central

    Kang, Hyun Min; Zaitlen, Noah A.

    2010-01-01

    Abstract Genome-wide association studies have proven to be a highly successful method for identification of genetic loci for complex phenotypes in both humans and model organisms. These large scale studies rely on the collection of hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome. Standard high-throughput genotyping technologies capture only a fraction of the total genetic variation. Recent efforts have shown that it is possible to “impute” with high accuracy the genotypes of SNPs that are not collected in the study provided that they are present in a reference data set which contains both SNPs collected in the study as well as other SNPs. We here introduce a novel HMM based technique to solve the imputation problem that addresses several shortcomings of existing methods. First, our method is adaptive which lets it estimate population genetic parameters from the data and be applied to model organisms that have very different evolutionary histories. Compared to previous methods, our method is up to ten times more accurate on model organisms such as mouse. Second, our algorithm scales in memory usage in the number of collected markers as opposed to the number of known SNPs. This issue is very relevant due to the size of the reference data sets currently being generated. We compare our method over mouse and human data sets to existing methods, and show that each has either comparable or better performance and much lower memory usage. The method is available for download at http://genetics.cs.ucla.edu/eminim. PMID:20377463

  6. 12 CFR 367.9 - Imputation of causes.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 12 Banks and Banking 5 2012-01-01 2012-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that...

  7. 12 CFR 367.9 - Imputation of causes.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 12 Banks and Banking 5 2014-01-01 2014-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that...

  8. 12 CFR 367.9 - Imputation of causes.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 4 2011-01-01 2011-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that...

  9. 12 CFR 367.9 - Imputation of causes.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 12 Banks and Banking 5 2013-01-01 2013-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that...

  10. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

  11. Short communication: Imputation of markers on the bovine X chromosome.

    PubMed

    Mao, Xiaowei; Johansson, Anna Maria; Sahana, Goutam; Guldbrandtsen, Bernt; De Koning, Dirk-Jan

    2016-09-01

    Imputation is a cost-effective approach to augment marker data for genomic selection and genome-wide association studies. However, most imputation studies have focused on autosomes. Here, we assessed the imputation of markers on the X chromosome in Holstein cattle for nongenotyped animals and animals genotyped with low-density (Illumina BovineLD, Illumina Inc., San Diego, CA) chips, using animals genotyped with medium-density (Illumina BovineSNP50) chips. A total of 26,884 genotyped Holstein individuals genotyped with medium-density chips were used in this study. Imputation was carried out using FImpute V2.2. The following parameters were examined: treating the pseudoautosomal region as autosomal or as X specific, different sizes of reference groups, different male/female proportions in the reference group, and cumulated degree of relationship between the reference group and target group. The imputation accuracy of markers on the X chromosome was improved if the pseudoautosomal region was treated as autosomal. Increasing the proportion of females in the reference group improved the imputation accuracy for the X chromosome. Imputation for nongenotyped animals in general had lower accuracy compared with animals genotyped with the low-density single nucleotide polymorphism array. In addition, higher cumulative pedigree relationships between the reference group and the target animal led to higher imputation accuracy. In the future, better marker coverage of the X chromosome should be developed to facilitate genomic studies involving the X chromosome.

  12. How to Improve Postgenomic Knowledge Discovery Using Imputation

    PubMed Central

    2009-01-01

    While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN) reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures. PMID:19223972

  13. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

    PubMed

    Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-03-15

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.

  14. Children's ability to impute inferentially based knowledge.

    PubMed

    Rai, Roshan; Mitchell, Peter

    2006-01-01

    Do young children appreciate the importance of access to premises when judging what another person knows? In Experiment 1, 5-year-olds (N=31) were sensitive to another person's access to premises when predicting that person's ability to point to a target after eliminating alternatives in a set of 3 cartoon characters. Experiment 2 replicated the finding when 5- to 6-year-olds (N=102) judged who the other person thought the target was, and whether the other person knew who the target was. Experiment 3 demonstrated that children aged 5-7 years (N=107) more successfully imputed inference by elimination than syllogistical inferential knowledge. Findings suggest that an early understanding of inference by elimination offers a route into understanding that people can sometimes gain knowledge without direct perceptual access.

  15. Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx

    PubMed Central

    Wang, Jiebiao; Gamazon, Eric R.; Pierce, Brandon L.; Stranger, Barbara E.; Im, Hae Kyung; Gibbons, Robert D.; Cox, Nancy J.; Nicolae, Dan L.; Chen, Lin S.

    2016-01-01

    Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies. PMID:27040689

  16. Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx.

    PubMed

    Wang, Jiebiao; Gamazon, Eric R; Pierce, Brandon L; Stranger, Barbara E; Im, Hae Kyung; Gibbons, Robert D; Cox, Nancy J; Nicolae, Dan L; Chen, Lin S

    2016-04-01

    Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies.

  17. Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data.

    PubMed

    Fragoso, Christopher A; Heffelfinger, Christopher; Zhao, Hongyu; Dellaporta, Stephen L

    2016-02-01

    Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.

  18. Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models

    PubMed Central

    Chen, Hua Yun; Xie, Hui; Qian, Yi

    2010-01-01

    Summary Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubin's rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Monte Carlo Markov Chain sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data. PMID:21210771

  19. Imputing gene expression from optimally reduced probe sets

    PubMed Central

    Donner, Yoni; Feng, Ting; Benoist, Christophe; Koller, Daphne

    2012-01-01

    Measuring complete gene expression profiles for a large number of experiments is costly. We propose an approach in which a small subset of probes is selected based on a preliminary set of full expression profiles. In subsequent experiments, only the subset is measured, and the missing values are imputed. We develop several algorithms to simultaneously select probes and impute missing values, and demonstrate that these probe selection for imputation (PSI) algorithms can successfully reconstruct missing gene expression values in a wide variety of applications, as evaluated using multiple metrics of biological importance. We analyze the performance of PSI methods under varying conditions, provide guidelines for choosing the optimal method based on the experimental setting, and indicate how to estimate imputation accuracy. Finally, we apply our approach to a large-scale study of immune system variation. PMID:23064520

  20. Multiple imputation for time series data with Amelia package

    PubMed Central

    2016-01-01

    Time series data are common in medical researches. Many laboratory variables or study endpoints could be measured repeatedly over time. Multiple imputation (MI) without considering time trend of a variable may cause it to be unreliable. The article illustrates how to perform MI by using Amelia package in a clinical scenario. Amelia package is powerful in that it allows for MI for time series data. External information on the variable of interest can also be incorporated by using prior or bound argument. Such information may be based on previous published observations, academic consensus, and personal experience. Diagnostics of imputation model can be performed by examining the distributions of imputed and observed values, or by using over-imputation technique. PMID:26904578

  1. missForest: Nonparametric missing value imputation using random forest

    NASA Astrophysics Data System (ADS)

    Stekhoven, Daniel J.

    2015-05-01

    missForest imputes missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation and can be run in parallel to save computation time. missForest has been used to, among other things, impute variable star colors in an All-Sky Automated Survey (ASAS) dataset of variable stars with no NOMAD match.

  2. CsSNP: A Web-Based Tool for the Detecting of Comparative Segments SNPs.

    PubMed

    Wang, Yi; Wang, Shuangshuang; Zhou, Dongjie; Yang, Shuai; Xu, Yongchao; Yang, Chao; Yang, Long

    2016-07-01

    SNP (single nucleotide polymorphism) is a popular tool for the study of genetic diversity, evolution, and other areas. Therefore, it is necessary to develop a convenient, utility, robust, rapid, and open source detecting-SNP tool for all researchers. Since the detection of SNPs needs special software and series steps including alignment, detection, analysis and present, the study of SNPs is limited for nonprofessional users. CsSNP (Comparative segments SNP, http://biodb.sdau.edu.cn/cssnp/ ) is a freely available web tool based on the Blat, Blast, and Perl programs to detect comparative segments SNPs and to show the detail information of SNPs. The results are filtered and presented in the statistics figure and a Gbrowse map. This platform contains the reference genomic sequences and coding sequences of 60 plant species, and also provides new opportunities for the users to detect SNPs easily. CsSNP is provided a convenient tool for nonprofessional users to find comparative segments SNPs in their own sequences, and give the users the information and the analysis of SNPs, and display these data in a dynamic map. It provides a new method to detect SNPs and may accelerate related studies. PMID:27347883

  3. CsSNP: A Web-Based Tool for the Detecting of Comparative Segments SNPs.

    PubMed

    Wang, Yi; Wang, Shuangshuang; Zhou, Dongjie; Yang, Shuai; Xu, Yongchao; Yang, Chao; Yang, Long

    2016-07-01

    SNP (single nucleotide polymorphism) is a popular tool for the study of genetic diversity, evolution, and other areas. Therefore, it is necessary to develop a convenient, utility, robust, rapid, and open source detecting-SNP tool for all researchers. Since the detection of SNPs needs special software and series steps including alignment, detection, analysis and present, the study of SNPs is limited for nonprofessional users. CsSNP (Comparative segments SNP, http://biodb.sdau.edu.cn/cssnp/ ) is a freely available web tool based on the Blat, Blast, and Perl programs to detect comparative segments SNPs and to show the detail information of SNPs. The results are filtered and presented in the statistics figure and a Gbrowse map. This platform contains the reference genomic sequences and coding sequences of 60 plant species, and also provides new opportunities for the users to detect SNPs easily. CsSNP is provided a convenient tool for nonprofessional users to find comparative segments SNPs in their own sequences, and give the users the information and the analysis of SNPs, and display these data in a dynamic map. It provides a new method to detect SNPs and may accelerate related studies.

  4. Association Analysis of BMD-associated SNPs with Knee Osteoarthritis†

    PubMed Central

    Yerges-Armstrong, LM; Yau, MS; Liu, Y; Krishnan, S; Renner, JB; Eaton, CB; Kwoh, CK; Nevitt, MC; Duggan, DJ; Mitchell, BD; Jordan, JM; Hochberg, MC; Jackson, RD

    2014-01-01

    Osteoarthritis (OA) risk is widely recognized to be heritable but few loci have been identified. Observational studies have identified higher systemic bone mineral density (BMD) to be associated with an increased risk of radiographic knee osteoarthritis. With this in mind, we sought to evaluate whether well-established genetic loci for variance in BMD are associated with risk for radiographic OA in the Osteoarthritis Initiative (OAI) and the Johnston County Osteoarthritis (JoCo) Project. Cases had at least one knee with definite radiographic OA defined as the presence of definite osteophytes with or without joint space narrowing (KL grade ≥ 2) and controls were absent for definite radiographic OA in both knees (KL grade ≤ 1bilaterally). There were 2014 and 658 Caucasian cases, respectively, in the OAI and JoCo Studies, and 953 and 823 controls. Single nucleotide polymorphisms (SNPs) were identified for association analysis from the literature. Genotyping was carried out on the Illumina 2.5M and 1M arrays in GeCKO and JoCo, respectively and imputation was done. Association analyses were carried out separately in each cohort with adjustments for age, BMI, and sex and then parameter estimates were combined across the two cohorts by meta-analysis. We identified 4 SNPs significantly associated with prevalent radiographic knee OA. The strongest signal (p=0.0009, OR=1.22, 95% CI[1.08–1.37]) maps to 12q3 which contains a gene coding for SP7. Additional loci map to 7p14.1 (TXNDC3), 11q13.2 (LRP5) and 11p14.1 (LIN7C). For all four loci the allele associated with higher BMD was associated with higher odds of OA. A BMD risk allele score was not significantly associated with OA risk. This meta-analysis demonstrates that several GWAS-identified BMD SNPs are nominally associated with prevalent radiographic knee OA and further supports the hypothesis that BMD, or its determinants, may be a risk factor contributing to OA development. PMID:24339167

  5. A SPATIOTEMPORAL APPROACH FOR HIGH RESOLUTION TRAFFIC FLOW IMPUTATION

    SciTech Connect

    Han, Lee; Chin, Shih-Miao; Hwang, Ho-Ling

    2016-01-01

    Along with the rapid development of Intelligent Transportation Systems (ITS), traffic data collection technologies have been evolving dramatically. The emergence of innovative data collection technologies such as Remote Traffic Microwave Sensor (RTMS), Bluetooth sensor, GPS-based Floating Car method, automated license plate recognition (ALPR) (1), etc., creates an explosion of traffic data, which brings transportation engineering into the new era of Big Data. However, despite the advance of technologies, the missing data issue is still inevitable and has posed great challenges for research such as traffic forecasting, real-time incident detection and management, dynamic route guidance, and massive evacuation optimization, because the degree of success of these endeavors depends on the timely availability of relatively complete and reasonably accurate traffic data. A thorough literature review suggests most current imputation models, if not all, focus largely on the temporal nature of the traffic data and fail to consider the fact that traffic stream characteristics at a certain location are closely related to those at neighboring locations and utilize these correlations for data imputation. To this end, this paper presents a Kriging based spatiotemporal data imputation approach that is able to fully utilize the spatiotemporal information underlying in traffic data. Imputation performance of the proposed approach was tested using simulated scenarios and achieved stable imputation accuracy. Moreover, the proposed Kriging imputation model is more flexible compared to current models.

  6. WIMP: web server tool for missing data imputation.

    PubMed

    Urda, D; Subirats, J L; García-Laencina, P J; Franco, L; Sancho-Gómez, J L; Jerez, J M

    2012-12-01

    The imputation of unknown or missing data is a crucial task on the analysis of biomedical datasets. There are several situations where it is necessary to classify or identify instances given incomplete vectors, and the existence of missing values can much degrade the performance of the algorithms used for the classification/recognition. The task of learning accurately from incomplete data raises a number of issues some of which have not been completely solved in machine learning applications. In this sense, effective missing value estimation methods are required. Different methods for missing data imputations exist but most of the times the selection of the appropriate technique involves testing several methods, comparing them and choosing the right one. Furthermore, applying these methods, in most cases, is not straightforward, as they involve several technical details, and in particular in cases such as when dealing with microarray datasets, the application of the methods requires huge computational resources. As far as we know, there is not a public software application that can provide the computing capabilities required for carrying the task of data imputation. This paper presents a new public tool for missing data imputation that is attached to a computer cluster in order to execute high computational tasks. The software WIMP (Web IMPutation) is a public available web site where registered users can create, execute, analyze and store their simulations related to missing data imputation.

  7. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression

    PubMed Central

    Cowper-Sal·lari, Richard; Zhang, Xiaoyang; Wright, Jason B.; Bailey, Swneke D.; Cole, Michael D.; Eeckhoute, Jerome; Moore, Jason H.; Lupien, Mathieu

    2012-01-01

    Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) associated with human traits and diseases. But because the vast majority of these SNPs are located in the noncoding regions of the genome their risk promoting mechanisms are elusive. Employing a new methodology combining cistromics, epigenomics and genotype imputation we annotate the noncoding regions of the genome in breast cancer cells and systematically identify the functional nature of SNPs associated with breast cancer risk. Our results demonstrate that breast cancer risk-associated SNPs are enriched in the cistromes of FOXA1 and ESR1 and the epigenome of H3K4me1 in a cancer and cell-type-specific manner. Furthermore, the majority of these risk-associated SNPs modulate the affinity of chromatin for FOXA1 at distal regulatory elements, which results in allele-specific gene expression, exemplified by the effect of the rs4784227 SNP on the TOX3 gene found within the 16q12.1 risk locus. PMID:23001124

  8. MaCH-Admix: Genotype Imputation for Admixed Populations

    PubMed Central

    Liu, Eric Yi; Li, Mingyao; Wang, Wei; Li, Yun

    2012-01-01

    Imputation in admixed populations is an important problem but challenging due to the complex linkage disequilibrium (LD) pattern. The emergence of large reference panels such as that from the 1,000 Genomes Project enables more accurate imputation in general, and in particular for admixed populations and for uncommon variants. To efficiently benefit from these large reference panels, one key issue to consider in modern genotype imputation framework is the selection of effective reference panels. In this work, we consider a number of methods for effective reference panel construction inside a hidden Markov model and specific to each target individual. These methods fall into two categories: identity-by-state (IBS) based and ancestry-weighted approach. We evaluated the performance on individuals from recently admixed populations. Our target samples include 8,421 African Americans and 3,587 Hispanic Americans from the Women’s Health Initiative, which allow assessment of imputation quality for uncommon variants. Our experiments include both large and small reference panels; large, medium, and small target samples; and in genome regions of varying levels of LD. We also include BEAGLE and IMPUTE2 for comparison. Experiment results with large reference panel suggest that our novel piecewise IBS method yields consistently higher imputation quality than other methods/software. The advantage is particularly noteworthy among uncommon variants where we observe up to 5.1% information gain with the difference being highly significant (Wilcoxon signed rank test P-value < 0.0001). Our work is the first that considers various sensible approaches for imputation in admixed populations and presents a comprehensive comparison. PMID:23074066

  9. A second generation human haplotype map of over 3.1 million SNPs

    PubMed Central

    2009-01-01

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

  10. A second generation human haplotype map of over 3.1 million SNPs.

    PubMed

    Frazer, Kelly A; Ballinger, Dennis G; Cox, David R; Hinds, David A; Stuve, Laura L; Gibbs, Richard A; Belmont, John W; Boudreau, Andrew; Hardenbol, Paul; Leal, Suzanne M; Pasternak, Shiran; Wheeler, David A; Willis, Thomas D; Yu, Fuli; Yang, Huanming; Zeng, Changqing; Gao, Yang; Hu, Haoran; Hu, Weitao; Li, Chaohua; Lin, Wei; Liu, Siqi; Pan, Hao; Tang, Xiaoli; Wang, Jian; Wang, Wei; Yu, Jun; Zhang, Bo; Zhang, Qingrun; Zhao, Hongbin; Zhao, Hui; Zhou, Jun; Gabriel, Stacey B; Barry, Rachel; Blumenstiel, Brendan; Camargo, Amy; Defelice, Matthew; Faggart, Maura; Goyette, Mary; Gupta, Supriya; Moore, Jamie; Nguyen, Huy; Onofrio, Robert C; Parkin, Melissa; Roy, Jessica; Stahl, Erich; Winchester, Ellen; Ziaugra, Liuda; Altshuler, David; Shen, Yan; Yao, Zhijian; Huang, Wei; Chu, Xun; He, Yungang; Jin, Li; Liu, Yangfan; Shen, Yayun; Sun, Weiwei; Wang, Haifeng; Wang, Yi; Wang, Ying; Xiong, Xiaoyan; Xu, Liang; Waye, Mary M Y; Tsui, Stephen K W; Xue, Hong; Wong, J Tze-Fei; Galver, Luana M; Fan, Jian-Bing; Gunderson, Kevin; Murray, Sarah S; Oliphant, Arnold R; Chee, Mark S; Montpetit, Alexandre; Chagnon, Fanny; Ferretti, Vincent; Leboeuf, Martin; Olivier, Jean-François; Phillips, Michael S; Roumy, Stéphanie; Sallée, Clémentine; Verner, Andrei; Hudson, Thomas J; Kwok, Pui-Yan; Cai, Dongmei; Koboldt, Daniel C; Miller, Raymond D; Pawlikowska, Ludmila; Taillon-Miller, Patricia; Xiao, Ming; Tsui, Lap-Chee; Mak, William; Song, You Qiang; Tam, Paul K H; Nakamura, Yusuke; Kawaguchi, Takahisa; Kitamoto, Takuya; Morizono, Takashi; Nagashima, Atsushi; Ohnishi, Yozo; Sekine, Akihiro; Tanaka, Toshihiro; Tsunoda, Tatsuhiko; Deloukas, Panos; Bird, Christine P; Delgado, Marcos; Dermitzakis, Emmanouil T; Gwilliam, Rhian; Hunt, Sarah; Morrison, Jonathan; Powell, Don; Stranger, Barbara E; Whittaker, Pamela; Bentley, David R; Daly, Mark J; de Bakker, Paul I W; Barrett, Jeff; Chretien, Yves R; Maller, Julian; McCarroll, Steve; Patterson, Nick; Pe'er, Itsik; Price, Alkes; Purcell, Shaun; Richter, Daniel J; Sabeti, Pardis; Saxena, Richa; Schaffner, Stephen F; Sham, Pak C; Varilly, Patrick; Altshuler, David; Stein, Lincoln D; Krishnan, Lalitha; Smith, Albert Vernon; Tello-Ruiz, Marcela K; Thorisson, Gudmundur A; Chakravarti, Aravinda; Chen, Peter E; Cutler, David J; Kashuk, Carl S; Lin, Shin; Abecasis, Gonçalo R; Guan, Weihua; Li, Yun; Munro, Heather M; Qin, Zhaohui Steve; Thomas, Daryl J; McVean, Gilean; Auton, Adam; Bottolo, Leonardo; Cardin, Niall; Eyheramendy, Susana; Freeman, Colin; Marchini, Jonathan; Myers, Simon; Spencer, Chris; Stephens, Matthew; Donnelly, Peter; Cardon, Lon R; Clarke, Geraldine; Evans, David M; Morris, Andrew P; Weir, Bruce S; Tsunoda, Tatsuhiko; Mullikin, James C; Sherry, Stephen T; Feolo, Michael; Skol, Andrew; Zhang, Houcan; Zeng, Changqing; Zhao, Hui; Matsuda, Ichiro; Fukushima, Yoshimitsu; Macer, Darryl R; Suda, Eiko; Rotimi, Charles N; Adebamowo, Clement A; Ajayi, Ike; Aniagwu, Toyin; Marshall, Patricia A; Nkwodimmah, Chibuzor; Royal, Charmaine D M; Leppert, Mark F; Dixon, Missy; Peiffer, Andy; Qiu, Renzong; Kent, Alastair; Kato, Kazuto; Niikawa, Norio; Adewole, Isaac F; Knoppers, Bartha M; Foster, Morris W; Clayton, Ellen Wright; Watkin, Jessica; Gibbs, Richard A; Belmont, John W; Muzny, Donna; Nazareth, Lynne; Sodergren, Erica; Weinstock, George M; Wheeler, David A; Yakub, Imtaz; Gabriel, Stacey B; Onofrio, Robert C; Richter, Daniel J; Ziaugra, Liuda; Birren, Bruce W; Daly, Mark J; Altshuler, David; Wilson, Richard K; Fulton, Lucinda L; Rogers, Jane; Burton, John; Carter, Nigel P; Clee, Christopher M; Griffiths, Mark; Jones, Matthew C; McLay, Kirsten; Plumb, Robert W; Ross, Mark T; Sims, Sarah K; Willey, David L; Chen, Zhu; Han, Hua; Kang, Le; Godbout, Martin; Wallenburg, John C; L'Archevêque, Paul; Bellemare, Guy; Saeki, Koji; Wang, Hongguang; An, Daochang; Fu, Hongbo; Li, Qing; Wang, Zhen; Wang, Renwu; Holden, Arthur L; Brooks, Lisa D; McEwen, Jean E; Guyer, Mark S; Wang, Vivian Ota; Peterson, Jane L; Shi, Michael; Spiegel, Jack; Sung, Lawrence M; Zacharia, Lynn F; Collins, Francis S; Kennedy, Karen; Jamieson, Ruth; Stewart, John

    2007-10-18

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

  11. [Psychoanalytic studies and examination of imputability].

    PubMed

    De Luca, A

    1975-01-01

    The Author points out that the recent contributions to the study of the crime require an improvement of the traditional principles followed for the investigation and qualification of the crime, as regards both its psychological dynamics and any juridical implications. The Author also shows that psychoanalysis is able to determine a decisive evolution in the eitology as well in the therapy of criminality. After a few preliminary considerations on the ambiguity of the idea of insanity, in accordance with the ordinary nosographic principles formulated in psychiatry, the Author emphasizes the uncertainties and discrepancies that, in the legislative systems from different countries, result from a limited view of the psychological phenomena. Then, he examines the utility of considering again the whole intra-psychic process which is involved in the crime dynamics through a psychoanalytical methodology. Particularly, he highlights the clarification which psycho-analysis may bring to the understanding of certain forms of aggressiveness which cannot be properly diagnosed according to the conventional medical and juridical methods. The Author finally considers the opening of the psycho-analytical application in the delicate examination of imputability. In this regard, he suggests to avoid any strict qualification, even in the evaluation of the most abnormal processes of psyche and he recommends--in conformity with a few juridical trends appeared in some countries--not to limit the investigation on the ability of understanding and will to the moment when a crime is committed, but to extend it to a single evaluation of the whole personality of the criminal.

  12. References for Haplotype Imputation in the Big Data Era

    PubMed Central

    Li, Wenzhi; Xu, Wei; Li, Qiling; Ma, Li; Song, Qing

    2016-01-01

    Imputation is a powerful in silico approach to fill in those missing values in the big datasets. This process requires a reference panel, which is a collection of big data from which the missing information can be extracted and imputed. Haplotype imputation requires ethnicity-matched references; a mismatched reference panel will significantly reduce the quality of imputation. However, currently existing big datasets cover only a small number of ethnicities, there is a lack of ethnicity-matched references for many ethnic populations in the world, which has hampered the data imputation of haplotypes and its downstream applications. To solve this issue, several approaches have been proposed and explored, including the mixed reference panel, the internal reference panel and genotype-converted reference panel. This review article provides the information and comparison between these approaches. Increasing evidence showed that not just one or two genetic elements dictate the gene activity and functions; instead, cis-interactions of multiple elements dictate gene activity. Cis-interactions require the interacting elements to be on the same chromosome molecule, therefore, haplotype analysis is essential for the investigation of cis-interactions among multiple genetic variants at different loci, and appears to be especially important for studying the common diseases. It will be valuable in a wide spectrum of applications from academic research, to clinical diagnosis, prevention, treatment, and pharmaceutical industry. PMID:27274952

  13. Combining fractional polynomial model building with multiple imputation.

    PubMed

    Morris, Tim P; White, Ian R; Carpenter, James R; Stanworth, Simon J; Royston, Patrick

    2015-11-10

    Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only.

  14. Systematic assessment of imputation performance using the 1000 Genomes reference panels

    PubMed Central

    Liu, Qian; Cirulli, Elizabeth T.; Han, Yujun; Yao, Song; Liu, Song

    2015-01-01

    Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays. By leveraging genotype data from 90 whole-genome deeply sequenced individuals as the evaluation benchmark and the 1000 Genomes Project data as reference panels, we systematically examined four important issues related to genotype imputation practice. First, in a study of imputation accuracy, we found that IMPUTE2 and minimac have the best imputation performance among the three popular imputing software evaluated and that using a multi-population reference panel is beneficial. Second, the optimal imputation quality cutoff for removing poorly imputed variants varies according to the software used. Third, the major contributing factors to consistently poor imputation are low variant heterozygosity, high sequence similarity to other genomic regions, high GC content, segmental duplication and being far from genotyping markers. Lastly, in an evaluation of the imputability of all known GWAS regions, we found that GWAS loci associated with hematological measurements and immune system diseases are harder to impute, as compared with other human traits. Recommendations made based on the above findings may provide practical guidance for imputation exercise in future genetic studies. PMID:25246238

  15. Systematic assessment of imputation performance using the 1000 Genomes reference panels.

    PubMed

    Liu, Qian; Cirulli, Elizabeth T; Han, Yujun; Yao, Song; Liu, Song; Zhu, Qianqian

    2015-07-01

    Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays. By leveraging genotype data from 90 whole-genome deeply sequenced individuals as the evaluation benchmark and the 1000 Genomes Project data as reference panels, we systematically examined four important issues related to genotype imputation practice. First, in a study of imputation accuracy, we found that IMPUTE2 and minimac have the best imputation performance among the three popular imputing software evaluated and that using a multi-population reference panel is beneficial. Second, the optimal imputation quality cutoff for removing poorly imputed variants varies according to the software used. Third, the major contributing factors to consistently poor imputation are low variant heterozygosity, high sequence similarity to other genomic regions, high GC content, segmental duplication and being far from genotyping markers. Lastly, in an evaluation of the imputability of all known GWAS regions, we found that GWAS loci associated with hematological measurements and immune system diseases are harder to impute, as compared with other human traits. Recommendations made based on the above findings may provide practical guidance for imputation exercise in future genetic studies. PMID:25246238

  16. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications.

    PubMed

    Calus, M P L; Bouwman, A C; Hickey, J M; Veerkamp, R F; Mulder, H A

    2014-11-01

    In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.

  17. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    PubMed

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  18. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…

  19. 32 CFR 776.29 - Imputed disqualification: General rule.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... disqualification: General rule. Covered USG attorneys working in the same military law office are not automatically... working in the same law office. Such representation is permissible so long as conflicts of interests are... representing co-accused at trial by court-martial. Imputed disqualification rules for non-USG attorneys...

  20. 32 CFR 776.29 - Imputed disqualification: General rule.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... disqualification: General rule. Covered USG attorneys working in the same military law office are not automatically... working in the same law office. Such representation is permissible so long as conflicts of interests are... representing co-accused at trial by court-martial. Imputed disqualification rules for non-USG attorneys...

  1. 32 CFR 776.29 - Imputed disqualification: General rule.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... disqualification: General rule. Covered USG attorneys working in the same military law office are not automatically... working in the same law office. Such representation is permissible so long as conflicts of interests are... representing co-accused at trial by court-martial. Imputed disqualification rules for non-USG attorneys...

  2. 32 CFR 776.29 - Imputed disqualification: General rule.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... disqualification: General rule. Covered USG attorneys working in the same military law office are not automatically... working in the same law office. Such representation is permissible so long as conflicts of interests are... representing co-accused at trial by court-martial. Imputed disqualification rules for non-USG attorneys...

  3. 32 CFR 776.29 - Imputed disqualification: General rule.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... disqualification: General rule. Covered USG attorneys working in the same military law office are not automatically... working in the same law office. Such representation is permissible so long as conflicts of interests are... representing co-accused at trial by court-martial. Imputed disqualification rules for non-USG attorneys...

  4. Imputation of Missing Categorical Data by Maximizing Internal Consistency.

    ERIC Educational Resources Information Center

    van Buuren, Stef; van Rijckevorsel, Jan L. A.

    1992-01-01

    A technique is presented to transform incomplete categorical data into complete data by imputing appropriate scores into missing cells. A solution of the optimization problem is suggested, and relevant psychometric theory is discussed. The average correlation should be at least 0.50 before the method becomes practical. (SLD)

  5. Strategies to choose from millions of imputed sequence variants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of sequence variants are known, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Variant selection and imputation strategies were tested using 26 984 simulated reference bulls, of which 1 000 had 30 million sequence variants, 773 had 600 000 markers...

  6. [Imputation methods for missing data in educational diagnostic evaluation].

    PubMed

    Fernández-Alonso, Rubén; Suárez-Álvarez, Javier; Muñiz, José

    2012-02-01

    In the diagnostic evaluation of educational systems, self-reports are commonly used to collect data, both cognitive and orectic. For various reasons, in these self-reports, some of the students' data are frequently missing. The main goal of this research is to compare the performance of different imputation methods for missing data in the context of the evaluation of educational systems. On an empirical database of 5,000 subjects, 72 conditions were simulated: three levels of missing data, three types of loss mechanisms, and eight methods of imputation. The levels of missing data were 5%, 10%, and 20%. The loss mechanisms were set at: Missing completely at random, moderately conditioned, and strongly conditioned. The eight imputation methods used were: listwise deletion, replacement by the mean of the scale, by the item mean, the subject mean, the corrected subject mean, multiple regression, and Expectation-Maximization (EM) algorithm, with and without auxiliary variables. The results indicate that the recovery of the data is more accurate when using an appropriate combination of different methods of recovering lost data. When a case is incomplete, the mean of the subject works very well, whereas for completely lost data, multiple imputation with the EM algorithm is recommended. The use of this combination is especially recommended when data loss is greater and its loss mechanism is more conditioned. Lastly, the results are discussed, and some future lines of research are analyzed.

  7. Fast imputation using medium or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and microarray genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updat...

  8. Accuracy of genotype imputation in Swiss cattle breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to evaluate the accuracy of imputation from Illumina Bovine3k Bead Chip (3k) and Illumina BovineLD (6k) to 54k chip information in Swiss dairy cattle breeds. Genotype data comprised of 54k SNP chip data of Original Braunvieh (OB), Brown Swiss (BS), Swiss Fleckvieh (SF...

  9. Novel and efficient tag SNPs selection algorithms.

    PubMed

    Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2014-01-01

    SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels. PMID:24212035

  10. Reference-free detection of isolated SNPs

    PubMed Central

    Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre

    2015-01-01

    Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism. PMID:25404127

  11. Functional annotation of colon cancer risk SNPs

    PubMed Central

    Yao, Lijing; Tak, Yu Gyoung; Berman, Benjamin P.; Farnham, Peggy J.

    2014-01-01

    Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with increased risk for CRC. A molecular understanding of the functional consequences of this genetic variation has been complicated because each GWAS SNP is a surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Here we use genomic and epigenomic information to test the hypothesis that the GWAS SNPs and/or correlated SNPs are in elements that regulate gene expression, and identify 23 promoters and 28 enhancers. Using gene expression data from normal and tumour cells, we identify 66 putative target genes of the risk-associated enhancers (10 of which were also identified by promoter SNPs). Employing CRISPR nucleases, we delete one risk-associated enhancer and identify genes showing altered expression. We suggest that similar studies be performed to characterize all CRC risk-associated enhancers. PMID:25268989

  12. Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory

    PubMed Central

    Li, Haiquan; Lee, Younghee; Chen, James L; Rebman, Ellen; Li, Jianrong

    2012-01-01

    Objective Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning. Methods Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait–trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits. Results A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10−16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fisher's exact test p=0.001 and 3.5×10−7, respectively). Conclusion An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches

  13. Meta-analysis and imputation refines the association of 15q25 with smoking quantity

    PubMed Central

    Liu, Jason Z.; Tozzi, Federica; Waterworth, Dawn M.; Pillai, Sreekumar G.; Muglia, Pierandrea; Middleton, Lefkos; Berrettini, Wade; Knouff, Christopher W.; Yuan, Xin; Waeber, Gérard; Vollenweider, Peter; Preisig, Martin; Wareham, Nicholas J; Zhao, Jing Hua; Loos, Ruth J.F.; Barroso, Inês; Khaw, Kay-Tee; Grundy, Scott; Barter, Philip; Mahley, Robert; Kesaniemi, Antero; McPherson, Ruth; Vincent, John B.; Strauss, John; Kennedy, James L.; Farmer, Anne; McGuffin, Peter; Day, Richard; Matthews, Keith; Bakke, Per; Gulsvik, Amund; Lucae, Susanne; Ising, Marcus; Brueckl, Tanja; Horstmann, Sonja; Wichmann, H.-Erich; Rawal, Rajesh; Dahmen, Norbert; Lamina, Claudia; Polasek, Ozren; Zgaga, Lina; Huffman, Jennifer; Campbell, Susan; Kooner, Jaspal; Chambers, John C; Burnett, Mary Susan; Devaney, Joseph M.; Pichard, Augusto D.; Kent, Kenneth M.; Satler, Lowell; Lindsay, Joseph M.; Waksman, Ron; Epstein, Stephen; Wilson, James F.; Wild, Sarah H.; Campbell, Harry; Vitart, Veronique; Reilly, Muredach P.; Li, Mingyao; Qu, Liming; Wilensky, Robert; Matthai, William; Hakonarson, Hakon H.; Rader, Daniel J.; Franke, Andre; Wittig, Michael; Schäfer, Arne; Uda, Manuela; Terracciano, Antonio; Xiao, Xiangjun; Busonero, Fabio; Scheet, Paul; Schlessinger, David; St Clair, David; Rujescu, Dan; Abecasis, Gonçalo R.; Grabe, Hans Jörgen; Teumer, Alexander; Völzke, Henry; Petersmann, Astrid; John, Ulrich; Rudan, Igor; Hayward, Caroline; Wright, Alan F.; Kolcic, Ivana; Wright, Benjamin J; Thompson, John R; Balmforth, Anthony J.; Hall, Alistair S.; Samani, Nilesh J.; Anderson, Carl A.; Ahmad, Tariq; Mathew, Christopher G.; Parkes, Miles; Satsangi, Jack; Caulfield, Mark; Munroe, Patricia B.; Farrall, Martin; Dominiczak, Anna; Worthington, Jane; Thomson, Wendy; Eyre, Steve; Barton, Anne; Mooser, Vincent; Francks, Clyde; Marchini, Jonathan

    2013-01-01

    Smoking is a leading global cause of disease and mortality1. We performed a genomewide meta-analytic association study of smoking-related behavioral traits in a total sample of 41,150 individuals drawn from 20 disease, population, and control cohorts. Our analysis confirmed an effect on smoking quantity (SQ) at a locus on 15q25 (P=9.45e-19) that includes three genes encoding neuronal nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, CHRNB4). We used data from the 1000 Genomes project to investigate the region using imputation, which allowed analysis of virtually all common variants in the region and offered a five-fold increase in coverage over the HapMap. This increased the spectrum of potentially causal single nucleotide polymorphisms (SNPs), which included a novel SNP that showed the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3. PMID:20418889

  14. Empirical evaluations of analytical issues arising from predicting HLA alleles using multiple SNPs

    PubMed Central

    2011-01-01

    Background Numerous immune-mediated diseases have been associated with the class I and II HLA genes located within the major histocompatibility complex (MHC) consisting of highly polymorphic alleles encoded by the HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1 loci. Genotyping for HLA alleles is complex and relatively expensive. Recent studies have demonstrated the feasibility of predicting HLA alleles, using MHC SNPs inside and outside of HLA that are typically included in SNP arrays and are commonly available in genome-wide association studies (GWAS). We have recently described a novel method that is complementary to the previous methods, for accurately predicting HLA alleles using unphased flanking SNPs genotypes. In this manuscript, we address several practical issues relevant to the application of this methodology. Results Applying this new methodology to three large independent study cohorts, we have evaluated the performance of the predictive models in ethnically diverse populations. Specifically, we have found that utilizing imputed in addition to genotyped SNPs generally yields comparable if not better performance in prediction accuracies. Our evaluation also supports the idea that predictive models trained on one population are transferable to other populations of the same ethnicity. Further, when the training set includes multi-ethnic populations, the resulting models are reliable and perform well for the same subpopulations across all HLA genes. In contrast, the predictive models built from single ethnic populations have superior performance within the same ethnic population, but are not likely to perform well in other ethnic populations. Conclusions The empirical explorations reported here provide further evidence in support of the application of this approach for predicting HLA alleles with GWAS-derived SNP data. Utilizing all available samples, we have built "state of the art" predictive models for HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1. The HLA allele

  15. Introduction to multiple imputation for dealing with missing data.

    PubMed

    Lee, Katherine J; Simpson, Julie A

    2014-02-01

    Missing data are common in both observational and experimental studies. Multiple imputation (MI) is a two-stage approach where missing values are imputed a number of times using a statistical model based on the available data and then inference is combined across the completed datasets. This approach is becoming increasingly popular for handling missing data. In this paper, we introduce the method of MI, as well as a discussion surrounding when MI can be a useful method for handling missing data and the drawbacks of this approach. We illustrate MI when exploring the association between current asthma status and forced expiratory volume in 1 s after adjustment for potential confounders using data from a population-based longitudinal cohort study.

  16. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  17. Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm

    PubMed Central

    Hoffmann, Thomas J.; Zhan, Yiping; Kvale, Mark N.; Hesselson, Stephanie E.; Gollub, Jeremy; Iribarren, Carlos; Lu, Yontao; Mei, Gangwu; Purdy, Matthew M.; Quesenberry, Charles; Rowell, Sarah; Shapero, Michael H.; Smethurst, David; Somkin, Carol P.; Van den Eeden, Stephen K.; Walter, Larry; Webster, Teresa; Whitmer, Rachel A.; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil

    2012-01-01

    Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies. PMID:21903159

  18. Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data.

    PubMed

    Kadengye, Damazo T; Cools, Wilfried; Ceulemans, Eva; Van den Noortgate, Wim

    2012-06-01

    Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.

  19. Genetic diversity analysis of highly incomplete SNP genotype data with imputations: an empirical assessment.

    PubMed

    Fu, Yong-Bi

    2014-05-01

    Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, with up to 90% of observations missing. Here we performed an empirical assessment of accuracy in genetic diversity analysis of highly incomplete single nucleotide polymorphism genotypes with imputations. Three large single-nucleotide polymorphism genotype data sets for corn, wheat, and rice were acquired, and missing data with up to 90% of missing observations were randomly generated and then imputed for missing genotypes with three map-independent imputation methods. Estimating heterozygosity and inbreeding coefficient from original, missing, and imputed data revealed variable patterns of bias from assessed levels of missingness and genotype imputation, but the estimation biases were smaller for missing data without genotype imputation. The estimates of genetic differentiation were rather robust up to 90% of missing observations but became substantially biased when missing genotypes were imputed. The estimates of topology accuracy for four representative samples of interested groups generally were reduced with increased levels of missing genotypes. Probabilistic principal component analysis based imputation performed better in terms of topology accuracy than those analyses of missing data without genotype imputation. These findings are not only significant for understanding the reliability of the genetic diversity analysis with respect to large missing data and genotype imputation but also are instructive for performing a proper genetic diversity analysis of highly incomplete GBS or other genotype data. PMID:24626289

  20. Replication and Characterization of Association between ABO SNPs and Red Blood Cell Traits by Meta-Analysis in Europeans.

    PubMed

    McLachlan, Stela; Giambartolomei, Claudia; White, Jon; Charoen, Pimphen; Wong, Andrew; Finan, Chris; Engmann, Jorgen; Shah, Tina; Hersch, Micha; Podmore, Clara; Cavadino, Alana; Jefferis, Barbara J; Dale, Caroline E; Hypponen, Elina; Morris, Richard W; Casas, Juan P; Kumari, Meena; Ben-Shlomo, Yoav; Gaunt, Tom R; Drenos, Fotios; Langenberg, Claudia; Kuh, Diana; Kivimaki, Mika; Rueedi, Rico; Waeber, Gerard; Hingorani, Aroon D; Price, Jacqueline F; Walker, Ann P

    2016-01-01

    Red blood cell (RBC) traits are routinely measured in clinical practice as important markers of health. Deviations from the physiological ranges are usually a sign of disease, although variation between healthy individuals also occurs, at least partly due to genetic factors. Recent large scale genetic studies identified loci associated with one or more of these traits; further characterization of known loci and identification of new loci is necessary to better understand their role in health and disease and to identify potential molecular mechanisms. We performed meta-analysis of Metabochip association results for six RBC traits-hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC)-in 11 093 Europeans from seven studies of the UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium. We identified 394 non-overlapping SNPs in five loci at genome-wide significance: 6p22.1-6p21.33 (with HFE among others), 6q23.2 (with HBS1L among others), 6q23.3 (contains no genes), 9q34.3 (only ABO gene) and 22q13.1 (with TMPRSS6 among others), replicating previous findings of association with RBC traits at these loci and extending them by imputation to 1000 Genomes. We further characterized associations between ABO SNPs and three traits: hemoglobin, hematocrit and red blood cell count, replicating them in an independent cohort. Conditional analyses indicated the independent association of each of these traits with ABO SNPs and a role for blood group O in mediating the association. The 15 most significant RBC-associated ABO SNPs were also associated with five cardiometabolic traits, with discordance in the direction of effect between groups of traits, suggesting that ABO may act through more than one mechanism to influence cardiometabolic risk. PMID:27280446

  1. Replication and Characterization of Association between ABO SNPs and Red Blood Cell Traits by Meta-Analysis in Europeans.

    PubMed

    McLachlan, Stela; Giambartolomei, Claudia; White, Jon; Charoen, Pimphen; Wong, Andrew; Finan, Chris; Engmann, Jorgen; Shah, Tina; Hersch, Micha; Podmore, Clara; Cavadino, Alana; Jefferis, Barbara J; Dale, Caroline E; Hypponen, Elina; Morris, Richard W; Casas, Juan P; Kumari, Meena; Ben-Shlomo, Yoav; Gaunt, Tom R; Drenos, Fotios; Langenberg, Claudia; Kuh, Diana; Kivimaki, Mika; Rueedi, Rico; Waeber, Gerard; Hingorani, Aroon D; Price, Jacqueline F; Walker, Ann P

    2016-01-01

    Red blood cell (RBC) traits are routinely measured in clinical practice as important markers of health. Deviations from the physiological ranges are usually a sign of disease, although variation between healthy individuals also occurs, at least partly due to genetic factors. Recent large scale genetic studies identified loci associated with one or more of these traits; further characterization of known loci and identification of new loci is necessary to better understand their role in health and disease and to identify potential molecular mechanisms. We performed meta-analysis of Metabochip association results for six RBC traits-hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC)-in 11 093 Europeans from seven studies of the UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium. We identified 394 non-overlapping SNPs in five loci at genome-wide significance: 6p22.1-6p21.33 (with HFE among others), 6q23.2 (with HBS1L among others), 6q23.3 (contains no genes), 9q34.3 (only ABO gene) and 22q13.1 (with TMPRSS6 among others), replicating previous findings of association with RBC traits at these loci and extending them by imputation to 1000 Genomes. We further characterized associations between ABO SNPs and three traits: hemoglobin, hematocrit and red blood cell count, replicating them in an independent cohort. Conditional analyses indicated the independent association of each of these traits with ABO SNPs and a role for blood group O in mediating the association. The 15 most significant RBC-associated ABO SNPs were also associated with five cardiometabolic traits, with discordance in the direction of effect between groups of traits, suggesting that ABO may act through more than one mechanism to influence cardiometabolic risk.

  2. Replication and Characterization of Association between ABO SNPs and Red Blood Cell Traits by Meta-Analysis in Europeans

    PubMed Central

    McLachlan, Stela; Giambartolomei, Claudia; Charoen, Pimphen; Wong, Andrew; Finan, Chris; Engmann, Jorgen; Shah, Tina; Hersch, Micha; Cavadino, Alana; Jefferis, Barbara J.; Dale, Caroline E.; Hypponen, Elina; Morris, Richard W.; Casas, Juan P.; Kumari, Meena; Ben-Shlomo, Yoav; Gaunt, Tom R.; Drenos, Fotios; Langenberg, Claudia; Kuh, Diana; Kivimaki, Mika; Rueedi, Rico; Waeber, Gerard; Hingorani, Aroon D.; Price, Jacqueline F.

    2016-01-01

    Red blood cell (RBC) traits are routinely measured in clinical practice as important markers of health. Deviations from the physiological ranges are usually a sign of disease, although variation between healthy individuals also occurs, at least partly due to genetic factors. Recent large scale genetic studies identified loci associated with one or more of these traits; further characterization of known loci and identification of new loci is necessary to better understand their role in health and disease and to identify potential molecular mechanisms. We performed meta-analysis of Metabochip association results for six RBC traits—hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC)—in 11 093 Europeans from seven studies of the UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium. We identified 394 non-overlapping SNPs in five loci at genome-wide significance: 6p22.1-6p21.33 (with HFE among others), 6q23.2 (with HBS1L among others), 6q23.3 (contains no genes), 9q34.3 (only ABO gene) and 22q13.1 (with TMPRSS6 among others), replicating previous findings of association with RBC traits at these loci and extending them by imputation to 1000 Genomes. We further characterized associations between ABO SNPs and three traits: hemoglobin, hematocrit and red blood cell count, replicating them in an independent cohort. Conditional analyses indicated the independent association of each of these traits with ABO SNPs and a role for blood group O in mediating the association. The 15 most significant RBC-associated ABO SNPs were also associated with five cardiometabolic traits, with discordance in the direction of effect between groups of traits, suggesting that ABO may act through more than one mechanism to influence cardiometabolic risk. PMID:27280446

  3. Multiple imputation approaches for the analysis of dichotomized responses in longitudinal studies with missing data.

    PubMed

    Lu, Kaifeng; Jiang, Liqiu; Tsiatis, Anastasios A

    2010-12-01

    Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use the repeated-measures model to impute missing data on responder status as a result of subject dropout and apply the logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques (Rubin, 1987, in Multiple Imputation for Nonresponse in Surveys) that draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within- and between-imputation variances are found to be conservative. The frequentist multiple imputation approach that fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of Robins and Wang (2000, Biometrika 87, 113-124) is shown to be more efficient. We propose to apply (Kenward and Roger, 1997, Biometrics 53, 983-997) degrees of freedom to account for the uncertainty associated with variance-covariance parameter estimates for the repeated measures model. PMID:20337628

  4. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  5. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  6. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  7. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  8. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  9. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  10. Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

    PubMed Central

    Deng, Yi; Chang, Changgee; Ido, Moges Seyoum; Long, Qi

    2016-01-01

    Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples. PMID:26868061

  11. Data supporting the high-accuracy haplotype imputation using unphased genotype data as the references.

    PubMed

    Li, Wenzhi; Xu, Wei; He, Shaohua; Ma, Li; Song, Qing

    2016-09-01

    The data presented in this article is related to the research article entitled "High-accuracy haplotype imputation using unphased genotype data as the references" which reports the unphased genotype data can be used as reference for haplotyping imputation [1]. This article reports different implementation generation pipeline, the results of performance comparison between different implementations (A, B, and C) and between HiFi and three major imputation software tools. Our data showed that the performances of these three implementations are similar on accuracy, in which the accuracy of implementation-B is slightly but consistently higher than A and C. HiFi performed better on haplotype imputation accuracy and three other software performed slightly better on genotype imputation accuracy. These data may provide a strategy for choosing optimal phasing pipeline and software for different studies. PMID:27595130

  12. Combining multiple imputation and meta-analysis with individual participant data.

    PubMed

    Burgess, Stephen; White, Ian R; Resche-Rigon, Matthieu; Wood, Angela M

    2013-11-20

    Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. PMID:23703895

  13. 41 CFR 105-68.630 - May the General Services Administration impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Administration impute conduct of one person to another? 105-68.630 Section 105-68.630 Public Contracts and... Services Administration impute conduct of one person to another? For purposes of actions taken under this... to another individual, if the individual to whom the improper conduct is imputed either...

  14. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data

    PubMed Central

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  15. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

    PubMed

    Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  16. An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle.

    PubMed

    Sun, Chuanyu; Wu, Xiao-Lin; Weigel, Kent A; Rosa, Guilherme J M; Bauck, Stewart; Woodward, Brent W; Schnabel, Robert D; Taylor, Jeremy F; Gianola, Daniel

    2012-06-01

    Summary Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority 'voting' to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.

  17. MULTIPLE IMPUTATION FOR SHARING PRECISE GEOGRAPHIES IN PUBLIC USE DATA.

    PubMed

    Wang, Hao; Reiter, Jerome P

    2012-03-01

    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.

  18. A multiple imputation strategy for sequential multiple assignment randomized trials.

    PubMed

    Shortreed, Susan M; Laber, Eric; Scott Stroup, T; Pineau, Joelle; Murphy, Susan A

    2014-10-30

    Sequential multiple assignment randomized trials (SMARTs) are increasingly being used to inform clinical and intervention science. In a SMART, each patient is repeatedly randomized over time. Each randomization occurs at a critical decision point in the treatment course. These critical decision points often correspond to milestones in the disease process or other changes in a patient's health status. Thus, the timing and number of randomizations may vary across patients and depend on evolving patient-specific information. This presents unique challenges when analyzing data from a SMART in the presence of missing data. This paper presents the first comprehensive discussion of missing data issues typical of SMART studies: we describe five specific challenges and propose a flexible imputation strategy to facilitate valid statistical estimation and inference using incomplete data from a SMART. To illustrate these contributions, we consider data from the Clinical Antipsychotic Trial of Intervention and Effectiveness, one of the most well-known SMARTs to date. PMID:24919867

  19. Data imputation through the identification of local anomalies.

    PubMed

    Ozkan, Huseyin; Pelvan, Ozgun Soner; Kozat, Suleyman S

    2015-10-01

    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions.

  20. Dealing with missing values in large-scale studies: microarray data imputation and beyond.

    PubMed

    Aittokallio, Tero

    2010-03-01

    High-throughput biotechnologies, such as gene expression microarrays or mass-spectrometry-based proteomic assays, suffer from frequent missing values due to various experimental reasons. Since the missing data points can hinder downstream analyses, there exists a wide variety of ways in which to deal with missing values in large-scale data sets. Nowadays, it has become routine to estimate (or impute) the missing values prior to the actual data analysis. After nearly a decade since the publication of the first missing value imputation methods for gene expression microarray data, new imputation approaches are still being developed at an increasing rate. However, what is lagging behind is a systematic and objective evaluation of the strengths and weaknesses of the different approaches when faced with different types of data sets and experimental questions. In this review, the present strategies for missing value imputation and the measures for evaluating their performance are described. The imputation methods are first reviewed in the context of gene expression microarray data, since most of the methods have been developed for estimating gene expression levels; then, we turn to other large-scale data sets that also suffer from the problems posed by missing values, together with pointers to possible imputation approaches in these settings. Along with a description of the basic principles behind the different imputation approaches, the review tries to provide practical guidance for the users of high-throughput technologies on how to choose the imputation tool for their data and questions, and some additional research directions for the developers of imputation methodologies. PMID:19965979

  1. Multiple imputation as a means to assess Mammographic vs. Ultrasound technology in Determine Breast Cancer Recurrence

    NASA Astrophysics Data System (ADS)

    Helenowski, Irene B.; Demirtas, Hakan; Khan, Seema; Eladoumikdachi, Firas; Shidfar, Ali

    2014-03-01

    Tumor size based on mammographic and ultrasound data are two methods used in predicting recurrence in breast cancer patients. Which technology offers better determination of diagnosis is an ongoing debate among radiologists, biophysicists, and other clinicians, however. Further complications in assessing the performance of each technology arise from missing data. One approach to remedy this problem may involve multiple imputation. Here, we therefore examine how imputation affects our assessment of the relationship between recurrence and tumor size determined either by mammography of ultrasound technology. We specifically employ the semi-parametric approach for imputing mixed continuous and binary data as presented in Helenowski and Demirtas (2013).

  2. Potentially functional SNPs (pfSNPs) as novel genomic predictors of 5-FU response in metastatic colorectal cancer patients.

    PubMed

    Wang, Jingbo; Wang, Xu; Zhao, Mingjue; Choo, Su Pin; Ong, Sin Jen; Ong, Simon Y K; Chong, Samuel S; Teo, Yik Ying; Lee, Caroline G L

    2014-01-01

    5-Fluorouracil (5-FU) and its pro-drug Capecitabine have been widely used in treating colorectal cancer. However, not all patients will respond to the drug, hence there is a need to develop reliable early predictive biomarkers for 5-FU response. Here, we report a novel potentially functional Single Nucleotide Polymorphism (pfSNP) approach to identify SNPs that may serve as predictive biomarkers of response to 5-FU in Chinese metastatic colorectal cancer (CRC) patients. 1547 pfSNPs and one variable number tandem repeat (VNTR) in 139 genes in 5-FU drug (both PK and PD pathway) and colorectal cancer disease pathways were examined in 2 groups of CRC patients. Shrinkage of liver metastasis measured by RECIST criteria was used as the clinical end point. Four non-responder-specific pfSNPs were found to account for 37.5% of all non-responders (P<0.0003). Five additional pfSNPs were identified from a multivariate model (AUC under ROC = 0.875) that was applied for all other pfSNPs, excluding the non-responder-specific pfSNPs. These pfSNPs, which can differentiate the other non-responders from responders, mainly reside in tumor suppressor genes or genes implicated in colorectal cancer risk. Hence, a total of 9 novel SNPs with potential functional significance may be able to distinguish non-responders from responders to 5-FU. These pfSNPs may be useful biomarkers for predicting response to 5-FU.

  3. Localization of Allotetraploid Gossypium SNPs Using Physical Mapping Resources

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent efforts in Gossypium SNP development have produced thousands of putative SNPs for G. barbadense, G. mustelinum, and G. tomentosum relative to G. hirsutum. Here we report on current efforts to localize putative SNPs using physical mapping resources. Recent advances in physical mapping resour...

  4. Joint multiple imputation for longitudinal outcomes and clinical events that truncate longitudinal follow-up.

    PubMed

    Hu, Bo; Li, Liang; Greene, Tom

    2016-07-30

    Longitudinal cohort studies often collect both repeated measurements of longitudinal outcomes and times to clinical events whose occurrence precludes further longitudinal measurements. Although joint modeling of the clinical events and the longitudinal data can be used to provide valid statistical inference for target estimands in certain contexts, the application of joint models in medical literature is currently rather restricted because of the complexity of the joint models and the intensive computation involved. We propose a multiple imputation approach to jointly impute missing data of both the longitudinal and clinical event outcomes. With complete imputed datasets, analysts are then able to use simple and transparent statistical methods and standard statistical software to perform various analyses without dealing with the complications of missing data and joint modeling. We show that the proposed multiple imputation approach is flexible and easy to implement in practice. Numerical results are also provided to demonstrate its performance. Copyright © 2015 John Wiley & Sons, Ltd.

  5. On the value of Mendelian laws of segregation in families: data quality control, imputation and beyond

    PubMed Central

    Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.

    2014-01-01

    When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups “Quality Control” and “Dropping WGS through families using GWAS framework” focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184

  6. Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.

    PubMed

    Blue, Elizabeth M; Sun, Lei; Tintle, Nathan L; Wijsman, Ellen M

    2014-09-01

    When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.

  7. A global learning with local preservation method for microarray data imputation.

    PubMed

    Chen, Ye; Wang, Aiguo; Ding, Huitong; Que, Xia; Li, Yabo; An, Ning; Jiang, Lili

    2016-10-01

    Microarray data suffer from missing values for various reasons, including insufficient resolution, image noise, and experimental errors. Because missing values can hinder downstream analysis steps that require complete data as input, it is crucial to be able to estimate the missing values. In this study, we propose a Global Learning with Local Preservation method (GL2P) for imputation of missing values in microarray data. GL2P consists of two components: a local similarity measurement module and a global weighted imputation module. The former uses a local structure preservation scheme to exploit as much information as possible from the observable data, and the latter is responsible for estimating the missing values of a target gene by considering all of its neighbors rather than a subset of them. Furthermore, GL2P imputes the missing values in ascending order according to the rate of missing data for each target gene to fully utilize previously estimated values. To validate the proposed method, we conducted extensive experiments on six benchmarked microarray datasets. We compared GL2P with eight state-of-the-art imputation methods in terms of four performance metrics. The experimental results indicate that GL2P outperforms its competitors in terms of imputation accuracy and better preserves the structure of differentially expressed genes. In addition, GL2P is less sensitive to the number of neighbors than other local learning-based imputation methods.

  8. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  9. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

  10. Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3

    PubMed Central

    Herold, Christine; Hooli, Basavaraj V.; Mullin, Kristina; Liu, Tian; Roehr, Johannes T; Mattheisen, Manuel; Parrado, Antonio R.; Bertram, Lars; Lange, Christoph; Tanzi, Rudolph E.

    2015-01-01

    The genetic basis of Alzheimer's disease (AD) is complex and heterogeneous. Over 200 highly penetrant pathogenic variants in the genes APP, PSEN1 and PSEN2 cause a subset of early-onset familial Alzheimer's disease (EOFAD). On the other hand, susceptibility to late-onset forms of AD (LOAD) is indisputably associated to the ε4 allele in the gene APOE, and more recently to variants in more than two-dozen additional genes identified in the large-scale genome-wide association studies (GWAS) and meta-analyses reports. Taken together however, although the heritability in AD is estimated to be as high as 80%, a large proportion of the underlying genetic factors still remain to be elucidated. In this study we performed a systematic family-based genome-wide association and meta-analysis on close to 15 million imputed variants from three large collections of AD families (~3,500 subjects from 1,070 families). Using a multivariate phenotype combining affection status and onset age, meta-analysis of the association results revealed three single nucleotide polymorphisms (SNPs) that achieved genome-wide significance for association with AD risk: rs7609954 in the gene PTPRG (P-value = 3.98·10−08), rs1347297 in the gene OSBPL6 (P-value = 4.53·10−08), and rs1513625 near PDCL3 (P-value = 4.28·10−08). In addition, rs72953347 in OSBPL6 (P-value = 6.36·10−07) and two SNPs in the gene CDKAL1 showed marginally significant association with LOAD (rs10456232, P-value: 4.76·10−07; rs62400067, P-value: 3.54·10−07). In summary, family-based GWAS meta-analysis of imputed SNPs revealed novel genomic variants in (or near) PTPRG, OSBPL6, and PDCL3 that influence risk for AD with genome-wide significance. PMID:26830138

  11. In Silico Analysis of FMR1 Gene Missense SNPs.

    PubMed

    Tekcan, Akin

    2016-06-01

    The FMR1 gene, a member of the fragile X-related gene family, is responsible for fragile X syndrome (FXS). Missense single-nucleotide polymorphisms (SNPs) are responsible for many complex diseases. The effect of FMR1 gene missense SNPs is unknown. The aim of this study, using in silico techniques, was to analyze all known missense mutations that can affect the functionality of the FMR1 gene, leading to mental retardation (MR) and FXS. Data on the human FMR1 gene were collected from the Ensembl database (release 81), National Centre for Biological Information dbSNP Short Genetic Variations database, 1000 Genomes Browser, and NHLBI Exome Sequencing Project Exome Variant Server. In silico analysis was then performed. One hundred-twenty different missense SNPs of the FMR1 gene were determined. Of these, 11.66 % of the FMR1 gene missense SNPs were in highly conserved domains, and 83.33 % were in domains with high variety. The results of the in silico prediction analysis showed that 31.66 % of the FMR1 gene SNPs were disease related and that 50 % of SNPs had a pathogenic effect. The results of the structural and functional analysis revealed that although the R138Q mutation did not seem to have a damaging effect on the protein, the G266E and I304N SNPs appeared to disturb the interaction between the domains and affect the function of the protein. This is the first study to analyze all missense SNPs of the FMR1 gene. The results indicate the applicability of a bioinformatics approach to FXS and other FMR1-related diseases. I think that the analysis of FMR1 gene missense SNPs using bioinformatics methods would help diagnosis of FXS and other FMR1-related diseases. PMID:26880065

  12. In Silico Analysis of FMR1 Gene Missense SNPs.

    PubMed

    Tekcan, Akin

    2016-06-01

    The FMR1 gene, a member of the fragile X-related gene family, is responsible for fragile X syndrome (FXS). Missense single-nucleotide polymorphisms (SNPs) are responsible for many complex diseases. The effect of FMR1 gene missense SNPs is unknown. The aim of this study, using in silico techniques, was to analyze all known missense mutations that can affect the functionality of the FMR1 gene, leading to mental retardation (MR) and FXS. Data on the human FMR1 gene were collected from the Ensembl database (release 81), National Centre for Biological Information dbSNP Short Genetic Variations database, 1000 Genomes Browser, and NHLBI Exome Sequencing Project Exome Variant Server. In silico analysis was then performed. One hundred-twenty different missense SNPs of the FMR1 gene were determined. Of these, 11.66 % of the FMR1 gene missense SNPs were in highly conserved domains, and 83.33 % were in domains with high variety. The results of the in silico prediction analysis showed that 31.66 % of the FMR1 gene SNPs were disease related and that 50 % of SNPs had a pathogenic effect. The results of the structural and functional analysis revealed that although the R138Q mutation did not seem to have a damaging effect on the protein, the G266E and I304N SNPs appeared to disturb the interaction between the domains and affect the function of the protein. This is the first study to analyze all missense SNPs of the FMR1 gene. The results indicate the applicability of a bioinformatics approach to FXS and other FMR1-related diseases. I think that the analysis of FMR1 gene missense SNPs using bioinformatics methods would help diagnosis of FXS and other FMR1-related diseases.

  13. Shrinkage regression-based methods for microarray missing value imputation

    PubMed Central

    2013-01-01

    Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods. PMID:24565159

  14. Potentially functional SNPs (pfSNPs) as novel genomic predictors of 5-FU response in metastatic colorectal cancer patients.

    PubMed

    Wang, Jingbo; Wang, Xu; Zhao, Mingjue; Choo, Su Pin; Ong, Sin Jen; Ong, Simon Y K; Chong, Samuel S; Teo, Yik Ying; Lee, Caroline G L

    2014-01-01

    5-Fluorouracil (5-FU) and its pro-drug Capecitabine have been widely used in treating colorectal cancer. However, not all patients will respond to the drug, hence there is a need to develop reliable early predictive biomarkers for 5-FU response. Here, we report a novel potentially functional Single Nucleotide Polymorphism (pfSNP) approach to identify SNPs that may serve as predictive biomarkers of response to 5-FU in Chinese metastatic colorectal cancer (CRC) patients. 1547 pfSNPs and one variable number tandem repeat (VNTR) in 139 genes in 5-FU drug (both PK and PD pathway) and colorectal cancer disease pathways were examined in 2 groups of CRC patients. Shrinkage of liver metastasis measured by RECIST criteria was used as the clinical end point. Four non-responder-specific pfSNPs were found to account for 37.5% of all non-responders (P<0.0003). Five additional pfSNPs were identified from a multivariate model (AUC under ROC = 0.875) that was applied for all other pfSNPs, excluding the non-responder-specific pfSNPs. These pfSNPs, which can differentiate the other non-responders from responders, mainly reside in tumor suppressor genes or genes implicated in colorectal cancer risk. Hence, a total of 9 novel SNPs with potential functional significance may be able to distinguish non-responders from responders to 5-FU. These pfSNPs may be useful biomarkers for predicting response to 5-FU. PMID:25372392

  15. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed Central

    Meseck, Kristin; Jankowska, Marta M.; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and post-imputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  16. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets

    SciTech Connect

    Torres-García, Wandaliz; Brown, Steven D; Johnson, Roger; Zhang, Weiwen; Runger, George; Meldrum, Deirdre

    2011-01-01

    Despite significant improvements in recent years, proteomic datasets currently available still suffer large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic da-tasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values for proteins experi-mentally undetected, using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes expression was measured after the cells were exposed to 1 mM potassium chromate for 5-, 30-, 60-, and 90-min, while protein abundance was measured only for 45- and 90-min samples. With the goal of elucidating the relationship between temporal gene expression and protein abundance data, and then using it to impute missing proteomic values for samples of 45-min (which does not have cognate transcriptomic data) and 90-min, we initially used nonlinear Smoothing Splines Curve Fitting (SSCF) to identify temporal relationships among transcriptomic data at different time points and then imputed missing gene expression measurements for the sample at 45-min. After the imputation was validated by biological constrains (i.e. operons), we used a data-driven Gradient Boosted Trees (GBT) model to uncover possible non-linear relationships between temporal transcriptomic and proteomic data, and to impute protein abundance for the proteins experimentally undetected in the 45- and 90-min sam-ples, based on relevant predictors such as temporal mRNA gene expression data, cellular roles, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. The imputed protein values were validated using biological constraints such as operon, regulon and pathway information. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.

  17. Prediction of functional regulatory SNPs in monogenic and complex disease

    PubMed Central

    Zhao, Yiqiang; Clark, Wyatt T.; Mort, Matthew; Cooper, David N.; Radivojac, Predrag; Mooney, Sean D.

    2013-01-01

    Next-Generation Sequencing (NGS) technologies are yielding ever-higher volumes of human genome sequence data. Given this large amount of data, it has become both a possibility and a priority to determine how disease-causing single nucleotide polymorphisms (SNPs) detected within gene regulatory regions (rSNPs) exert their effects on gene expression. Recently, several studies have explored whether disease-causing polymorphisms have attributes that can distinguish them from those that are neutral, attaining moderate success at discriminating between functional and putatively neutral regulatory SNPs. Here, we have extended this work by assessing the utility of both SNP-based features (those associated only with the polymorphism site and the surrounding DNA) and Gene-based features (those derived from the associated gene in whose regulatory region the SNP lies) in the identification of functional regulatory polymorphisms involved in either monogenic or complex disease. Gene-based features were found to be capable of both augmenting and enhancing the utility of SNP-based features in the prediction of known regulatory mutations. Adopting this approach, we achieved an AUC of 0.903 for predicting regulatory SNPs. Finally, our tool predicted 225 new regulatory SNPs with a high degree of confidence, with 105 of the 225 falling into linkage disequilibrium blocks of reported disease-associated GWAS SNPs. PMID:21796725

  18. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy

    PubMed Central

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy. PMID:26283989

  19. Imputation method for lifetime exposure assessment in air pollution epidemiologic studies

    PubMed Central

    2013-01-01

    Background Environmental epidemiology, when focused on the life course of exposure to a specific pollutant, requires historical exposure estimates that are difficult to obtain for the full time period due to gaps in the historical record, especially in earlier years. We show that these gaps can be filled by applying multiple imputation methods to a formal risk equation that incorporates lifetime exposure. We also address challenges that arise, including choice of imputation method, potential bias in regression coefficients, and uncertainty in age-at-exposure sensitivities. Methods During time periods when parameters needed in the risk equation are missing for an individual, the parameters are filled by an imputation model using group level information or interpolation. A random component is added to match the variance found in the estimates for study subjects not needing imputation. The process is repeated to obtain multiple data sets, whose regressions against health data can be combined statistically to develop confidence limits using Rubin’s rules to account for the uncertainty introduced by the imputations. To test for possible recall bias between cases and controls, which can occur when historical residence location is obtained by interview, and which can lead to misclassification of imputed exposure by disease status, we introduce an “incompleteness index,” equal to the percentage of dose imputed (PDI) for a subject. “Effective doses” can be computed using different functional dependencies of relative risk on age of exposure, allowing intercomparison of different risk models. To illustrate our approach, we quantify lifetime exposure (dose) from traffic air pollution in an established case–control study on Long Island, New York, where considerable in-migration occurred over a period of many decades. Results The major result is the described approach to imputation. The illustrative example revealed potential recall bias, suggesting that regressions

  20. Can we spin straw into gold? An evaluation of immigrant legal status imputation approaches.

    PubMed

    Van Hook, Jennifer; Bachmeier, James D; Coffman, Donna L; Harel, Ofer

    2015-02-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants' legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants' legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases.

  1. Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data

    PubMed Central

    Edwards, Jessie K.; Cole, Stephen R.; Troester, Melissa A.; Richardson, David B.

    2013-01-01

    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods. PMID:24627573

  2. PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

    PubMed Central

    Livne, Oren E.; Han, Lide; Alkorta-Aranburu, Gorka; Wentworth-Sheilds, William; Abney, Mark; Ober, Carole; Nicolae, Dan L.

    2015-01-01

    Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost. PMID:25735005

  3. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs)

    PubMed Central

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K.; Wang, Qin; Dennis, Joe; Alonso, M. Rosario; Andrulis, Irene L.; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W.; Benitez, Javier; Bogdanova, Natalia V.; Bojesen, Stig E.; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M.; Couch, Fergus J.; Cox, Angela; Cross, Simon S.; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F.; Fasching, Peter A.; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G.; Goldberg, Mark S.; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A.; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L.; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L.; Muir, Kenneth; Neuhausen, Susan L.; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J.; Schmidt, Marjanka K.; Schmutzler, Rita K.; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C.; Stram, Daniel O.; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H.; Tessier, Daniel C.; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M.; Vincent, Daniel; Winqvist, Robert; Wu, Anna H.; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D. P.; Hall, Per; Edwards, Stacey L.; Simard, Jacques; French, Juliet D.; Chenevix-Trench, Georgia; Dunning, Alison M.

    2016-01-01

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90–0.94; P = 8.96 × 10−15)) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10−09, r2 = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10−11, r2 = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus. PMID:27600471

  4. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs).

    PubMed

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W; Benitez, Javier; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M; Couch, Fergus J; Cox, Angela; Cross, Simon S; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F; Fasching, Peter A; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G; Goldberg, Mark S; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L; Muir, Kenneth; Neuhausen, Susan L; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Stram, Daniel O; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tessier, Daniel C; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M; Vincent, Daniel; Winqvist, Robert; Wu, Anna H; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D P; Hall, Per; Edwards, Stacey L; Simard, Jacques; French, Juliet D; Chenevix-Trench, Georgia; Dunning, Alison M

    2016-01-01

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90-0.94; P = 8.96 × 10(-15))) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10(-09), r(2) = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10(-11), r(2) = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus. PMID:27600471

  5. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    PubMed

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn G A; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John R B

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7,879,351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation.

  6. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers.

    PubMed

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  7. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation

    PubMed Central

    Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.

    2015-01-01

    This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level. PMID:26377959

  8. Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation.

    PubMed

    Jackson, Dan; White, Ian R; Seaman, Shaun; Evans, Hannah; Baisley, Kathy; Carpenter, James

    2014-11-30

    The Cox proportional hazards model is frequently used in medical statistics. The standard methods for fitting this model rely on the assumption of independent censoring. Although this is sometimes plausible, we often wish to explore how robust our inferences are as this untestable assumption is relaxed. We describe how this can be carried out in a way that makes the assumptions accessible to all those involved in a research project. Estimation proceeds via multiple imputation, where censored failure times are imputed under user-specified departures from independent censoring. A novel aspect of our method is the use of bootstrapping to generate proper imputations from the Cox model. We illustrate our approach using data from an HIV-prevention trial and discuss how it can be readily adapted and applied in other settings. PMID:25060703

  9. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    PubMed Central

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  10. A model-based approach to selection of tag SNPs

    PubMed Central

    Nicolas, Pierre; Sun, Fengzhu; Li, Lei M

    2006-01-01

    Background Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. Results Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. Conclusion Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping

  11. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and postimputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  12. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    PubMed

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-05-31

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and postimputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset.

  13. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which

  14. 31 CFR 19.630 - May the Department of the Treasury impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ...'s knowledge, approval or acquiescence. The organization's acceptance of the benefits derived from...: (a) Conduct imputed from an individual to an organization. We may impute the fraudulent, criminal, or... associated with an organization, to that organization when the improper conduct occurred in connection...

  15. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...'s knowledge, approval or acquiescence. The organization's acceptance of the benefits derived from...: (a) Conduct imputed from an individual to an organization. We may impute the fraudulent, criminal, or... associated with an organization, to that organization when the improper conduct occurred in connection...

  16. 31 CFR 19.630 - May the Department of the Treasury impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... impute conduct of one person to another? 19.630 Section 19.630 Money and Finance: Treasury Office of the... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  17. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  18. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  19. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  20. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  1. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  2. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  3. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  4. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  5. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  6. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  7. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  8. Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

    PubMed Central

    Lee, Minjung; Dignam, James J.; Han, Junhee

    2014-01-01

    We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107

  9. 22 CFR 208.630 - May the U.S. Agency for International Development impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... Development impute conduct of one person to another? 208.630 Section 208.630 Foreign Relations AGENCY FOR... impute conduct of one person to another? For purposes of actions taken under this rule, we may impute..., or other improper conduct of any organization to an individual, or from one individual to...

  10. 22 CFR 208.630 - May the U.S. Agency for International Development impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... Development impute conduct of one person to another? 208.630 Section 208.630 Foreign Relations AGENCY FOR... impute conduct of one person to another? For purposes of actions taken under this rule, we may impute..., or other improper conduct of any organization to an individual, or from one individual to...

  11. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough

    PubMed Central

    McMahon, George; Ring, Susan M.; Davey-Smith, George; Timpson, Nicholas J.

    2015-01-01

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case–control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E − 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  12. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough.

    PubMed

    McMahon, George; Ring, Susan M; Davey-Smith, George; Timpson, Nicholas J

    2015-10-15

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case-control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E - 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy.

  13. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    NASA Astrophysics Data System (ADS)

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  14. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their…

  15. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

  16. Generating Multiple Imputations for Matrix Sampling Data Analyzed with Item Response Models.

    ERIC Educational Resources Information Center

    Thomas, Neal; Gan, Nianci

    1997-01-01

    Describes and assesses missing data methods currently used to analyze data from matrix sampling designs implemented by the National Assessment of Educational Progress. Several improved methods are developed, and these models are evaluated using an EM algorithm to obtain maximum likelihood estimates followed by multiple imputation of complete data…

  17. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  18. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... to another? 919.630 Section 919.630 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  19. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... to another? 919.630 Section 919.630 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  20. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... to another? 919.630 Section 919.630 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  1. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... to another? 919.630 Section 919.630 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  2. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... to another? 919.630 Section 919.630 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... one person to another? For purposes of actions taken under this rule, we may impute conduct as follows... improper conduct of any organization to an individual, or from one individual to another individual, if...

  3. Evaluation of an Imputed Pitch Velocity Model of the Auditory Kappa Effect

    ERIC Educational Resources Information Center

    Henry, Molly J.; McAuley, J. Devin

    2009-01-01

    Three experiments evaluated an imputed pitch velocity model of the auditory kappa effect. Listeners heard 3-tone sequences and judged the timing of the middle (target) tone relative to the timing of the 1st and 3rd (bounding) tones. Experiment 1 held pitch constant but varied the time (T) interval between bounding tones (T = 728, 1,000, or 1,600…

  4. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    ERIC Educational Resources Information Center

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a…

  5. Disk filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  6. Disk filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  7. Imputation methods for temporal radiographic texture analysis in the detection of periprosthetic osteolysis

    NASA Astrophysics Data System (ADS)

    Wilkie, Joel R.; Giger, Maryellen L.; Pesce, Lorenzo L.; Engh, Charles A., Sr.; Hopper, Robert H., Jr.; Martell, John M.

    2007-03-01

    Periprosthetic osteolysis is a disease triggered by the body's response to tiny wear fragments from total hip replacements (THR), which leads to localized bone loss and disappearance of the trabecular bone texture. We have been investigating methods of temporal radiographic texture analysis (tRTA) to help detect periprosthetic osteolysis. One method involves merging feature measurements at multiple time points using an LDA or BANN. The major drawback of this method is that several cases do not meet the inclusion criteria because of missing data, i.e., missing image data at the necessary time intervals. In this research, we investigated imputation methods to fill in missing data points using feature averaging, linear interpolation, and first and second order polynomial fitting. The database consisted of 101 THR cases with full data available from four follow-up intervals. For 200 iterations, missing data were randomly created to simulate a typical THR database, and the missing points were then filled in using the imputation methods. ROC analysis was used to assess the performance of tRTA in distinguishing between osteolysis and normal cases for the full database and each simulated database. The calculated values from the 200 iterations showed that the imputation methods produced negligible bias, and substantially decreased the variance of the AUC estimator, relative to excluding incomplete cases. The best performing imputation methods were those that heavily weighted the data points closest to the missing data. The results suggest that these imputation methods appear to be acceptable means to include cases with missing data for tRTA.

  8. Analysis of accelerated failure time data with dependent censoring using auxiliary variables via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Taylor, Jeremy M G; Hu, Chengcheng

    2015-08-30

    We consider the situation of estimating the marginal survival distribution from censored data subject to dependent censoring using auxiliary variables. We had previously developed a nonparametric multiple imputation approach. The method used two working proportional hazards (PH) models, one for the event times and the other for the censoring times, to define a nearest neighbor imputing risk set. This risk set was then used to impute failure times for censored observations. Here, we adapt the method to the situation where the event and censoring times follow accelerated failure time models and propose to use the Buckley-James estimator as the two working models. Besides studying the performances of the proposed method, we also compare the proposed method with two popular methods for handling dependent censoring through the use of auxiliary variables, inverse probability of censoring weighted and parametric multiple imputation methods, to shed light on the use of them. In a simulation study with time-independent auxiliary variables, we show that all approaches can reduce bias due to dependent censoring. The proposed method is robust to misspecification of either one of the two working models and their link function. This indicates that a working proportional hazards model is preferred because it is more cumbersome to fit an accelerated failure time model. In contrast, the inverse probability of censoring weighted method is not robust to misspecification of the link function of the censoring time model. The parametric imputation methods rely on the specification of the event time model. The approaches are applied to a prostate cancer dataset.

  9. A NONPARAMETRIC MULTIPLE IMPUTATION APPROACH FOR DATA WITH MISSING COVARIATE VALUES WITH APPLICATION TO COLORECTAL ADENOMA DATA

    PubMed Central

    Hsu, Chiu-Hsieh; Long, Qi; Li, Yisheng; Jacobs, Elizabeth

    2015-01-01

    A nearest neighbor-based multiple imputation approach is proposed to recover missing covariate information using the predictive covariates while estimating the association between the outcome and the covariates. To conduct the imputation, two working models are fitted to define an imputing set. This approach is expected to be robust to the underlying distribution of the data. We show in simulation and demonstrate on a colorectal data set that the proposed approach can improve efficiency and reduce bias in a situation with missing at random compared to the complete case analysis and the modified inverse probability weighted method. PMID:24697618

  10. Water Filters

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The Aquaspace H2OME Guardian Water Filter, available through Western Water International, Inc., reduces lead in water supplies. The filter is mounted on the faucet and the filter cartridge is placed in the "dead space" between sink and wall. This filter is one of several new filtration devices using the Aquaspace compound filter media, which combines company developed and NASA technology. Aquaspace filters are used in industrial, commercial, residential, and recreational environments as well as by developing nations where water is highly contaminated.

  11. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    PubMed Central

    2013-01-01

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while

  12. Biological Filters.

    ERIC Educational Resources Information Center

    Klemetson, S. L.

    1978-01-01

    Presents the 1978 literature review of wastewater treatment. The review is concerned with biological filters, and it covers: (1) trickling filters; (2) rotating biological contractors; and (3) miscellaneous reactors. A list of 14 references is also presented. (HM)

  13. From SNPs to Genes: Disease Association at the Gene Level

    PubMed Central

    Lehne, Benjamin; Lewis, Cathryn M.; Schlitt, Thomas

    2011-01-01

    Interpreting Genome-Wide Association Studies (GWAS) at a gene level is an important step towards understanding the molecular processes that lead to disease. In order to incorporate prior biological knowledge such as pathways and protein interactions in the analysis of GWAS data it is necessary to derive one measure of association for each gene. We compare three different methods to obtain gene-wide test statistics from Single Nucleotide Polymorphism (SNP) based association data: choosing the test statistic from the most significant SNP; the mean test statistics of all SNPs; and the mean of the top quartile of all test statistics. We demonstrate that the gene-wide test statistics can be controlled for the number of SNPs within each gene and show that all three methods perform considerably better than expected by chance at identifying genes with confirmed associations. By applying each method to GWAS data for Crohn's Disease and Type 1 Diabetes we identified new potential disease genes. PMID:21738570

  14. SNP-VISTA: An Interactive SNPs Visualization Tool

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.

  15. FILTER TREATMENT

    DOEpatents

    Sutton, J.B.; Torrey, J.V.P.

    1958-08-26

    A process is described for reconditioning fused alumina filters which have become clogged by the accretion of bismuth phosphate in the filter pores, The method consists in contacting such filters with faming sulfuric acid, and maintaining such contact for a substantial period of time.

  16. Water Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    A compact, lightweight electrolytic water filter generates silver ions in concentrations of 50 to 100 parts per billion in the water flow system. Silver ions serve as effective bactericide/deodorizers. Ray Ward requested and received from NASA a technical information package on the Shuttle filter, and used it as basis for his own initial development, a home use filter.

  17. Accounting for Dependence Induced by Weighted KNN Imputation in Paired Samples, Motivated by a Colorectal Cancer Study

    PubMed Central

    Suyundikov, Anvar; Stevens, John R.; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K.; Slattery, Martha L.

    2015-01-01

    Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control. PMID:25849489

  18. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    PubMed Central

    Jattawa, Danai; Elzo, Mauricio A.; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-01-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  19. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population.

    PubMed

    Jattawa, Danai; Elzo, Mauricio A; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-04-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

  20. Variable selection in the presence of missing data: resampling and imputation.

    PubMed

    Long, Qi; Johnson, Brent A

    2015-07-01

    In the presence of missing data, variable selection methods need to be tailored to missing data mechanisms and statistical approaches used for handling missing data. We focus on the mechanism of missing at random and variable selection methods that can be combined with imputation. We investigate a general resampling approach (BI-SS) that combines bootstrap imputation and stability selection, the latter of which was developed for fully observed data. The proposed approach is general and can be applied to a wide range of settings. Our extensive simulation studies demonstrate that the performance of BI-SS is the best or close to the best and is relatively insensitive to tuning parameter values in terms of variable selection, compared with several existing methods for both low-dimensional and high-dimensional problems. The proposed approach is further illustrated using two applications, one for a low-dimensional problem and the other for a high-dimensional problem.

  1. Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys

    PubMed Central

    Rendall, Michael S.; Ghosh-Dastidar, Bonnie; Weden, Margaret M.; Baker, Elizabeth H.; Nazarov, Zafar

    2013-01-01

    Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys. PMID:24223447

  2. Comparison of SNPs and microsatellites in identifying offtypes of cacao clones from Cameroon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single Nucleotide Polymorphism (SNP) markers are increasingly being used in crop breeding programs, slowly replacing microsatellites and other markers. SNPs provide many benefits over microsatellites, including ease of analysis and unambiguous results across various platforms. We compare SNPs to m...

  3. Normalization and missing value imputation for label-free LC-MS analysis.

    PubMed

    Karpievitch, Yuliya V; Dabney, Alan R; Smith, Richard D

    2012-01-01

    Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

  4. Normalization and missing value imputation for label-free LC-MS analysis

    SciTech Connect

    Karpievitch, Yuliya; Dabney, Alan R.; Smith, Richard D.

    2012-11-05

    Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

  5. Microarray missing data imputation based on a set theoretic framework and biological knowledge.

    PubMed

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2006-01-01

    Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.

  6. Imputing Observed Blood Pressure for Antihypertensive Treatment: Impact on Population and Genetic Analyses

    PubMed Central

    2014-01-01

    BACKGROUND Elevated blood pressure (BP), a heritable risk factor for many age-related disorders, is commonly investigated in population and genetic studies, but antihypertensive use can confound study results. Routine methods to adjust for antihypertensives may not sufficiently account for newer treatment protocols (i.e., combination or multiple drug therapy) found in contemporary cohorts. METHODS We refined an existing method to impute unmedicated BP in individuals on antihypertensives by incorporating new treatment trends. We assessed BP and antihypertensive use in male twins (n = 1,237) from the Vietnam Era Twin Study of Aging: 36% reported antihypertensive use; 52% of those treated were on multiple drugs. RESULTS Estimated heritability was 0.43 (95% confidence interval (CI) = 0.20–0.50) and 0.44 (95% CI = 0.22–0.61) for measured systolic BP (SBP) and diastolic BP (DBP), respectively. We imputed BP for antihypertensives by 3 approaches: (i) addition of a fixed value of 10/5mm Hg to measured SBP/DBP; (ii) incremented addition of mm Hg to BP based on number of medications; and (iii) a refined approach adding mm Hg based on antihypertensive drug class and ethnicity. The imputations did not significantly affect estimated heritability of BP. However, use of our most refined imputation method and other methods resulted in significantly increased phenotypic correlations between BP and body mass index, a trait known to be correlated with BP. CONCLUSIONS This study highlights the potential usefulness of applying a representative adjustment for medication use, such as by considering drug class, ethnicity, and the combination of drugs when assessing the relationship between BP and risk factors. PMID:24532572

  7. A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers.

    PubMed

    Turrado, Concepción Crespo; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés-José; Melero, Manuel G; de Cos Juez, Francisco Javier

    2016-01-01

    The storage of data is a key process in the study of electrical power networks related to the search for harmonics and the finding of a lack of balance among phases. The presence of missing data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, current in each phase and power factor) affects any time series study in a negative way that has to be addressed. When this occurs, missing data imputation algorithms are required. These algorithms are able to substitute the data that are missing for estimated values. This research presents a new algorithm for the missing data imputation method based on Self-Organized Maps Neural Networks and Mahalanobis distances and compares it not only with a well-known technique called Multivariate Imputation by Chained Equations (MICE) but also with an algorithm previously proposed by the authors called Adaptive Assignation Algorithm (AAA). The results obtained demonstrate how the proposed method outperforms both algorithms. PMID:27626419

  8. A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers.

    PubMed

    Turrado, Concepción Crespo; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés-José; Melero, Manuel G; de Cos Juez, Francisco Javier

    2016-01-01

    The storage of data is a key process in the study of electrical power networks related to the search for harmonics and the finding of a lack of balance among phases. The presence of missing data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, current in each phase and power factor) affects any time series study in a negative way that has to be addressed. When this occurs, missing data imputation algorithms are required. These algorithms are able to substitute the data that are missing for estimated values. This research presents a new algorithm for the missing data imputation method based on Self-Organized Maps Neural Networks and Mahalanobis distances and compares it not only with a well-known technique called Multivariate Imputation by Chained Equations (MICE) but also with an algorithm previously proposed by the authors called Adaptive Assignation Algorithm (AAA). The results obtained demonstrate how the proposed method outperforms both algorithms.

  9. Imputation of missing covariate values in epigenome-wide analysis of DNA methylation data

    PubMed Central

    Wu, Chong; Demerath, Ellen W.; Pankow, James S.; Bressler, Jan; Fornage, Myriam; Grove, Megan L.; Chen, Wei; Guan, Weihua

    2016-01-01

    ABSTRACT DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed. PMID:26890800

  10. Missing data imputation of solar radiation data under different atmospheric conditions.

    PubMed

    Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  11. A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers

    PubMed Central

    Turrado, Concepción Crespo; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés-José; Melero, Manuel G.; de Cos Juez, Francisco Javier

    2016-01-01

    The storage of data is a key process in the study of electrical power networks related to the search for harmonics and the finding of a lack of balance among phases. The presence of missing data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, current in each phase and power factor) affects any time series study in a negative way that has to be addressed. When this occurs, missing data imputation algorithms are required. These algorithms are able to substitute the data that are missing for estimated values. This research presents a new algorithm for the missing data imputation method based on Self-Organized Maps Neural Networks and Mahalanobis distances and compares it not only with a well-known technique called Multivariate Imputation by Chained Equations (MICE) but also with an algorithm previously proposed by the authors called Adaptive Assignation Algorithm (AAA). The results obtained demonstrate how the proposed method outperforms both algorithms. PMID:27626419

  12. Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model

    PubMed Central

    Wu, Chuanli; Gao, Yuexia; Hua, Tianqi; Xu, Chenwu

    2016-01-01

    Background It is challenging to deal with mixture models when missing values occur in clustering datasets. Methods and Results We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a “pseudo-complete” dataset. Parameters from different clusters and missing values are estimated according to the maximum likelihood implemented with an expectation-maximization algorithm, and multivariate individuals are clustered with Bayesian posterior probability. A simulation showed that our proposed method has a fast convergence speed and it accurately estimates missing values. Our proposed algorithm was further validated with Fisher’s Iris dataset, the Yeast Cell-cycle Gene-expression dataset, and the CIFAR-10 images dataset. The results indicate that our algorithm offers highly accurate clustering, comparable to that using a complete dataset without missing values. Furthermore, our algorithm resulted in a lower misjudgment rate than both clustering algorithms with missing data deleted and with missing-value imputation by mean replacement. Conclusion We demonstrate that our missing-value imputation clustering algorithm is feasible and superior to both of these other clustering algorithms in certain situations. PMID:27552203

  13. SNPs Selection using Gravitational Search Algorithm and Exhaustive Search for Association Mapping

    NASA Astrophysics Data System (ADS)

    Kusuma, W. A.; Hasibuan, L. S.; Istiadi, M. A.

    2016-01-01

    Single Nucleotide Polymorphisms (SNPs) are known having association to phenotipic variations. The study of linking SNPs to interest phenotype is refer to Association Mapping (AM), which is classified as a combinatorial problem. Exhaustive Search (ES) approach is able to be implemented to select targeted SNPs exactly since it evaluate all possible combinations of SNPs, but it is not efficient in terms of computer resources and computation time. Heuristic Search (HS) approach is an alternative to improve the performance of ES in those terms, but it still suffers high false positive SNPs in each combinations. Gravitational Search Algorithm (GSA) is a new HS algorithm that yields better performance than other nature inspired HS. This paper proposed a new method which combined GSA and ES to identify the most appropriate combination of SNPs linked to interest phenotype. Testing was conducted using dataset without epistasis and dataset with epistasis. Using dataset without epistasis with 7 targeted SNPs, the proposed method identified 7 SNPs - 6 True Positive (TP) SNPs and 1 False Positive (FP) SNP- with association value of 0.83. In addition, the proposed method could identified 3 SNPs- 2 TP SNP and 1 FP SNP with association value of 0.87 by using dataset with epistases and 5 targeted SNPs. The results showed that the method is robust in reducing redundant SNPs and identifying main markers.

  14. In-Silico Computing of the Most Deleterious nsSNPs in HBA1 Gene

    PubMed Central

    AbdulAzeez, Sayed; Borgio, J. Francis

    2016-01-01

    Background α-Thalassemia (α-thal) is a genetic disorder caused by the substitution of single amino acid or large deletions in the HBA1 and/or HBA2 genes. Method Using modern bioinformatics tools as a systematic in-silico approach to predict the deleterious SNPs in the HBA1 gene and its significant pathogenic impact on the functions and structure of HBA1 protein was predicted. Results and Discussion A total of 389 SNPs in HBA1 were retrieved from dbSNP database, which includes: 201 non-coding synonymous (nsSNPs), 43 human active SNPs, 16 intronic SNPs, 11 mRNA 3′ UTR SNPs, 9 coding synonymous SNPs, 9 5′ UTR SNPs and other types. Structural homology-based method (PolyPhen) and sequence homology-based tool (SIFT), SNPs&Go, PROVEAN and PANTHER revealed that 2.4% of the nsSNPs are pathogenic. Conclusions A total of 5 nsSNPs (G60V, K17M, K17T, L92F and W15R) were predicted to be responsible for the structural and functional modifications of HBA1 protein. It is evident from the deep comprehensive in-silico analysis that, two nsSNPs such as G60Vand W15R in HBA1 are highly deleterious. These “2 pathogenic nsSNPs” can be considered for wet-lab confirmatory analysis. PMID:26824843

  15. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions

    PubMed Central

    Han, Ying; Hazelett, Dennis J.; Wiklund, Fredrik; Schumacher, Fredrick R.; Stram, Daniel O.; Berndt, Sonja I.; Wang, Zhaoming; Rand, Kristin A.; Hoover, Robert N.; Machiela, Mitchell J.; Yeager, Merideth; Burdette, Laurie; Chung, Charles C.; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C.; Key, Timothy J.; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L.; Kolb, Suzanne; Gapstur, Susan M.; Diver, W. Ryan; Stevens, Victoria L.; Strom, Sara S.; Pettaway, Curtis A.; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A.; Yeboah, Edward D.; Tettey, Yao; Biritwum, Richard B.; Adjei, Andrew A.; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P.; Isaacs, William B.; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L.; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M.; Ingles, Sue A.; Kittles, Rick A.; Murphy, Adam B.; Blot, William J.; Signorello, Lisa B.; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M. Cristina; Wu, Suh-Yuh; Hennis, Anselm J. M.; Rybicki, Benjamin A.; Neslund-Dudas, Christine; Hsing, Ann W.; Chu, Lisa; Goodman, Phyllis J.; Klein, Eric A.; Zheng, S. Lilly; Witte, John S.; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L.; Hunter, David J.; Gronberg, Henrik; Cook, Michael B.; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J.; Easton, Douglas F.; Henderson, Brian E.; Coetzee, Gerhard A.; Conti, David V.; Haiman, Christopher A.

    2015-01-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 × 10−4–5.6 × 10−3) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 × 10−6) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation. PMID:26162851

  16. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions.

    PubMed

    Han, Ying; Hazelett, Dennis J; Wiklund, Fredrik; Schumacher, Fredrick R; Stram, Daniel O; Berndt, Sonja I; Wang, Zhaoming; Rand, Kristin A; Hoover, Robert N; Machiela, Mitchell J; Yeager, Merideth; Burdette, Laurie; Chung, Charles C; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C; Key, Timothy J; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L; Kolb, Suzanne; Gapstur, Susan M; Diver, W Ryan; Stevens, Victoria L; Strom, Sara S; Pettaway, Curtis A; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A; Yeboah, Edward D; Tettey, Yao; Biritwum, Richard B; Adjei, Andrew A; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P; Isaacs, William B; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M; Ingles, Sue A; Kittles, Rick A; Murphy, Adam B; Blot, William J; Signorello, Lisa B; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M Cristina; Wu, Suh-Yuh; Hennis, Anselm J M; Rybicki, Benjamin A; Neslund-Dudas, Christine; Hsing, Ann W; Chu, Lisa; Goodman, Phyllis J; Klein, Eric A; Zheng, S Lilly; Witte, John S; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L; Hunter, David J; Gronberg, Henrik; Cook, Michael B; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J; Easton, Douglas F; Henderson, Brian E; Coetzee, Gerhard A; Conti, David V; Haiman, Christopher A

    2015-10-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 × 10(-4)-5.6 × 10(-3)) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 × 10(-6)) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation.

  17. A Latent Model for Prioritization of SNPs for Functional Studies

    PubMed Central

    Fridley, Brooke L.; Iversen, Ed; Tsai, Ya-Yu; Jenkins, Gregory D.; Goode, Ellen L.; Sellers, Thomas A.

    2011-01-01

    One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating “features” about a SNP to estimate a latent “quality score”, with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP “features” for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking. PMID:21687685

  18. Molecular Beacon CNT-based Detection of SNPs

    NASA Astrophysics Data System (ADS)

    Egorova, V. P.; Krylova, H. V.; Lipnevich, I. V.; Veligura, A. A.; Shulitsky, B. G.; Y Fedotenkova, L.

    2015-11-01

    An fluorescence quenching effect due to few-walled carbon nanotubes chemically modified by carboxyl groups has been utilized to discriminate Single Nucleotide Polymorphism (SNP). It was shown that the complex obtained from these nanotube and singlestranded primer DNA is formed due to stacking interactions between the hexagons of the nanotubes and aromatic rings of nucleotide bases as well as due to establishing of hydrogen bonds between acceptor amine groups of nucleotide bases and donor carboxyl groups of the nanotubes. It has been demonstrated that these complexes may be used to make highly effective DNA biosensors detecting SNPs which operate as molecular beacons.

  19. SNPs Array Karyotyping in Non-Hodgkin Lymphoma

    PubMed Central

    Etebari, Maryam; Navari, Mohsen; Piccaluga, Pier Paolo

    2015-01-01

    The traditional methods for detection of chromosomal aberrations, which included cytogenetic or gene candidate solutions, suffered from low sensitivity or the need for previous knowledge of the target regions of the genome. With the advent of single nucleotide polymorphism (SNP) arrays, genome screening at global level in order to find chromosomal aberrations like copy number variants, DNA amplifications, deletions, and also loss of heterozygosity became feasible. In this review, we present an update of the knowledge, gained by SNPs arrays, of the genomic complexity of the most important subtypes of non-Hodgkin lymphomas. PMID:27600240

  20. SNPs Array Karyotyping in Non-Hodgkin Lymphoma

    PubMed Central

    Etebari, Maryam; Navari, Mohsen; Piccaluga, Pier Paolo

    2015-01-01

    The traditional methods for detection of chromosomal aberrations, which included cytogenetic or gene candidate solutions, suffered from low sensitivity or the need for previous knowledge of the target regions of the genome. With the advent of single nucleotide polymorphism (SNP) arrays, genome screening at global level in order to find chromosomal aberrations like copy number variants, DNA amplifications, deletions, and also loss of heterozygosity became feasible. In this review, we present an update of the knowledge, gained by SNPs arrays, of the genomic complexity of the most important subtypes of non-Hodgkin lymphomas.

  1. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti

    PubMed Central

    2014-01-01

    Background Genetic markers are widely used to understand the biology and population dynamics of disease vectors, but often markers are limited in the resolution they provide. In particular, the delineation of population structure, fine scale movement and patterns of relatedness are often obscured unless numerous markers are available. To address this issue in the major arbovirus vector, the yellow fever mosquito (Aedes aegypti), we used double digest Restriction-site Associated DNA (ddRAD) sequencing for the discovery of genome-wide single nucleotide polymorphisms (SNPs). We aimed to characterize the new SNP set and to test the resolution against previously described microsatellite markers in detecting broad and fine-scale genetic patterns in Ae. aegypti. Results We developed bioinformatics tools that support the customization of restriction enzyme-based protocols for SNP discovery. We showed that our approach for RAD library construction achieves unbiased genome representation that reflects true evolutionary processes. In Ae. aegypti samples from three continents we identified more than 18,000 putative SNPs. They were widely distributed across the three Ae. aegypti chromosomes, with 47.9% found in intergenic regions and 17.8% in exons of over 2,300 genes. Pattern of their imputed effects in ORFs and UTRs were consistent with those found in a recent transcriptome study. We demonstrated that individual mosquitoes from Indonesia, Australia, Vietnam and Brazil can be assigned with a very high degree of confidence to their region of origin using a large SNP panel. We also showed that familial relatedness of samples from a 0.4 km2 area could be confidently established with a subset of SNPs. Conclusions Using a cost-effective customized RAD sequencing approach supported by our bioinformatics tools, we characterized over 18,000 SNPs in field samples of the dengue fever mosquito Ae. aegypti. The variants were annotated and positioned onto the three Ae. aegypti chromosomes

  2. Filtering apparatus

    DOEpatents

    Haldipur, G.B.; Dilmore, W.J.

    1992-09-01

    A vertical vessel is described having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas. 18 figs.

  3. Filtering apparatus

    DOEpatents

    Haldipur, Gaurang B.; Dilmore, William J.

    1992-01-01

    A vertical vessel having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas.

  4. Transcriptome analysis of the gill of Takifugu rubripes using Illumina sequencing for discovery of SNPs.

    PubMed

    Cui, Jun; Wang, Hongdi; Liu, Shikai; Qiu, Xuemei; Jiang, Zhiqiang; Wang, Xiuli

    2014-06-01

    Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies in many species. High-throughput sequencing of RNA was developed primarily to analyze global gene expression, while it is an efficient way to discover SNPs from the expressed genes. In this study, we conducted transcriptome sequencing of the gill samples of Takifugu rubripes analyzed by using Illumina HiSeq 2000 platform to identify gene-associated SNPs from the transcriptome of T. rubripes gill. A total of 27,085,235 unique-mapped-reads from 55,061,524 raw data reads were generated. A total of 56,972 putative SNPs were discovered, which were located in 11,327 genes. 35,839 SNPs were transitions (Ts), 21,074 SNPs were transversions (Tv) and 88.1% of 56,972 SNPs were assigned to the 22 chromosomes. The average minor allele frequency (MAF) of the SNPs was 0.26. GO and KEGG pathway analyses were conducted to analyze the genes containing SNPs. Validation of selected SNPs revealed that 63.4% of SNPs (34/52) were true SNPs. RNA-Seq is a cost-effective way to discover gene-associated SNPs. In this study, a large number of SNPs were identified and these data will be useful resources for population genetic study, evolution analysis, resource assessment, genetic linkage analysis and genome-wide association studies. The results of our study can also offer some useful information as molecular makers to help select and cultivate T. rubripes.

  5. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

    PubMed

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J Brent; Durbin, Richard; Timpson, Nicholas J; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.

  6. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    PubMed

    Shara, Nawar; Yassin, Sayf A; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V; Wang, Wenyu; Lee, Elisa T; Umans, Jason G

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991), 2 (1993-1995), and 3 (1998-1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  7. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    PubMed

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  8. Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion

    PubMed Central

    Žitnik, Marinka; Zupan, Blaž

    2015-01-01

    Abstract Epistatic miniarray profile (E-MAP) is a popular large-scale genetic interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions with greater precision. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, in this way completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to missing data values. We introduce a new interaction data imputation method called network-guided matrix completion (NG-MC). The core part of NG-MC is low-rank probabilistic matrix completion that incorporates prior knowledge presented as a collection of gene networks. NG-MC assumes that interactions are transitive, such that latent gene interaction profiles inferred by NG-MC depend on the profiles of their direct neighbors in gene networks. As the NG-MC inference algorithm progresses, it propagates latent interaction profiles through each of the networks and updates gene network weights toward improved prediction. In a study with four different E-MAP data assays and considered protein–protein interaction and gene ontology similarity networks, NG-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allowed NG-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches. PMID:25658751

  9. The advantage of imputation of missing income data to evaluate the association between income and self-reported health status (SRH) in a Mexican American cohort study.

    PubMed

    Ryder, Anthony B; Wilkinson, Anna V; McHugh, Michelle K; Saunders, Katherine; Kachroo, Sumesh; D'Amelio, Anthony; Bondy, Melissa; Etzel, Carol J

    2011-12-01

    Missing data often occur in cross-sectional surveys and longitudinal and experimental studies. The purpose of this study was to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations, before and after imputation of the missing variable "yearly household income." We reviewed data from 4,162 participants of Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory powers of the SRH prediction models were good (range, 69-72%) and compared to the prediction model obtained after no imputation of missing yearly income, all other imputation methods improved the prediction of SRH (P < 0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore, given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each $5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with SRH.

  10. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification

  11. The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data

    PubMed Central

    Wood, Angela M; Royston, Patrick; White, Ian R

    2015-01-01

    Multiple imputation can be used as a tool in the process of constructing prediction models in medical and epidemiological studies with missing covariate values. Such models can be used to make predictions for model performance assessment, but the task is made more complicated by the multiple imputation structure. We summarize various predictions constructed from covariates, including multiply imputed covariates, and either the set of imputation-specific prediction model coefficients or the pooled prediction model coefficients. We further describe approaches for using the predictions to assess model performance. We distinguish between ideal model performance and pragmatic model performance, where the former refers to the model's performance in an ideal clinical setting where all individuals have fully observed predictors and the latter refers to the model's performance in a real-world clinical setting where some individuals have missing predictors. The approaches are compared through an extensive simulation study based on the UK700 trial. We determine that measures of ideal model performance can be estimated within imputed datasets and subsequently pooled to give an overall measure of model performance. Alternative methods to evaluate pragmatic model performance are required and we propose constructing predictions either from a second set of covariate imputations which make no use of observed outcomes, or from a set of partial prediction models constructed for each potential observed pattern of covariate. Pragmatic model performance is generally lower than ideal model performance. We focus on model performance within the derivation data, but describe how to extend all the methods to a validation dataset. PMID:25630926

  12. The operating regimes and basic control principles of SNPS Topaz''. [Cs

    SciTech Connect

    Makarov, A.N.; Volberg, M.S.; Grayznov, G.M.; Zhabotinsky, E.E.; Serbin, V.I. )

    1991-01-05

    The basic operating regimes of space nuclear power system (SNPS) Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

  13. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    ERIC Educational Resources Information Center

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  14. The sensitivity to key data imputations of recent estimates of income poverty and inequality in South Africa.

    PubMed

    Ardington, Cally; Lam, David; Leibbrandt, Murray; Welch, Matthew

    2006-01-01

    Existing literature using South African censuses reports an increase in both poverty and inequality over the 1996 to 2001 period. This paper assesses the robustness of these results to a number of weaknesses in the personal income variable. We use a sequential regression multiple imputation approach to impute missing values and to explicitly assess the influence of implausible income values and different rules used to convert income that is measured in bands into point incomes. Overall our results for 1996 and 2001 confirm the major findings from the existing literature while generating more reliable confidence intervals for the key parameters of interest than are available elsewhere.

  15. The sensitivity to key data imputations of recent estimates of income poverty and inequality in South Africa

    PubMed Central

    Ardington, Cally; Lam, David; Leibbrandt, Murray; Welch, Matthew

    2008-01-01

    Existing literature using South African censuses reports an increase in both poverty and inequality over the 1996 to 2001 period. This paper assesses the robustness of these results to a number of weaknesses in the personal income variable. We use a sequential regression multiple imputation approach to impute missing values and to explicitly assess the influence of implausible income values and different rules used to convert income that is measured in bands into point incomes. Overall our results for 1996 and 2001 confirm the major findings from the existing literature while generating more reliable confidence intervals for the key parameters of interest than are available elsewhere. PMID:18815626

  16. Multiple imputation to account for measurement error in marginal structural models

    PubMed Central

    Edwards, Jessie K.; Cole, Stephen R.; Westreich, Daniel; Crane, Heidi; Eron, Joseph J.; Mathews, W. Christopher; Moore, Richard; Boswell, Stephen L.; Lesko, Catherine R.; Mugavero, Michael J.

    2015-01-01

    Background Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and non-differential measurement error in a marginal structural model. Methods We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. Results In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality [hazard ratio (HR): 1.2 (95% CI: 0.6, 2.3)]. The HR for current smoking and therapy (0.4 (95% CI: 0.2, 0.7)) was similar to the HR for no smoking and therapy (0.4; 95% CI: 0.2, 0.6). Conclusions Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies. PMID:26214338

  17. Smoking imputation and lung cancer in railroad workers exposed to diesel exhaust

    PubMed Central

    Garshick, Eric; Laden, Francine; Hart, Jaime E; Smith, Thomas J; Rosner, Bernard

    2007-01-01

    Background An association between diesel exhaust exposure and lung cancer mortality in a large retrospective cohort study of US railroad workers has previously been reported. However, specific information regarding cigarette smoking was unavailable. Methods Birth cohort, age, job, and cause of death specific smoking histories from a companion case-control study were used to impute smoking behavior for 39,388 railroad workers who died 1959–1996. Mortality analyses incorporated the effect of smoking on lung cancer risk. Results The smoking adjusted relative risk of lung cancer in railroad workers exposed to diesel exhaust compared to unexposed workers was 1.22 (95% CI=1.12–1.32), and unadjusted for smoking the relative risk was 1.35 (95% CI=1.24–1.46). Conclusions These analyses illustrate the use of imputation in record-based occupational health studies to assess potential confounding due to smoking. In this cohort, small differences in smoking behavior between diesel exposed and unexposed workers did not explain the elevated lung cancer risk. PMID:16767725

  18. Genome-wide association analysis of imputed rare variants: application to seven common complex diseases.

    PubMed

    Mägi, Reedik; Asimit, Jennifer L; Day-Williams, Aaron G; Zeggini, Eleftheria; Morris, Andrew P

    2012-12-01

    Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.

  19. Filter apparatus

    DOEpatents

    Kuban, D.P.; Singletary, B.H.; Evans, J.H.

    A plurality of holding tubes are respectively mounted in apertures in a partition plate fixed in a housing receiving gas contaminated with particulate material. A filter cartridge is removably held in each holding tube, and the cartridges and holding tubes are arranged so that gas passes through apertures therein and across the the partition plate while particulate material is collected in the cartridges. Replacement filter cartridges are respectively held in holding canisters mounted on a support plate which can be secured to the aforesaid housing, and screws mounted on said canisters are arranged to push replacement cartridges into the cartridge holding tubes and thereby eject used cartridges therefrom.

  20. Water Filters

    NASA Technical Reports Server (NTRS)

    1988-01-01

    Seeking to find a more effective method of filtering potable water that was highly contaminated, Mike Pedersen, founder of Western Water International, learned that NASA had conducted extensive research in methods of purifying water on board manned spacecraft. The key is Aquaspace Compound, a proprietary WWI formula that scientifically blends various types of glandular activated charcoal with other active and inert ingredients. Aquaspace systems remove some substances; chlorine, by atomic adsorption, other types of organic chemicals by mechanical filtration and still others by catalytic reaction. Aquaspace filters are finding wide acceptance in industrial, commercial, residential and recreational applications in the U.S. and abroad.

  1. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...

  2. 2 CFR 180.630 - May a Federal agency impute the conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... one person to another? 180.630 Section 180.630 Grants and Agreements Office of Management and Budget... conduct of one person to another? For purposes of actions taken under this part, a Federal agency may... individual to another individual, if the individual to whom the improper conduct is imputed...

  3. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... conduct of one person to another? 1471.630 Section 1471.630 Labor Regulations Relating to Labor (Continued... Conciliation Service impute conduct of one person to another? For purposes of actions taken under this rule, we..., criminal, or other improper conduct of any organization to an individual, or from one individual to...

  4. 7 CFR 3017.630 - May the Department of Agriculture impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false May the Department of Agriculture impute conduct of one person to another? 3017.630 Section 3017.630 Agriculture Regulations of the Department of Agriculture (Continued) OFFICE OF THE CHIEF FINANCIAL OFFICER, DEPARTMENT OF AGRICULTURE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT)...

  5. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    PubMed Central

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  6. Imputation of Test Scores in the National Education Longitudinal Study of 1988 (NELS:88). Working Paper Series.

    ERIC Educational Resources Information Center

    Bokossa, Maxime C.; Huang, Gary G.

    This report describes the imputation procedures used to deal with missing data in the National Education Longitudinal Study of 1988 (NELS:88), the only current National Center for Education Statistics (NCES) dataset that contains scores from cognitive tests given the same set of students at multiple time points. As is inevitable, cognitive test…

  7. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1471.630 May the Federal Mediation...

  8. Notch filter

    NASA Technical Reports Server (NTRS)

    Shelton, G. B. (Inventor)

    1977-01-01

    A notch filter for the selective attenuation of a narrow band of frequencies out of a larger band was developed. A helical resonator is connected to an input circuit and an output circuit through discrete and equal capacitors, and a resistor is connected between the input and the output circuits.

  9. Thermal state of SNPS Topaz'' units: Calculation basing and experimental confirmation

    SciTech Connect

    Bogush, I.P.; Bushinsky, A.V.; Galkin, A.Y.; Serbin, V.I.; Zhabotinsky, E.E. )

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS Topaz''.

  10. Structural and energetic analyses of SNPs in drug targets and implications for drug therapy.

    PubMed

    Sun, Hui-Yong; Ji, Feng-Qin; Fu, Liang-Yu; Wang, Zhong-Yi; Zhang, Hong-Yu

    2013-12-23

    Mutations in drug targets can alter the therapeutic effects of drugs. Therefore, evaluating the effects of single-nucleotide polymorphisms (SNPs) on drug-target binding is of significant interest. This study focuses on the analysis of the structural and energy properties of SNPs in successful drug targets by using the data derived from HapMap and the Therapeutic Target Database. The results show the following: (i) Drug targets undergo strong purifying selection, and the majority (92.4%) of the SNPs are located far from the drug-binding sites (>12 Å). (ii) For SNPs near the drug-binding pocket (≤12 Å), nearly half of the drugs are weakly affected by the SNPs, and only a few drugs are significantly affected by the target mutations. These results have direct implications for population-based drug therapy and for chemical treatment of genetic diseases as well.

  11. Accounting for uncertainty due to 'last observation carried forward' outcome imputation in a meta-analysis model.

    PubMed

    Dimitrakopoulou, Vasiliki; Efthimiou, Orestis; Leucht, Stefan; Salanti, Georgia

    2015-02-28

    Missing outcome data are a problem commonly observed in randomized control trials that occurs as a result of participants leaving the study before its end. Missing such important information can bias the study estimates of the relative treatment effect and consequently affect the meta-analytic results. Therefore, methods on manipulating data sets with missing participants, with regard to incorporating the missing information in the analysis so as to avoid the loss of power and minimize the bias, are of interest. We propose a meta-analytic model that accounts for possible error in the effect sizes estimated in studies with last observation carried forward (LOCF) imputed patients. Assuming a dichotomous outcome, we decompose the probability of a successful unobserved outcome taking into account the sensitivity and specificity of the LOCF imputation process for the missing participants. We fit the proposed model within a Bayesian framework, exploring different prior formulations for sensitivity and specificity. We illustrate our methods by performing a meta-analysis of five studies comparing the efficacy of amisulpride versus conventional drugs (flupenthixol and haloperidol) on patients diagnosed with schizophrenia. Our meta-analytic models yield estimates similar to meta-analysis with LOCF-imputed patients. Allowing for uncertainty in the imputation process, precision is decreased depending on the priors used for sensitivity and specificity. Results on the significance of amisulpride versus conventional drugs differ between the standard LOCF approach and our model depending on prior beliefs on the imputation process. Our method can be regarded as a useful sensitivity analysis that can be used in the presence of concerns about the LOCF process.

  12. Assessing assay agreement estimation for multiple left-censored data: a multiple imputation approach.

    PubMed

    Lapidus, Nathanael; Chevret, Sylvie; Resche-Rigon, Matthieu

    2014-12-30

    Agreement between two assays is usually based on the concordance correlation coefficient (CCC), estimated from the means, standard deviations, and correlation coefficient of these assays. However, such data will often suffer from left-censoring because of lower limits of detection of these assays. To handle such data, we propose to extend a multiple imputation approach by chained equations (MICE) developed in a close setting of one left-censored assay. The performance of this two-step approach is compared with that of a previously published maximum likelihood estimation through a simulation study. Results show close estimates of the CCC by both methods, although the coverage is improved by our MICE proposal. An application to cytomegalovirus quantification data is provided.

  13. Impute DC link (IDCL) cell based power converters and control thereof

    DOEpatents

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  14. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets.

    PubMed

    Carrig, Madeline M; Manrique-Vallier, Daniel; Ranby, Krista W; Reiter, Jerome P; Hoyle, Rick H

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches.

  15. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation.

    PubMed

    Soler Artigas, María; Wain, Louise V; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R; Grallert, Harald; Hammond, Chris J; Harris, Sarah E; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W; Navarro, Pau; Nickle, David C; Padmanabhan, Sandosh; Raitakari, Olli T; Ried, Janina S; Ripatti, Samuli; Schulz, Holger; Scott, Robert A; Sin, Don D; Starr, John M; Viñuela, Ana; Völzke, Henry; Wild, Sarah H; Wright, Alan F; Zemunik, Tatijana; Jarvis, Deborah L; Spector, Tim D; Evans, David M; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J; Karrasch, Stefan; Probst-Hensch, Nicole M; Heinrich, Joachim; Stubbe, Beate; Wilson, James F; Wareham, Nicholas J; James, Alan L; Morris, Andrew P; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P; Hall, Ian P; Tobin, Martin D

    2015-12-04

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered.

  16. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    PubMed Central

    Artigas, María Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria; Viñuela, Ana; Völzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10−8) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  17. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation.

    PubMed

    Soler Artigas, María; Wain, Louise V; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R; Grallert, Harald; Hammond, Chris J; Harris, Sarah E; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W; Navarro, Pau; Nickle, David C; Padmanabhan, Sandosh; Raitakari, Olli T; Ried, Janina S; Ripatti, Samuli; Schulz, Holger; Scott, Robert A; Sin, Don D; Starr, John M; Viñuela, Ana; Völzke, Henry; Wild, Sarah H; Wright, Alan F; Zemunik, Tatijana; Jarvis, Deborah L; Spector, Tim D; Evans, David M; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J; Karrasch, Stefan; Probst-Hensch, Nicole M; Heinrich, Joachim; Stubbe, Beate; Wilson, James F; Wareham, Nicholas J; James, Alan L; Morris, Andrew P; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P; Hall, Ian P; Tobin, Martin D

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  18. Genotyping of 75 SNPs using arrays for individual identification in five population groups.

    PubMed

    Hwa, Hsiao-Lin; Wu, Lawrence Shih Hsin; Lin, Chun-Yen; Huang, Tsun-Ying; Yin, Hsiang-I; Tseng, Li-Hui; Lee, James Chun-I

    2016-01-01

    Single nucleotide polymorphism (SNP) typing offers promise to forensic genetics. Various strategies and panels for analyzing SNP markers for individual identification have been published. However, the best panels with fewer identity SNPs for all major population groups are still under discussion. This study aimed to find more autosomal SNPs with high heterozygosity for individual identification among Asian populations. Ninety-six autosomal SNPs of 502 DNA samples from unrelated individuals of five population groups (208 Taiwanese Han, 83 Filipinos, 62 Thais, 69 Indonesians, and 80 individuals with European, Near Eastern, or South Asian ancestry) were analyzed using arrays in an initial screening, and 75 SNPs (group A, 46 newly selected SNPs; groups B, 29 SNPs based on a previous SNP panel) were selected for further statistical analyses. Some SNPs with high heterozygosity from Asian populations were identified. The combined random match probability of the best 40 and 45 SNPs was between 3.16 × 10(-17) and 7.75 × 10(-17) and between 2.33 × 10(-19) and 7.00 × 10(-19), respectively, in all five populations. These loci offer comparable power to short tandem repeats (STRs) for routine forensic profiling. In this study, we demonstrated the population genetic characteristics and forensic parameters of 75 SNPs with high heterozygosity from five population groups. This SNPs panel can provide valuable genotypic information and can be helpful in forensic casework for individual identification among these populations.

  19. Handling Missing Data in Matched Case-Control Studies Using Multiple Imputation

    PubMed Central

    Seaman, Shaun R.; Keogh, Ruth H.

    2016-01-01

    SUMMARY Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin’s Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data. PMID:26237003

  20. Confidence intervals after multiple imputation: combining profile likelihood information from logistic regressions.

    PubMed

    Heinze, Georg; Ploner, Meinhard; Beyea, Jan

    2013-12-20

    In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter β  at level 1 - α to be identified as those β* and β** that satisfy CDF c (β*) = α ∕ 2 and CDF c (β**) = 1 - α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. PMID:23873477

  1. Plasmonic filters.

    SciTech Connect

    Passmore, Brandon Scott; Shaner, Eric Arthur; Barrick, Todd A.

    2009-09-01

    Metal films perforated with subwavelength hole arrays have been show to demonstrate an effect known as Extraordinary Transmission (EOT). In EOT devices, optical transmission passbands arise that can have up to 90% transmission and a bandwidth that is only a few percent of the designed center wavelength. By placing a tunable dielectric in proximity to the EOT mesh, one can tune the center frequency of the passband. We have demonstrated over 1 micron of passive tuning in structures designed for an 11 micron center wavelength. If a suitable midwave (3-5 micron) tunable dielectric (perhaps BaTiO{sub 3}) were integrated with an EOT mesh designed for midwave operation, it is possible that a fast, voltage tunable, low temperature filter solution could be demonstrated with a several hundred nanometer passband. Such an element could, for example, replace certain components in a filter wheel solution.

  2. Water Filter

    NASA Technical Reports Server (NTRS)

    1982-01-01

    A compact, lightweight electrolytic water sterilizer available through Ambassador Marketing, generates silver ions in concentrations of 50 to 100 parts per billion in water flow system. The silver ions serve as an effective bactericide/deodorizer. Tap water passes through filtering element of silver that has been chemically plated onto activated carbon. The silver inhibits bacterial growth and the activated carbon removes objectionable tastes and odors caused by addition of chlorine and other chemicals in municipal water supply. The three models available are a kitchen unit, a "Tourister" unit for portable use while traveling and a refrigerator unit that attaches to the ice cube water line. A filter will treat 5,000 to 10,000 gallons of water.

  3. Eyeglass Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Biomedical Optical Company of America's suntiger lenses eliminate more than 99% of harmful light wavelengths. NASA derived lenses make scenes more vivid in color and also increase the wearer's visual acuity. Distant objects, even on hazy days, appear crisp and clear; mountains seem closer, glare is greatly reduced, clouds stand out. Daytime use protects the retina from bleaching in bright light, thus improving night vision. Filtering helps prevent a variety of eye disorders, in particular cataracts and age related macular degeneration.

  4. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    SciTech Connect

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.; Loots, Gabriela G.; Houston, Kathryn A.; Dubchak, Inna; Speed, Terence P.; Rubin, Edward M.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs in gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.

  5. Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

    PubMed Central

    Hu, Yang; Wu, Xiaoliang; Ma, Rui

    2016-01-01

    Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer. PMID:27429976

  6. Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data.

    PubMed

    Zhang, Tianjiao; Hu, Yang; Wu, Xiaoliang; Ma, Rui; Jiang, Qinghua; Wang, Yadong

    2016-01-01

    Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer. PMID:27429976

  7. Ceramic filters

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1995-12-31

    Filters were formed from ceramic fibers, organic fibers, and a ceramic bond phase using a papermaking technique. The distribution of particulate ceramic bond phase was determined using a model silicon carbide system. As the ceramic fiber increased in length and diameter the distance between particles decreased. The calculated number of particles per area showed good agreement with the observed value. After firing, the papers were characterized using a biaxial load test. The strength of papers was proportional to the amount of bond phase included in the paper. All samples exhibited strain-tolerant behavior.

  8. Rocket noise filtering system using digital filters

    NASA Technical Reports Server (NTRS)

    Mauritzen, David

    1990-01-01

    A set of digital filters is designed to filter rocket noise to various bandwidths. The filters are designed to have constant group delay and are implemented in software on a general purpose computer. The Parks-McClellan algorithm is used. Preliminary tests are performed to verify the design and implementation. An analog filter which was previously employed is also simulated.

  9. Comparing the efficacy of SNP filtering methods for identifying a single causal SNP in a known association region.

    PubMed

    Spencer, Amy Victoria; Cox, Angela; Walters, Kevin

    2014-01-01

    Genome-wide association studies have successfully identified associations between common diseases and a large number of single nucleotide polymorphisms (SNPs) across the genome. We investigate the effectiveness of several statistics, including p-values, likelihoods, genetic map distance and linkage disequilibrium between SNPs, in filtering SNPs in several disease-associated regions. We use simulated data to compare the efficacy of filters with different sample sizes and for causal SNPs with different minor allele frequencies (MAFs) and effect sizes, focusing on the small effect sizes and MAFs likely to represent the majority of unidentified causal SNPs. In our analyses, of all the methods investigated, filtering on the ranked likelihoods consistently retains the true causal SNP with the highest probability for a given false positive rate. This was the case for all the local linkage disequilibrium patterns investigated. Our results indicate that when using this method to retain only the top 5% of SNPs, even a causal SNP with an odds ratio of 1.1 and MAF of 0.08 can be retained with a probability exceeding 0.9 using an overall sample size of 50,000.

  10. Gap-filling methods to impute eddy covariance flux data by preserving variance.

    NASA Astrophysics Data System (ADS)

    Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.

    2015-12-01

    To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables

  11. On the performance of multiple imputation based on chained equations in tackling missing data of the African α3.7 -globin deletion in a malaria association study.

    PubMed

    Sepúlveda, Nuno; Manjurano, Alphaxard; Drakeley, Chris; Clark, Taane G

    2014-07-01

    Multiple imputation based on chained equations (MICE) is an alternative missing genotype method that can use genetic and nongenetic auxiliary data to inform the imputation process. Previously, MICE was successfully tested on strongly linked genetic data. We have now tested it on data of the HBA2 gene which, by the experimental design used in a malaria association study in Tanzania, shows a high missing data percentage and is weakly linked with the remaining genetic markers in the data set. We constructed different imputation models and studied their performance under different missing data conditions. Overall, MICE failed to accurately predict the true genotypes. However, using the best imputation model for the data, we obtained unbiased estimates for the genetic effects, and association signals of the HBA2 gene on malaria positivity. When the whole data set was analyzed with the same imputation model, the association signal increased from 0.80 to 2.70 before and after imputation, respectively. Conversely, postimputation estimates for the genetic effects remained the same in relation to the complete case analysis but showed increased precision. We argue that these postimputation estimates are reasonably unbiased, as a result of a good study design based on matching key socio-environmental factors.

  12. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; Brown, Joseph N.; Wang, Jing; McDermott, Jason E.; Smith, Richard D.; Rodland, Karin D.; Metz, Thomas O.; Pounds, Joel G.; Waters, Katrina M.

    2015-04-09

    In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.

  13. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

    DOE PAGESBeta

    Webb-Robertson, Bobbie-Jo M.; Wiberg, Holli K.; Matzke, Melissa M.; Brown, Joseph N.; Wang, Jing; McDermott, Jason E.; Smith, Richard D.; Rodland, Karin D.; Metz, Thomas O.; Pounds, Joel G.; et al

    2015-04-09

    In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yieldedmore » the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. In summary, on the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.« less

  14. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    PubMed

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial.

  15. Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files.

    PubMed

    Parker, Jennifer D; Schenker, Nathaniel

    2007-09-01

    Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight-gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.

  16. A Multiethnic Replication Study of Plasma Lipoprotein Levels-Associated SNPs Identified in Recent GWAS

    PubMed Central

    Bryant, Emily K.; Dressen, Amy S.; Bunker, Clareann H.; Hokanson, John E.; Hamman, Richard F.; Kamboh, M. Ilyas; Demirci, F. Yesim

    2013-01-01

    Genome-wide association studies (GWAS) have identified a number of loci/SNPs associated with plasma total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride (TG) levels. The purpose of this study was to replicate 40 recent GWAS-identified HDL-C-related new loci in 3 epidemiological samples comprising U.S. non-Hispanic Whites (NHWs), U.S. Hispanics, and African Blacks. In each sample, the association analyses were performed with all 4 major lipid traits regardless of previously reported specific associations with selected SNPs. A total of 22 SNPs showed nominally significant association (p<0.05) with at least one lipid trait in at least one ethnic group, although not always with the same lipid traits reported as genome-wide significant in the original GWAS. The total number of significant loci was 10 for TC, 12 for LDL-C, 10 for HDL-C, and 6 for TG levels. Ten SNPs were significantly associated with more than one lipid trait in at least one ethnic group. Six SNPs were significantly associated with at least one lipid trait in more than one ethnic group, although not always with the same trait across various ethnic groups. For 25 SNPs, the associations were replicated with the same genome-wide significant lipid traits in the same direction in at least one ethnic group; at nominal significance for 13 SNPs and with a trend for association for 12 SNPs. However, the associations were not consistently present in all ethnic groups. This observation was consistent with mixed results obtained in other studies that also examined various ethnic groups. PMID:23717430

  17. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

    PubMed

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

    2013-04-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.

  18. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

    PubMed

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

    2013-04-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  19. Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data

    PubMed Central

    Welch, Catherine; Bartlett, Jonathan; Petersen, Irene

    2014-01-01

    Electronic health records of longitudinal clinical data are a valuable resource for health care research. One obstacle of using databases of health records in epidemiological analyses is that general practitioners mainly record data if they are clinically relevant. We can use existing methods to handle missing data, such as multiple imputation (mi), if we treat the unavailability of measurements as a missing-data problem. Most software implementations of MI do not take account of the longitudinal and dynamic structure of the data and are difficult to implement in large databases with millions of individuals and long follow-up. Nevalainen, Kenward, and Virtanen (2009, Statistics in Medicine 28: 3657–3669) proposed the two-fold fully conditional specification algorithm to impute missing data in longitudinal data. It imputes missing values at a given time point, conditional on information at the same time point and immediately adjacent time points. In this article, we describe a new command, twofold, that implements the two-fold fully conditional specification algorithm. It is extended to accommodate MI of longitudinal clinical records in large databases. PMID:25420071

  20. Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs

    PubMed Central

    Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

    2014-01-01

    Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100

  1. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon

    PubMed Central

    Tsai, Hsin Y.; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Bishop, Steve C.; Houston, Ross D.

    2015-01-01

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth. PMID:26703584

  2. Genome-wide association study SNPs in the human genome diversity project populations: does selection affect unlinked SNPs with shared trait associations?

    PubMed

    Casto, Amanda M; Feldman, Marcus W

    2011-01-06

    Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS.

  3. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

    PubMed

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

  4. Recovering incomplete data using Statistical Multiple Imputations (SMI): a case study in environmental chemistry.

    PubMed

    Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D

    2011-10-15

    This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset.

  5. Imputation-Based Meta-Analysis of Severe Malaria in Three African Populations

    PubMed Central

    Band, Gavin; Le, Quang Si; Jostins, Luke; Pirinen, Matti; Kivinen, Katja; Jallow, Muminatou; Sisay-Joof, Fatoumatta; Bojang, Kalifa; Pinder, Margaret; Sirugo, Giorgio; Conway, David J.; Nyirongo, Vysaul; Kachala, David; Molyneux, Malcolm; Taylor, Terrie; Ndila, Carolyne; Peshu, Norbert; Marsh, Kevin; Williams, Thomas N.; Alcock, Daniel; Andrews, Robert; Edkins, Sarah; Gray, Emma; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Schuldt, Kathrin; Clark, Taane G.; Small, Kerrin S.; Teo, Yik Ying; Kwiatkowski, Dominic P.; Rockett, Kirk A.; Barrett, Jeffrey C.; Spencer, Chris C. A.

    2013-01-01

    Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP–based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles. PMID:23717212

  6. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

    PubMed Central

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules. PMID:27199552

  7. Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling

    PubMed Central

    Hieke, Stefanie; Benner, Axel; Schlenk, Richard F.; Schumacher, Martin; Bullinger, Lars; Binder, Harald

    2016-01-01

    Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses. PMID:27159447

  8. Heritability of submaximal exercise heart rate response to exercise training is accounted for by nine SNPs.

    PubMed

    Rankinen, Tuomo; Sung, Yun Ju; Sarzynski, Mark A; Rice, Treva K; Rao, D C; Bouchard, Claude

    2012-03-01

    Endurance training-induced changes in hemodynamic traits are heritable. However, few genes associated with heart rate training responses have been identified. The purpose of our study was to perform a genome-wide association study to uncover DNA sequence variants associated with submaximal exercise heart rate training responses in the HERITAGE Family Study. Heart rate was measured during steady-state exercise at 50 W (HR50) on 2 separate days before and after a 20-wk endurance training program in 483 white subjects from 99 families. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. After quality control procedures, 320,000 single-nucleotide polymorphisms (SNPs) were available for the genome-wide association study analyses, which were performed using the MERLIN software package (single-SNP analyses and conditional heritability tests) and standard regression models (multivariate analyses). The strongest associations for HR50 training response adjusted for age, sex, body mass index, and baseline HR50 were detected with SNPs at the YWHAQ locus on chromosome 2p25 (P = 8.1 × 10(-7)), the RBPMS locus on chromosome 8p12 (P = 3.8 × 10(-6)), and the CREB1 locus on chromosome 2q34 (P = 1.6 × 10(-5)). In addition, 37 other SNPs showed P values <9.9 × 10(-5). After removal of redundant SNPs, the 10 most significant SNPs explained 35.9% of the ΔHR50 variance in a multivariate regression model. Conditional heritability tests showed that nine of these SNPs (all intragenic) accounted for 100% of the ΔHR50 heritability. Our results indicate that SNPs in nine genes related to cardiomyocyte and neuronal functions, as well as cardiac memory formation, fully account for the heritability of the submaximal heart rate training response.

  9. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

    PubMed

    Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D

    2015-03-12

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data.

  10. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass

    PubMed Central

    Ramstein, Guillaume P.; Lipka, Alexander E.; Lu, Fei; Costich, Denise E.; Cherney, Jerome H.; Buckler, Edward S.; Casler, Michael D.

    2015-01-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  11. [Analysis of population stratification using random SNPs in genome-wide association studies].

    PubMed

    Cao, Zong-Fu; Ma, Chuan-Xiang; Wang, Lei; Cai, Bin

    2010-09-01

    Since population genetic STRUCTURE can increase false-positive rate in genome-wide association studies (GWAS) for complex diseases, the effect of population stratification should be taken into account in GWAS. However, the effect of randomly selected SNPs in population stratification analysis is underdetermined. In this study, based on the genotype data generated on Genome-Wide Human SNP Array 6.0 from unrelated individuals of HapMap Phase2, we randomly selected SNPs that were evenly distributed across the whole-genome, and acquired Ancestry Informative Markers (AIMs) by the method of f value and allelic Fisher exact test. F-statistics and STRUCTURE analysis based on the select different sets of SNPs were used to evaluate the effect of distinguishing the populations from HapMap Phase3. We found that randomly selected SNPs that were evenly distributed across the whole-genome were able to be used to identify the population structure. This study further indicated that more than 3 000 randomly selected SNPs that were evenly distributed across the whole-genome were substituted for AIMs in population stratification analysis, when there were no available AIMs for spe-cific populations.

  12. Portability of tag SNPs across isolated population groups: an example from India.

    PubMed

    Sarkar Roy, N; Farheen, S; Roy, N; Sengupta, S; Majumder, P P

    2008-01-01

    Isolated population groups are useful in conducting association studies of complex diseases to avoid various pitfalls, including those arising from population stratification. Since DNA resequencing is expensive, it is recommended that genotyping be carried out at tagSNP (tSNP) loci. For this, tSNPs identified in one isolated population need to be used in another. Unless tSNPs are highly portable across populations this strategy may result in loss of information in association studies. We examined the issue of tSNP portability by sampling individuals from 10 isolated ethnic groups from India. We generated DNA resequencing data pertaining to 3 genomic regions and identified tSNPs in each population. We defined an index of tSNP portability and showed that portability is low across isolated Indian ethnic groups. The extent of portability did not significantly correlate with genetic similarity among the populations studied here. We also analyzed our data with sequence data from individuals of African and European descent. Our results indicated that it may be necessary to carry out resequencing in a small number of individuals to discover SNPs and identify tSNPs in the specific isolated population in which a disease association study is to be conducted.

  13. A small number of candidate gene SNPs reveal continental ancestry in African Americans.

    PubMed

    Kodaman, Nuri; Aldrich, Melinda C; Smith, Jeffrey R; Signorello, Lisa B; Bradley, Kevin; Breyer, Joan; Cohen, Sarah S; Long, Jirong; Cai, Qiuyin; Giles, Justin; Bush, William S; Blot, William J; Matthews, Charles E; Williams, Scott M

    2013-01-01

    Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K = 2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99 ± 0.003; 0.98 ± 0.01; 0.93 ± 0.03; and 0.81 ± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably.

  14. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    SciTech Connect

    Yang, Jing; Li, Yuan-Yuan; Li, Yi-Xue; Ye, Zhi-Qiang

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

  15. Association analysis identifies Melampsora ×columbiana poplar leaf rust resistance SNPs.

    PubMed

    La Mantia, Jonathan; Klápště, Jaroslav; El-Kassaby, Yousry A; Azam, Shofiul; Guy, Robert D; Douglas, Carl J; Mansfield, Shawn D; Hamelin, Richard

    2013-01-01

    Populus species are currently being domesticated through intensive time- and resource-dependent programs for utilization in phytoremediation, wood and paper products, and conversion to biofuels. Poplar leaf rust disease can greatly reduce wood volume. Genetic resistance is effective in reducing economic losses but major resistance loci have been race-specific and can be readily defeated by the pathogen. Developing durable disease resistance requires the identification of non-race-specific loci. In the presented study, area under the disease progress curve was calculated from natural infection of Melampsora ×columbiana in three consecutive years. Association analysis was performed using 412 P. trichocarpa clones genotyped with 29,355 SNPs covering 3,543 genes. We found 40 SNPs within 26 unique genes significantly associated (permutated P<0.05) with poplar rust severity. Moreover, two SNPs were repeated in all three years suggesting non-race-specificity and three additional SNPs were differentially expressed in other poplar rust interactions. These five SNPs were found in genes that have orthologs in Arabidopsis with functionality in pathogen induced transcriptome reprogramming, Ca²⁺/calmodulin and salicylic acid signaling, and tolerance to reactive oxygen species. The additive effect of non-R gene functional variants may constitute high levels of durable poplar leaf rust resistance. Therefore, these findings are of significance for speeding the genetic improvement of this long-lived, economically important organism.

  16. Association Analysis Identifies Melampsora ×columbiana Poplar Leaf Rust Resistance SNPs

    PubMed Central

    La Mantia, Jonathan; Klápště, Jaroslav; El-Kassaby, Yousry A.; Azam, Shofiul; Guy, Robert D.; Douglas, Carl J.; Mansfield, Shawn D.; Hamelin, Richard

    2013-01-01

    Populus species are currently being domesticated through intensive time- and resource-dependent programs for utilization in phytoremediation, wood and paper products, and conversion to biofuels. Poplar leaf rust disease can greatly reduce wood volume. Genetic resistance is effective in reducing economic losses but major resistance loci have been race-specific and can be readily defeated by the pathogen. Developing durable disease resistance requires the identification of non-race-specific loci. In the presented study, area under the disease progress curve was calculated from natural infection of Melampsora ×columbiana in three consecutive years. Association analysis was performed using 412 P. trichocarpa clones genotyped with 29,355 SNPs covering 3,543 genes. We found 40 SNPs within 26 unique genes significantly associated (permutated P<0.05) with poplar rust severity. Moreover, two SNPs were repeated in all three years suggesting non-race-specificity and three additional SNPs were differentially expressed in other poplar rust interactions. These five SNPs were found in genes that have orthologs in Arabidopsis with functionality in pathogen induced transcriptome reprogramming, Ca2+/calmodulin and salicylic acid signaling, and tolerance to reactive oxygen species. The additive effect of non-R gene functional variants may constitute high levels of durable poplar leaf rust resistance. Therefore, these findings are of significance for speeding the genetic improvement of this long-lived, economically important organism. PMID:24236018

  17. Association of MHC region SNPs with irritant susceptibility in healthcare workers.

    PubMed

    Yucesoy, Berran; Talzhanov, Yerkebulan; Michael Barmada, M; Johnson, Victor J; Kashon, Michael L; Baron, Elma; Wilson, Nevin W; Frye, Bonnie; Wang, Wei; Fluharty, Kara; Gharib, Rola; Meade, Jean; Germolec, Dori; Luster, Michael I; Nedorost, Susan

    2016-09-01

    Irritant contact dermatitis is the most common work-related skin disease, especially affecting workers in "wet-work" occupations. This study was conducted to investigate the association between single nucleotide polymorphisms (SNPs) within the major histocompatibility complex (MHC) and skin irritant response in a group of healthcare workers. 585 volunteer healthcare workers were genotyped for MHC SNPs and patch tested with three different irritants: sodium lauryl sulfate (SLS), sodium hydroxide (NaOH) and benzalkonium chloride (BKC). Genotyping was performed using Illumina Goldengate MHC panels. A number of SNPs within the MHC Class I (OR2B3, TRIM31, TRIM10, TRIM40 and IER3), Class II (HLA-DPA1, HLA-DPB1) and Class III (C2) genes were associated (p < 0.001) with skin response to tested irritants in different genetic models. Linkage disequilibrium patterns and functional annotations identified two SNPs in the TRIM40 (rs1573298) and HLA-DPB1 (rs9277554) genes, with a potential impact on gene regulation. In addition, SNPs in PSMB9 (rs10046277 and ITPR3 (rs499384) were associated with hand dermatitis. The results are of interest as they demonstrate that genetic variations in inflammation-related genes within the MHC can influence chemical-induced skin irritation and may explain the connection between inflamed skin and propensity to subsequent allergic contact sensitization. PMID:27258892

  18. Miniaturized dielectric waveguide filters

    NASA Astrophysics Data System (ADS)

    Sandhu, Muhammad Y.; Hunter, Ian C.

    2016-10-01

    Design techniques for a new class of integrated monolithic high-permittivity ceramic waveguide filters are presented. These filters enable a size reduction of 50% compared to air-filled transverse electromagnetic filters with the same unloaded Q-factor. Designs for Chebyshev and asymmetric generalised Chebyshev filter and a diplexer are presented with experimental results for an 1800 MHz Chebyshev filter and a 1700 MHz generalised Chebyshev filter showing excellent agreement with theory.

  19. Multiple imputation in veterinary epidemiological studies: a case study and simulation.

    PubMed

    Dohoo, Ian R; Nielsen, Christel R; Emanuelson, Ulf

    2016-07-01

    The problem of missing data occurs frequently in veterinary epidemiological studies. Most studies use a complete case (CC) analysis which excludes all observations for which any relevant variable have missing values. Alternative approaches (most notably multiple imputation (MI)) which avoid the exclusion of observations with missing values are now widely available but have been used very little in veterinary epidemiology. This paper uses a case study based on research into dairy producers' attitudes toward mastitis control procedures, combined with two simulation studies to evaluate the use of MI and compare results with a CC analysis. MI analysis of the original data produced results which had relatively minor differences from the CC analysis. However, most of the missing data in the original data set were in the dependent variable and a subsequent simulation study based on the observed missing data pattern and 1000 simulations showed that an MI analysis would not be expected to offer any advantages over a CC analysis in this situation. This was true regardless of the missing data mechanism (MCAR - missing completely at random, MAR - missing at random, or NMAR - not missing at random) underlying the missing values. Surprisingly, recent textbooks dealing with MI make little reference to this limitation of MI for dealing with missing values in the dependent variable. An additional simulation study (1000 runs for each of the three missing data mechanisms) compared MI and CC analyses for data in which varying levels (n=7) of missing data were created in predictor variables. This study showed that MI analyses generally produced results that were less biased on average, were more precise (smaller SEs), were more consistent (less variability between simulation runs) and consequently were more likely to produce estimates that were close to the "truth" (results obtained from a data set with no missing values). While the benefit of MI varied with the mechanism used to

  20. Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation.

    PubMed

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R; Lin, Xihong

    2015-06-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  1. All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

    PubMed Central

    Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  2. Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs

    NASA Astrophysics Data System (ADS)

    Watson, Corey T.; Disanto, Giulio; Breden, Felix; Giovannoni, Gavin; Ramagopalan, Sreeram V.

    2012-10-01

    Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.

  3. Typing of 49 autosomal SNPs by single base extension and capillary electrophoresis for forensic genetic testing.

    PubMed

    Børsting, Claus; Tomas, Carmen; Morling, Niels

    2012-01-01

    We describe a method for simultaneous amplification of 49 autosomal single nucleotide polymorphisms (SNPs) by multiplex PCR and detection of the SNP alleles by single base extension (SBE) and capillary electrophoresis. All the SNPs may be amplified from only 100 pg of genomic DNA and the length of the amplicons range from 65 to 115 bp. The high sensitivity and the short amplicon sizes make the assay very suitable for typing of degraded DNA samples, and the low mutation rate of SNPs makes the assay very useful for relationship testing. Combined, these advantages make the assay well suited for disaster victim identifications, where the DNA from the victims may be highly degraded and the victims are identified via investigation of their relatives. The assay was validated according to the ISO 17025 standard and used for routine case work in our laboratory. PMID:22139655

  4. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references.

    PubMed

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-12-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case-control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5-98.8% obtained with the original HIBAG references to 95.2-99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets.

  5. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris)

    PubMed Central

    Villordo-Pineda, Emiliano; González-Chavira, Mario M.; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A.; Caballero-Pérez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  6. Silver sulfide nanoparticles (Ag2S-NPs) are taken up by plants and are phytotoxic.

    PubMed

    Wang, Peng; Menzies, Neal W; Lombi, Enzo; Sekine, Ryo; Blamey, F Pax C; Hernandez-Soriano, Maria C; Cheng, Miaomiao; Kappen, Peter; Peijnenburg, Willie J G M; Tang, Caixian; Kopittke, Peter M

    2015-01-01

    Silver nanoparticles (NPs) are used in more consumer products than any other nanomaterial and their release into the environment is unavoidable. Of primary concern is the wastewater stream in which most silver NPs are transformed to silver sulfide NPs (Ag2S-NPs) before being applied to agricultural soils within biosolids. While Ag2S-NPs are assumed to be biologically inert, nothing is known of their effects on terrestrial plants. The phytotoxicity of Ag and its accumulation was examined in short-term (24 h) and longer-term (2-week) solution culture experiments with cowpea (Vigna unguiculata L. Walp.) and wheat (Triticum aestivum L.) exposed to Ag2S-NPs (0-20 mg Ag L(-1)), metallic Ag-NPs (0-1.6 mg Ag L(-1)), or ionic Ag (AgNO3; 0-0.086 mg Ag L(-1)). Although not inducing any effects during 24-h exposure, Ag2S-NPs reduced growth by up to 52% over a 2-week period. This toxicity did not result from their dissolution and release of toxic Ag(+) in the rooting medium, with soluble Ag concentrations remaining below 0.001 mg Ag L(-1). Rather, Ag accumulated as Ag2S in the root and shoot tissues when plants were exposed to Ag2S-NPs, consistent with their direct uptake. Importantly, this differed from the form of Ag present in tissues of plants exposed to AgNO3. For the first time, our findings have shown that Ag2S-NPs exert toxic effects through their direct accumulation in terrestrial plant tissues. These findings need to be considered to ensure high yield of food crops, and to avoid increasing Ag in the food chain. PMID:25686712

  7. Application of Population Sequencing (POPSEQ) for Ordering and Imputing Genotyping-by-Sequencing Markers in Hexaploid Wheat.

    PubMed

    Edae, Erena A; Bowden, Robert L; Poland, Jesse

    2015-11-03

    The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based, high-resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for association mapping and genomics-assisted breeding is limited by a large proportion of missing data per marker. For species with a reference genomic sequence, markers can be ordered on the physical map. However, in the absence of reference marker order, the use and imputation of GBS markers is challenging. Here, we demonstrate how the population sequencing (POPSEQ) approach can be used to provide marker context for GBS in wheat. The utility of a POPSEQ-based genetic map as a reference map to create genetically ordered markers on a chromosome for hexaploid wheat was validated by constructing an independent de novo linkage map of GBS markers from a Synthetic W7984 × Opata M85 recombinant inbred line (SynOpRIL) population. The results indicated that there is strong agreement between the independent de novo linkage map and the POPSEQ mapping approach in mapping and ordering GBS markers for hexaploid wheat. After ordering, a large number of GBS markers were imputed, thus providing a high-quality reference map that can be used for QTL mapping for different traits. The POPSEQ-based reference map and whole-genome sequence assemblies are valuable resources that can be used to order GBS markers and enable the application of highly accurate imputation methods to leverage the application GBS markers in wheat.

  8. Association between SNPs within candidate genes and compounds related to boar taint and reproduction

    PubMed Central

    Moe, Maren; Lien, Sigbjørn; Aasmundstad, Torunn; Meuwissen, Theo HE; Hansen, Marianne HS; Bendixen, Christian; Grindflek, Eli

    2009-01-01

    Background Boar taint is an unpleasant odour and flavour of the meat from some uncastrated male pigs primarily caused by elevated levels of androstenone and skatole in adipose tissue. Androstenone is produced in the same biochemical pathway as testosterone and estrogens, which represents a particular challenge when selecting against high levels of androstenone in the breeding programme, without simultaneously decreasing levels of other steroids. Detection of single nucleotide polymorphisms (SNPs) associated with compounds affecting boar taint is important both for gaining a better understanding of the complex regulation of the trait and for the purpose of identifying markers that can be used to improve the gain of breeding. The beneficial SNPs to be used in breeding would have the combinational effects of reducing levels of boar taint without affecting fertility of the animals. The aim of this study was to detect SNPs in boar taint candidate genes and to perform association studies for both single SNPs and haplotypes with levels of boar taint compounds and phenotypes related to reproduction. Results An association study involving 275 SNPs in 121 genes and compounds related to boar taint and reproduction were carried out in Duroc and Norwegian Landrace boars. Phenotypes investigated were levels of androstenone, skatole and indole in adipose tissue, levels of androstenone, testosterone, estrone sulphate and 17β-estradiol in plasma, and length of bulbo urethralis gland. The SNPs were genotyped in more than 2800 individuals and several SNPs were found to be significantly (LRT > 5.4) associated with the different phenotypes. Genes with significant SNPs in either of the traits investigated include cytochrome P450 members CYP2E1, CYP21, CYP2D6 and CYP2C49, steroid 5α-reductase SRD5A2, nuclear receptor NGFIB, catenin CTNND1, BRCA1 associated protein BAP1 and hyaluronoglucosaminidase HYAL2. Haplotype analysis provided additional evidence for an effect of CYP2E1 on levels

  9. Genetic variants in urinary bladder cancer: collective power of the "wimp SNPs".

    PubMed

    Golka, Klaus; Selinski, Silvia; Lehmann, Marie-Louise; Blaszkewicz, Meinolf; Marchan, Rosemarie; Ickstadt, Katja; Schwender, Holger; Bolt, Hermann M; Hengstler, Jan G

    2011-06-01

    In recent years, genome-wide association studies (GWAS) have identified more than 300 validated associations between genetic variants and risk of approximately 70 common diseases. A small number of rare variants with a frequency of usually less than 1% are associated with a strongly enhanced risk, such as genetic variants of TP53, RB1, BRCA1, and BRCA2. Only a very small number of SNPs (with a frequency of more that 1% of the rare allele) have effects of a factor of two or higher. Examples include APOE4 in Alzheimer's disease, LOXL1 in exfoliative glaucoma, and CFH in age-related macular degeneration. However, the majority of all identified SNPs have odds ratios between 1.1 and 1.5. In the case of urinary bladder cancer, all known SNPs that have been validated in sufficiently large populations are associated with odds ratios smaller than 1.5. These SNPs are located next to the following genes: MYC, TP63, PSCA, the TERT-CLPTM1L locus, FGFR3, TACC3, NAT2, CBX6, APOBEC3A, CCNE1, and UGT1A. It is likely that these moderate risk or "wimp SNPs" interact, and because of their high number, collectively have a strong influence on whether an individual will develop cancer or not. It should be considered that variants identified so far explain only approximately 5-10% of the overall inherited risk. Possibly, the remaining variance is due to an even higher number of SNPs with odds ratios smaller than 1.1. Recent studies have provided the following information: (1) The functions of genes identified as relevant for bladder cancer focus on detoxification of carcinogens, control of the cell cycle and apoptosis, as well as maintenance of DNA integrity. (2) Many novel SNPs are far away from the protein coding regions, suggesting that these SNPs are located on distant-acting transcriptional enhancers. (3) The low odds ratio of each individual bladder cancer-associated SNP is too low to justify reasonable preventive measures. However, if the recently identified SNPs interact, they may

  10. HEPA filter dissolution process

    DOEpatents

    Brewer, K.N.; Murphy, J.A.

    1994-02-22

    A process is described for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal. 4 figures.

  11. Hepa filter dissolution process

    DOEpatents

    Brewer, Ken N.; Murphy, James A.

    1994-01-01

    A process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  12. Recirculating electric air filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric air filter cartridge has a cylindrical inner high voltage electrode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  13. Recirculating electric air filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric air filter cartridge has a cylindrical inner high voltage eleode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  14. Identification of immune-related SNPs in the transcriptome of Mytilus chilensis through high-throughput sequencing.

    PubMed

    Núñez-Acuña, Gustavo; Gallardo-Escárate, Cristian

    2013-12-01

    Single nucleotide polymorphisms (SNPs) identified in coding regions represent a useful tool for understanding the immune response against pathogens and stressful environmental conditions. In this study, a SNPs database was generated from transcripts involved in the innate immune response of the mussel Mytilus chilensis. The SNPs were identified through hemocytes transcriptome sequencing from 18 individuals, and SNPs mining was performed in 225,336 contigs, yielding 20,306 polymorphisms associated to immune-related genes. Classification of identified SNPs was based on different pathways of the immune response for Mytilus sp. A total of 28 SNPs were identified in the Toll-like receptor pathway and included 5 non-synonymous polymorphisms; 19 SNPs were identified in the apoptosis pathway and included 3 non-synonymous polymorphisms; 35 SNPs were identified in the Ubiquitin-mediated proteolysis pathway and included 4 non-synonymous variants; and 54 SNPs involved in other molecular functions related to the immune response, such as molecular chaperones, antimicrobial peptides, and genes that interacts with marine toxins were also identified. The molecular markers identified in this work could be useful for novel studies, such as those related to associations between high-resolution molecular markers and functional response to pathogen agents. PMID:24080470

  15. Exonic versus intronic SNPs: contrasting roles in revealing the population genetic differentiation of a widespread bird species

    PubMed Central

    Zhan, X; Dixon, A; Batbayar, N; Bragin, E; Ayas, Z; Deutschova, L; Chavko, J; Domashevsky, S; Dorosencu, A; Bagyura, J; Gombobaatar, S; Grlica, I D; Levin, A; Milobog, Y; Ming, M; Prommer, M; Purev-Ochir, G; Ragyov, D; Tsurkanu, V; Vetrov, V; Zubkov, N; Bruford, M W

    2015-01-01

    Recent years have seen considerable progress in applying single nucleotide polymorphisms (SNPs) to population genetics studies. However, relatively few have attempted to use them to study the genetic differentiation of wild bird populations and none have examined possible differences of exonic and intronic SNPs in these studies. Here, using 144 SNPs, we examined population genetic differentiation in the saker falcon (Falco cherrug) across Eurasia. The position of each SNP was verified using the recently sequenced saker genome with 108 SNPs positioned within the introns of 10 fragments and 36 SNPs in the exons of six genes, comprising MHC, MC1R and four others. In contrast to intronic SNPs, both Bayesian clustering and principal component analyses using exonic SNPs consistently revealed two genetic clusters, within which the least admixed individuals were found in Europe/central Asia and Qinghai (China), respectively. Pairwise D analysis for exonic SNPs showed that the two populations were significantly differentiated and between the two clusters the frequencies of five SNP markers were inferred to be influenced by selection. Central Eurasian populations clustered in as intermediate between the two main groups, consistent with their geographic position. But the westernmost populations of central Europe showed evidence of demographic isolation. Our work highlights the importance of functional exonic SNPs for studying population genetic pattern in a widespread avian species. PMID:25074575

  16. Exonic versus intronic SNPs: contrasting roles in revealing the population genetic differentiation of a widespread bird species.

    PubMed

    Zhan, X; Dixon, A; Batbayar, N; Bragin, E; Ayas, Z; Deutschova, L; Chavko, J; Domashevsky, S; Dorosencu, A; Bagyura, J; Gombobaatar, S; Grlica, I D; Levin, A; Milobog, Y; Ming, M; Prommer, M; Purev-Ochir, G; Ragyov, D; Tsurkanu, V; Vetrov, V; Zubkov, N; Bruford, M W

    2015-01-01

    Recent years have seen considerable progress in applying single nucleotide polymorphisms (SNPs) to population genetics studies. However, relatively few have attempted to use them to study the genetic differentiation of wild bird populations and none have examined possible differences of exonic and intronic SNPs in these studies. Here, using 144 SNPs, we examined population genetic differentiation in the saker falcon (Falco cherrug) across Eurasia. The position of each SNP was verified using the recently sequenced saker genome with 108 SNPs positioned within the introns of 10 fragments and 36 SNPs in the exons of six genes, comprising MHC, MC1R and four others. In contrast to intronic SNPs, both Bayesian clustering and principal component analyses using exonic SNPs consistently revealed two genetic clusters, within which the least admixed individuals were found in Europe/central Asia and Qinghai (China), respectively. Pairwise D analysis for exonic SNPs showed that the two populations were significantly differentiated and between the two clusters the frequencies of five SNP markers were inferred to be influenced by selection. Central Eurasian populations clustered in as intermediate between the two main groups, consistent with their geographic position. But the westernmost populations of central Europe showed evidence of demographic isolation. Our work highlights the importance of functional exonic SNPs for studying population genetic pattern in a widespread avian species. PMID:25074575

  17. Genetic association studies between SNPs and suicidal behavior: a meta-analytical field synopsis.

    PubMed

    Schild, Anne H E; Pietschnig, Jakob; Tran, Ulrich S; Voracek, Martin

    2013-10-01

    The large number of published meta-analyses on the associations between single-nucleotide polymorphisms (SNPs) and suicidal behavior mirrors the enormous research interest in this topic. Although meta-analytic evidence is abundant and certain patterns are apparent, those have not been integrated into a general framework as of yet. In a systematic review, genetic association studies between SNPs and suicidal behavior were identified. Previously published meta-analyses for eight SNPs were updated and the results of the different meta-analyses were compared. Meta-analyses for 15 SNPs, which had not been subjected to meta-analysis before, were conducted. The present meta-analytical field synopsis showed five major similarities between new and published analyses: 1) Summary effect sizes were small and rarely statistically significant, 2) heterogeneity between studies was often substantial, 3) there were no time trends, 4) effects were easily swayed and were largely dependent on individual studies, and 5) publication bias does not play a role in this field of research. Meta-analytic data show once more that major contributions of single genes are unlikely. However, association studies and corresponding meta-analyses have been an important and necessary stepping stone in the development of modern and more complex approaches in the genetics of suicidal behavior.

  18. Alteration of Antiviral Signalling by Single Nucleotide Polymorphisms (SNPs) of Mitochondrial Antiviral Signalling Protein (MAVS)

    PubMed Central

    Xing, Fei; Matsumiya, Tomoh; Hayakari, Ryo; Yoshida, Hidemi; Kawaguchi, Shogo; Takahashi, Ippei; Nakaji, Shigeyuki; Imaizumi, Tadaatsu

    2016-01-01

    Genetic variation is associated with diseases. As a type of genetic variation occurring with certain regularity and frequency, the single nucleotide polymorphism (SNP) is attracting more and more attention because of its great value for research and real-life application. Mitochondrial antiviral signalling protein (MAVS) acts as a common adaptor molecule for retinoic acid-inducible gene-I (RIG-I)-like receptors (RLRs), which can recognize foreign RNA, including viral RNA, leading to the induction of type I interferons (IFNs). Therefore, MAVS is thought to be a crucial molecule in antiviral innate immunity. We speculated that genetic variation of MAVS may result in susceptibility to infectious diseases. To assess the risk of viral infection based on MAVS variation, we tested the effects of twelve non-synonymous MAVS coding-region SNPs from the National Center for Biotechnology Information (NCBI) database that result in amino acid substitutions. We found that five of these SNPs exhibited functional alterations. Additionally, four resulted in an inhibitory immune response, and one had the opposite effect. In total, 1,032 human genomic samples obtained from a mass examination were genotyped at these five SNPs. However, no homozygous or heterozygous variation was detected. We hypothesized that these five SNPs are not present in the Japanese population and that such MAVS variations may result in serious immune diseases. PMID:26954674

  19. Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA Segments

    PubMed Central

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  20. Validation of 58 autosomal individual identification SNPs in three Chinese populations

    PubMed Central

    Wei, Yi-Liang; Qin, Cui-Jiao; Liu, Hai-Bo; Jia, Jing; Hu, Lan; Li, Cai-Xia

    2014-01-01

    Aim To genotype and evaluate a panel of single-nucleotide polymorphisms for individual identification (IISNPs) in three Chinese populations: Chinese Han, Uyghur, and Tibetan. Methods Two previously identified panels of IISNPs, 86 unlinked IISNPs and SNPforID 52-plex markers, were pooled and analyzed. Four SNPs were included in both panels. In total, 132 SNPs were typed on Sequenom MassARRAY® platform in 330 individuals from Han Chinese, Uyghur, and Tibetan populations. Population genetic indices and forensic parameters were determined for all studied markers. Results No significant deviation from Hardy-Weinberg equilibrium was observed for any of the SNPs in 3 populations. Expected heterozygosity (He) ranged from 0.144 to 0.500 in Han Chinese, from 0.197 to 0.500 in Uyghur, and from 0.018 to 0.500 in Tibetan population. Wright's Fst values ranged from 0.0001 to 0.1613. Pairwise linkage disequilibrium (LD) calculations for all 132 SNPs showed no significant LD across the populations (r2<0.147). A subset of 58 unlinked IISNPs (r2<0.094) with He>0.450 and Fst values from 0.0002 to 0.0536 gave match probabilities of 10−25 and a cumulative probability of exclusion of 0.999992. Conclusion The 58 unlinked IISNPs with high heterozygosity have low allele frequency variation among 3 Chinese populations, which makes them excellent candidates for the development of multiplex assays for individual identification and paternity testing. PMID:24577821

  1. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing

    PubMed Central

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  2. Hansa: an automated method for discriminating disease and neutral human nsSNPs.

    PubMed

    Acharya, Vishal; Nagarajaram, Hampapathalu A

    2012-02-01

    Variations are mostly due to nonsynonymous single nucleotide polymorphisms (nsSNPs), some of which are associated with certain diseases. Phenotypic effects of a large number of nsSNPs have not been characterized. Although several methods have been developed to predict the effects of nsSNPs as "disease" or "neutral," there is still a need for development of methods with improved prediction accuracies. We, therefore, developed a support vector machine (SVM) based method named Hansa which uses a novel set of discriminatory features to classify nsSNPs into disease (pathogenic) and benign (neutral) types. Validation studies on a benchmark dataset and further on an independent dataset of well-characterized known disease and neutral mutations show that Hansa outperforms the other known methods. For example, fivefold cross-validation studies using the benchmark HumVar dataset reveal that at the false positive rate (FPR) of 20% Hansa yields a true positive rate (TPR) of 82% that is about 10% higher than the best-known method. Hansa is available in the form of a web server at http://hansa.cdfd.org.in:8080.

  3. SNPs at 3'-UTR of the bovine CDIPT gene associated with Qinchuan cattle meat quality traits.

    PubMed

    Fu, C Z; Wang, H; Mei, C G; Wang, J L; Jiang, B J; Ma, X H; Wang, H B; Cheng, G; Zan, L S

    2013-03-13

    The CDIPT is crucial to the fatty acid metabolic pathway, intracellular signal transduction and energy metabolism in eukaryotic cells. We detected three SNPs at 3'-untranslated regions (UTR), named 3'-UTR_108 A > G, 3'-UTR_448 G > A and 3'-UTR_477 C > G, of the CDIPT gene in 618 Qinchuan cattle using PCR-RFLP and DNA sequencing methods. At each of the three SNPs, we found three genotypes named as follows: AA, AB, BB (3'-UTR_108 A > G), CC, CD, DD (3'-UTR_448 G > A) and EE, EF, FF (3'-UTR_477 C > G.). Based on association analysis of these SNPs with ultrasound measurement traits, individuals of genotype BB had a significantly larger loin muscle area than genotype AA. Individuals of genotype CC had significantly thicker back fat than individuals of genotype DD. Individuals of genotype EE also had significantly thicker back fat than did individuals of genotype FF. We conclude that these SNPs of the CDIPT gene could be used as molecular markers for selecting and breeding beef cattle with superior body traits, depending on breeding goals.

  4. Cross-amplification and validation of SNPs conserved over 44 million years between seals and dogs.

    PubMed

    Hoffman, Joseph I; Thorne, Michael A S; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  5. Large-scale enrichment and discovery of gene-associated SNPs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...

  6. SNPs for parentage testing and traceability in globally diverse breeds of sheep

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA-based parentage determination accelerates genetic improvement by increasing pedigree accuracy. However, the utility of any “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities...

  7. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island

    PubMed Central

    Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A.; Shouche, Yogesh S.; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1–40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1–20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25–40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  8. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island.

    PubMed

    Kumbhare, Shreyas V; Dhotre, Dhiraj P; Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A; Shouche, Yogesh S; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1-40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1-20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25-40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity.

  9. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  10. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  11. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    PubMed Central

    Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

    2012-01-01

    Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses. PMID:22013517

  12. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  13. Prioritization of candidate SNPs in colon cancer using bioinformatics tools: an alternative approach for a cancer biologist.

    PubMed

    George Priya Doss, C; Rajasekaran, R; Arjun, P; Sethumadhavan, Rao

    2010-12-01

    The genetics of human phenotype variation and especially, the genetic basis of human complex diseases could be understood by knowing the functions of Single Nucleotide Polymorphisms (SNPs). The main goal of this work is to predict the deleterious non-synonymous SNPs (nsSNPs), so that the number of SNPs screened for association with disease can be reduced to that most likely alters gene function. In this work by using computational tools, we have analyzed the SNPs that can alter the expression and function of cancerous genes involved in colon cancer. To explore possible relationships between genetic mutation and phenotypic variation, different computational algorithm tools like Sorting Intolerant from Tolerant (evolutionary-based approach), Polymorphism Phenotyping (structure-based approach), PupaSuite, UTRScan and FASTSNP were used for prioritization of high-risk SNPs in coding region (exonic nonsynonymous SNPs) and non-coding regions (intronic and exonic 5' and 3'-untranslated region (UTR) SNPs). We developed semi-quantitative relative ranking strategy (non availability of 3D structure) that can be adapted to a priori SNP selection or post hoc evaluation of variants identified in whole genome scans or within haplotype blocks associated with disease. Lastly, we analyzed haplotype tagging SNPs (htSNPs) in the coding and untranslated regions of all the genes by selecting the force tag SNPs selection using iHAP analysis. The computational architecture proposed in this review is based on integrating relevant biomedical information sources to provide a systematic analysis of complex diseases. We have shown a "real world" application of interesting existing bioinformatics tools for SNP analysis in colon cancer. PMID:21153778

  14. lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse

    PubMed Central

    Gong, Jing; Liu, Wei; Zhang, Jiayou; Miao, Xiaoping; Guo, An-Yuan

    2015-01-01

    Long non-coding RNAs (lncRNAs) play key roles in various cellular contexts and diseases by diverse mechanisms. With the rapid growth of identified lncRNAs and disease-associated single nucleotide polymorphisms (SNPs), there is a great demand to study SNPs in lncRNAs. Aiming to provide a useful resource about lncRNA SNPs, we systematically identified SNPs in lncRNAs and analyzed their potential impacts on lncRNA structure and function. In total, we identified 495 729 and 777 095 SNPs in more than 30 000 lncRNA transcripts in human and mouse, respectively. A large number of SNPs were predicted with the potential to impact on the miRNA–lncRNA interaction. The experimental evidence and conservation of miRNA–lncRNA interaction, as well as miRNA expressions from TCGA were also integrated to prioritize the miRNA–lncRNA interactions and SNPs on the binding sites. Furthermore, by mapping SNPs to GWAS results, we found that 142 human lncRNA SNPs are GWAS tagSNPs and 197 827 lncRNA SNPs are in the GWAS linkage disequilibrium regions. All these data for human and mouse lncRNAs were imported into lncRNASNP database (http://bioinfo.life.hust.edu.cn/lncRNASNP/), which includes two sub-databases lncRNASNP-human and lncRNASNP-mouse. The lncRNASNP database has a user-friendly interface for searching and browsing through the SNP, lncRNA and miRNA sections. PMID:25332392

  15. Identification of putative SNPs in progressive retinal atrophy affected Canis lupus familiaris using exome sequencing.

    PubMed

    Reddy, Bhaskar; Kelawala, Divyesh N; Shah, Tejas; Patel, Anand B; Patil, Deepak B; Parikh, Pinesh V; Patel, Namrata; Parmar, Nidhi; Mohapatra, Amit B; Singh, Krishna M; Menon, Ramesh; Pandya, Dipal; Jakhesara, Subhash J; Koringa, Prakash G; Rao, Mandava V; Joshi, Chaitanya G

    2015-12-01

    Progressive retinal atrophy (PRA) is one of the major causes of retinal photoreceptor cell degeneration in canines. The inheritance pattern of PRA is autosomal recessive and genetically heterogeneous. Here, using targeted sequencing technology, we have performed exome sequencing of 10 PRA-affected (Spitz=7, Cocker Spaniel=1, Lhasa Aphso=1 and Spitz-Labrador cross breed=1) and 6 normal (Spitz=5, Cocker Spaniel=1) dogs. The high-throughput sequencing using 454-Roche Titanium sequencer generated about 2.16 Giga bases of raw data. Initially, we have successfully identified 25,619 single nucleotide polymorphisms (SNPs) that passed the stringent SNP calling parameters. Further, we performed association study on the cohort, and the highly significant (0.001) associations were short-listed and investigated in-depth. Out of the 171 significant SNPs, 113 were previously unreported. Interestingly, six among them were non-synonymous coding (NSC) SNPs, which includes CPPED1 A>G (p.M307V), PITRM1 T>G (p.S715A), APP G>A (p.T266M), RNF213 A>G (p.V1482A), C>A (p.V1456L), and SLC46A3 G>A (p.R168Q). On the other hand, 35 out of 113 unreported SNPs were falling in regulatory regions such as 3'-UTR, 5'-UTR, etc. In-depth bioinformatics analysis revealed that majority of NSC SNPs have damaging effect and alter protein stability. This study highlighted the genetic markers associated with PRA, which will help to develop genetic assay-based screening in effective breeding. PMID:26515695

  16. MiR-SNPs as Markers of Toxicity and Clinical Outcome in Hodgkin Lymphoma Patients

    PubMed Central

    Navarro, Alfons; Muñoz, Carmen; Gaya, Anna; Díaz-Beyá, Marina; Gel, Bernat; Tejero, Rut; Díaz, Tania; Martinez, Antonio; Monzó, Mariano

    2013-01-01

    Background In recent years, microRNA (miRNA) pathways have emerged as a crucial system for the regulation of tumorogenesis. miR-SNPs are a novel class of single nucleotide polymorphisms that can affect miRNA pathways. Design and Methods We analyzed eight miR-SNPs by allelic discrimination in 141 patients with Hodgkin lymphoma and correlated the results with treatment-related toxicity, response, disease-free survival (DFS) and overall survival (OS). Results The KRT81 (rs3660) GG genotype was associated with an increased risk of neurological toxicity (P = 0.016), while patients with XPO5 (rs11077) AA or CC genotypes had a higher rate of bleomycin-associated pulmonary toxicity (P = 0.048). Both miR-SNPs emerged as independent factors in the multivariate analysis. The XPO5 AA and CC genotypes were also associated with a lower response rate (P = 0.036). XPO5 (P = 0.039) and TRBP (rs784567) (P = 0.022) genotypes emerged as prognostic markers for DFS, and XPO5 was also associated with OS (P = 0.033). In the multivariate analysis, only XPO5 emerged as an independent prognostic factor for DFS (HR: 2.622; 95%CI 1.039–6.620; P = 0.041). Given the influence of XPO5 and TRBP as individual markers, we then investigated the combined effect of these miR-SNPs. Patients with both the XPO5 AA/CC and TRBP TT/TC genotypes had the shortest DFS (P = 0.008) and OS (P = 0.008). Conclusion miR-SNPs can add useful prognostic information on treatment-related toxicity and clinical outcome in Hodgkin lymphoma and can be used to identify patients likely to be chemoresistant or to relapse. PMID:23705004

  17. ARRANGEMENT FOR REPLACING FILTERS

    DOEpatents

    Blomgren, R.A.; Bohlin, N.J.C.

    1957-08-27

    An improved filtered air exhaust system which may be continually operated during the replacement of the filters without the escape of unfiltered air is described. This is accomplished by hermetically sealing the box like filter containers in a rectangular tunnel with neoprene covered sponge rubber sealing rings coated with a silicone impregnated pneumatic grease. The tunnel through which the filters are pushed is normal to the exhaust air duct. A number of unused filters are in line behind the filters in use, and are moved by a hydraulic ram so that a fresh filter is positioned in the air duct. The used filter is pushed into a waiting receptacle and is suitably disposed. This device permits a rapid and safe replacement of a radiation contaminated filter without interruption to the normal flow of exhaust air.

  18. Method of securing filter elements

    DOEpatents

    Brown, Erik P.; Haslam, Jeffery L.; Mitchell, Mark A.

    2016-10-04

    A filter securing system including a filter unit body housing; at least one tubular filter element positioned in the filter unit body housing, the tubular filter element having a closed top and an open bottom; a dimple in either the filter unit body housing or the top of the tubular filter element; and a socket in either the filter unit body housing or the top of the tubular filter element that receives the dimple in either the filter unit body housing or the top of the tubular filter element to secure the tubular filter element to the filter unit body housing.

  19. eQuIPS: eQTL Analysis Using Informed Partitioning of SNPs - A Fully Bayesian Approach.

    PubMed

    Boggis, E M; Milo, M; Walters, K

    2016-05-01

    We develop a Bayesian multi-SNP Markov chain Monte Carlo approach that allows published functional significance scores to objectively inform single nucleotide polymorphism (SNP) prior effect sizes in expression quantitative trait locus (eQTL) studies. We developed the Normal Gamma prior to allow the inclusion of functional information. We partition SNPs into predefined functional groups and select prior distributions that fit the group-specific observed functional significance scores. We test our method on two simulated datasets and previously analysed human eQTL data containing validated causal SNPs. In our simulations the modified Normal Gamma always performs at least as well, and generally outperforms, the other methods considered. When analysing the human eQTL data, we placed all SNPs into their actual functional group. The ranks of the four validated causal SNPs analysed using the modified Normal Gamma increase dramatically compared to those of the other methods considered. Using our new method, three of the four validated SNPs are ranked in the top 1% of SNPs and the other is in the top 2%. For the standard Normal Gamma, the best of the other methods, the four validated SNPs had ranks in the top 1%, 4%, 20% and 59%. Crucially these substantive improvements in the ranks make it highly likely that most, if not all, of these validated SNPs would have been flagged for follow-up using our new method, whereas at least two of them would certainly not have been using the current approaches. PMID:26989050

  20. Rigid porous filter

    DOEpatents

    Chiang, Ta-Kuan; Straub, Douglas L.; Dennis, Richard A.

    2000-01-01

    The present invention involves a porous rigid filter including a plurality of concentric filtration elements having internal flow passages and forming external flow passages there between. The present invention also involves a pressure vessel containing the filter for the removal of particulates from high pressure particulate containing gases, and further involves a method for using the filter to remove such particulates. The present filter has the advantage of requiring fewer filter elements due to the high surface area-to-volume ratio provided by the filter, requires a reduced pressure vessel size, and exhibits enhanced mechanical design properties, improved cleaning properties, configuration options, modularity and ease of fabrication.

  1. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, Harry S.; Thompson, Robert C.; Hubbard, Charles W.; Perkins, Richard W.

    1997-01-01

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, whereafter the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant.

  2. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, H.S.; Thompson, R.C.; Hubbard, C.W.; Perkins, R.W.

    1997-03-25

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, where after the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant. 5 figs.

  3. Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene.

    PubMed

    Hassan, Mohamed M; Omer, Shaza E; Khalf-Allah, Rahma M; Mustafa, Razaz Y; Ali, Isra S; Mohamed, Sofia B

    2016-01-01

    This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3' UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5' UTR). In addition for 5'/3' splice sites, analysis showed that one SNP within 5' splice site and one Indel in 3' splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases. PMID:27478437

  4. HEPA Filter Vulnerability Assessment

    SciTech Connect

    GUSTAVSON, R.D.

    2000-05-11

    This assessment of High Efficiency Particulate Air (HEPA) filter vulnerability was requested by the USDOE Office of River Protection (ORP) to satisfy a DOE-HQ directive to evaluate the effect of filter degradation on the facility authorization basis assumptions. Within the scope of this assessment are ventilation system HEPA filters that are classified as Safety-Class (SC) or Safety-Significant (SS) components that perform an accident mitigation function. The objective of the assessment is to verify whether HEPA filters that perform a safety function during an accident are likely to perform as intended to limit release of hazardous or radioactive materials, considering factors that could degrade the filters. Filter degradation factors considered include aging, wetting of filters, exposure to high temperature, exposure to corrosive or reactive chemicals, and exposure to radiation. Screening and evaluation criteria were developed by a site-wide group of HVAC engineers and HEPA filter experts from published empirical data. For River Protection Project (RPP) filters, the only degradation factor that exceeded the screening threshold was for filter aging. Subsequent evaluation of the effect of filter aging on the filter strength was conducted, and the results were compared with required performance to meet the conditions assumed in the RPP Authorization Basis (AB). It was found that the reduction in filter strength due to aging does not affect the filter performance requirements as specified in the AB. A portion of the HEPA filter vulnerability assessment is being conducted by the ORP and is not part of the scope of this study. The ORP is conducting an assessment of the existing policies and programs relating to maintenance, testing, and change-out of HEPA filters used for SC/SS service. This document presents the results of a HEPA filter vulnerability assessment conducted for the River protection project as requested by the DOE Office of River Protection.

  5. Cordierite silicon nitride filters

    SciTech Connect

    Sawyer, J.; Buchan, B. ); Duiven, R.; Berger, M. ); Cleveland, J.; Ferri, J. )

    1992-02-01

    The objective of this project was to develop a silicon nitride based crossflow filter. This report summarizes the findings and results of the project. The project was phased with Phase I consisting of filter material development and crossflow filter design. Phase II involved filter manufacturing, filter testing under simulated conditions and reporting the results. In Phase I, Cordierite Silicon Nitride (CSN) was developed and tested for permeability and strength. Target values for each of these parameters were established early in the program. The values were met by the material development effort in Phase I. The crossflow filter design effort proceeded by developing a macroscopic design based on required surface area and estimated stresses. Then the thermal and pressure stresses were estimated using finite element analysis. In Phase II of this program, the filter manufacturing technique was developed, and the manufactured filters were tested. The technique developed involved press-bonding extruded tiles to form a filter, producing a monolithic filter after sintering. Filters manufactured using this technique were tested at Acurex and at the Westinghouse Science and Technology Center. The filters did not delaminate during testing and operated and high collection efficiency and good cleanability. Further development in areas of sintering and filter design is recommended.

  6. RTEL1 tagging SNPs and haplotypes were associated with glioma development

    PubMed Central

    2013-01-01

    Abstract As glioma ranks as the first most prevalent solid tumors in primary central nervous system, certain single-nucleotide polymorphisms (SNPs) may be related to increased glioma risk, and have implications in carcinogenesis. The present case–control study was carried out to elucidate how common variants contribute to glioma susceptibility. Ten candidate tagging SNPs (tSNPs) were selected from seven genes whose polymorphisms have been proven by classical literatures and reliable databases to be tended to relate with gliomas, and with the minor allele frequency (MAF) > 5% in the HapMap Asian population. The selected tSNPs were genotyped in 629 glioma patients and 645 controls from a Han Chinese population using the multiplexed SNP MassEXTEND assay calibrated. Two significant tSNPs in RTEL1 gene were observed to be associated with glioma risk (rs6010620, P = 0.0016, OR: 1.32, 95% CI: 1.11-1.56; rs2297440, P = 0.001, OR: 1.33, 95% CI: 1.12-1.58) by χ2 test. It was identified the genotype “GG” of rs6010620 acted as the protective genotype for glioma (OR, 0.46; 95% CI, 0.31-0.7; P = 0.0002), while the genotype “CC” of rs2297440 as the protective genotype in glioma (OR, 0.47; 95% CI, 0.31-0.71; P = 0.0003). Furthermore, haplotype “GCT” in RTEL1 gene was found to be associated with risk of glioma (OR, 0.7; 95% CI, 0.57-0.86; Fisher’s P = 0.0005; Pearson’s P = 0.0005), and haplotype “ATT” was detected to be associated with risk of glioma (OR, 1.32; 95% CI, 1.12-1.57; Fisher’s P = 0.0013; Pearson’s P = 0.0013). Two single variants, the genotypes of “GG” of rs6010620 and “CC” of rs2297440 (rs6010620 and rs2297440) in the RTEL1 gene, together with two haplotypes of GCT and ATT, were identified to be associated with glioma development. And it might be used to evaluate the glioma development risks to screen the above RTEL1 tagging SNPs and haplotypes. Virtual slides The virtual slides for this article

  7. HEPA filter monitoring program

    NASA Astrophysics Data System (ADS)

    Kirchner, K. N.; Johnson, C. M.; Aiken, W. F.; Lucerna, J. J.; Barnett, R. L.; Jensen, R. T.

    1986-07-01

    The testing and replacement of HEPA filters, widely used in the nuclear industry to purify process air, are costly and labor-intensive. Current methods of testing filter performance, such as differential pressure measurement and scanning air monitoring, allow determination of overall filter performance but preclude detection of incipient filter failure such as small holes in the filters. Using current technology, a continual in-situ monitoring system was designed which provides three major improvements over current methods of filter testing and replacement. The improvements include: cost savings by reducing the number of intact filters which are currently being replaced unnecessarily; more accurate and quantitative measurement of filter performance; and reduced personnel exposure to a radioactive environment by automatically performing most testing operations.

  8. Bag filters for TPP

    SciTech Connect

    L.V. Chekalov; Yu.I. Gromov; V.V. Chekalov

    2007-05-15

    Cleaning of TPP flue gases with bag filters capable of pulsed regeneration is examined. A new filtering element with a three-dimensional filtering material formed from a needle-broached cloth in which the filtration area, as compared with a conventional smooth bag, is increased by more than two times, is proposed. The design of a new FRMI type of modular filter is also proposed. A standard series of FRMI filters with a filtration area ranging from 800 to 16,000 m{sup 2} is designed for an output more than 1 million m{sub 3}/h of with respect to cleaned gas. The new bag filter permits dry collection of sulfur oxides from waste gases at TPP operating on high-sulfur coals. The design of the filter makes it possible to replace filter elements without taking the entire unit out of service.

  9. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  10. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...

  11. Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...

  12. Novel Backup Filter Device for Candle Filters

    SciTech Connect

    Bishop, B.; Goldsmith, R.; Dunham, G.; Henderson, A.

    2002-09-18

    The currently preferred means of particulate removal from process or combustion gas generated by advanced coal-based power production processes is filtration with candle filters. However, candle filters have not shown the requisite reliability to be commercially viable for hot gas clean up for either integrated gasifier combined cycle (IGCC) or pressurized fluid bed combustion (PFBC) processes. Even a single candle failure can lead to unacceptable ash breakthrough, which can result in (a) damage to highly sensitive and expensive downstream equipment, (b) unacceptably low system on-stream factor, and (c) unplanned outages. The U.S. Department of Energy (DOE) has recognized the need to have fail-safe devices installed within or downstream from candle filters. In addition to CeraMem, DOE has contracted with Siemens-Westinghouse, the Energy & Environmental Research Center (EERC) at the University of North Dakota, and the Southern Research Institute (SRI) to develop novel fail-safe devices. Siemens-Westinghouse is evaluating honeycomb-based filter devices on the clean-side of the candle filter that can operate up to 870 C. The EERC is developing a highly porous ceramic disk with a sticky yet temperature-stable coating that will trap dust in the event of filter failure. SRI is developing the Full-Flow Mechanical Safeguard Device that provides a positive seal for the candle filter. Operation of the SRI device is triggered by the higher-than-normal gas flow from a broken candle. The CeraMem approach is similar to that of Siemens-Westinghouse and involves the development of honeycomb-based filters that operate on the clean-side of a candle filter. The overall objective of this project is to fabricate and test silicon carbide-based honeycomb failsafe filters for protection of downstream equipment in advanced coal conversion processes. The fail-safe filter, installed directly downstream of a candle filter, should have the capability for stopping essentially all particulate

  13. Accuracy of genomic predictions for feed efficiency traits of beef cattle using 50K and imputed HD genotypes.

    PubMed

    Lu, D; Akanno, E C; Crowley, J J; Schenkel, F; Li, H; De Pauw, M; Moore, S S; Wang, Z; Li, C; Stothard, P; Plastow, G; Miller, S P; Basarab, J A

    2016-04-01

    The accuracy of genomic predictions can be used to assess the utility of dense marker genotypes for genetic improvement of beef efficiency traits. This study was designed to test the impact of genomic distance between training and validation populations, training population size, statistical methods, and density of genetic markers on prediction accuracy for feed efficiency traits in multibreed and crossbred beef cattle. A total of 6,794 beef cattle data collated from various projects and research herds across Canada were used. Illumina BovineSNP50 (50K) and imputed Axiom Genome-Wide BOS 1 Array (HD) genotypes were available for all animals. The traits studied were DMI, ADG, and residual feed intake (RFI). Four validation groups of 150 animals each, including Angus (AN), Charolais (CH), Angus-Hereford crosses (ANHH), and a Charolais-based composite (TX) were created by considering the genomic distance between pairs of individuals in the validation groups. Each validation group had 7 corresponding training groups of increasing sizes ( = 1,000, 1,999, 2,999, 3,999, 4,999, 5,998, and 6,644), which also represent increasing average genomic distance between pairs of individuals in the training and validations groups. Prediction of genomic estimated breeding values (GEBV) was performed using genomic best linear unbiased prediction (GBLUP) and Bayesian method C (BayesC). The accuracy of genomic predictions was defined as the Pearson's correlation between adjusted phenotype and GEBV (), unless otherwise stated. Using 50K genotypes, the highest average achieved in purebreds (AN, CH) was 0.41 for DMI, 0.34 for ADG, and 0.35 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.21 for ADG, and 0.25 for RFI. Similarly, when imputed HD genotypes were applied in purebreds (AN, CH), the highest average was 0.14 for DMI, 0.15 for ADG, and 0.14 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.22 for ADG, and 0.24 for RFI. The of GBLUP predictions were

  14. Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans.

    PubMed

    Du, Mengmeng; Auer, Paul L; Jiao, Shuo; Haessler, Jeffrey; Altshuler, David; Boerwinkle, Eric; Carlson, Christopher S; Carty, Cara L; Chen, Yii-Der Ida; Curtis, Keith; Franceschini, Nora; Hsu, Li; Jackson, Rebecca; Lange, Leslie A; Lettre, Guillaume; Monda, Keri L; Nickerson, Deborah A; Reiner, Alex P; Rich, Stephen S; Rosse, Stephanie A; Rotter, Jerome I; Willer, Cristen J; Wilson, James G; North, Kari; Kooperberg, Charles; Heard-Costa, Nancy; Peters, Ulrike

    2014-12-15

    Adult body height is a quantitative trait for which genome-wide association studies (GWAS) have identified numerous loci, primarily in European populations. These loci, comprising common variants, explain <10% of the phenotypic variance in height. We searched for novel associations between height and common (minor allele frequency, MAF ≥5%) or infrequent (0.5% < MAF < 5%) variants across the exome in African Americans. Using a reference panel of 1692 African Americans and 471 Europeans from the National Heart, Lung, and Blood Institute's (NHLBI) Exome Sequencing Project (ESP), we imputed whole-exome sequence data into 13 719 African Americans with existing array-based GWAS data (discovery). Variants achieving a height-association threshold of P < 5E-06 in the imputed dataset were followed up in an independent sample of 1989 African Americans with whole-exome sequence data (replication). We used P < 2.5E-07 (=0.05/196 779 variants) to define statistically significant associations in meta-analyses combining the discovery and replication sets (N = 15 708). We discovered and replicated three independent loci for association: 5p13.3/C5orf22/rs17410035 (MAF = 0.10, β = 0.64 cm, P = 8.3E-08), 13q14.2/SPRYD7/rs114089985 (MAF = 0.03, β = 1.46 cm, P = 4.8E-10) and 17q23.3/GH2/rs2006123 (MAF = 0.30; β = 0.47 cm; P = 4.7E-09). Conditional analyses suggested 5p13.3 (C5orf22/rs17410035) and 13q14.2 (SPRYD7/rs114089985) may harbor novel height alleles independent of previous GWAS-identified variants (r(2) with GWAS loci <0.01); whereas 17q23.3/GH2/rs2006123 was correlated with GWAS-identified variants in European and African populations. Notably, 13q14.2/rs114089985 is infrequent in African Americans (MAF = 3%), extremely rare in European Americans (MAF = 0.03%), and monomorphic in Asian populations, suggesting it may be an African-American-specific height allele. Our findings demonstrate that whole-exome imputation of sequence variants can identify low-frequency variants

  15. MST Filterability Tests

    SciTech Connect

    Poirier, M. R.; Burket, P. R.; Duignan, M. R.

    2015-03-12

    The Savannah River Site (SRS) is currently treating radioactive liquid waste with the Actinide Removal Process (ARP) and the Modular Caustic Side Solvent Extraction Unit (MCU). The low filter flux through the ARP has limited the rate at which radioactive liquid waste can be treated. Recent filter flux has averaged approximately 5 gallons per minute (gpm). Salt Batch 6 has had a lower processing rate and required frequent filter cleaning. Savannah River Remediation (SRR) has a desire to understand the causes of the low filter flux and to increase ARP/MCU throughput. In addition, at the time the testing started, SRR was assessing the impact of replacing the 0.1 micron filter with a 0.5 micron filter. This report describes testing of MST filterability to investigate the impact of filter pore size and MST particle size on filter flux and testing of filter enhancers to attempt to increase filter flux. The authors constructed a laboratory-scale crossflow filter apparatus with two crossflow filters operating in parallel. One filter was a 0.1 micron Mott sintered SS filter and the other was a 0.5 micron Mott sintered SS filter. The authors also constructed a dead-end filtration apparatus to conduct screening tests with potential filter aids and body feeds, referred to as filter enhancers. The original baseline for ARP was 5.6 M sodium salt solution with a free hydroxide concentration of approximately 1.7 M.3 ARP has been operating with a sodium concentration of approximately 6.4 M and a free hydroxide concentration of approximately 2.5 M. SRNL conducted tests varying the concentration of sodium and free hydroxide to determine whether those changes had a significant effect on filter flux. The feed slurries for the MST filterability tests were composed of simple salts (NaOH, NaNO2, and NaNO3) and MST (0.2 – 4.8 g/L). The feed slurry for the filter enhancer tests contained simulated salt batch 6 supernate, MST, and filter enhancers.

  16. Survey of digital filtering

    NASA Technical Reports Server (NTRS)

    Nagle, H. T., Jr.

    1972-01-01

    A three part survey is made of the state-of-the-art in digital filtering. Part one presents background material including sampled data transformations and the discrete Fourier transform. Part two, digital filter theory, gives an in-depth coverage of filter categories, transfer function synthesis, quantization and other nonlinear errors, filter structures and computer aided design. Part three presents hardware mechanization techniques. Implementations by general purpose, mini-, and special-purpose computers are presented.

  17. Coding SNPs as intrinsic markers for sample tracking in large-scale transcriptome studies

    PubMed Central

    Xu, Weihong; Gao, Hong; Seok, Junhee; Wilhelmy, Julie; Mindrinos, Michael N.; Davis, Ronald W.; Xiao, Wenzhong

    2014-01-01

    Large-scale transcriptome profiling in clinical studies often involves assaying multiple samples of a patient to monitor disease progression, treatment effect, and host response in multiple tissues. Such profiling is prone to human error, which often results in mislabeled samples. Here, we present a method to detect mislabeled sample outliers using coding single nucleotide polymorphisms (cSNPs) specifically designed on the microarray and demonstrate that the mislabeled samples can be efficiently identified by either simple clustering of allele-specific expression scores or Mahalanobis distance-based outlier detection method. Based on our results, we recommend the incorporation of cSNPs into future transcriptome array designs as intrinsic markers for sample tracking. PMID:22668418

  18. Coding SNPs as intrinsic markers for sample tracking in large-scale transcriptome studies.

    PubMed

    Xu, Weihong; Gao, Hong; Seok, Junhee; Wilhelmy, Julie; Mindrinos, Michael N; Davis, Ronald W; Xiao, Wenzhong

    2012-06-01

    Large-scale transcriptome profiling in clinical studies often involves assaying multiple samples of a patient to monitor disease progression, treatment effect, and host response in multiple tissues. Such profiling is prone to human error, which often results in mislabeled samples. Here, we present a method to detect mislabeled sample outliers using coding single nucleotide polymorphisms (cSNPs) specifically designed on the microarray and demonstrate that the mislabeled samples can be efficiently identified by either simple clustering of allele-specific expression scores or Mahalanobis distance-based outlier detection method. Based on our results, we recommend the incorporation of cSNPs into future transcriptome array designs as intrinsic markers for sample tracking.

  19. Collective effects of SNPs on transgenerational inheritance in Caenorhabditis elegans and budding yeast.

    PubMed

    Zhu, Zuobin; Man, Xian; Xia, Mengying; Huang, Yimin; Yuan, Dejian; Huang, Shi

    2015-07-01

    We studied the collective effects of single nucleotide polymorphisms (SNPs) on transgenerational inheritance in Caenorhabditis elegans recombinant inbred advanced intercross lines (RIAILs) and yeast segregants. We divided the RIAILs and segregants into two groups of high and low minor allele content (MAC). RIAILs with higher MAC needed less generations of benzaldehyde training to gain a stable olfactory imprint and showed a greater change from normal after benzaldehyde training. Yeast segregants with higher MAC showed a more dramatic shortening of the lag phase length after ethanol exposure. The short lag phase as acquired by ethanol training was more dramatically lost after recovery in ethanol free medium for the high MAC group. We also found a preferential association between MAC and traits linked with higher number of additive QTLs. These results suggest a role for the collective effects of SNPs in transgenerational inheritance, and may help explain human variations in disease susceptibility.

  20. Myf5 and MyoG gene SNPs associated with Bian chicken growth trait.

    PubMed

    Wei, Y; Zhang, G X; Zhang, T; Wang, J Y; Fan, Q C; Tang, Y; Ding, F X; Zhang, L

    2016-01-01

    The growth trait is important in poultry production. We analyzed the association between single nucleotide polymorphisms (SNPs) in the Myf5 and MyoG gene and Bian chicken growth traits. SNPs in candidate genes of the Bian chickens were detected by the polymerase chain reaction-single strand conformation polymorphism method. Two mutation loci and six genotypes were identified in each candidate gene. In terms of growth traits, least square analysis showed that the FF genotype of the MyoG was the advantage genotype and the IJ genotype of the Myf5 was the disadvantage genotype for growth trait in Bian chicken. Correlation analysis suggested that the different combination genotypes between Myf5 and MyoG genes had a significant effect on growth traits in Bian chickens. The result suggested that MyoG and Myf5 genes can be used in marker-assisted selection for improving the growth trait in Bian chicken. PMID:27525903

  1. Filter service system

    DOEpatents

    Sellers, Cheryl L.; Nordyke, Daniel S.; Crandell, Richard A.; Tomlins, Gregory; Fei, Dong; Panov, Alexander; Lane, William H.; Habeger, Craig F.

    2008-12-09

    According to an exemplary embodiment of the present disclosure, a system for removing matter from a filtering device includes a gas pressurization assembly. An element of the assembly is removably attachable to a first orifice of the filtering device. The system also includes a vacuum source fluidly connected to a second orifice of the filtering device.

  2. Practical Active Capacitor Filter

    NASA Technical Reports Server (NTRS)

    Shuler, Robert L., Jr. (Inventor)

    2005-01-01

    A method and apparatus is described that filters an electrical signal. The filtering uses a capacitor multiplier circuit where the capacitor multiplier circuit uses at least one amplifier circuit and at least one capacitor. A filtered electrical signal results from a direct connection from an output of the at least one amplifier circuit.

  3. HEPA filter encapsulation

    DOEpatents

    Gates-Anderson, Dianne D.; Kidd, Scott D.; Bowers, John S.; Attebery, Ronald W.

    2003-01-01

    A low viscosity resin is delivered into a spent HEPA filter or other waste. The resin is introduced into the filter or other waste using a vacuum to assist in the mass transfer of the resin through the filter media or other waste.

  4. Nonlinear Attitude Filtering Methods

    NASA Technical Reports Server (NTRS)

    Markley, F. Landis; Crassidis, John L.; Cheng, Yang

    2005-01-01

    This paper provides a survey of modern nonlinear filtering methods for attitude estimation. Early applications relied mostly on the extended Kalman filter for attitude estimation. Since these applications, several new approaches have been developed that have proven to be superior to the extended Kalman filter. Several of these approaches maintain the basic structure of the extended Kalman filter, but employ various modifications in order to provide better convergence or improve other performance characteristics. Examples of such approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters that propagate and update a discrete set of sigma points rather than using linearized equations for the mean and covariance are also reviewed. A two-step approach is discussed with a first-step state that linearizes the measurement model and an iterative second step to recover the desired attitude states. These approaches are all based on the Gaussian assumption that the probability density function is adequately specified by its mean and covariance. Other approaches that do not require this assumption are reviewed, including particle filters and a Bayesian filter based on a non-Gaussian, finite-parameter probability density function on SO(3). Finally, the predictive filter, nonlinear observers and adaptive approaches are shown. The strengths and weaknesses of the various approaches are discussed.

  5. AB048. X-chromosomal SNPs variation in populations of Russia

    PubMed Central

    Stepanov, Vadim; Vagaitseva, Kseniya; Kharkov, Vladimir

    2015-01-01

    X-chromosome markers are informative tool for studying a genetic diversity in human populations and have become a useful in DNA identification when certain complex kinship cases need to be unravelled. In this work we present population genetic data on X-chromosome-wide SNPs in North Eurasian populations and report XSNP multiplex system for forensic genetics. A total of 2,867 X-chromosomal SNPs were genotyped in 12 populations using Illumina microarray platform. Twelve populations under study (Komi, Mordva, Russians, Kirghiz, Kazakh, Uzbek, Buryat, Yakut, Evenk, Tuva, Khanty, Ket) represent various language families and geographic regions of North Eurasia (Eastern Europe, Central Asia, Siberia and North Asia). North Eurasian populations are highly genetically differentiated with respect to XSNPs allele frequencies. Average level of genetic differentiation (Gst) for 12 populations is 6.03% and ranged from 1.05% to 30.05% per individual SNP. Principal component analysis of allele frequencies demonstrated geographic pattern of population clustering, as well as longitudinal gradient in genetic diversity. The 66 XSNPs characterized by high expected heterozygosity and linkage equilibrium in populations under study were selected for constructing a panel for forensic genetic applications. Average heterozygosity of selected SNPs varied from 0.4925 to 0.4958. Overall values of power of discrimination for males and females (PDm and PDf) obtained with these XSNPs set are several magnitude higher than those for standard forensic STR panels. Protocol for multiplex amplification of 66 XSNPs in two separate multiplex PCR reactions and MALDI-TOF mass spectrometry genotyping was developed. North Eurasian populations demonstrate high level of genetic diversity and differentiation for X-chromosome-wide SNPs. Based on obtained population genetic data, highly informative multiplex XSNPs panel for forensic genetics was developed.

  6. Genome-Wide Association Studies Using Haplotypes and Individual SNPs in Simmental Cattle

    PubMed Central

    Wu, Yang; Fan, Huizhong; Wang, Yanhui; Zhang, Lupei; Gao, Xue; Chen, Yan; Li, Junya; Ren, HongYan; Gao, Huijiang

    2014-01-01

    Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components. PMID:25330174

  7. Endothelial nitric oxide synthase tagSNPs influence the effects of enalapril in essential hypertension.

    PubMed

    Oliveira-Paula, Gustavo H; Lacchini, Riccardo; Luizon, Marcelo R; Fontana, Vanessa; Silva, Pamela S; Biagi, Celso; Tanus-Santos, Jose E

    2016-05-01

    The antihypertensive effects of angiotensin-converting enzyme inhibitors (ACEi) are associated with up-regulation of endothelial nitric oxide synthase (NOS3) activity. This mechanism may explain how polymorphisms in NOS3 gene affect the antihypertensive responses to ACEi. While clinically relevant NOS3 polymorphisms were previously shown to affect the antihypertensive responses to enalapril, no study has tested the hypothesis that NOS3 tagSNPs influence the antihypertensive effects of this drug. We examined whether the NOS3 tagSNPs rs3918226, rs3918188, and rs743506, and their haplotypes, affect the antihypertensive responses to enalapril in 101 patients with essential hypertension. Subjects were prospectively treated only with enalapril for 8 weeks. Genotypes were determined by Taqman(®) allele discrimination assay and real-time polymerase chain reaction (PCR) and haplotype frequencies were estimated. We compared the effects of NOS3 tagSNPs on changes in blood pressure after enalapril treatment. To confirm our findings, multiple linear regression analysis was performed adjusting for age, gender, ethnicity, and alcohol consumption. We found that hypertensive patients carrying the AA genotype for the tagSNP rs3918188 showed lower decreases in blood pressure in response to enalapril. Moreover, the TCA haplotype was associated with improved decreases in blood pressure in response to enalapril compared with the CAG haplotype. Adjustment for covariates in multiple linear regression analysis did not change these effects. In addition, when patients were stratified according to the dose of enalapril used, we found that the carries of the T allele for the functional tagSNP rs3918226 showed more intense decreases in blood pressure in response to enalapril 20 mg/day. Our findings suggest that NOS3 tagSNPs influence the effects of enalapril in essential hypertension. PMID:27060232

  8. Endothelial nitric oxide synthase tagSNPs influence the effects of enalapril in essential hypertension.

    PubMed

    Oliveira-Paula, Gustavo H; Lacchini, Riccardo; Luizon, Marcelo R; Fontana, Vanessa; Silva, Pamela S; Biagi, Celso; Tanus-Santos, Jose E

    2016-05-01

    The antihypertensive effects of angiotensin-converting enzyme inhibitors (ACEi) are associated with up-regulation of endothelial nitric oxide synthase (NOS3) activity. This mechanism may explain how polymorphisms in NOS3 gene affect the antihypertensive responses to ACEi. While clinically relevant NOS3 polymorphisms were previously shown to affect the antihypertensive responses to enalapril, no study has tested the hypothesis that NOS3 tagSNPs influence the antihypertensive effects of this drug. We examined whether the NOS3 tagSNPs rs3918226, rs3918188, and rs743506, and their haplotypes, affect the antihypertensive responses to enalapril in 101 patients with essential hypertension. Subjects were prospectively treated only with enalapril for 8 weeks. Genotypes were determined by Taqman(®) allele discrimination assay and real-time polymerase chain reaction (PCR) and haplotype frequencies were estimated. We compared the effects of NOS3 tagSNPs on changes in blood pressure after enalapril treatment. To confirm our findings, multiple linear regression analysis was performed adjusting for age, gender, ethnicity, and alcohol consumption. We found that hypertensive patients carrying the AA genotype for the tagSNP rs3918188 showed lower decreases in blood pressure in response to enalapril. Moreover, the TCA haplotype was associated with improved decreases in blood pressure in response to enalapril compared with the CAG haplotype. Adjustment for covariates in multiple linear regression analysis did not change these effects. In addition, when patients were stratified according to the dose of enalapril used, we found that the carries of the T allele for the functional tagSNP rs3918226 showed more intense decreases in blood pressure in response to enalapril 20 mg/day. Our findings suggest that NOS3 tagSNPs influence the effects of enalapril in essential hypertension.

  9. Genome-wide association studies using haplotypes and individual SNPs in Simmental cattle.

    PubMed

    Wu, Yang; Fan, Huizhong; Wang, Yanhui; Zhang, Lupei; Gao, Xue; Chen, Yan; Li, Junya; Ren, HongYan; Gao, Huijiang

    2014-01-01

    Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components. PMID:25330174

  10. A systematic confirmation study of reported prostate cancer risk-associated SNPs in Chinese men

    PubMed Central

    Liu, Fang; Hsing, Ann W.; Wang, Xiang; Shao, Qiang; Qi, Jun; Ye, Yu; Wang, Zhong; Chen, Hongyan; Gao, Xin; Wang, Guozeng; Chu, Lisa W.; Ding, Qiang; OuYang, Jun; Gao, Xu; Huang, Yichen; Chen, Yanbo; Gao, Yu Tang; Zhang, Zuo-Feng; Rao, Jianyu; Shi, Rong; Wu, Qijun; Wang, Meilin; Zhang, Zhengdong; Zhang, Yuanyuan; Jiang, Haowen; Zheng, Jie; Hu, Yanlin; Guo, Ling; Lin, Xiaoling; Tao, Sha; Jin, Guangfu; Sun, Jielin; Lu, Daru; Zheng, S. Lilly; Sun, Yinghao; Mo, Zengnan; Xu, Jianfeng

    2013-01-01

    More than 30 prostate cancer (PCa) risk-associated loci have been identified in populations of European descent by genome-wide association studies (GWAS). We hypothesized that a subset of these loci may be associated with PCa risk in Chinese men. To test this hypothesis, 33 single nucleotide polymorphisms (SNPs), one each from the 33 independent PCa risk-associated loci reported in populations of European descent, were investigated for their associations with PCa risk in a case-control study of Chinese men (1,108 cases and 1,525 controls). We found that 11 of the 33 SNPs were significantly associated with PCa risk in Chinese men (P < 0.05). The reported risk alleles were associated with increased risk for PCa, with allelic odds ratios ranging from 1.12 to 1.44. The most significant locus was located on 8q24 Region 2 (rs16901979, P = 5.14×10−9) with a genome-wide significance (P < 10−8), and three loci reached the Bonferroni correction significance level (P < 1.52×10−3), including 8q24 Region 1 (rs1447295, P = 7.04×10−6), 8q24 Region 5 (rs10086908, P = 9.24×10−4), and 8p21 (rs1512268, P = 9.39×10−4). Our results suggest that a subset of the PCa risk-associated SNPs discovered by GWAS among men of European descent is also associated with PCa risk in Chinese men. This finding provides evidence of ethnic differences and similarity in genetic susceptibility to PCa. GWAS in Chinese men are needed to identify Chinese-specific PCa risk-associated SNPs. PMID:21756274

  11. Prediction of CYP3A4 enzyme activity using haplotype tag SNPs in African Americans.

    PubMed

    Perera, M A; Thirumaran, R K; Cox, N J; Hanauer, S; Das, S; Brimer-Cline, C; Lamba, V; Schuetz, E G; Ratain, M J; Di Rienzo, A

    2009-02-01

    The CYP3A locus encodes hepatic enzymes that metabolize many clinically used drugs. However, there is marked interindividual variability in enzyme expression and clearance of drugs metabolized by these enzymes. We utilized comparative genomics and computational prediction of transcriptional factor binding sites to evaluate regions within CYP3A that were most likely to contribute to this variation. We then used a haplotype tagging single-nucleotide polymorphisms (htSNPs) approach to evaluate the entire locus with the fewest number of maximally informative SNPs. We investigated the association between these htSNPs and in vivo CYP3A enzyme activity using a single-point IV midazolam clearance assay. We found associations between the midazolam phenotype and age, diagnosis of hypertension and one htSNP (141689) located upstream of CYP3A4. 141689 lies near the xenobiotic responsive enhancer module (XREM) regulatory region of CYP3A4. Cell-based studies show increased transcriptional activation with the minor allele at 141689, in agreement with the in vivo association study findings. This study marks the first systematic evaluation of coding and noncoding variation that may contribute to CYP3A phenotypic variability.

  12. The SNPs of Melanocortin 4 Receptor (MC4R) Associated with Body Weight in Beagle Dogs

    PubMed Central

    Zeng, Ruixia; Zhang, YiBo; Du, Peng

    2014-01-01

    Melanocortin 4 receptor (MC4R), which is associated with inherited human obesity, is involoved in food intake and body weight of mammals. To study the relationships between MC4R gene polymorphism and body weight in Beagle dogs, we detected and compared the nucleotide sequence of the whole coding region and 3′- and 5′- flanking regions of the dog MC4R gene (1214 bp). In 120 Beagle dogs, two SNPs (A420C, C895T) were identified and their relation with body weight was analyzed with RFLP-PCR method. The results showed that the SNP at A420C was significantly associated with canine body weight trait when it changed amino acid 101 of the MC4R protein from asparagine to threonine,while canine body weight variations were significant in female dogs when MC4R nonsense mutation at C895T. It suggested that the two SNPs might affect the MC4R gene’s function which was relative to body weight in Beagle dogs. Therefore, MC4R was a candidate gene for selecting different size dogs with the MC4R SNPs (A420C, C895T) being potentially valuable as a genetic marker. PMID:24521865

  13. Ewing Sarcoma: influence of TP53 Arg72Pro and MDM2 T309G SNPs.

    PubMed

    Thurow, Helena S; Hartwig, Fernando P; Alho, Clarice S; Silva, Deborah S B S; Roesler, Rafael; Abujamra, Ana Lucia; de Farias, Caroline Brunetto; Brunetto, Algemir Lunardi; Horta, Bernardo L; Dellagostin, Odir A; Collares, Tiago; Seixas, Fabiana K

    2013-08-01

    The Ewing Sarcoma is an important tumor of bone and soft tissue. The SNPs Arg72Pro of TP53 and T309G of MDM2 have been associated with many cancer types and have been differently distributed among populations worldwide. Based on a case-control design, this study aimed to assess the role of these SNPs in 24 Ewing Sarcoma patients, compared to 91 control individuals. DNA samples were extracted from blood and genotyped for both SNPs by PCR-RFLP and confirmed by DNA sequencing. The results showed an association between the G allele of the T309G and Ewing Sarcoma (P=0.02). Comparing to the TT carriers, the risk of G allele carriers was 3.35 (95% CI=1.22-9.21) with P=0.02. At the genotypic level, an association of the TT genotype with the control group (P=0.03) was found. Comparing to the TT genotype, the risk of TG and GG was 2.97 (95% CI=1.03-8.58) with P=0.04 and 5.00 (95% CI=1.23-20.34) with P=0.02, respectively. No associations regarding the Arg72Pro SNP were found. Considering that the T309G has been associated with several types of cancer, including sarcomas, our results indicate that this SNP may also be important to Ewing Sarcoma predisposition.

  14. Novel SNPs in the Ankyrin 1 gene and their association with beef quality traits.

    PubMed

    Horodyska, J; Sweeney, T; Ryan, M; Hamill, R M

    2015-10-01

    Single nucleotide polymorphisms (SNPs) in the promoter region of bovine Ankyrin 1 (ANK1) have been associated with tenderness and intramuscular fat level in beef. The objectives of this study were to characterise novel DNA variants in the coding region of bovine ANK1 and test for association with beef quality traits. A 3kb region of ANK1 cDNA was amplified and sequenced in 32 Charolais cattle using five sets of overlapping primers. Eighteen SNPs were identified and a predicted exon was confirmed. An in silico translation indicated that SNP4 and SNP16 were non-conservative. Three SNPs were genotyped in 158 crossbred cattle (n=158) with associated meat quality data. SNP6 was associated with texture scores while SNP17 was associated with juiciness. Haplotype (cHAP) 1 was associated with lightness, redness, ultimate pH, as well as sarcomere length. Alleles of the ANK1 gene could be potential targets for gene-assisted selection to improve a range of meat quality traits in beef. PMID:26051041

  15. Y-chromosomal SNPs in Finno-Ugric-speaking populations analyzed by minisequencing on microarrays.

    PubMed

    Raitio, M; Lindroos, K; Laukkanen, M; Pastinen, T; Sistonen, P; Sajantila, A; Syvänen, A C

    2001-03-01

    An increasing number of single nucleotide polymorphisms (SNPs) on the Y chromosome are being identified. To utilize the full potential of the SNP markers in population genetic studies, new genotyping methods with high throughput are required. We describe a microarray system based on the minisequencing single nucleotide primer extension principle for multiplex genotyping of Y-chromosomal SNP markers. The system was applied for screening a panel of 25 Y-chromosomal SNPs in a unique collection of samples representing five Finno--Ugric populations. The specific minisequencing reaction provides 5-fold to infinite discrimination between the Y-chromosomal genotypes, and the microarray format of the system allows parallel and simultaneous analysis of large numbers of SNPs and samples. In addition to the SNP markers, five Y-chromosomal microsatellite loci were typed. Altogether 10,000 genotypes were generated to assess the genetic diversity in these population samples. Six of the 25 SNP markers (M9, Tat, SRY10831, M17, M12, 92R7) were polymorphic in the analyzed populations, yielding six distinct SNP haplotypes. The microsatellite data were used to study the genetic structure of two major SNP haplotypes in the Finns and the Saami in more detail. We found that the most common haplotypes are shared between the Finns and the Saami, and that the SNP haplotypes show regional differences within the Finns and the Saami, which supports the hypothesis of two separate settlement waves to Finland.

  16. Enrichment of risk SNPs in regulatory regions implicate diverse tissues in Parkinson’s disease etiology

    PubMed Central

    Coetzee, Simon G.; Pierce, Steven; Brundin, Patrik; Brundin, Lena; Hazelett, Dennis J.; Coetzee, Gerhard A.

    2016-01-01

    Recent genome-wide association studies (GWAS) of Parkinson’s disease (PD) revealed at least 26 risk loci, with associated single nucleotide polymorphisms (SNPs) located in non-coding DNA having unknown functions in risk. In order to explore in which cell types these SNPs (and their correlated surrogates at r2 ≥ 0.8) could alter cellular function, we assessed their location overlap with histone modification regions that indicate transcription regulation in 77 diverse cell types. We found statistically significant enrichment of risk SNPs at 12 loci in active enhancers or promoters. We investigated 4 risk loci in depth that were most significantly enriched (−logeP > 14) and contained 8 putative enhancers in the different cell types. These enriched loci, along with eQTL associations, were unexpectedly present in non-neuronal cell types. These included lymphocytes, mesendoderm, liver- and fat-cells, indicating that cell types outside the brain are involved in the genetic predisposition to PD. Annotating regulatory risk regions within specific cell types may unravel new putative risk mechanisms and molecular pathways that contribute to PD development. PMID:27461410

  17. Enrichment of risk SNPs in regulatory regions implicate diverse tissues in Parkinson's disease etiology.

    PubMed

    Coetzee, Simon G; Pierce, Steven; Brundin, Patrik; Brundin, Lena; Hazelett, Dennis J; Coetzee, Gerhard A

    2016-01-01

    Recent genome-wide association studies (GWAS) of Parkinson's disease (PD) revealed at least 26 risk loci, with associated single nucleotide polymorphisms (SNPs) located in non-coding DNA having unknown functions in risk. In order to explore in which cell types these SNPs (and their correlated surrogates at r(2) ≥ 0.8) could alter cellular function, we assessed their location overlap with histone modification regions that indicate transcription regulation in 77 diverse cell types. We found statistically significant enrichment of risk SNPs at 12 loci in active enhancers or promoters. We investigated 4 risk loci in depth that were most significantly enriched (-logeP > 14) and contained 8 putative enhancers in the different cell types. These enriched loci, along with eQTL associations, were unexpectedly present in non-neuronal cell types. These included lymphocytes, mesendoderm, liver- and fat-cells, indicating that cell types outside the brain are involved in the genetic predisposition to PD. Annotating regulatory risk regions within specific cell types may unravel new putative risk mechanisms and molecular pathways that contribute to PD development. PMID:27461410

  18. A multiple imputation approach to the analysis of interval-censored failure time data with the additive hazards model

    PubMed Central

    Chen, Ling; Sun, Jianguo

    2013-01-01

    This paper discusses regression analysis of interval-censored failure time data, which occur in many fields including demographical, epidemiological, financial, medical, and sociological studies. For the problem, we focus on the situation where the survival time of interest can be described by the additive hazards model and a multiple imputation approach is presented for inference. A major advantage of the approach is its simplicity and it can be easily implemented by using the existing software packages for right-censored failure time data. Extensive simulation studies are conducted which indicate that the approach performs well for practical situations and is comparable to the existing methods. The methodology is applied to a set of interval-censored failure time data arising from an AIDS clinical trial. PMID:25419022

  19. An illustration of using multiple imputation versus listwise deletion analyses: the effect of Hanen's "More than words" on parenting stress.

    PubMed

    Lieberman-Betz, Rebecca G; Yoder, Paul; Stone, Wendy L; Nahmias, Allison S; Carter, Alice S; Celimli-Aksoy, Seniz; Messinger, Daniel S

    2014-09-01

    This investigation illustrates the effects of using different missing data analysis techniques to analyze effects of a parent-implemented treatment on stress in parents of toddlers with autism symptomatology. The analysis approaches yielded similar results when analyzing main effects of the intervention, but different findings for moderation effects. Using listwise deletion, the data supported an iatrogenic effect of Hanen's "More Than Words" on stress in parents with high levels of pretreatment depressive symptoms. Using multiple imputation, a significant moderated treatment effect with uninterpretable regions of significance did not support an iatrogenic effect of treatment on parenting stress. Results highlight the need for caution in interpreting analyses that do not involve validated methods of handling missing data.

  20. Regenerative particulate filter development

    NASA Technical Reports Server (NTRS)

    Descamp, V. A.; Boex, M. W.; Hussey, M. W.; Larson, T. P.

    1972-01-01

    Development, design, and fabrication of a prototype filter regeneration unit for regenerating clean fluid particle filter elements by using a backflush/jet impingement technique are reported. Development tests were also conducted on a vortex particle separator designed for use in zero gravity environment. A maintainable filter was designed, fabricated and tested that allows filter element replacement without any leakage or spillage of system fluid. Also described are spacecraft fluid system design and filter maintenance techniques with respect to inflight maintenance for the space shuttle and space station.

  1. Outcome-adaptive randomization for a delayed outcome with a short-term predictor: imputation-based designs.

    PubMed

    Kim, Mi-Ok; Liu, Chunyan; Hu, Feifang; Lee, J Jack

    2014-10-15

    Delay in the outcome variable is challenging for outcome-adaptive randomization, as it creates a lag between the number of subjects accrued and the information known at the time of the analysis. Motivated by a real-life pediatric ulcerative colitis trial, we consider a case where a short-term predictor is available for the delayed outcome. When a short-term predictor is not considered, studies have shown that the asymptotic properties of many outcome-adaptive randomization designs are little affected unless the lag is unreasonably large relative to the accrual process. These theoretical results assumed independent identical delays, however, whereas delays in the presence of a short-term predictor may only be conditionally homogeneous. We consider delayed outcomes as missing and propose mitigating the delay effect by imputing them. We apply this approach to the doubly adaptive biased coin design (DBCD) for motivating pediatric ulcerative colitis trial. We provide theoretical results that if the delays, although non-homogeneous, are reasonably short relative to the accrual process similarly as in the iid delay case, the lag is also asymptotically ignorable in the sense that a standard DBCD that utilizes only observed outcomes attains target allocation ratios in the limit. Empirical studies, however, indicate that imputation-based DBCDs performed more reliably in finite samples with smaller root mean square errors. The empirical studies assumed a common clinical setting where a delayed outcome is positively correlated with a short-term predictor similarly between treatment arm groups. We varied the strength of the correlation and considered fast and slow accrual settings. PMID:24889540

  2. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer.

    PubMed

    Al-Tassan, Nada A; Whiffin, Nicola; Hosking, Fay J; Palles, Claire; Farrington, Susan M; Dobbins, Sara E; Harris, Rebecca; Gorman, Maggie; Tenesa, Albert; Meyer, Brian F; Wakil, Salma M; Kinnersley, Ben; Campbell, Harry; Martin, Lynn; Smith, Christopher G; Idziaszczyk, Shelley; Barclay, Ella; Maughan, Timothy S; Kaplan, Richard; Kerr, Rachel; Kerr, David; Buchanan, Daniel D; Buchannan, Daniel D; Win, Aung Ko; Hopper, John; Jenkins, Mark; Lindor, Noralane M; Newcomb, Polly A; Gallinger, Steve; Conti, David; Schumacher, Fred; Casey, Graham; Dunlop, Malcolm G; Tomlinson, Ian P; Cheadle, Jeremy P; Houlston, Richard S

    2015-01-01

    Genome-wide association studies (GWAS) of colorectal cancer (CRC) have identified 23 susceptibility loci thus far. Analyses of previously conducted GWAS indicate additional risk loci are yet to be discovered. To identify novel CRC susceptibility loci, we conducted a new GWAS and performed a meta-analysis with five published GWAS (totalling 7,577 cases and 9,979 controls of European ancestry), imputing genotypes utilising the 1000 Genomes Project. The combined analysis identified new, significant associations with CRC at 1p36.2 marked by rs72647484 (minor allele frequency [MAF] = 0.09) near CDC42 and WNT4 (P = 1.21 × 10(-8), odds ratio [OR] = 1.21 ) and at 16q24.1 marked by rs16941835 (MAF = 0.21, P = 5.06 × 10(-8); OR = 1.15) within the long non-coding RNA (lncRNA) RP11-58A18.1 and ~500 kb from the nearest coding gene FOXL1. Additionally we identified a promising association at 10p13 with rs10904849 intronic to CUBN (MAF = 0.32, P = 7.01 × 10(-8); OR = 1.14). These findings provide further insights into the genetic and biological basis of inherited genetic susceptibility to CRC. Additionally, our analysis further demonstrates that imputation can be used to exploit GWAS data to identify novel disease-causing variants. PMID:25990418

  3. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    PubMed

    Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  4. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

    PubMed Central

    van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

    2015-01-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  5. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer

    PubMed Central

    Al-Tassan, Nada A.; Whiffin, Nicola; Hosking, Fay J.; Palles, Claire; Farrington, Susan M.; Dobbins, Sara E.; Harris, Rebecca; Gorman, Maggie; Tenesa, Albert; Meyer, Brian F.; Wakil, Salma M.; Kinnersley, Ben; Campbell, Harry; Martin, Lynn; Smith, Christopher G.; Idziaszczyk, Shelley; Barclay, Ella; Maughan, Timothy S.; Kaplan, Richard; Kerr, Rachel; Kerr, David; Buchannan, Daniel D.; Ko Win, Aung; Hopper, John; Jenkins, Mark; Lindor, Noralane M.; Newcomb, Polly A.; Gallinger, Steve; Conti, David; Schumacher, Fred; Casey, Graham; Dunlop, Malcolm G.; Tomlinson, Ian P.; Cheadle, Jeremy P.; Houlston, Richard S.

    2015-01-01

    Genome-wide association studies (GWAS) of colorectal cancer (CRC) have identified 23 susceptibility loci thus far. Analyses of previously conducted GWAS indicate additional risk loci are yet to be discovered. To identify novel CRC susceptibility loci, we conducted a new GWAS and performed a meta-analysis with five published GWAS (totalling 7,577 cases and 9,979 controls of European ancestry), imputing genotypes utilising the 1000 Genomes Project. The combined analysis identified new, significant associations with CRC at 1p36.2 marked by rs72647484 (minor allele frequency [MAF] = 0.09) near CDC42 and WNT4 (P = 1.21 × 10−8, odds ratio [OR] = 1.21 ) and at 16q24.1 marked by rs16941835 (MAF = 0.21, P = 5.06 × 10−8; OR = 1.15) within the long non-coding RNA (lncRNA) RP11-58A18.1 and ~500 kb from the nearest coding gene FOXL1. Additionally we identified a promising association at 10p13 with rs10904849 intronic to CUBN (MAF = 0.32, P = 7.01 × 10-8; OR = 1.14). These findings provide further insights into the genetic and biological basis of inherited genetic susceptibility to CRC. Additionally, our analysis further demonstrates that imputation can be used to exploit GWAS data to identify novel disease-causing variants. PMID:25990418

  6. A systematic search for SNPs/haplotypes associated with disease phenotypes using a haplotype-based stepwise procedure

    PubMed Central

    Yang, Yin; Li, Shuying Sue; Chien, Jason W; Andriesen, Jessica; Zhao, Lue Ping

    2008-01-01

    Background Genotyping technologies enable us to genotype multiple Single Nucleotide Polymorphisms (SNPs) within selected genes/regions, providing data for haplotype association analysis. While haplotype-based association analysis is powerful for detecting untyped causal alleles in linkage-disequilibrium (LD) with neighboring SNPs/haplotypes, the inclusion of extraneous SNPs could reduce its power by increasing the number of haplotypes with each additional SNP. Methods Here, we propose a haplotype-based stepwise procedure (HBSP) to eliminate extraneous SNPs. To evaluate its properties, we applied HBSP to both simulated and real data, generated from a study of genetic associations of the bactericidal/permeability-increasing (BPI) gene with pulmonary function in a cohort of patients following bone marrow transplantation. Results Under the null hypothesis, use of the HBSP gave results that retained the desired false positive error rates when multiple comparisons were considered. Under various alternative hypotheses, HBSP had adequate power to detect modest genetic associations in case-control studies with 500, 1,000 or 2,000 subjects. In the current application, HBSP led to the identification of two specific SNPs with a positive validation. Conclusion These results demonstrate that HBSP retains the essence of haplotype-based association analysis while improving analytic power by excluding extraneous SNPs. Minimizing the number of SNPs also enables simpler interpretation and more cost-effective applications. PMID:19102730

  7. Evaluating the transferability of 15 European-derived fasting plasma glucose SNPs in Mexican children and adolescents

    PubMed Central

    Langlois, Christine; Abadi, Arkan; Peralta-Romero, Jesus; Alyass, Akram; Suarez, Fernando; Gomez-Zamudio, Jaime; Burguete-Garcia, Ana I.; Yazdi, Fereshteh T.; Cruz, Miguel; Meyre, David

    2016-01-01

    Genome wide association studies (GWAS) have identified single-nucleotide polymorphisms (SNPs) that are associated with fasting plasma glucose (FPG) in adult European populations. The contribution of these SNPs to FPG in non-Europeans and children is unclear. We studied the association of 15 GWAS SNPs and a genotype score (GS) with FPG and 7 metabolic traits in 1,421 Mexican children and adolescents from Mexico City. Genotyping of the 15 SNPs was performed using TaqMan Open Array. We used multivariate linear regression models adjusted for age, sex, body mass index standard deviation score, and recruitment center. We identified significant associations between 3 SNPs (G6PC2 (rs560887), GCKR (rs1260326), MTNR1B (rs10830963)), the GS and FPG level. The FPG risk alleles of 11 out of the 15 SNPs (73.3%) displayed significant or non-significant beta values for FPG directionally consistent with those reported in adult European GWAS. The risk allele frequencies for 11 of 15 (73.3%) SNPs differed significantly in Mexican children and adolescents compared to European adults from the 1000G Project, but no significant enrichment in FPG risk alleles was observed in the Mexican population. Our data support a partial transferability of European GWAS FPG association signals in children and adolescents from the admixed Mexican population. PMID:27782183

  8. Rank and order: evaluating the performance of SNPs for individual assignment in a non-model organism.

    PubMed

    Storer, Caroline G; Pascal, Carita E; Roberts, Steven B; Templin, William D; Seeb, Lisa W; Seeb, James E

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: F(ST), informativeness (I(n)), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from F(ST), I(n), and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  9. Rank and Order: Evaluating the Performance of SNPs for Individual Assignment in a Non-Model Organism

    PubMed Central

    Storer, Caroline G.; Pascal, Carita E.; Roberts, Steven B.; Templin, William D.; Seeb, Lisa W.; Seeb, James E.

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: FST, informativeness (In), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from FST, In, and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  10. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

    PubMed Central

    2013-01-01

    Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482

  11. Genetic Basis of Common Human Disease: Insight into the Role of Missense SNPs from Genome-Wide Association Studies.

    PubMed

    Pal, Lipika R; Moult, John

    2015-07-01

    Recent genome-wide association studies (GWAS) have led to the reliable identification of single nucleotide polymorphisms (SNPs) at a number of loci associated with increased risk of specific common human diseases. Each such locus implicates multiple possible candidate SNPs for involvement in disease mechanism. A variety of mechanisms may link the presence of an SNP to altered in vivo gene product function and hence contribute to disease risk. Here, we report an analysis of the role of one of these mechanisms, missense SNPs (msSNPs) in proteins in seven complex trait diseases. Linkage disequilibrium information was used to identify possible candidate msSNPs associated with increased disease risk at each of 356 loci for the seven diseases. Two computational methods were used to estimate which of these SNPs has a significant impact on in vivo protein function. 69% of the loci have at least one candidate msSNP and 33% have at least one predicted high-impact msSNP. In some cases, these SNPs are in well-established disease-related proteins, such as MST1 (macrophage stimulating 1) for Crohn's disease. In others, they are in proteins identified by GWAS as likely candidates for disease relevance, but previously without known mechanism, such as ADAMTS13 (ADAM metallopeptidase with thrombospondin type 1 motif, 13) for coronary artery disease. In still other cases, the missense SNPs are in proteins not previously suggested as disease candidates, such as TUBB1 (tubulin, beta 1, class VI) for hypertension. Together, these data support a substantial role for this class of SNPs in susceptibility to common human disease.

  12. Ceramic fiber filter technology

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1996-06-01

    Fibrous filters have been used for centuries to protect individuals from dust, disease, smoke, and other gases or particulates. In the 1970s and 1980s ceramic filters were developed for filtration of hot exhaust gases from diesel engines. Tubular, or candle, filters have been made to remove particles from gases in pressurized fluidized-bed combustion and gasification-combined-cycle power plants. Very efficient filtration is necessary in power plants to protect the turbine blades. The limited lifespan of ceramic candle filters has been a major obstacle in their development. The present work is focused on forming fibrous ceramic filters using a papermaking technique. These filters are highly porous and therefore very lightweight. The papermaking process consists of filtering a slurry of ceramic fibers through a steel screen to form paper. Papermaking and the selection of materials will be discussed, as well as preliminary results describing the geometry of papers and relative strengths.

  13. A set of EST-SNPs for map saturation and cultivar identification in melon

    PubMed Central

    Deleu, Wim; Esteras, Cristina; Roig, Cristina; González-To, Mireia; Fernández-Silva, Iria; Gonzalez-Ibeas, Daniel; Blanca, José; Aranda, Miguel A; Arús, Pere; Nuez, Fernando; Monforte, Antonio J; Picó, Maria Belén; Garcia-Mas, Jordi

    2009-01-01

    Background There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs. Results EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) × 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars. Conclusion This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 × 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon

  14. A comprehensive in silico analysis of non-synonymous and regulatory SNPs of human MBL2 gene.

    PubMed

    Kalia, Namarta; Sharma, Aarti; Kaur, Manpreet; Kamboj, Sukhdev Singh; Singh, Jatinder

    2016-01-01

    Mannose binding lectin (MBL) is a liver derived protein which plays an important role in innate immunity. Mannose binding lectin gene 2 (MBL2) polymorphisms are reported to be associated with various diseases. In spite of being exhaustively studied molecule, no attempt has been made till date to comprehensively and systematically analyze the SNPs of MBL2 gene. The present study was carried out to identify and prioritize the SNPs of MBL2 gene for further genotyping and functional studies. To predict the possible impact of SNPs on MBL structure and function SNP data obtained from dbSNP database were analyzed using various bioinformatics tools. Out of total 661 SNPs, only 37 validated SNPs having minor allele frequency ≥0.10 were considered for the present study. These 37 SNPs includes one in 3' near gene, nine in 3' UTR, one non-synonymous SNP (nsSNP), thirteen intronic SNPs and thirteen in 5' near gene. From these 37 SNPs, 11 non-coding SNPs were identified to be of functional significance and evolutionary conserved. Out of these, 4 SNPs from 3' UTR were found to play role in miRNA binding, 7 SNPs from 5' near and intronic region were predicted to involve in transcription factor binding and expression of MBL2 gene. One nsSNP Gly54Asp (rs1800450) was found to be deleterious and damaging by both SIFT and Polyphen-2 servers and thus affecting MBL2 protein stability and expression. Protein structural analysis with this amino acid variant was performed by using I-TASSER, RAMPAGE, Swiss-PdbViewer, Chimera and I-mutant. Information regarding solvent accessibility, molecular dynamics and energy minimization calculations showed that this variant causes clashes with neighboring amino acids residues that must interfere in the normal triple helix formation of trimeric subunit and further with the normal assembly of MBL oligomeric form, hence decrease in stability. Thus, findings of the present study indicated 12 SNPs of MBL2 gene to be functionally important. Exploration of

  15. Compact planar microwave blocking filters

    NASA Technical Reports Server (NTRS)

    U-Yen, Kongpop (Inventor); Wollack, Edward J. (Inventor)

    2012-01-01

    A compact planar microwave blocking filter includes a dielectric substrate and a plurality of filter unit elements disposed on the substrate. The filter unit elements are interconnected in a symmetrical series cascade with filter unit elements being organized in the series based on physical size. In the filter, a first filter unit element of the plurality of filter unit elements includes a low impedance open-ended line configured to reduce the shunt capacitance of the filter.

  16. Rethinking Stability of Silver Sulfide Nanoparticles (Ag2S-NPs) in the Aquatic Environment: Photoinduced Transformation of Ag2S-NPs in the Presence of Fe(III).

    PubMed

    Li, Lingxiangyu; Wang, Yawei; Liu, Qian; Jiang, Guibin

    2016-01-01

    The stability of engineered nanomaterials in a natural aquatic environment has drawn much attention over the past few years. Silver sulfide nanoparticles (Ag2S-NPs) are generally assumed to be stable in a natural environment as a result of their physicochemical property; however, it may vary depending upon environmental conditions. Here, we investigated whether and how the environmentally relevant factors including light irradiation, solution pH, inorganic salts, dissolved organic matter (DOM), and dissolved oxygen (DO) individually and in combination influenced the stability of Ag2S-NPs in an aquatic environment. We presented for the first time that transformation of Ag2S-NPs can indeed occur in the aqueous system with an environmentally relevant concentration of Fe(3+) under simulated solar irradiation and natural sunlight within a short time (96 h), along with significant changes in morphology and dissolution. The photoinduced transformation of Ag2S-NPs in the presence of Fe(3+) can be dramatically influenced by solution pH, Ca(2+)/Na(+), Cl(-)/SO4(2-), DOM, and DO. Moreover, Ag2S-NP dissolution increased within 28 h, followed rapid decline in the next 68 h, which may be a result of the reconstitution of small Ag2S-NPs. Taken together, this work is of importance to comprehensively evaluate the stability of Ag2S-NPs in an aquatic environment, improving our understanding of their potential risks to human and environmental health.

  17. Generic Kalman Filter Software

    NASA Technical Reports Server (NTRS)

    Lisano, Michael E., II; Crues, Edwin Z.

    2005-01-01

    The Generic Kalman Filter (GKF) software provides a standard basis for the development of application-specific Kalman-filter programs. Historically, Kalman filters have been implemented by customized programs that must be written, coded, and debugged anew for each unique application, then tested and tuned with simulated or actual measurement data. Total development times for typical Kalman-filter application programs have ranged from months to weeks. The GKF software can simplify the development process and reduce the development time by eliminating the need to re-create the fundamental implementation of the Kalman filter for each new application. The GKF software is written in the ANSI C programming language. It contains a generic Kalman-filter-development directory that, in turn, contains a code for a generic Kalman filter function; more specifically, it contains a generically designed and generically coded implementation of linear, linearized, and extended Kalman filtering algorithms, including algorithms for state- and covariance-update and -propagation functions. The mathematical theory that underlies the algorithms is well known and has been reported extensively in the open technical literature. Also contained in the directory are a header file that defines generic Kalman-filter data structures and prototype functions and template versions of application-specific subfunction and calling navigation/estimation routine code and headers. Once the user has provided a calling routine and the required application-specific subfunctions, the application-specific Kalman-filter software can be compiled and executed immediately. During execution, the generic Kalman-filter function is called from a higher-level navigation or estimation routine that preprocesses measurement data and post-processes output data. The generic Kalman-filter function uses the aforementioned data structures and five implementation- specific subfunctions, which have been developed by the user on

  18. In Silico Model-Driven Assessment of the Effects of Single Nucleotide Polymorphisms (SNPs) on Human Red Blood Cell Metabolism

    PubMed Central

    Jamshidi, Neema; Wiback, Sharon J.; Palsson, Bernhard Ø.

    2002-01-01

    The completion of the human genome project and the construction of single nucleotide polymorphism (SNP) maps have lead to significant efforts to find SNPs that can be linked to pathophysiology. In silico models of complete biochemical reaction networks relate a cell's individual reactions to the function of the entire network. Sequence variations can in turn be related to kinetic properties of individual enzymes, thus allowing an in silico model-driven assessment of the effects of defined SNPs on overall cellular functions. This process is applied to defined SNPs in two key enzymes of human red blood cell metabolism: glucose-6-phosphate dehydrogenase and pyruvate kinase. The results demonstrate the utility of in silico models in providing insight into differences between red cell function in patients with chronic and nonchronic anemia. In silico models of complex cellular processes are thus likely to aid in defining and understanding key SNPs in human pathophysiology. PMID:12421755

  19. Contactor/filter improvements

    DOEpatents

    Stelman, D.

    1988-06-30

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream is described. The filter includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. A cover screen isolates the filter element from contact with the moving granular bed. In one embodiment, the granular material is comprised of porous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses. 6 figs.

  20. Concentric Split Flow Filter

    NASA Technical Reports Server (NTRS)

    Stapleton, Thomas J. (Inventor)

    2015-01-01

    A concentric split flow filter may be configured to remove odor and/or bacteria from pumped air used to collect urine and fecal waste products. For instance, filter may be designed to effectively fill the volume that was previously considered wasted surrounding the transport tube of a waste management system. The concentric split flow filter may be configured to split the air flow, with substantially half of the air flow to be treated traveling through a first bed of filter media and substantially the other half of the air flow to be treated traveling through the second bed of filter media. This split flow design reduces the air velocity by 50%. In this way, the pressure drop of filter may be reduced by as much as a factor of 4 as compare to the conventional design.

  1. Lack of association between two key SNPs on chromosome 12p13 and ischemic stroke in Chinese Uyghur population.

    PubMed

    Tong, Yeqing; Zhan, Faxian; Han, Jinjun; Zhang, Yanwei; Yin, Xiaoxu; Geng, Yijie; Hou, Shuangyi; Ye, Jianjun; Guan, Xuhua; Han, Shenhong; Wang, Yunxia; Mason, Katherine A; Lu, Zuxun; Liu, Jiafa; Cheng, Jinquan

    2012-12-15

    Recent genome-wide association studies (GWAS) have identified two key SNPs (rs11833579 and rs12425791) on chromosome 12p13 that were significantly associated with stroke in Caucasians. However, the validity of the association has remained controversial. We performed genetic association analyses in a very unique population which has 60% European ancestry and 40% East Asian ancestry. No significant association between these two SNPs and ischemic stroke was detected in this Chinese Uyghur population.

  2. Powerful Identification of Cis-regulatory SNPs in Human Primary Monocytes Using Allele-Specific Gene Expression

    PubMed Central

    Almlöf, Jonas Carlsson; Lundmark, Per; Lundmark, Anders; Ge, Bing; Maouche, Seraya; Göring, Harald H. H.; Liljedahl, Ulrika; Enström, Camilla; Brocheton, Jessy; Proust, Carole; Godefroy, Tiphaine; Sambrook, Jennifer G.; Jolley, Jennifer; Crisp-Hihn, Abigail; Foad, Nicola; Lloyd-Jones, Heather; Stephens, Jonathan; Gwilliam, Rhian; Rice, Catherine M.; Hengstenberg, Christian; Samani, Nilesh J.; Erdmann, Jeanette; Schunkert, Heribert; Pastinen, Tomi; Deloukas, Panos; Goodall, Alison H.; Ouwehand, Willem H.; Cambien, François; Syvänen, Ann-Christine

    2012-01-01

    A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers. PMID:23300628

  3. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs

    PubMed Central

    Wu, XianMing; Hurst, Laurence D.

    2016-01-01

    Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3–69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20–45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3′ non-“AGgt” splice site. We suggest the concept of the “fragile” exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density. PMID:26545919

  4. Filter vapor trap

    DOEpatents

    Guon, Jerold

    1976-04-13

    A sintered filter trap is adapted for insertion in a gas stream of sodium vapor to condense and deposit sodium thereon. The filter is heated and operated above the melting temperature of sodium, resulting in a more efficient means to remove sodium particulates from the effluent inert gas emanating from the surface of a liquid sodium pool. Preferably the filter leaves are precoated with a natrophobic coating such as tetracosane.

  5. Hybrid Filter Membrane

    NASA Technical Reports Server (NTRS)

    Laicer, Castro; Rasimick, Brian; Green, Zachary

    2012-01-01

    Cabin environmental control is an important issue for a successful Moon mission. Due to the unique environment of the Moon, lunar dust control is one of the main problems that significantly diminishes the air quality inside spacecraft cabins. Therefore, this innovation was motivated by NASA s need to minimize the negative health impact that air-suspended lunar dust particles have on astronauts in spacecraft cabins. It is based on fabrication of a hybrid filter comprising nanofiber nonwoven layers coated on porous polymer membranes with uniform cylindrical pores. This design results in a high-efficiency gas particulate filter with low pressure drop and the ability to be easily regenerated to restore filtration performance. A hybrid filter was developed consisting of a porous membrane with uniform, micron-sized, cylindrical pore channels coated with a thin nanofiber layer. Compared to conventional filter media such as a high-efficiency particulate air (HEPA) filter, this filter is designed to provide high particle efficiency, low pressure drop, and the ability to be regenerated. These membranes have well-defined micron-sized pores and can be used independently as air filters with discreet particle size cut-off, or coated with nanofiber layers for filtration of ultrafine nanoscale particles. The filter consists of a thin design intended to facilitate filter regeneration by localized air pulsing. The two main features of this invention are the concept of combining a micro-engineered straight-pore membrane with nanofibers. The micro-engineered straight pore membrane can be prepared with extremely high precision. Because the resulting membrane pores are straight and not tortuous like those found in conventional filters, the pressure drop across the filter is significantly reduced. The nanofiber layer is applied as a very thin coating to enhance filtration efficiency for fine nanoscale particles. Additionally, the thin nanofiber coating is designed to promote capture of

  6. Practical alarm filtering

    SciTech Connect

    Bray, M.; Corsberg, D. )

    1994-02-01

    An expert system-based alarm filtering method is described which prioritizes and reduces the number of alarms facing an operator. This patented alarm filtering methodology was originally developed and implemented in a pressurized water reactor, and subsequently in a chemical processing facility. Both applications were in LISP and both were successful. In the chemical processing facility, for instance, alarm filtering reduced the quantity of alarm messages by 90%. 6 figs.

  7. Differences in allele frequencies of autosomal dominant hypercholesterolemia SNPs in the Malaysian population.

    PubMed

    Alex, Livy; Chahil, Jagdish Kaur; Lye, Say Hean; Bagali, Pramod; Ler, Lian Wee

    2012-06-01

    Hypercholesterolemia is caused by different interactions of lifestyle and genetic determinants. At the genetic level, it can be attributed to the interactions of multiple polymorphisms, or as in the example of familial hypercholesterolemia (FH), it can be the result of a single mutation. A large number of genetic markers, mostly single nucleotide polymorphisms (SNP) or mutations in three genes, implicated in autosomal dominant hypercholesterolemia (ADH), viz APOB (apolipoprotein B), LDLR (low density lipoprotein receptor) and PCSK9 (proprotein convertase subtilisin/kexin type-9), have been identified and characterized. However, such studies have been insufficiently undertaken specifically in Malaysia and Southeast Asia in general. The main objective of this study was to identify ADH variants, specifically ADH-causing mutations and hypercholesterolemia-associated polymorphisms in multiethnic Malaysian population. We aimed to evaluate published SNPs in ADH causing genes, in this population and to report any unusual trends. We examined a large number of selected SNPs from previous studies of APOB, LDLR, PCSK9 and other genes, in clinically diagnosed ADH patients (n=141) and healthy control subjects (n=111). Selection of SNPs was initiated by searching within genes reported to be associated with ADH from known databases. The important finding was 137 mono-allelic markers (44.1%) and 173 polymorphic markers (55.8%) in both subject groups. By comparing to publicly available data, out of the 137 mono-allelic markers, 23 markers showed significant differences in allele frequency among Malaysians, European Whites, Han Chinese, Yoruba and Gujarati Indians. Our data can serve as reference for others in related fields of study during the planning of their experiments.

  8. VnD: a structure-centric database of disease-related SNPs and drugs.

    PubMed

    Yang, Jin Ok; Oh, Sangho; Ko, Gunhwan; Park, Seong-Jin; Kim, Woo-Yeon; Lee, Byungwook; Lee, Sanghyuk

    2011-01-01

    Numerous genetic variations have been found to be related to human diseases. Significant portion of those affect the drug response as well by changing the protein structure and function. Therefore, it is crucial to understand the trilateral relationship among genomic variations, diseases and drugs. We present the variations and drugs (VnD), a consolidated database containing information on diseases, related genes and genetic variations, protein structures and drug information. VnD was built in three steps. First, we integrated various resources systematically to deduce catalogs of disease-related genes, single nucleotide polymorphisms (SNPs), protein mutations and relevant drugs. VnD contains 137,195 disease-related gene records (13,940 distinct genes) and 16,586 genetic variation records (1790 distinct variations). Next, we carried out structure modeling and docking simulation for wild-type and mutant proteins to examine the structural and functional consequences of non-synonymous SNPs in the drug-related genes. Conformational changes in 590 wild-type and 4437 mutant proteins from drug-related genes were included in our database. Finally, we investigated the structural and biochemical properties relevant to drug binding such as the distribution of SNPs in proximal protein pockets, thermo-chemical stability, interactions with drugs and physico-chemical properties. The VnD database, available at http://vnd.kobic.re.kr:8080/VnD/ or vandd.org, would be a useful platform for researchers studying the underlying mechanism for association among genetic variations, diseases and drugs.

  9. Regulatory SNPs Alter the Gene Expression of Diabetic Retinopathy Associated Secretary Factors

    PubMed Central

    Chen, Chian-Feng; Liou, Shiow-Wen; Wu, Hsin-Han; Lin, Chin-Hui; Huang, Li-Shan; Woung, Lin-Chung; Tsai, Ching-Yao

    2016-01-01

    Objectives: Diabetic retinopathy (DR) is a common microvascular complication in both type I and type II diabetes. Several previous reports indicated the serum centration of some secretary factors were highly associated with DR. Therefore, we hypothesis regulatory SNPs (rSNPs) genotype in secretary factors may alter these gene expression and lead to DR. Methods: At first, pyrosequencing were applying to screen the SNPs which present allele frequency different in DR and DNR. Then individual genotyping was processed by Taqman assays in Taiwanese DR and DNR patients. To evaluate the effect of SNP allele on transcriptional activity, we measured promoter activity using luciferase reporter constructs. Results: We found the frequencies of the CC, CG, and GG genotype of the rs2010963 polymorphism were 15.09%, 47.14%, and 37.74% in DR and 12.90%, 19.35%, and 67.74% in DNR, respectively (p = 0.0205). The prevalence of DR was higher (p = 0.00793) in patients with the CC or CG genotype (62.26% and 32.26% for DR and DNR, respectively) compared with the patients with the GG genotype. To evaluate the effect of rs2010963-C allele on transcriptional activity, we measured promoter activity using luciferase reporter constructs. The rs2010963-C reporter showed 1.6 to 2-fold higher luciferase activity than rs2010963-G in 3 cell lines. Conclusion: Our data proposed rs2010963-C altered the expression level of VEGFA in different tissues. We suggested small increase but long term exposure to VEGFA may lead to DR finally.

  10. Genetic analysis of candidate SNPs for metabolic syndrome in obstructive sleep apnea (OSA).

    PubMed

    Grilo, Antonio; Ruiz-Granados, Elena S; Moreno-Rey, Concha; Rivera, Jose M; Ruiz, Agustin; Real, Luis M; Sáez, Maria E

    2013-05-25

    Obstructive sleep apnea (OSA) is a common disorder characterized by the reduction or complete cessation in airflow resulting from an obstruction of the upper airway. Several studies have observed an increased risk for cardiovascular morbidity and mortality among OSA patients. Metabolic syndrome (MetS), a cluster of cardiovascular risk factors characterized by the presence of insulin resistance, is often found in patients with OSA, but the complex interplay between these two syndromes is not well understood. In this study, we present the results of a genetic association analysis of 373 candidate SNPs for MetS selected in a previous genome wide association analysis (GWAS). The 384 selected SNPs were genotyped using the Illumina VeraCode Technology in 387 subjects retrospectively assessed at the Internal Medicine Unit of the "Virgen de Valme" University Hospital (Seville, Spain). In order to increase the power of this study and to validate our findings in an independent population, we used data from the Framingham Sleep Study which comprises 368 individuals. Only the rs11211631 polymorphism was associated with OSA in both populations, with an estimated OR=0.57 (0.42-0.79) in the joint analysis (p=7.21×10(-4)). This SNP was selected in the previous GWAS for MetS components using a digenic approach, but was not significant in the monogenic study. We have also identified two SNPs (rs2687855 and rs4299396) with a protective effect from OSA only in the subpopulation with abdominal obesity. As a whole, our study does not support the idea that OSA and MetS share major genetic determinants, although both syndromes share common epidemiological and clinical features. PMID:23524009

  11. Genetic analysis of candidate SNPs for metabolic syndrome in obstructive sleep apnea (OSA)

    PubMed Central

    Grilo, Antonio; Ruiz-Granados, Elena S.; Moreno-Rey, Concha; Rivera, Jose M.; Ruiz, Agustin; Real, Luis M.; Sáez, Maria E.

    2014-01-01

    Obstructive sleep apnea (OSA) is a common disorder characterized by the reduction or complete cessation in airflow resulting from an obstruction of the upper airway. Several studies have observed an increased risk for cardiovascular morbidity and mortality among OSA patients. Metabolic syndrome (MetS), a cluster of cardiovascular risk factors characterized by the presence of insulin resistance, is often found in patients with OSA, but the complex interplay between these two syndromes is not well understood. In this study, we present the results of a genetic association analysis of 373 candidate SNPs for MetS selected in a previous genome wide association analysis (GWAS). The 384 selected SNPs were genotyped using the Illumina VeraCode Technology in 387 subjects retrospectively assessed at the Internal Medicine Unit of the “Virgen de Valme” University Hospital (Seville, Spain). In order to increase the power of this study and to validate our findings in an independent population, we used data from the Framingham Sleep study which comprises 368 individuals. Only the rs11211631 polymorphism was associated with OSA in both populations, with an estimated OR=0.57 (0.42-0.79) in the joint analysis (p=7.21 × 10-4). This SNP was selected in the previous GWAS for MetS components using a digenic approach, but was not significant in the monogenic study. We have also identified two SNPs (rs2687855 and rs4299396) with a protective effect from OSA only in the abdominal obese subpopulation. As a whole, our study does not support that OSA and MetS share major genetic determinants, although both syndromes share common epidemiological and clinical features. PMID:23524009

  12. Regulatory SNPs Alter the Gene Expression of Diabetic Retinopathy Associated Secretary Factors

    PubMed Central

    Chen, Chian-Feng; Liou, Shiow-Wen; Wu, Hsin-Han; Lin, Chin-Hui; Huang, Li-Shan; Woung, Lin-Chung; Tsai, Ching-Yao

    2016-01-01

    Objectives: Diabetic retinopathy (DR) is a common microvascular complication in both type I and type II diabetes. Several previous reports indicated the serum centration of some secretary factors were highly associated with DR. Therefore, we hypothesis regulatory SNPs (rSNPs) genotype in secretary factors may alter these gene expression and lead to DR. Methods: At first, pyrosequencing were applying to screen the SNPs which present allele frequency different in DR and DNR. Then individual genotyping was processed by Taqman assays in Taiwanese DR and DNR patients. To evaluate the effect of SNP allele on transcriptional activity, we measured promoter activity using luciferase reporter constructs. Results: We found the frequencies of the CC, CG, and GG genotype of the rs2010963 polymorphism were 15.09%, 47.14%, and 37.74% in DR and 12.90%, 19.35%, and 67.74% in DNR, respectively (p = 0.0205). The prevalence of DR was higher (p = 0.00793) in patients with the CC or CG genotype (62.26% and 32.26% for DR and DNR, respectively) compared with the patients with the GG genotype. To evaluate the effect of rs2010963-C allele on transcriptional activity, we measured promoter activity using luciferase reporter constructs. The rs2010963-C reporter showed 1.6 to 2-fold higher luciferase activity than rs2010963-G in 3 cell lines. Conclusion: Our data proposed rs2010963-C altered the expression level of VEGFA in different tissues. We suggested small increase but long term exposure to VEGFA may lead to DR finally. PMID:27648002

  13. Nanofiber Filters Eliminate Contaminants

    NASA Technical Reports Server (NTRS)

    2009-01-01

    With support from Phase I and II SBIR funding from Johnson Space Center, Argonide Corporation of Sanford, Florida tested and developed its proprietary nanofiber water filter media. Capable of removing more than 99.99 percent of dangerous particles like bacteria, viruses, and parasites, the media was incorporated into the company's commercial NanoCeram water filter, an inductee into the Space Foundation's Space Technology Hall of Fame. In addition to its drinking water filters, Argonide now produces large-scale nanofiber filters used as part of the reverse osmosis process for industrial water purification.

  14. Linear phase compressive filter

    DOEpatents

    McEwan, Thomas E.

    1995-01-01

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line.

  15. Linear phase compressive filter

    DOEpatents

    McEwan, T.E.

    1995-06-06

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line. 2 figs.

  16. Filter construction and design.

    PubMed

    Jornitz, Maik W

    2006-01-01

    Sterilizing and pre-filters are manufactured in different formats and designs. The criteria for the specific designs are set by the application and the specifications of the filter user. The optimal filter unit or even system requires evaluation, such as flow rate, throughput, unspecific adsorption, steam sterilizability and chemical compatibility. These parameters are commonly tested within a qualification phase, which ensures that an optimal filter design and combination finds its use. If such design investigations are neglected it could be costly in the process scale. PMID:16570863

  17. Novel SNPs of the mannan-binding lectin 2 gene and their association with production traits in Chinese Holsteins.

    PubMed

    Zhao, Z L; Wang, C F; Li, Q L; Ju, Z H; Huang, J M; Li, J B; Zhong, J F; Zhang, J B

    2012-01-01

    The mannan-binding lectin gene (MBL) participates as an opsonin in the innate immune system of mammals, and single nucleotide polymorphisms (SNPs) in MBL cause various immune dysfunctions. In this study, we detected SNPs in MBL2 at exon 1 using polymerase chain reaction single-strand conformation polymorphism analysis and DNA sequencing techniques in 825 Chinese Holstein cows. Four new SNPs with various allele frequencies were also found. The g.1164 G>A SNP was predicted to substitute arginine with glutamine at the N-terminus of the cysteine-rich domain. In the collagen-like domain, SNPs g.1197 C>A and g.1198 G>A changed proline to glutamine, whereas SNP g.1207 T>C was identified as a synonymous mutation. Correlation analysis showed that the g.1197 C>A marker was significantly correlated to somatic cell score (SCS), and the g.1164 G>A locus had significant effects on SCS, fat content, and protein content (P < 0.05), suggesting possible roles of these SNPs in the host response against mastitis. Nine haplotypes and nine haplotype pairs corresponding to the loci of the 4 novel SNPs were found in Chinese Holsteins. Haplotype pairs MM, MN, and BQ were correlated with the lowest SCS; MN with the highest protein yield; MM with the highest protein rate, and MN with the highest 305- day milk yield. Thus, MM, MN, and BQ are possible candidates for marker-assisted selection in dairy cattle breeding programs. PMID:23096694

  18. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, T.E.; Alvin, M.A.; Bruck, G.J.; Smeltzer, E.E.

    1999-03-02

    A filter holder and gasket assembly are disclosed for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut. 9 figs.

  19. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, Thomas Edwin; Alvin, Mary Anne; Bruck, Gerald Joseph; Smeltzer, Eugene E.

    1999-03-02

    A filter holder and gasket assembly for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut.

  20. Filtering reprecipitated slurry

    SciTech Connect

    Morrissey, M.F.

    1992-01-01

    As part of the Late Washing Demonstration at Savannah River Technology Center, Interim Waste Technology has filtered reprecipitated and non reprecipitated slurry with the Experimental Laboratory Filter (ELF) at TNX. Reprecipitated slurry generates higher permeate fluxes than non reprecipitated slurry. Washing reprecipitated slurry may require a defoamer because reprecipitation encourages foaming.

  1. Filtering reprecipitated slurry

    SciTech Connect

    Morrissey, M.F.

    1992-12-31

    As part of the Late Washing Demonstration at Savannah River Technology Center, Interim Waste Technology has filtered reprecipitated and non reprecipitated slurry with the Experimental Laboratory Filter (ELF) at TNX. Reprecipitated slurry generates higher permeate fluxes than non reprecipitated slurry. Washing reprecipitated slurry may require a defoamer because reprecipitation encourages foaming.

  2. Active rejector filter

    SciTech Connect

    Kuchinskii, A.G.; Pirogov, S.G.; Savchenko, V.M.; Yakushev, A.K.

    1985-01-01

    This paper describes an active rejector filter for suppressing noise signals in the frequency range 50-100 Hz and for extracting a vlf information signal. The filter has the following characteristics: a high input impedance, a resonant frequency of 75 Hz, a Q of 1.25, and an attenuation factor of 53 dB at resonant frequency.

  3. Transposon Insertions, Structural Variations, and SNPs Contribute to the Evolution of the Melon Genome.

    PubMed

    Sanseverino, Walter; Hénaff, Elizabeth; Vives, Cristina; Pinosio, Sara; Burgos-Paz, William; Morgante, Michele; Ramos-Onsins, Sebastián E; Garcia-Mas, Jordi; Casacuberta, Josep Maria

    2015-10-01

    The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution.

  4. Genetic diversity and demographic history of Cajanus spp. illustrated from genome-wide SNPs.

    PubMed

    Saxena, Rachit K; von Wettberg, Eric; Upadhyaya, Hari D; Sanchez, Vanessa; Songok, Serah; Saxena, Kulbhushan; Kimurto, Paul; Varshney, Rajeev K

    2014-01-01

    Understanding genetic structure of Cajanus spp. is essential for achieving genetic improvement by quantitative trait loci (QTL) mapping or association studies and use of selected markers through genomic assisted breeding and genomic selection. After developing a comprehensive set of 1,616 single nucleotide polymorphism (SNPs) and their conversion into cost effective KASPar assays for pigeonpea (Cajanus cajan), we studied levels of genetic variability both within and between diverse set of Cajanus lines including 56 breeding lines, 21 landraces and 107 accessions from 18 wild species. These results revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, 75.8% of successful SNP assays revealed polymorphism, and more than 95% of these assays could be successfully transferred to related wild species. To show regional patterns of variation, we used STRUCTURE and Analysis of Molecular Variance (AMOVA) to partition variance among hierarchical sets of landraces and wild species at either the continental scale or within India. STRUCTURE separated most of the domesticated germplasm from wild ecotypes, and separates Australian and Asian wild species as has been found previously. Among Indian regions and states within regions, we found 36% of the variation between regions, and 64% within landraces or wilds within states. The highest level of polymorphism in wild relatives and landraces was found in Madhya Pradesh and Andhra Pradesh provinces of India representing the centre of origin and domestication of pigeonpea respectively. PMID:24533111

  5. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  6. Analysis of 49 autosomal SNPs in three ethnic groups from Iran: Persians, Lurs and Kurds.

    PubMed

    Sharafi Farzad, M; Tomas, C; Børsting, C; Zeinali, Z; Malekdoost, M; Zeinali, S; Morling, N

    2013-07-01

    A total number of 149 individuals from Iran (Persians, Lurs and Kurds) were analyzed for 49 autosomal SNPs using PCR, SBE and capillary electrophoresis. No deviation from Hardy-Weinberg expectations was observed. One SNP pair (rs1015250-rs251934) showed significant linkage disequilibrium in Kurds. However, this was most likely due to chance. High intrapopulation variability and no significant population structure were observed among the three ethnic groups from Iran. Pairwise FST values obtained from the mean numbers of pairwise differences between SNP profiles were calculated for Persians, Lurs, Kurds and eighteen other worldwide populations. For each of the three Iranian ethnic groups, the lowest FST values calculated between an Iranian and non-Iranian populations were observed between Iranians and populations in Iraq and Turkey. The three Iranian ethnic groups grouped together with other West Asian populations in the MDS plot drawn from the FST values. Statistical parameters of forensic interest calculated for the Iranian ethnic groups showed values of the same order of magnitudes as those obtained for Asians. The mean match probability calculated for the 49 SNPs ranged from 1.7x10(-18) for Kurds to 1.3x10(-19) for Persians. Despite the low level of genetic structure observed among Persians, Lurs and Kurds, a single autosomal SNP database should be used with care when extending its forensic application to other Iranian ethnic groups. PMID:23648204

  7. Evaluating our ability to predict the structural disruption of RNA by SNPs

    PubMed Central

    2012-01-01

    The structure of RiboNucleic Acid (RNA) has the potential to be altered by a Single Nucleotide Polymorphism (SNP). Disease-associated SNPs mapping to non-coding regions of the genome that are transcribed into RiboNucleic Acid (RNA) can potentially affect cellular regulation (and cause disease) by altering the structure of the transcript. We performed a large-scale meta-analysis of Selective 2'-Hydroxyl Acylation analyzed by Primer Extension (SHAPE) data, which probes the structure of RNA. We found that several single point mutations exist that significantly disrupt RNA secondary structure in the five transcripts we analyzed. Thus, every RNA that is transcribed has the potential to be a “RiboSNitch;” where a SNP causes a large conformational change that alters regulatory function. Predicting the SNPs that will have the largest effect on RNA structure remains a contemporary computational challenge. We therefore benchmarked the most popular RNA structure prediction algorithms for their ability to identify mutations that maximally affect structure. We also evaluated metrics for rank ordering the extent of the structural change. Although no single algorithm/metric combination dramatically outperformed the others, small differences in AUC (Area Under the Curve) values reveal that certain approaches do provide better agreement with experiment. The experimental data we analyzed nonetheless show that multiple single point mutations exist in all RNA transcripts that significantly disrupt structure in agreement with the predictions. PMID:22759654

  8. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs

    PubMed Central

    2013-01-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  9. Genetic association between SNPs in the DGAT1 gene and milk production traits in Murrah buffaloes.

    PubMed

    de Freitas, Ana Cláudia; de Camargo, Gregório Miguel Ferreira; Stafuzza, Nedenia Bonvino; Aspilcueta-Borquis, Rusbel Raul; Venturini, Guilherme Costa; Dias, Marina Mortati; Cardoso, Diercles Francisco; Tonhati, Humberto

    2016-10-01

    This study identified polymorphisms in the DGAT1 gene in Murrah buffaloes and investigated the associations to milk production and quality traits (milk, fat and protein yields and percentages, somatic cell count). Genomic DNA was extracted from hair follicles collected from the tail of 196 females. Three SNPs were identified in DGAT1 gene by sequencing. Statistical analyses were performed to verify the linkage and the association between polymorphisms and traits. The estimated value of r (2) between two SNPs in exon 17 (g.11,783G > A and g.11,785 T > C) was 0.029. SNP g.11,785 T > C was significantly associated (P < 0.05) to fat and protein percentage. Dominance effect was significant for milk and fat yields and protein percentage (P < 0.05). The additive effect of the SNP g.11,785 T > C was significant for protein production and somatic cell count (P < 0.05). It indicates that assisted marker selection might be done with considerations to balance production and udder health.

  10. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing

    PubMed Central

    Bowers, John E.; Pearl, Stephanie A.; Burke, John M.

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  11. The genetics of human infertility by functional interrogation of SNPs in mice.

    PubMed

    Singh, Priti; Schimenti, John C

    2015-08-18

    Infertility is a prevalent health issue, affecting ∼15% of couples of childbearing age. Nearly one-half of idiopathic infertility cases are thought to have a genetic basis, but the underlying causes are largely unknown. Traditional methods for studying inheritance, such as genome-wide association studies and linkage analyses, have been confounded by the genetic and phenotypic complexity of reproductive processes. Here we describe an association- and linkage-free approach to identify segregating infertility alleles, in which CRISPR/Cas9 genome editing is used to model putatively deleterious nonsynonymous SNPs (nsSNPs) in the mouse orthologs of fertility genes. Mice bearing "humanized" alleles of four essential meiosis genes, each predicted to be deleterious by most of the commonly used algorithms for analyzing functional SNP consequences, were examined for fertility and reproductive defects. Only a Cdk2 allele mimicking SNP rs3087335, which alters an inhibitory WEE1 protein kinase phosphorylation site, caused infertility and revealed a novel function in regulating spermatogonial stem cell maintenance. Our data indicate that segregating infertility alleles exist in human populations. Furthermore, whereas computational prediction of SNP effects is useful for identifying candidate causal mutations for diverse diseases, this study underscores the need for in vivo functional evaluation of physiological consequences. This approach can revolutionize personalized reproductive genetics by establishing a permanent reference of benign vs. infertile alleles. PMID:26240362

  12. Heritability and Genetic Correlations Explained by Common SNPs for Metabolic Syndrome Traits

    PubMed Central

    Vattikuti, Shashaank; Guo, Juen; Chow, Carson C.

    2012-01-01

    We used a bivariate (multivariate) linear mixed-effects model to estimate the narrow-sense heritability (h2) and heritability explained by the common SNPs (hg2) for several metabolic syndrome (MetS) traits and the genetic correlation between pairs of traits for the Atherosclerosis Risk in Communities (ARIC) genome-wide association study (GWAS) population. MetS traits included body-mass index (BMI), waist-to-hip ratio (WHR), systolic blood pressure (SBP), fasting glucose (GLU), fasting insulin (INS), fasting trigylcerides (TG), and fasting high-density lipoprotein (HDL). We found the percentage of h2 accounted for by common SNPs to be 58% of h2 for height, 41% for BMI, 46% for WHR, 30% for GLU, 39% for INS, 34% for TG, 25% for HDL, and 80% for SBP. We confirmed prior reports for height and BMI using the ARIC population and independently in the Framingham Heart Study (FHS) population. We demonstrated that the multivariate model supported large genetic correlations between BMI and WHR and between TG and HDL. We also showed that the genetic correlations between the MetS traits are directly proportional to the phenotypic correlations. PMID:22479213

  13. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    PubMed

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-07-07

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.

  14. Evaluation of random forest regression for prediction of breeding value from genomewide SNPs.

    PubMed

    Sarkar, Rupam Kumar; Rao, A R; Meher, Prabina Kumar; Nepolean, T; Mohapatra, T

    2015-06-01

    Genomic prediction is meant for estimating the breeding value using molecular marker data which has turned out to be a powerful tool for efficient utilization of germplasm resources and rapid improvement of cultivars. Model-based techniques have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, application of the random forest (RF), a model-free ensemble learning method, is not widely used for prediction. In this study, the optimum values of tuning parameters of RF have been identified and applied to predict the breeding value of genotypes based on genomewide single-nucleotide polymorphisms (SNPs), where the number of SNPs (P variables) is much higher than the number of genotypes (n observations) (P » n). Further, a comparison was made with the model-based genomic prediction methods, namely, least absolute shrinkage and selection operator (LASSO), ridge regression (RR) and elastic net (EN) under P » n. It was found that the correlations between the predicted and observed trait response were 0.591, 0.539, 0.431 and 0.587 for RF, LASSO, RR and EN, respectively, which implies superiority of the RF over the model-based techniques in genomic prediction. Hence, we suggest that the RF methodology can be used as an alternative to the model-based techniques for the prediction of breeding value at genome level with higher accuracy.

  15. Genetic association between SNPs in the DGAT1 gene and milk production traits in Murrah buffaloes.

    PubMed

    de Freitas, Ana Cláudia; de Camargo, Gregório Miguel Ferreira; Stafuzza, Nedenia Bonvino; Aspilcueta-Borquis, Rusbel Raul; Venturini, Guilherme Costa; Dias, Marina Mortati; Cardoso, Diercles Francisco; Tonhati, Humberto

    2016-10-01

    This study identified polymorphisms in the DGAT1 gene in Murrah buffaloes and investigated the associations to milk production and quality traits (milk, fat and protein yields and percentages, somatic cell count). Genomic DNA was extracted from hair follicles collected from the tail of 196 females. Three SNPs were identified in DGAT1 gene by sequencing. Statistical analyses were performed to verify the linkage and the association between polymorphisms and traits. The estimated value of r (2) between two SNPs in exon 17 (g.11,783G > A and g.11,785 T > C) was 0.029. SNP g.11,785 T > C was significantly associated (P < 0.05) to fat and protein percentage. Dominance effect was significant for milk and fat yields and protein percentage (P < 0.05). The additive effect of the SNP g.11,785 T > C was significant for protein production and somatic cell count (P < 0.05). It indicates that assisted marker selection might be done with considerations to balance production and udder health. PMID:27469895

  16. Evaluation of random forest regression for prediction of breeding value from genomewide SNPs.

    PubMed

    Sarkar, Rupam Kumar; Rao, A R; Meher, Prabina Kumar; Nepolean, T; Mohapatra, T

    2015-06-01

    Genomic prediction is meant for estimating the breeding value using molecular marker data which has turned out to be a powerful tool for efficient utilization of germplasm resources and rapid improvement of cultivars. Model-based techniques have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, application of the random forest (RF), a model-free ensemble learning method, is not widely used for prediction. In this study, the optimum values of tuning parameters of RF have been identified and applied to predict the breeding value of genotypes based on genomewide single-nucleotide polymorphisms (SNPs), where the number of SNPs (P variables) is much higher than the number of genotypes (n observations) (P » n). Further, a comparison was made with the model-based genomic prediction methods, namely, least absolute shrinkage and selection operator (LASSO), ridge regression (RR) and elastic net (EN) under P » n. It was found that the correlations between the predicted and observed trait response were 0.591, 0.539, 0.431 and 0.587 for RF, LASSO, RR and EN, respectively, which implies superiority of the RF over the model-based techniques in genomic prediction. Hence, we suggest that the RF methodology can be used as an alternative to the model-based techniques for the prediction of breeding value at genome level with higher accuracy. PMID:26174666

  17. Transposon Insertions, Structural Variations, and SNPs Contribute to the Evolution of the Melon Genome.

    PubMed

    Sanseverino, Walter; Hénaff, Elizabeth; Vives, Cristina; Pinosio, Sara; Burgos-Paz, William; Morgante, Michele; Ramos-Onsins, Sebastián E; Garcia-Mas, Jordi; Casacuberta, Josep Maria

    2015-10-01

    The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution. PMID:26174143

  18. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    PubMed

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  19. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

    PubMed

    Lee, S Hong; Ripke, Stephan; Neale, Benjamin M; Faraone, Stephen V; Purcell, Shaun M; Perlis, Roy H; Mowry, Bryan J; Thapar, Anita; Goddard, Michael E; Witte, John S; Absher, Devin; Agartz, Ingrid; Akil, Huda; Amin, Farooq; Andreassen, Ole A; Anjorin, Adebayo; Anney, Richard; Anttila, Verneri; Arking, Dan E; Asherson, Philip; Azevedo, Maria H; Backlund, Lena; Badner, Judith A; Bailey, Anthony J; Banaschewski, Tobias; Barchas, Jack D; Barnes, Michael R; Barrett, Thomas B; Bass, Nicholas; Battaglia, Agatino; Bauer, Michael; Bayés, Mònica; Bellivier, Frank; Bergen, Sarah E; Berrettini, Wade; Betancur, Catalina; Bettecken, Thomas; Biederman, Joseph; Binder, Elisabeth B; Black, Donald W; Blackwood, Douglas H R; Bloss, Cinnamon S; Boehnke, Michael; Boomsma, Dorret I; Breen, Gerome; Breuer, René; Bruggeman, Richard; Cormican, Paul; Buccola, Nancy G; Buitelaar, Jan K; Bunney, William E; Buxbaum, Joseph D; Byerley, William F; Byrne, Enda M; Caesar, Sian; Cahn, Wiepke; Cantor, Rita M; Casas, Miguel; Chakravarti, Aravinda; Chambert, Kimberly; Choudhury, Khalid; Cichon, Sven; Cloninger, C Robert; Collier, David A; Cook, Edwin H; Coon, Hilary; Cormand, Bru; Corvin, Aiden; Coryell, William H; Craig, David W; Craig, Ian W; Crosbie, Jennifer; Cuccaro, Michael L; Curtis, David; Czamara, Darina; Datta, Susmita; Dawson, Geraldine; Day, Richard; De Geus, Eco J; Degenhardt, Franziska; Djurovic, Srdjan; Donohoe, Gary J; Doyle, Alysa E; Duan, Jubao; Dudbridge, Frank; Duketis, Eftichia; Ebstein, Richard P; Edenberg, Howard J; Elia, Josephine; Ennis, Sean; Etain, Bruno; Fanous, Ayman; Farmer, Anne E; Ferrier, I Nicol; Flickinger, Matthew; Fombonne, Eric; Foroud, Tatiana; Frank, Josef; Franke, Barbara; Fraser, Christine; Freedman, Robert; Freimer, Nelson B; Freitag, Christine M; Friedl, Marion; Frisén, Louise; Gallagher, Louise; Gejman, Pablo V; Georgieva, Lyudmila; Gershon, Elliot S; Geschwind, Daniel H; Giegling, Ina; Gill, Michael; Gordon, Scott D; Gordon-Smith, Katherine; Green, Elaine K; Greenwood, Tiffany A; Grice, Dorothy E; Gross, Magdalena; Grozeva, Detelina; Guan, Weihua; Gurling, Hugh; De Haan, Lieuwe; Haines, Jonathan L; Hakonarson, Hakon; Hallmayer, Joachim; Hamilton, Steven P; Hamshere, Marian L; Hansen, Thomas F; Hartmann, Annette M; Hautzinger, Martin; Heath, Andrew C; Henders, Anjali K; Herms, Stefan; Hickie, Ian B; Hipolito, Maria; Hoefels, Susanne; Holmans, Peter A; Holsboer, Florian; Hoogendijk, Witte J; Hottenga, Jouke-Jan; Hultman, Christina M; Hus, Vanessa; Ingason, Andrés; Ising, Marcus; Jamain, Stéphane; Jones, Edward G; Jones, Ian; Jones, Lisa; Tzeng, Jung-Ying; Kähler, Anna K; Kahn, René S; Kandaswamy, Radhika; Keller, Matthew C; Kennedy, James L; Kenny, Elaine; Kent, Lindsey; Kim, Yunjung; Kirov, George K; Klauck, Sabine M; Klei, Lambertus; Knowles, James A; Kohli, Martin A; Koller, Daniel L; Konte, Bettina; Korszun, Ania; Krabbendam, Lydia; Krasucki, Robert; Kuntsi, Jonna; Kwan, Phoenix; Landén, Mikael; Långström, Niklas; Lathrop, Mark; Lawrence, Jacob; Lawson, William B; Leboyer, Marion; Ledbetter, David H; Lee, Phil H; Lencz, Todd; Lesch, Klaus-Peter; Levinson, Douglas F; Lewis, Cathryn M; Li, Jun; Lichtenstein, Paul; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Liu, Chunyu; Lohoff, Falk W; Loo, Sandra K; Lord, Catherine; Lowe, Jennifer K; Lucae, Susanne; MacIntyre, Donald J; Madden, Pamela A F; Maestrini, Elena; Magnusson, Patrik K E; Mahon, Pamela B; Maier, Wolfgang; Malhotra, Anil K; Mane, Shrikant M; Martin, Christa L; Martin, Nicholas G; Mattheisen, Manuel; Matthews, Keith; Mattingsdal, Morten; McCarroll, Steven A; McGhee, Kevin A; McGough, James J; McGrath, Patrick J; McGuffin, Peter; McInnis, Melvin G; McIntosh, Andrew; McKinney, Rebecca; McLean, Alan W; McMahon, Francis J; McMahon, William M; McQuillin, Andrew; Medeiros, Helena; Medland, Sarah E; Meier, Sandra; Melle, Ingrid; Meng, Fan; Meyer, Jobst; Middeldorp, Christel M; Middleton, Lefkos; Milanova, Vihra; Miranda, Ana; Monaco, Anthony P; Montgomery, Grant W; Moran, Jennifer L; Moreno-De-Luca, Daniel; Morken, Gunnar; Morris, Derek W; Morrow, Eric M; Moskvina, Valentina; Muglia, Pierandrea; Mühleisen, Thomas W; Muir, Walter J; Müller-Myhsok, Bertram; Murtha, Michael; Myers, Richard M; Myin-Germeys, Inez; Neale, Michael C; Nelson, Stan F; Nievergelt, Caroline M; Nikolov, Ivan; Nimgaonkar, Vishwajit; Nolen, Willem A; Nöthen, Markus M; Nurnberger, John I; Nwulia, Evaristus A; Nyholt, Dale R; O'Dushlaine, Colm; Oades, Robert D; Olincy, Ann; Oliveira, Guiomar; Olsen, Line; Ophoff, Roel A; Osby, Urban; Owen, Michael J; Palotie, Aarno; Parr, Jeremy R; Paterson, Andrew D; Pato, Carlos N; Pato, Michele T; Penninx, Brenda W; Pergadia, Michele L; Pericak-Vance, Margaret A; Pickard, Benjamin S; Pimm, Jonathan; Piven, Joseph; Posthuma, Danielle; Potash, James B; Poustka, Fritz; Propping, Peter; Puri, Vinay; Quested, Digby J; Quinn, Emma M; Ramos-Quiroga, Josep Antoni; Rasmussen, Henrik B; Raychaudhuri, Soumya; Rehnström, Karola; Reif, Andreas; Ribasés, Marta; Rice, John P; Rietschel, Marcella; Roeder, Kathryn; Roeyers, Herbert; Rossin, Lizzy; Rothenberger, Aribert; Rouleau, Guy; Ruderfer, Douglas; Rujescu, Dan; Sanders, Alan R; Sanders, Stephan J; Santangelo, Susan L; Sergeant, Joseph A; Schachar, Russell; Schalling, Martin; Schatzberg, Alan F; Scheftner, William A; Schellenberg, Gerard D; Scherer, Stephen W; Schork, Nicholas J; Schulze, Thomas G; Schumacher, Johannes; Schwarz, Markus; Scolnick, Edward; Scott, Laura J; Shi, Jianxin; Shilling, Paul D; Shyn, Stanley I; Silverman, Jeremy M; Slager, Susan L; Smalley, Susan L; Smit, Johannes H; Smith, Erin N; Sonuga-Barke, Edmund J S; St Clair, David; State, Matthew; Steffens, Michael; Steinhausen, Hans-Christoph; Strauss, John S; Strohmaier, Jana; Stroup, T Scott; Sutcliffe, James S; Szatmari, Peter; Szelinger, Szabocls; Thirumalai, Srinivasa; Thompson, Robert C; Todorov, Alexandre A; Tozzi, Federica; Treutlein, Jens; Uhr, Manfred; van den Oord, Edwin J C G; Van Grootheest, Gerard; Van Os, Jim; Vicente, Astrid M; Vieland, Veronica J; Vincent, John B; Visscher, Peter M; Walsh, Christopher A; Wassink, Thomas H; Watson, Stanley J; Weissman, Myrna M; Werge, Thomas; Wienker, Thomas F; Wijsman, Ellen M; Willemsen, Gonneke; Williams, Nigel; Willsey, A Jeremy; Witt, Stephanie H; Xu, Wei; Young, Allan H; Yu, Timothy W; Zammit, Stanley; Zandi, Peter P; Zhang, Peng; Zitman, Frans G; Zöllner, Sebastian; Devlin, Bernie; Kelsoe, John R; Sklar, Pamela; Daly, Mark J; O'Donovan, Michael C; Craddock, Nicholas; Sullivan, Patrick F; Smoller, Jordan W; Kendler, Kenneth S; Wray, Naomi R

    2013-09-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  20. No Observed Association for Mitochondrial SNPs with Preterm Delivery and Related Outcomes

    PubMed Central

    Alleman, Brandon W.; Myking, Solveig; Ryckman, Kelli K.; Myhre, Ronny; Feingold, Eleanor; Feenstra, Bjarke; Geller, Frank; Boyd, Heather A.; Shaffer, John R.; Zhang, Qi; Begum, Ferdouse; Crosslin, David; Doheny, Kim; Pugh, Elizabeth; Pay, Aase Serine Devold; Østensen, Ingrid H.G.; Morken, Nils-Halvdan; Magnus, Per; Marazita, Mary L.; Jacobsson, Bo; Melbye, Mads; Murray, Jeffrey C.

    2013-01-01

    Background Preterm delivery (PTD) is the leading cause of neonatal morbidity and mortality. Epidemiologic studies indicate recurrence of PTD is maternally inherited creating a strong possibility that mitochondrial variants contribute to its etiology. This study examines the association between mitochondrial genotypes with PTD and related outcomes. Methods This study combined, through meta-analysis, two case-control, genome-wide association studies (GWAS); one from the Danish National Birth Cohort (DNBC) Study and one from the Norwegian Mother and Child Cohort Study (MoBa) conducted by the Norwegian Institute of Public Health. The outcomes of PTD (≤36 weeks), very PTD (≤32 weeks) and preterm prelabor rupture of membranes (PPROM) were examined. 135 individual SNP associations were tested using the combined genome from mothers and neonates (case vs. control) in each population and then pooled via meta-analysis. Results After meta-analysis there were four SNPs for the outcome of PTD below p≤0.10, and two below p≤0.05. For the additional outcomes of very PTD and PPROM there were three and four SNPs respectively below p≤0.10. Conclusion Given the number of tests no single SNP reached study wide significance (p=0.0006). Our study does not support the hypothesis that mitochondrial genetics contributes to the maternal transmission of PTD and related outcomes. PMID:22902432

  1. The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs

    PubMed Central

    2004-01-01

    Understanding the nature of evolutionary relationships among persons and populations is important for the efficient application of genome science to biomedical research. We have analysed 8,525 autosomal single nucleotide polymorphisms (SNPs) in 84 individuals from four populations: African-American, European-American, Chinese and Japanese. Individual relationships were reconstructed using the allele sharing distance and the neighbour-joining tree making method. Trees show clear clustering according to population, with the root branching from the African-American clade. The African-American cluster is much less star-like than European-American and East Asian clusters, primarily because of admixture. Furthermore, on the East Asian branch, all ten Chinese individuals cluster together and all ten Japanese individuals cluster together. Using positional information, we demonstrate strong correlations between inter-marker distance and both locus-specific FST (the proportion of total variation due to differentiation) levels and branch lengths. Chromosomal maps of the distribution of locus-specific branch lengths were constructed by combining these data with other published SNP markers (total of 33,704 SNPs). These maps clearly illustrate a non-uniform distribution of human genetic substructure, an instructional and useful paradigm for education and research. PMID:15588487

  2. Weighted guided image filtering.

    PubMed

    Li, Zhengguo; Zheng, Jinghong; Zhu, Zijian; Yao, Wei; Wu, Shiqian

    2015-01-01

    It is known that local filtering-based edge preserving smoothing techniques suffer from halo artifacts. In this paper, a weighted guided image filter (WGIF) is introduced by incorporating an edge-aware weighting into an existing guided image filter (GIF) to address the problem. The WGIF inherits advantages of both global and local smoothing filters in the sense that: 1) the complexity of the WGIF is O(N) for an image with N pixels, which is same as the GIF and 2) the WGIF can avoid halo artifacts like the existing global smoothing filters. The WGIF is applied for single image detail enhancement, single image haze removal, and fusion of differently exposed images. Experimental results show that the resultant algorithms produce images with better visual quality and at the same time halo artifacts can be reduced/avoided from appearing in the final images with negligible increment on running times. PMID:25415986

  3. Sintered composite filter

    DOEpatents

    Bergman, W.

    1986-05-02

    A particulate filter medium formed of a sintered composite of 0.5 micron diameter quartz fibers and 2 micron diameter stainless steel fibers is described. Preferred composition is about 40 vol.% quartz and about 60 vol.% stainless steel fibers. The media is sintered at about 1100/sup 0/C to bond the stainless steel fibers into a cage network which holds the quartz fibers. High filter efficiency and low flow resistance are provided by the smaller quartz fibers. High strength is provided by the stainless steel fibers. The resulting media has a high efficiency and low pressure drop similar to the standard HEPA media, with tensile strength at least four times greater, and a maximum operating temperature of about 550/sup 0/C. The invention also includes methods to form the composite media and a HEPA filter utilizing the composite media. The filter media can be used to filter particles in both liquids and gases.

  4. Sub-micron filter

    DOEpatents

    Tepper, Frederick; Kaledin, Leonid

    2009-10-13

    Aluminum hydroxide fibers approximately 2 nanometers in diameter and with surface areas ranging from 200 to 650 m.sup.2/g have been found to be highly electropositive. When dispersed in water they are able to attach to and retain electronegative particles. When combined into a composite filter with other fibers or particles they can filter bacteria and nano size particulates such as viruses and colloidal particles at high flux through the filter. Such filters can be used for purification and sterilization of water, biological, medical and pharmaceutical fluids, and as a collector/concentrator for detection and assay of microbes and viruses. The alumina fibers are also capable of filtering sub-micron inorganic and metallic particles to produce ultra pure water. The fibers are suitable as a substrate for growth of cells. Macromolecules such as proteins may be separated from each other based on their electronegative charges.

  5. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests

    PubMed Central

    2015-01-01

    Background Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. Results This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction

  6. Natural Functional SNPs in miR-155 Alter Its Expression Level, Blood Cell Counts, and Immune Responses.

    PubMed

    Li, Congcong; He, Huabin; Liu, An; Liu, Huazhen; Huang, Haibo; Zhao, Changzhi; Jing, Lu; Ni, Juan; Yin, Lilin; Hu, Suqin; Wu, Hui; Li, Xinyun; Zhao, Shuhong

    2016-01-01

    miR-155 has been confirmed to be a key factor in immune responses in humans and other mammals. Therefore, investigation of variations in miR-155 could be useful for understanding the differences in immunity between individuals. In this study, four SNPs in miR-155 were identified in mice (Mus musculus) and humans (Homo sapiens). In mice, the four SNPs were closely linked and formed two miR-155 haplotypes (A and B). Ten distinct types of blood parameters were associated with miR-155 expression under normal conditions. Additionally, 4 and 14 blood parameters were significantly different between these two genotypes under normal and lipopolysaccharide (LPS) stimulation conditions, respectively. Moreover, the expression levels of miR-155, the inflammatory response to LPS stimulation, and the lethal ratio following Salmonella typhimurium infection were significantly increased in mice harboring the AA genotype. Further, two SNPs, one in the loop region and the other near the 3' terminal of pre-miR-155, were confirmed to be responsible for the differential expression of miR-155 in mice. Interestingly, two additional SNPs, one in the loop region and the other in the middle of miR-155*, modulated the function of miR-155 in humans. Predictions of secondary RNA structure using RNAfold showed that these SNPs affected the structure of miR-155 in both mice and humans. Our results provide novel evidence of the natural functional SNPs of miR-155 in both mice and humans, which may affect the expression levels of mature miR-155 by modulating its secondary structure. The SNPs of human miR-155 may be considered as causal mutations for some immune-related diseases in the clinic. The two genotypes of mice could be used as natural models for studying the mechanisms of immune diseases caused by abnormal expression of miR-155 in humans. PMID:27532002

  7. Natural Functional SNPs in miR-155 Alter Its Expression Level, Blood Cell Counts, and Immune Responses

    PubMed Central

    Li, Congcong; He, Huabin; Liu, An; Liu, Huazhen; Huang, Haibo; Zhao, Changzhi; Jing, Lu; Ni, Juan; Yin, Lilin; Hu, Suqin; Wu, Hui; Li, Xinyun; Zhao, Shuhong

    2016-01-01

    miR-155 has been confirmed to be a key factor in immune responses in humans and other mammals. Therefore, investigation of variations in miR-155 could be useful for understanding the differences in immunity between individuals. In this study, four SNPs in miR-155 were identified in mice (Mus musculus) and humans (Homo sapiens). In mice, the four SNPs were closely linked and formed two miR-155 haplotypes (A and B). Ten distinct types of blood parameters were associated with miR-155 expression under normal conditions. Additionally, 4 and 14 blood parameters were significantly different between these two genotypes under normal and lipopolysaccharide (LPS) stimulation conditions, respectively. Moreover, the expression levels of miR-155, the inflammatory response to LPS stimulation, and the lethal ratio following Salmonella typhimurium infection were significantly increased in mice harboring the AA genotype. Further, two SNPs, one in the loop region and the other near the 3′ terminal of pre-miR-155, were confirmed to be responsible for the differential expression of miR-155 in mice. Interestingly, two additional SNPs, one in the loop region and the other in the middle of miR-155*, modulated the function of miR-155 in humans. Predictions of secondary RNA structure using RNAfold showed that these SNPs affected the structure of miR-155 in both mice and humans. Our results provide novel evidence of the natural functional SNPs of miR-155 in both mice and humans, which may affect the expression levels of mature miR-155 by modulating its secondary structure. The SNPs of human miR-155 may be considered as causal mutations for some immune-related diseases in the clinic. The two genotypes of mice could be used as natural models for studying the mechanisms of immune diseases caused by abnormal expression of miR-155 in humans. PMID:27532002

  8. Detection of SNPs in the TBC1D1 gene and their association with carcass traits in chicken.

    PubMed

    Wang, Yan; Xu, Heng-Yong; Gilbert, Elizabeth R; Peng, Xing; Zhao, Xiao-Ling; Liu, Yi-Ping; Zhu, Qing

    2014-09-01

    TBC1D1 plays an important role in numerous fundamental physiological processes including muscle metabolism, regulation of whole body energy homeostasis and lipid metabolism. The objective of the present study was to identify single nucleotide polymorphisms (SNPs) in chicken TBC1D1 using 128 Erlang mountainous chickens and to determine if these SNPs are associated with carcass traits. The approach consisted of sequencing TBC1D1 using a panel of DNA from different individuals, revealing twenty-two SNPs. Among these SNPs, two polymorphisms (g.69307744C>T and g.69307608T>G) of block 1, four polymorphisms (g.69322320C>T, g.69322314G>A, g.69317290A>G and g.69317276T>C) of block 2 and four polymorphisms of block 3 (g.69349746G>A, g.69349736C>G, g.69349727C>T and g.69349694C>T) exhibited a high degree of linkage disequilibrium in all test populations. An association analysis was performed between the twenty-two SNPs and seven performance traits. SNPs g.69307744C>T, g.69340192G>A and g.69355665T>C were demonstrated to have a strong effect on liveweight (BW), carcass weight (CW), semi-eviscerated weight (SEW) and eviscerated weight (EW) and g.69340070C>T polymorphism was related to BW, SEW and BMW in chicken populations. However, for the other SNPs, there were no significant correlations between different genotypes and carcass traits. Meanwhile, haplotype CT-TG of block 1 and combined genotype AG-TT-AC-CT of block 3 were significantly associated with BW, CW, SEW and EW. Overall, our results provide evidence that polymorphisms in TBC1D1 are associated with carcass traits and would be a useful candidate gene in selection programs for improving carcass traits. PMID:24979340

  9. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation.

    PubMed

    Carpenter, James R; Roger, James H; Kenward, Michael G

    2013-01-01

    Protocol deviations, for example, due to early withdrawal and noncompliance, are unavoidable in clinical trials. Such deviations often result in missing data. Additional assumptions are then needed for the analysis, and these cannot be definitively verified from the data at hand. Thus, as recognized by recent regulatory guidelines and reports, clarity about these assumptions and their implications is vital for both the primary analysis and framing relevant sensitivity analysis. This article focuses on clinical trials with longitudinal quantitative outcome data. For the target population, we define two estimands, the de jure estimand, "does the treatment work under the best case scenario," and the de facto estimand, "what would be the effect seen in practice." We then carefully define the concept of a deviation from the protocol relevant to the estimand, or for short a deviation. Each patient's postrandomization data can then be divided into predeviation data and postdeviation data. We set out an accessible framework for contextually appropriate assumptions relevant to de facto and de jure estimands, that is, assumptions about the joint distribution of pre- and postdeviation data relevant to the clinical question at hand. We then show how, under these assumptions, multiple imputation provides a practical approach to estimation and inference. We illustrate with data from a longitudinal clinical trial in patients with chronic asthma. PMID:24138436

  10. Using imputation and mixture model approaches to integrate multi-state capture-recapture models with assignment information.

    PubMed

    Wen, Zhi; Pollock, Kenneth H; Nichols, James D; Waser, Peter M; Cao, Weihua

    2014-06-01

    In this article, we first extend the superpopulation capture-recapture model to multiple states (locations or populations) for two age groups., Wen et al., (2011; 2013) developed a new approach combining capture-recapture data with population assignment information to estimate the relative contributions of in situ births and immigrants to the growth of a single study population. Here, we first generalize Wen et al., (2011; 2013) approach to a system composed of multiple study populations (multi-state) with two age groups, where an imputation approach is employed to account for the uncertainty inherent in the population assignment information. Then we develop a different, individual-level mixture model approach to integrate the individual-level population assignment information with the capture-recapture data. Our simulation and real data analyses show that the fusion of population assignment information with capture-recapture data allows us to estimate the origination-specific recruitment of new animals to the system and the dispersal process between populations within the system. Compared to a standard capture-recapture model, our new models improve the estimation of demographic parameters, including survival probability, origination-specific entry probability, and especially the probability of movement between populations, yielding higher accuracy and precision.

  11. Ceramic fiber reinforced filter

    DOEpatents

    Stinton, David P.; McLaughlin, Jerry C.; Lowden, Richard A.

    1991-01-01

    A filter for removing particulate matter from high temperature flowing fluids, and in particular gases, that is reinforced with ceramic fibers. The filter has a ceramic base fiber material in the form of a fabric, felt, paper of the like, with the refractory fibers thereof coated with a thin layer of a protective and bonding refractory applied by chemical vapor deposition techniques. This coating causes each fiber to be physically joined to adjoining fibers so as to prevent movement of the fibers during use and to increase the strength and toughness of the composite filter. Further, the coating can be selected to minimize any reactions between the constituents of the fluids and the fibers. A description is given of the formation of a composite filter using a felt preform of commercial silicon carbide fibers together with the coating of these fibers with pure silicon carbide. Filter efficiency approaching 100% has been demonstrated with these filters. The fiber base material is alternately made from aluminosilicate fibers, zirconia fibers and alumina fibers. Coating with Al.sub.2 O.sub.3 is also described. Advanced configurations for the composite filter are suggested.

  12. Solc filter engineering

    NASA Technical Reports Server (NTRS)

    Rosenberg, W. J.; Title, A. M.

    1982-01-01

    A Solc (1965) filter configuration is presented which is both tunable and spectrally variable, since it possesses an adjustable bandwidth, and which although less efficient than a Lyot filter is attractive because of its spectral versatility. The lossless design, using only an entrance and exit polarizer, improves throughput generally and especially in the IR, where polarizers are less convenient than dichroic sheet polarizers. Attention is given to the transmission profiles of Solc filters with different numbers of elements and split elements, as well as their mechanical design features.

  13. Multilevel filtering elliptic preconditioners

    NASA Technical Reports Server (NTRS)

    Kuo, C. C. Jay; Chan, Tony F.; Tong, Charles

    1989-01-01

    A class of preconditioners is presented for elliptic problems built on ideas borrowed from the digital filtering theory and implemented on a multilevel grid structure. They are designed to be both rapidly convergent and highly parallelizable. The digital filtering viewpoint allows the use of filter design techniques for constructing elliptic preconditioners and also provides an alternative framework for understanding several other recently proposed multilevel preconditioners. Numerical results are presented to assess the convergence behavior of the new methods and to compare them with other preconditioners of multilevel type, including the usual multigrid method as preconditioner, the hierarchical basis method and a recent method proposed by Bramble-Pasciak-Xu.

  14. HEPA filter jointer

    SciTech Connect

    Hill, D.; Martinez, H.E.

    1998-02-01

    A HEPA filter jointer system was created to remove nitrate contaminated wood from the wooden frames of HEPA filters that are stored at the Rocky Flats Plant. A commercial jointer was chosen to remove the nitrated wood. The chips from the wood removal process are in the right form for caustic washing. The jointer was automated for safety and ease of operation. The HEPA filters are prepared for jointing by countersinking the nails with a modified air hammer. The equipment, computer program, and tests are described in this report.

  15. Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene

    PubMed Central

    Omer, Shaza E.; Khalf-allah, Rahma M.; Mustafa, Razaz Y.; Ali, Isra S.; Mohamed, Sofia B.

    2016-01-01

    This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3′ UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5′ UTR). In addition for 5′/3′ splice sites, analysis showed that one SNP within 5′ splice site and one Indel in 3′ splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases. PMID:27478437

  16. Genotypes, haplotypes and diplotypes of IGF-II SNPs and their association with growth traits in largemouth bass (Micropterus salmoides).

    PubMed

    Li, Xiaohui; Bai, Junjie; Hu, Yinchang; Ye, Xing; Li, Shengjie; Yu, Lingyun

    2012-04-01

    Insulin-like growth factor II (IGF-II) is involved in the regulation of somatic growth and metabolism in many fishes. IGF-II is an important candidate gene for growth traits in fishes and its polymorphisms were associated with the growth traits. The aim of this study is to screen single nucleotide polymorphisms (SNPs) of the largemouth bass (Micropterus salmoides) IGF-II gene and to analyze potential association between IGF-II gene polymorphisms and growth traits in largemouth bass. Four SNPs (C127T, T1012G, C1836T and C1861T) were detected and verified by DNA sequencing in the largemouth bass IGF-II gene. These SNPs were found to organize into seven haplotypes, which formed 13 diplotypes (haplotype pairs). Association analysis showed that four individual SNPs were not significantly associated with growth traits. Significant associations were, however, noted between diplotypes and growth traits (P < 0.05). The fish with H1H3 (CTCC/CGCC) and H1H5 (CTCC/TTTT) had greater body weight than those with H1H1 (CTCC/CTCC), H1H2 (CTCC/TGTT) and H4H4 (TGCT/TGCT/) did. Our data suggest a significant association between genetic variations in the largemouth bass IGF-II gene and growth traits. IGF-II SNPs could be used as potential genetic markers in future breeding programs of largemouth bass. PMID:21894518

  17. SNPs in the aryl hydrocarbon receptor-interacting protein gene associated with sporadic non-functioning pituitary adenoma

    PubMed Central

    HU, YESHUAI; YANG, JUN; CHANG, YONGKAI; MA, SHUNCHANG; QI, JIANFA

    2016-01-01

    Mutations in the aryl hydrocarbon receptor-interacting protein (AIP) gene have previously been associated with a predisposition to pituitary adenomas. However, to the best of our knowledge, mutations in AIP that relate specifically to sporadic non-functioning pituitary adenomas (NFPAs) have yet to be reported. Therefore, the present study aimed to identify single nucleotide polymorphisms (SNPs) in the AIP gene that may be associated with NFPAs. Peripheral blood samples and the entire coding sequence of the AIP gene from 56 patients with NFPAs and 56 controls were analyzed in triplicate. Of the 56 patients with NFPAs, 9 patients (16.1%) were identified as harboring five different SNPs, although no germline mutations in the AIP gene were detected in any of the patients. Three different SNPs (7051C>T, 8012G>C and 8020G>C) were identified in exons 4 and 6 in 3 different patients (each in 1 patient). Two different SNPs (7318C>A and 7886A>G) were identified in exons 5 and 6, respectively, in 6 different patients (each in 3 patients). No SNPs or germline mutations in the AIP gene were identified in the controls. The results of the present study suggested that mutations in the AIP gene might not have an important role in the tumorigenesis of NFPAs. However, further studies are required in order to investigate potential molecular and genetic mechanisms that may underlie the involvement of AIP in NFPA. PMID:26998050

  18. Genotypes, haplotypes and diplotypes of IGF-II SNPs and their association with growth traits in largemouth bass (Micropterus salmoides).

    PubMed

    Li, Xiaohui; Bai, Junjie; Hu, Yinchang; Ye, Xing; Li, Shengjie; Yu, Lingyun

    2012-04-01

    Insulin-like growth factor II (IGF-II) is involved in the regulation of somatic growth and metabolism in many fishes. IGF-II is an important candidate gene for growth traits in fishes and its polymorphisms were associated with the growth traits. The aim of this study is to screen single nucleotide polymorphisms (SNPs) of the largemouth bass (Micropterus salmoides) IGF-II gene and to analyze potential association between IGF-II gene polymorphisms and growth traits in largemouth bass. Four SNPs (C127T, T1012G, C1836T and C1861T) were detected and verified by DNA sequencing in the largemouth bass IGF-II gene. These SNPs were found to organize into seven haplotypes, which formed 13 diplotypes (haplotype pairs). Association analysis showed that four individual SNPs were not significantly associated with growth traits. Significant associations were, however, noted between diplotypes and growth traits (P < 0.05). The fish with H1H3 (CTCC/CGCC) and H1H5 (CTCC/TTTT) had greater body weight than those with H1H1 (CTCC/CTCC), H1H2 (CTCC/TGTT) and H4H4 (TGCT/TGCT/) did. Our data suggest a significant association between genetic variations in the largemouth bass IGF-II gene and growth traits. IGF-II SNPs could be used as potential genetic markers in future breeding programs of largemouth bass.

  19. A Mismatch EndoNuclease Array-Based Methodology (MENA) for Identifying Known SNPs or Novel Point Mutations

    PubMed Central

    Comeron, Josep M.; Reed, Jordan; Christie, Matthew; Jacobs, Julia S.; Dierdorff, Jason; Eberl, Daniel F.; Manak, J. Robert

    2016-01-01

    Accurate and rapid identification or confirmation of single nucleotide polymorphisms (SNPs), point mutations and other human genomic variation facilitates understanding the genetic basis of disease. We have developed a new methodology (called MENA (Mismatch EndoNuclease Array)) pairing DNA mismatch endonuclease enzymology with tiling microarray hybridization in order to genotype both known point mutations (such as SNPs) as well as identify previously undiscovered point mutations and small indels. We show that our assay can rapidly genotype known SNPs in a human genomic DNA sample with 99% accuracy, in addition to identifying novel point mutations and small indels with a false discovery rate as low as 10%. Our technology provides a platform for a variety of applications, including: (1) genotyping known SNPs as well as confirming newly discovered SNPs from whole genome sequencing analyses; (2) identifying novel point mutations and indels in any genomic region from any organism for which genome sequence information is available; and (3) screening panels of genes associated with particular diseases and disorders in patient samples to identify causative mutations. As a proof of principle for using MENA to discover novel mutations, we report identification of a novel allele of the beethoven (btv) gene in Drosophila, which encodes a ciliary cytoplasmic dynein motor protein important for auditory mechanosensation. PMID:27600073

  20. A Mismatch EndoNuclease Array-Based Methodology (MENA) for Identifying Known SNPs or Novel Point Mutations.

    PubMed

    Comeron, Josep M; Reed, Jordan; Christie, Matthew; Jacobs, Julia S; Dierdorff, Jason; Eberl, Daniel F; Manak, J Robert

    2016-01-01

    Accurate and rapid identification or confirmation of single nucleotide polymorphisms (SNPs), point mutations and other human genomic variation facilitates understanding the genetic basis of disease. We have developed a new methodology (called MENA (Mismatch EndoNuclease Array)) pairing DNA mismatch endonuclease enzymology with tiling microarray hybridization in order to genotype both known point mutations (such as SNPs) as well as identify previously undiscovered point mutations and small indels. We show that our assay can rapidly genotype known SNPs in a human genomic DNA sample with 99% accuracy, in addition to identifying novel point mutations and small indels with a false discovery rate as low as 10%. Our technology provides a platform for a variety of applications, including: (1) genotyping known SNPs as well as confirming newly discovered SNPs from whole genome sequencing analyses; (2) identifying novel point mutations and indels in any genomic region from any organism for which genome sequence information is available; and (3) screening panels of genes associated with particular diseases and disorders in patient samples to identify causative mutations. As a proof of principle for using MENA to discover novel mutations, we report identification of a novel allele of the beethoven (btv) gene in Drosophila, which encodes a ciliary cytoplasmic dynein motor protein important for auditory mechanosensation. PMID:27600073

  1. Mapping the genetic variation of regional brain volumes as explained by all common SNPs from the ADNI study.

    PubMed

    Bryant, Christopher; Giovanello, Kelly S; Ibrahim, Joseph G; Chang, Jing; Shen, Dinggang; Peterson, Bradley S; Zhu, Hongtu

    2013-01-01

    Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p = 0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p = 0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer's disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs.

  2. ICSNPathway: identify candidate causal SNPs and pathways from genome-wide association study by one analytical framework.

    PubMed

    Zhang, Kunlin; Chang, Suhua; Cui, Sijia; Guo, Liyuan; Zhang, Liuyan; Wang, Jing

    2011-07-01

    Genome-wide association study (GWAS) is widely utilized to identify genes involved in human complex disease or some other trait. One key challenge for GWAS data interpretation is to identify causal SNPs and provide profound evidence on how they affect the trait. Currently, researches are focusing on identification of candidate causal variants from the most significant SNPs of GWAS, while there is lack of support on biological mechanisms as represented by pathways. Although pathway-based analysis (PBA) has been designed to identify disease-related pathways by analyzing the full list of SNPs from GWAS, it does not emphasize on interpreting causal SNPs. To our knowledge, so far there is no web server available to solve the challenge for GWAS data interpretation within one analytical framework. ICSNPathway is developed to identify candidate causal SNPs and their corresponding candidate causal pathways from GWAS by integrating linkage disequilibrium (LD) analysis, functional SNP annotation and PBA. ICSNPathway provides a feasible solution to bridge the gap between GWAS and disease mechanism study by generating hypothesis of SNP → gene → pathway(s). The ICSNPathway server is freely available at http://icsnpathway.psych.ac.cn/. PMID:21622953

  3. Theodore E. Woodward Award: lactase persistence SNPs in African populations regulate promoter activity in intestinal cell culture.

    PubMed

    Sibley, Eric; Ahn, Jong Kun

    2011-01-01

    Lactase-phlorizin hydrolase, lactase, is the intestinal enzyme responsible for the digestion of the milk sugar lactose. The majority of the world's human population experiences a decline in expression of the lactase gene by late childhood (lactase non-persistence). Individuals with lactase persistence, however, continue to express high levels of the lactase gene throughout adulthood. Lactase persistence is a heritable autosomal dominant condition and has been strongly correlated with several single nucleotide polymorphisms (SNPs) located ∼14 kb upstream of the lactase gene in different ethnic populations: -13910*T in Europeans and -13907*G, -13915*G, and -14010*C in several African populations. The coincidence of the four SNPs clustering within 100 bp strongly suggests that this region mediates the lactase non-persistence/persistence phenotype. Having previously characterized the European SNP, we aimed to determine whether the African SNPs similarly mediate a functional role in regulating the lactase promoter. Human intestinal Caco-2 cells were transfected with lactase SNP/promoter-reporter constructs and assayed for promoter activity. The -13907*G and -13915*G SNPs result in a significant enhancement of lactase promoter activity relative to the ancestral lactase non-persistence genotype. Such differential regulation by the SNPs is consistent with a causative role in the mechanism specifying the lactase persistence phenotype.

  4. Identification of Pyrus single nucleotide polymorphisms (SNPs) and evaluation for genetic mapping in European pear and interspecific Pyrus hybrids.

    PubMed

    Montanari, Sara; Saeed, Munazza; Knäbel, Mareike; Kim, YoonKyeong; Troggio, Michela; Malnoy, Mickael; Velasco, Riccardo; Fontana, Paolo; Won, KyungHo; Durel, Charles-Eric; Perchepied, Laure; Schaffer, Robert; Wiedow, Claudia; Bus, Vincent; Brewer, Lester; Gardiner, Susan E; Crowhurst, Ross N; Chagné, David

    2013-01-01

    We have used new generation sequencing (NGS) technologies to identify single nucleotide polymorphism (SNP) markers from three European pear (Pyrus communis L.) cultivars and subsequently developed a subset of 1096 pear SNPs into high throughput markers by combining them with the set of 7692 apple SNPs on the IRSC apple Infinium® II 8K array. We then evaluated this apple and pear Infinium® II 9K SNP array for large-scale genotyping in pear across several species, using both pear and apple SNPs. The segregating populations employed for array validation included a segregating population of European pear ('Old Home'×'Louise Bon Jersey') and four interspecific breeding families derived from Asian (P. pyrifolia Nakai and P. bretschneideri Rehd.) and European pear pedigrees. In total, we mapped 857 polymorphic pear markers to construct the first SNP-based genetic maps for pear, comprising 78% of the total pear SNPs included in the array. In addition, 1031 SNP markers derived from apple (13% of the total apple SNPs included in the array) were polymorphic and were mapped in one or more of the pear populations. These results are the first to demonstrate SNP transferability across the genera Malus and Pyrus. Our construction of high density SNP-based and gene-based genetic maps in pear represents an important step towards the identification of chromosomal regions associated with a range of horticultural characters, such as pest and disease resistance, orchard yield and fruit quality.

  5. Effect of DISC1 SNPs on brain structure in healthy controls and patients with a history of psychosis.

    PubMed

    Kähler, Anna K; Rimol, Lars M; Brown, Andrew Anand; Djurovic, Srdjan; Hartberg, Cecilie B; Melle, Ingrid; Dale, Anders M; Andreassen, Ole A; Agartz, Ingrid

    2012-09-01

    Disrupted-in-Schizophrenia-1 (DISC1) has been suggested as a susceptibility locus for a broad spectrum of psychiatric disorders. Risk variants have been associated with brain structural changes, which overlap alterations reported in schizophrenia and bipolar disorder patients. We used genome-wide genotyping data for a Norwegian sample of healthy controls (n = 171) and patients with a history of psychosis (n = 184), to investigate 61 SNPs in the DISC1 region for putative association with structural magnetic resonance imaging (sMRI) measures (hippocampal volume; mean cortical thickness; and total surface area, as well as cortical thickness and area divided into four lobar measures). SNP rs821589 was associated with mean temporal and total brain cortical thickness in controls (P(adjusted) = 0.009 and 0.02, respectively), but not in patients. SNPs rs11122319 and rs1417584 were associated with mean temporal cortical thickness in patients (P(adjusted) = 0.04 and 0.03, respectively), but not in controls, and both SNPs have previously been highly associated with DISC1 gene expression. There were significant genotype ×  case-control interactions. There was no significant association between SNPs and cortical area or hippocampal volume in controls, or with any of the structural measures in cases, after correction for multiple comparisons. In conclusion, DISC1 SNPs might impact brain structural variation, possibly differently in psychosis patients versus controls, but independent replication will be needed to confirm our findings. PMID:22815203

  6. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  7. Identification of novel single nucleotide polymorphisms (SNPs) in deer (Odocoileus spp.) using the BovineSNP50 BeadChip.

    PubMed

    Haynes, Gwilym D; Latch, Emily K

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the F(ST)-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1-30.1 million years before present).

  8. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea.

    PubMed

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties.

  9. In silico analysis of consequences of non-synonymous SNPs of Slc11a2 gene in Indian bovines.

    PubMed

    Patel, Shreya M; Koringa, Prakash G; Reddy, Bhaskar B; Nathani, Neelam M; Joshi, Chaitanya G

    2015-09-01

    The aim of our study was to analyze the consequences of non-synonymous SNPs in Slc11a2 gene using bioinformatic tools. There is a current need of efficient bioinformatic tools for in-depth analysis of data generated by the next generation sequencing technologies. SNPs are known to play an imperative role in understanding the genetic basis of many genetic diseases. Slc11a2 is one of the major metal transporter families in mammals and plays a critical role in host defenses. In this study, we performed a comprehensive analysis of the impact of all non-synonymous SNPs in this gene using multiple tools like SIFT, PROVEAN, I-Mutant and PANTHER. Among the total 124 SNPs obtained from amplicon sequencing of Slc11a2 gene by Ion Torrent PGM involving 10 individuals of Gir cattle and Murrah buffalo each, we found 22 non-synonymous. Comparing the prediction of these 4 methods, 5 nsSNPs (G369R, Y374C, A377V, Q385H and N492S) were identified as deleterious. In addition, while tested out for polar interactions with other amino acids in the protein, from above 5, Y374C, Q385H and N492S showed a change in interaction pattern and further confirmed by an increase in total energy after energy minimizations in case of mutant protein compared to the native. PMID:26484229

  10. Polymorphic L1 retrotransposons are frequently in strong linkage disequilibrium with neighboring SNPs.

    PubMed

    Higashino, Saneyuki; Ohno, Tomoyuki; Ishiguro, Koichi; Aizawa, Yasunori

    2014-05-10

    L1 retrotransposons have been the major driver of structural variation of the human genome. L1 insertion polymorphism (LIP)-mediated genomic variation can alter the transcriptome and contribute to the divergence of human phenotypes. To assess this possibility, a genome-wide association study (GWAS) including LIPs is required. Toward this ultimate goal, the present study examined linkage disequilibrium between six LIPs and their neighboring single nucleotide polymorphisms (SNPs). Genomic PCR and sequencing of L1-plus and -minus alleles from different donors revealed that all six LIPs were in strong linkage disequilibrium with at least one SNP. In addition, comparison of syntenic regions containing the identified SNP nucleotides was performed among modern humans (L1-plus and -minus alleles), archaic humans and non-human primates, revealing two different evolutionary schemes that might have resulted in the observed strong SNP-LIP linkage disequilibria. This study provides an experimental framework and guidance for a future SNP-LIP integrative GWAS.

  11. Identification of predominant SNPs as a novel method for genotyping bovine Staphylococcus aureus isolates.

    PubMed

    Hall, Jeffrey W; Ji, Yinduo

    2012-01-01

    Staphylococcus aureus is a formidable pathogen of both human and animal. Infection often gives rise to an economic loss resulting from the extended cost of treatment and hospitalization for humans, and loss of usable agriculture animal products from infected animals and treatment regiments. We describe here a protocol for the amplification and sequencing of predominant single nucleotide polymorphisms within the promoter region of hla (encoding α-toxin) that confers a hyper-producing α-toxin phenotype to S. aureus isolates associated with chronic and severe bovine mastitis infections. We validated our findings with a second round of analysis, confirming the SNPs as a valid genotypic marker for α-toxin hyper-producing bovine isolates. The identification of highly virulent isolates will allow for aggressive treatment of the infection and limit the disease and economic impact. With readily available reagents and facilities, this protocol can be completed in as little as 72 h once samples are isolated. PMID:22286701

  12. Active-R filter

    DOEpatents

    Soderstrand, Michael A.

    1976-01-01

    An operational amplifier-type active filter in which the only capacitor in the circuit is the compensating capacitance of the operational amplifiers, the various feedback and coupling elements being essentially solely resistive.

  13. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  14. HEPA air filter (image)

    MedlinePlus

    ... pet dander and other irritating allergens from the air. Along with other methods to reduce allergens, such ... controlling the amount of allergens circulating in the air. HEPA filters can be found in most air ...

  15. Improved optical filter

    NASA Technical Reports Server (NTRS)

    Title, A. M.

    1978-01-01

    Filter includes partial polarizer between birefrigent elements. Plastic film on partial polarizer compensates for any polarization rotation by partial polarizer. Two quarter-wave plates change incident, linearly polarized light into elliptically polarized light.

  16. Allelic Spectra of Risk SNPs Are Different for Environment/Lifestyle Dependent versus Independent Diseases

    PubMed Central

    Amos, Christopher I.

    2015-01-01

    Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning “environment” or “lifestyle” AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases. PMID:26201053

  17. SNPs and breast cancer risk prediction for African American and Hispanic women.

    PubMed

    Allman, Richard; Dite, Gillian S; Hopper, John L; Gordon, Ora; Starlard-Davenport, Athena; Chlebowski, Rowan; Kooperberg, Charles

    2015-12-01

    For African American or Hispanic women, the extent to which clinical breast cancer risk prediction models are improved by including information on susceptibility single nucleotide polymorphisms (SNPs) is unknown, even though these women comprise increasing proportions of the US population and represent a large proportion of the world's population. We studied 7539 African American and 3363 Hispanic women from the Women's Health Initiative. The age-adjusted 5-year risks from the BCRAT and IBIS risk prediction models were measured and combined with a risk score based on >70 independent susceptibility SNPs. Logistic regression, adjusting for age group, was used to estimate risk associations with log-transformed age-adjusted 5-year risks. Discrimination was measured by the odds ratio (OR) per standard deviation (SD) and the area under the receiver operator curve (AUC). When considered alone, the ORs for African American women were 1.28 for BCRAT, and 1.04 for IBIS. When combined with the SNP risk score (OR 1.23), the corresponding ORs were 1.39 and 1.22. For Hispanic women the corresponding ORs were 1.25 for BCRAT, and 1.15 for IBIS. When combined with the SNP risk score (OR 1.39), the corresponding ORs were 1.48 and 1.42. There was no evidence that any of the combined models were not well calibrated. Including information on known breast cancer susceptibility loci provides approximately 10 and 19% improvement in risk prediction using BCRAT for African Americans and Hispanics, respectively. The corresponding figures for IBIS are approximately 18 and 26%, respectively.

  18. SNPs in transporter and metabolizing genes as predictive markers for oxaliplatin treatment in colorectal cancer patients.

    PubMed

    Kap, Elisabeth J; Seibold, Petra; Scherer, Dominique; Habermann, Nina; Balavarca, Yesilda; Jansen, Lina; Zucknick, Manuela; Becker, Natalia; Hoffmeister, Michael; Ulrich, Alexis; Benner, Axel; Ulrich, Cornelia M; Burwinkel, Barbara; Brenner, Hermann; Chang-Claude, Jenny

    2016-06-15

    Oxaliplatin is frequently used as part of a chemotherapeutic regimen with 5-fluorouracil in the treatment of colorectal cancer (CRC). The cellular availability of oxaliplatin is dependent on metabolic and transporter enzymes. Variants in genes encoding these enzymes may cause variation in response to oxaliplatin and could be potential predictive markers. Therefore, we used a two-step procedure to comprehensively investigate 1,444 single nucleotide polymorphisms (SNPs) from these pathways for their potential as predictive markers for oxaliplatin treatment, using 623 stage II-IV CRC patients (of whom 201 patients received oxaliplatin) from a German prospective patient cohort treated with adjuvant or palliative chemotherapy. First, all genes were screened using the global test that evaluated SNP*oxaliplatin interaction terms per gene. Second, one model was created by backward elimination on all SNP*oxaliplatin interactions of the selected genes. The statistical procedure was evaluated using bootstrap analyses. Nine genes differentially associated with overall survival according to oxaliplatin treatment (unadjusted p values < 0.05) were selected. Model selection resulted in the inclusion of 14 SNPs from eight genes (six transporter genes, ABCA9, ABCB11, ABCC10, ATP1A1, ATP1B2, ATP8B3, and two metabolism genes GSTM5, GRHPR), which significantly improved model fit. Using bootstrap analysis we show an improvement of the prediction error of 3.7% in patients treated with oxaliplatin. Several variants in genes involved in metabolism and transport could thus be potential predictive markers for oxaliplatin treatment in CRC patients. If confirmed, inclusion of these variants in a predictive test could identify patients who are more likely to benefit from treatment with oxaliplatin. PMID:26835885

  19. SNPs and breast cancer risk prediction for African American and Hispanic women.

    PubMed

    Allman, Richard; Dite, Gillian S; Hopper, John L; Gordon, Ora; Starlard-Davenport, Athena; Chlebowski, Rowan; Kooperberg, Charles

    2015-12-01

    For African American or Hispanic women, the extent to which clinical breast cancer risk prediction models are improved by including information on susceptibility single nucleotide polymorphisms (SNPs) is unknown, even though these women comprise increasing proportions of the US population and represent a large proportion of the world's population. We studied 7539 African American and 3363 Hispanic women from the Women's Health Initiative. The age-adjusted 5-year risks from the BCRAT and IBIS risk prediction models were measured and combined with a risk score based on >70 independent susceptibility SNPs. Logistic regression, adjusting for age group, was used to estimate risk associations with log-transformed age-adjusted 5-year risks. Discrimination was measured by the odds ratio (OR) per standard deviation (SD) and the area under the receiver operator curve (AUC). When considered alone, the ORs for African American women were 1.28 for BCRAT, and 1.04 for IBIS. When combined with the SNP risk score (OR 1.23), the corresponding ORs were 1.39 and 1.22. For Hispanic women the corresponding ORs were 1.25 for BCRAT, and 1.15 for IBIS. When combined with the SNP risk score (OR 1.39), the corresponding ORs were 1.48 and 1.42. There was no evidence that any of the combined models were not well calibrated. Including information on known breast cancer susceptibility loci provides approximately 10 and 19% improvement in risk prediction using BCRAT for African Americans and Hispanics, respectively. The corresponding figures for IBIS are approximately 18 and 26%, respectively. PMID:26589314

  20. SNPs in transporter and metabolizing genes as predictive markers for oxaliplatin treatment in colorectal cancer patients.

    PubMed

    Kap, Elisabeth J; Seibold, Petra; Scherer, Dominique; Habermann, Nina; Balavarca, Yesilda; Jansen, Lina; Zucknick, Manuela; Becker, Natalia; Hoffmeister, Michael; Ulrich, Alexis; Benner, Axel; Ulrich, Cornelia M; Burwinkel, Barbara; Brenner, Hermann; Chang-Claude, Jenny

    2016-06-15

    Oxaliplatin is frequently used as part of a chemotherapeutic regimen with 5-fluorouracil in the treatment of colorectal cancer (CRC). The cellular availability of oxaliplatin is dependent on metabolic and transporter enzymes. Variants in genes encoding these enzymes may cause variation in response to oxaliplatin and could be potential predictive markers. Therefore, we used a two-step procedure to comprehensively investigate 1,444 single nucleotide polymorphisms (SNPs) from these pathways for their potential as predictive markers for oxaliplatin treatment, using 623 stage II-IV CRC patients (of whom 201 patients received oxaliplatin) from a German prospective patient cohort treated with adjuvant or palliative chemotherapy. First, all genes were screened using the global test that evaluated SNP*oxaliplatin interaction terms per gene. Second, one model was created by backward elimination on all SNP*oxaliplatin interactions of the selected genes. The statistical procedure was evaluated using bootstrap analyses. Nine genes differentially associated with overall survival according to oxaliplatin treatment (unadjusted p values < 0.05) were selected. Model selection resulted in the inclusion of 14 SNPs from eight genes (six transporter genes, ABCA9, ABCB11, ABCC10, ATP1A1, ATP1B2, ATP8B3, and two metabolism genes GSTM5, GRHPR), which significantly improved model fit. Using bootstrap analysis we show an improvement of the prediction error of 3.7% in patients treated with oxaliplatin. Several variants in genes involved in metabolism and transport could thus be potential predictive markers for oxaliplatin treatment in CRC patients. If confirmed, inclusion of these variants in a predictive test could identify patients who are more likely to benefit from treatment with oxaliplatin.

  1. Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion

    PubMed Central

    Vinkhuyzen, A A E; Pedersen, N L; Yang, J; Lee, S H; Magnusson, P K E; Iacono, W G; McGue, M; Madden, P A F; Heath, A C; Luciano, M; Payton, A; Horan, M; Ollier, W; Pendleton, N; Deary, I J; Montgomery, G W; Martin, N G; Visscher, P M; Wray, N R

    2012-01-01

    The personality traits of neuroticism and extraversion are predictive of a number of social and behavioural outcomes and psychiatric disorders. Twin and family studies have reported moderate heritability estimates for both traits. Few associations have been reported between genetic variants and neuroticism/extraversion, but hardly any have been replicated. Moreover, the ones that have been replicated explain only a small proportion of the heritability (<∼2%). Using genome-wide single-nucleotide polymorphism (SNP) data from ∼12 000 unrelated individuals we estimated the proportion of phenotypic variance explained by variants in linkage disequilibrium with common SNPs as 0.06 (s.e.=0.03) for neuroticism and 0.12 (s.e.=0.03) for extraversion. In an additional series of analyses in a family-based sample, we show that while for both traits ∼45% of the phenotypic variance can be explained by pedigree data (that is, expected genetic similarity) one third of this can be explained by SNP data (that is, realized genetic similarity). A part of the so-called ‘missing heritability' has now been accounted for, but some of the reported heritability is still unexplained. Possible explanations for the remaining missing heritability are that: (i) rare variants that are not captured by common SNPs on current genotype platforms make a major contribution; and/ or (ii) the estimates of narrow sense heritability from twin and family studies are biased upwards, for example, by not properly accounting for nonadditive genetic factors and/or (common) environmental factors. PMID:22832902

  2. [Construction and function identification of luciferase reporter gene vectors containing SNPs in NFKBIA gene 3'UTR].

    PubMed

    Yang, Shuo; Li, Jia-li; Bi, Hui-chang; Zhou, Shou-ning; Liu, Xiao-man; Zeng, Hang; Hu, Bing-fang; Huang, Min

    2016-01-01

    This study aims to investigate the function of two SNPs (rs8904C > T and rs696G >A) in 3' untranslated region (3'UTR) of NFKBIA gene by constructing luciferase reporter gene. A patient's genomic DNA with rs8904 CC and rs696 GA genotype was used as the PCR template. Full-length 3'UTR of NFKBIA gene was amplified by different primers. After sequencing validation, these fragments were inserted to the luciferase reporter vector, pGL3-promoter to construct recombinant plasmids containing four kinds of haplotypes, pGL3-rs8904C/rs696G, pGL3-rs8904C/rs696A, pGL3-rs8904T/rs696G and pGL3-rs8904T/rs696A. Then these plasmids were transfected into LS174T cells and the luciferase activity was detected. Compared with pGL3-vector transfected cells (negative control), the luciferase activity of the four kinds of recombinant plasmids was significantly decreased (P < 0.001). For rs696G > A, the luciferase activity of the recombinant plasmids containing A allele (pGL3-rs8904C/rs696A and pGL3-rs8904T/rs696A) was about 45.1% (P < 0.05) and 56.1% (P < 0.001) lower than those containing G allele (pGL3-rs8904C/rs696G and pGL3-rs8904T/rs696G), respectively. For rs8904C > T, there were no significant differences in the luciferase activity between the recombinant plasmids containing T allele and those with C allele. Together, the luciferase reporter gene vectors containing SNPs in NFKBIA gene 3'UTR were constructed successfully and rs696G > A could decrease the luciferase activity while rs8904C >T didn't have much effect on the luciferase activity. PMID:27405166

  3. [Construction and function identification of luciferase reporter gene vectors containing SNPs in NFKBIA gene 3'UTR].

    PubMed

    Yang, Shuo; Li, Jia-li; Bi, Hui-chang; Zhou, Shou-ning; Liu, Xiao-man; Zeng, Hang; Hu, Bing-fang; Huang, Min

    2016-01-01

    This study aims to investigate the function of two SNPs (rs8904C > T and rs696G >A) in 3' untranslated region (3'UTR) of NFKBIA gene by constructing luciferase reporter gene. A patient's genomic DNA with rs8904 CC and rs696 GA genotype was used as the PCR template. Full-length 3'UTR of NFKBIA gene was amplified by different primers. After sequencing validation, these fragments were inserted to the luciferase reporter vector, pGL3-promoter to construct recombinant plasmids containing four kinds of haplotypes, pGL3-rs8904C/rs696G, pGL3-rs8904C/rs696A, pGL3-rs8904T/rs696G and pGL3-rs8904T/rs696A. Then these plasmids were transfected into LS174T cells and the luciferase activity was detected. Compared with pGL3-vector transfected cells (negative control), the luciferase activity of the four kinds of recombinant plasmids was significantly decreased (P < 0.001). For rs696G > A, the luciferase activity of the recombinant plasmids containing A allele (pGL3-rs8904C/rs696A and pGL3-rs8904T/rs696A) was about 45.1% (P < 0.05) and 56.1% (P < 0.001) lower than those containing G allele (pGL3-rs8904C/rs696G and pGL3-rs8904T/rs696G), respectively. For rs8904C > T, there were no significant differences in the luciferase activity between the recombinant plasmids containing T allele and those with C allele. Together, the luciferase reporter gene vectors containing SNPs in NFKBIA gene 3'UTR were constructed successfully and rs696G > A could decrease the luciferase activity while rs8904C >T didn't have much effect on the luciferase activity.

  4. Novel SNPs in the PRDM16 gene and their associations with performance traits in chickens.

    PubMed

    Han, Ruili; Wei, Yang; Kang, Xiangtao; Chen, Hong; Sun, Guirong; Li, Guoxi; Bai, Yichun; Tian, Yadong; Huang, Yanqun

    2012-03-01

    The PR domain containing 16 (PRDM16) is a member of the Prdm family, and is known to regulate cell differentiation. In the present study, DNA pool sequencing methods were employed to screen genetic variations in the chicken PRDM16 gene. The results revealed four novel single nucleotide polymorphisms (SNPs): NC_006108.2: g.92188G>A, XM_417551: c.1161C>T (Ala/Ala, 387aa), c.1233C>T (Ser/Ser, 411aa) and c.1433G>A (Ser/Asn, 478aa). The BglI polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) was used to detect c.1161C>T, while HhaI Forced PCR-RFLP methods were used to detect 1233C>T and c.1433G>A in 964 chickens. The chickens comprised 38 grandparents, 66 F(1) parents and 860 F(2) birds derived from an F(2) resource population of Gushi chickens crossed with Anka broilers. The associations of the polymorphisms in the chicken PRDM16 gene with performance traits were analyzed in the 860 F(2) chickens. The results indicated that the three SNPs were significantly associated with growth, fatness and meat quality traits in the chickens. In particular, the polymorphisms of the missense SNP (c.1433G>A) had positive effects on chicken body weight and body size at different stages. It affected also fatness traits significantly. Comparison of the different genotypes of c.1433G>A showed that the GG genotype favored chicken growth and fatness traits. PMID:21761141

  5. Anti-Glare Filters

    NASA Technical Reports Server (NTRS)

    1989-01-01

    Glare from CRT screens has been blamed for blurred vision, eyestrain, headaches, etc. Optical Coating Laboratory, Inc. (OCLI) manufactures a coating to reduce glare which was used to coat the windows on the Gemini and Apollo spacecraft. In addition, OCLI offers anti-glare filters (Glare Guard) utilizing the same thin film coating technology. The coating minimizes brightness, provides enhanced contrast and improves readability. The filters are OCLI's first consumer product.

  6. Spatial filter issues

    SciTech Connect

    Murray, J.E.; Estabrook, K.G.; Milam, D.; Sell, W.D.; Van Wonterghem, R.M.; Feil, M.D.; Rubenchick, A.M.

    1996-12-09

    Experiments and calculations indicate that the threshold pressure in spatial filters for distortion of a transmitted pulse scales approximately as I{sup O.2} and (F{number_sign}){sup 2} over the intensity range from 10{sup 14} to 2xlO{sup 15} W/CM{sup 2} . We also demonstrated an interferometric diagnostic that will be used to measure the scaling relationships governing pinhole closure in spatial filters.

  7. Holographic interference filters

    NASA Astrophysics Data System (ADS)

    Diehl, Damon W.

    Holographic mirrors have wavelength-selection properties and thus qualify as a class of interference filters. Two theoretical methods for analyzing such structures are developed. The first method uses Hill's matrix method to yield closed-forms solutions in terms of the Floquet-Bloch waves within a periodic structure. A process is developed for implementing this solution method on a computer, using sparse-matrix memory allocation, numerical root-finding algorithms, and inverse-iteration techniques. It is demonstrated that Hill's matrix method is valid for the analysis of finite and multi-periodic problems. The second method of theoretical analysis is a transfer-matrix technique, which is herein termed thin-film decomposition. It is shown that the two methods of solution yield results that differ by, at worst, a fraction of a percent. Using both calculation techniques, a number of example problems are explored. Of key importance is the construction of a set of curves that are useful for the design and characterization of holographic interference filters. In addition to the theoretical development, methods are presented for the fabrication of holographic interference filters using DuPont HRF-800X001 photopolymer. Central to the exposure system is a frequency-stabilized, tunable dye laser. The types of filters fabricated include single-tone reflection filters, two types of multitone reflection filters, and reflection filters for infrared wavelengths. These filters feature index profiles that are not easily attainable through other fabrication methods. As a supplement to the body of the dissertation, the computer algorithms developed to implement Hill's matrix method and thin-film decomposition are also included as an appendix. Further appendices provide more information on Floquet's theorem and Hill's matrix method. A final appendix presents a design for an infrared laser spectrophotometer.

  8. The identification of trans-associations between prostate cancer GWAS SNPs and RNA expression differences in tumor-adjacent stroma

    PubMed Central

    Chen, Xin; McClelland, Michael; Jia, Zhenyu; Rahmatpanah, Farah B.; Sawyers, Anne; Trent, Jeffrey; Duggan, David; Mercola, Dan

    2015-01-01

    Here we tested the hypothesis that SNPs associated with prostate cancer risk, might differentially affect RNA expression in prostate cancer stroma. The most significant 35 SNP loci were selected from Genome Wide Association (GWA) studies of ~40,000 patients. We also selected 4030 transcripts previously associated with prostate cancer diagnosis and prognosis. eQTL analysis was carried out by a modified BAYES method to analyze the associations between the risk variants and expressed transcripts jointly in a single model. We observed 47 significant associations between eight risk variants and the expression patterns of 46 genes. This is the first study to identify associations between multiple SNPs and multiple in trans gene expression differences in cancer stroma. Potentially, a combination of SNPs and associated expression differences in prostate stroma may increase the power of risk assessment for individuals, and for cancer progression. PMID:25638161

  9. Contactor/filter improvements

    DOEpatents

    Stelman, David

    1989-01-01

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. The housing further includes a gas inlet means, a gas outlet means, and means for moving a body of granular material through the zone. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. Disposed on the upstream face of the filter element is a cover screen which isolates the filter element from contact with the moving granular bed and collects a portion of the particulates so as to form a dust cake having openings small enough to exclude the granular material, yet large enough to receive the dust particles. In one embodiment, the granular material is comprised of prous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses.

  10. NICMOS Filter Wheel Test

    NASA Astrophysics Data System (ADS)

    Wheeler, Thomas

    2009-07-01

    This is an engineering test {described in SMOV4 Activity Description NICMOS-04} to verify the aliveness, functionality, operability, and electro-mechanical calibration of the NICMOS filter wheel motors and assembly after NCS restart in SMOV4. This test has been designed to obviate concerns over possible deformation or breakage of the fitter wheel "soda-straw" shafts due to excess rotational drag torque and/or bending moments which may be imparted due to changes in the dewar metrology from warm-up/cool-down. This test should be executed after the NCS {and filter wheel housing} has reached and approximately equilibrated to its nominal operating temperature.Addition of visits G0 - G9 {9/9/09}: Ten visits copied from proposal 11868 {visits 20, 30, ..., 90, A0, B0}. Each visit moves two filter positions, takes lamp ON/OFF exposures and then moves back to the blank position. Visits G0, G1 and G2 will leave the filter wheels disabled. The remaining visits will leave the filter wheels enabled. There are sufficient in between times to allow for data download and analysis. In the case of problem is encountered, the filter wheels will be disabled through a real time command. The in between times are all set to 22-50 hours. It is preferable to have as short as possible in between time.

  11. Remotely serviced filter and housing

    DOEpatents

    Ross, Maurice J.; Zaladonis, Larry A.

    1988-09-27

    A filter system for a hot cell comprises a housing adapted for input of air or other gas to be filtered, flow of the air through a filter element, and exit of filtered air. The housing is tapered at the top to make it easy to insert a filter cartridge using an overhead crane. The filter cartridge holds the filter element while the air or other gas is passed through the filter element. Captive bolts in trunnion nuts are readily operated by electromechanical manipulators operating power wrenches to secure and release the filter cartridge. The filter cartridge is adapted to make it easy to change a filter element by using a master-slave manipulator at a shielded window station.

  12. [Microchip electrophoresis coupled with multiplex allele-specific am-plification for typing multiple single nucleotide polymorphisms (SNPs) simultaneously].

    PubMed

    Wang, Wei-Peng; Zhou, Guo-Hua

    2009-02-01

    A new method of DNA adapter ligation-mediated allele-specific amplification (ALM-ASA) was developed for typing multiple single nucleotide polymorphisms (SNPs) on the platform of microchip electrophoresis. Using seven SNPs of 794C>T, 1274C>T, 2143T>C, 2766T>del, 3298G>A, 5200G>A, and 5277C>T in the interleukin 1B (IL1B) gene as a target object, a long DNA fragment containing the seven SNPs of interest was pre-amplified to enhance the specificity. The pre-amplified DNA fragment was digested by a restriction endonuclease to form sticky ends; and then the adapter was ligated to either end of the digested fragment. Using the adapter-ligated fragments as templates, a 7-plex allele-specific amplification was performed by 7 allele-specific primers and a universal primer in one tube. The allele-specific products amplified were separated by chip electrophoresis and the types of SNPs were easily discriminated by the product sizes. The seven SNPs in IL1B gene in 48 healthy Chinese were successfully typed by microchip electrophoresis and the results coincided with those by PCR-restriction fragment length polymorphism and sequencing method. The method established was accurate and can be used to type multiple SNPs simultaneously. In combination with microchip electrophoresis for readout, ALM-ASA assay can be used for fast SNP detection with a small amount of sample. Using self-prepared gel matrix and reused chips for analysis, the SNP can be typed at an ultra low cost.

  13. Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

    NASA Astrophysics Data System (ADS)

    Manago, K. F.; Hogue, T. S.; Hering, A. S.

    2014-12-01

    In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.

  14. A global view of 54,001 single nucleotide polymorphisms (SNPs) on the Illumina BovineSNP50 BeadChip and their transferability to water buffalo.

    PubMed

    Michelizzi, Vanessa N; Wu, Xiaolin; Dodson, Michael V; Michal, Jennifer J; Zambrano-Varon, Jorge; McLean, Derek J; Jiang, Zhihua

    2010-01-01

    The Illumina BovineSNP50 BeadChip features 54,001 informative single nucleotide polymorphisms (SNPs) that uniformly span the entire bovine genome. Among them, 52,255 SNPs have locations assigned in the current genome assembly (Btau_4.0), including 19,294 (37%) intragenic SNPs (i.e., located within genes) and 32,961 (63%) intergenic SNPs (i.e., located between genes). While the SNPs represented on the Illumina Bovine50K BeadChip are evenly distributed along each bovine chromosome, there are over 14,000 genes that have no SNPs placed on the current BeadChip. Kernel density estimation, a non-parametric method, was used in the present study to identify SNP-poor and SNP-rich regions on each bovine chromosome. With bandwidth = 0.05 Mb, we observed that most regions have SNP densities within 2 standard deviations of the chromosome SNP density mean. The SNP density on chromosome X was the most dynamic, with more than 30 SNP-rich regions and at least 20 regions with no SNPs. Genotyping ten water buffalo using the Illumina BovineSNP50 BeadChip revealed that 41,870 of the 54,001 SNPs are fully scored on all ten water buffalo, but 6,771 SNPs are partially scored on one to nine animals. Both fully scored and partially/no scored SNPs are clearly clustered with various sizes on each chromosome. However, among 43,687 bovine SNPs that were successfully genotyped on nine and ten water buffalo, only 1,159 were polymorphic in the species. These results indicate that the SNPs sites, but not the polymorphisms, are conserved between two species. Overall, our present study provides a solid foundation to further characterize the SNP evolutionary process, thus improving understanding of within- and between-species biodiversity, phylogenetics and adaption to environmental changes.

  15. Edge-Aware BMA Filters.

    PubMed

    Guang Deng

    2016-01-01

    There has been continuous research in edge-aware filters which have found many applications in computer vision and image processing. In this paper, we propose a principled-approach for the development of edge-aware filters. The proposed approach is based on two well-established principles: 1) optimal parameter estimation and 2) Bayesian model averaging (BMA). Using this approach, we formulate the problem of filtering a pixel in a local pixel patch as an optimal estimation problem. Since a pixel belongs to multiple local patches, there are multiple estimates of the same pixel. We combine these estimates into a final estimate using BMA. We demonstrate the versatility of this approach by developing a family of BMA filters based on different settings of cost functions and log-likelihood and log-prior functions. We also present a new interpretation of the guided filter and develop a BMA guided filter which includes the guided filter as a special case. We show that BMA filters can produce similar smoothing results as those of the state-of-the-art edge-aware filters. Two BMA filters are computationally as efficient as the guided filter which is one of the fastest edge-aware filters. We also demonstrate that the BMA guided filter is better than the guided filter in preserving sharp edges. A new feature of the BMA guided filter is that the filtered image is similar to that produced by a clustering process.

  16. Association, characterisation and meta-analysis of SNPs linked to general reading ability in a German dyslexia case-control cohort.

    PubMed

    Müller, Bent; Wilcke, Arndt; Czepezauer, Ivonne; Ahnert, Peter; Boltze, Johannes; Kirsten, Holger

    2016-01-01

    Dyslexia is a severe disorder in the acquisition of reading and writing. Several studies investigated the role of genetics for reading, writing and spelling ability in the general population. However, many of the identified SNPs were not analysed in case-control cohorts. Here, we investigated SNPs previously linked to reading or spelling ability in the general population in a German case-control cohort. Furthermore, we characterised these SNPs for functional relevance with in silico methods and meta-analysed them with previous studies. A total of 16 SNPs within five genes were included. The total number of risk alleles was higher in cases than in controls. Three SNPs were nominally associated with dyslexia: rs7765678 within DCDC2, and rs2038137 and rs6935076 within KIAA0319. The relevance of rs2038137 and rs6935076 was further supported by the meta-analysis. Functional profiling included analysis of tissue-specific expression, annotations for regulatory elements and effects on gene expression levels (eQTLs). Thereby, we found molecular mechanistical implications for 13 of all 16 included SNPs. SNPs associated in our cohort showed stronger gene-specific eQTL effects than non-associated SNPs. In summary, our results validate SNPs previously linked to reading and spelling in the general population in dyslexics and provide insights into their putative molecular pathomechanisms. PMID:27312598

  17. Association, characterisation and meta-analysis of SNPs linked to general reading ability in a German dyslexia case-control cohort

    PubMed Central

    Müller, Bent; Wilcke, Arndt; Czepezauer, Ivonne; Ahnert, Peter; Boltze, Johannes; Kirsten, Holger; Friederici, Angela D.; Emmrich, Frank; Brauer, Jens; Wilcke, Arndt; Neef, Nicole; Boltze, Johannes; Skeide, Michael; Kirsten, Holger; Schaadt, Gesa; Müller, Bent; Kraft, Indra; Czepezauer, Ivonne; Dörr, Liane

    2016-01-01

    Dyslexia is a severe disorder in the acquisition of reading and writing. Several studies investigated the role of genetics for reading, writing and spelling ability in the general population. However, many of the identified SNPs were not analysed in case-control cohorts. Here, we investigated SNPs previously linked to reading or spelling ability in the general population in a German case-control cohort. Furthermore, we characterised these SNPs for functional relevance with in silico methods and meta-analysed them with previous studies. A total of 16 SNPs within five genes were included. The total number of risk alleles was higher in cases than in controls. Three SNPs were nominally associated with dyslexia: rs7765678 within DCDC2, and rs2038137 and rs6935076 within KIAA0319. The relevance of rs2038137 and rs6935076 was further supported by the meta-analysis. Functional profiling included analysis of tissue-specific expression, annotations for regulatory elements and effects on gene expression levels (eQTLs). Thereby, we found molecular mechanistical implications for 13 of all 16 included SNPs. SNPs associated in our cohort showed stronger gene-specific eQTL effects than non-associated SNPs. In summary, our results validate SNPs previously linked to reading and spelling in the general population in dyslexics and provide insights into their putative molecular pathomechanisms. PMID:27312598

  18. Remotely serviced filter and housing

    DOEpatents

    Ross, M.J.; Zaladonis, L.A.

    1987-07-22

    A filter system for a hot cell comprises a housing adapted for input of air or other gas to be filtered, flow of the air through a filter element, and exit of filtered air. The housing is tapered at the top to make it easy to insert a filter cartridge holds the filter element while the air or other gas is passed through the filter element. Captive bolts in trunnion nuts are readily operated by electromechanical manipulators operating power wrenches to secure and release the filter cartridge. The filter cartridge is adapted to make it easy to change a filter element by using a master-slave manipulator at a shielded window station. 6 figs.

  19. Anti-clogging filter system

    DOEpatents

    Brown, Erik P.

    2015-05-19

    An anti-clogging filter system for filtering a fluid containing large particles and small particles includes an enclosure with at least one individual elongated tubular filter element in the enclosure. The individual elongated tubular filter element has an internal passage, a closed end, an open end, and a filtering material in or on the individual elongated tubular filter element. The fluid travels through the open end of the elongated tubular element and through the internal passage and through the filtering material. An anti-clogging element is positioned on or adjacent the individual elongated tubular filter element and provides a fluid curtain that preferentially directs the larger particulates to one area of the filter material allowing the remainder of the filter material to remain more efficient.

  20. An IIR median hybrid filter

    NASA Technical Reports Server (NTRS)

    Bauer, Peter H.; Sartori, Michael A.; Bryden, Timothy M.

    1992-01-01

    A new class of nonlinear filters, the so-called class of multidirectional infinite impulse response median hybrid filters, is presented and analyzed. The input signal is processed twice using a linear shift-invariant infinite impulse response filtering module: once with normal causality and a second time with inverted causality. The final output of the MIMH filter is the median of the two-directional outputs and the original input signal. Thus, the MIMH filter is a concatenation of linear filtering and nonlinear filtering (a median filtering module). Because of this unique scheme, the MIMH filter possesses many desirable properties which are both proven and analyzed (including impulse removal, step preservation, and noise suppression). A comparison to other existing median type filters is also provided.