Science.gov

Sample records for filtering snps imputed

  1. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

    PubMed Central

    Bigdeli, T. Bernard; Williamson, Vernell S.; Vladimirov, Vladimir I.; Riley, Brien P.; Fanous, Ayman H.; Bacanu, Silviu-Alin

    2015-01-01

    Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. Contact: dlee4@vcu.edu Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:26059716

  2. Genotype imputation efficiency in Nelore Cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotype imputation efficiency in Nelore cattle was evaluated in different scenarios of lower density (LD) chips, imputation methods and sets of animals to have their genotypes imputed. Twelve commercial and virtual custom LD chips with densities varying from 7K to 75K SNPs were tested. Customized L...

  3. SWEEP: A Tool for Filtering High-Quality SNPs in Polyploid Crops

    PubMed Central

    Clevenger, Josh P.; Ozias-Akins, Peggy

    2015-01-01

    High-throughput next-generation sequence-based genotyping and single nucleotide polymorphism (SNP) detection opens the door for emerging genomics-based breeding strategies such as genome-wide association analysis and genomic selection. In polyploids, SNP detection is confounded by a highly similar homeologous sequence where a polymorphism between subgenomes must be differentiated from a SNP. We have developed and implemented a novel tool called SWEEP: Sliding Window Extraction of Explicit Polymorphisms. SWEEP uses subgenome polymorphism haplotypes as contrast to identify true SNPs between genotypes. The tool is a single command script that calls a series of modules based on user-defined options and takes sorted/indexed bam files or vcf files as input. Filtering options are highly flexible and include filtering based on sequence depth, alternate allele ratio, and SNP quality on top of the SWEEP filtering procedure. Using real and simulated data we show that SWEEP outperforms current SNP filtering methods for polyploids. SWEEP can be used for high-quality SNP discovery in polyploid crops. PMID:26153076

  4. Imputation reliability on DNA biallelic markers for drug metabolism studies

    PubMed Central

    2012-01-01

    Background Imputation is a statistical process used to predict genotypes of loci not directly assayed in a sample of individuals. Our goal is to measure the performance of imputation in predicting the genotype of the best known gene polymorphisms involved in drug metabolism using a common SNP array genotyping platform generally exploited in genome wide association studies. Methods Thirty-nine (39) individuals were genotyped with both Affymetrix Genome Wide Human SNP 6.0 (AFFY) and Affymetrix DMET Plus (DMET) platforms. AFFY and DMET contain nearly 900000 and 1931 markers respectively. We used a 1000 Genomes Pilot + HapMap 3 reference panel. Imputation was performed using the computer program Impute, version 2. SNPs contained in DMET, but not imputed, were analysed studying markers around their chromosome regions. The efficacy of the imputation was measured evaluating the number of successfully imputed SNPs (SSNPs). Results The imputation predicted the genotypes of 654 SNPs not present in the AFFY array, but contained in the DMET array. Approximately 1000 SNPs were not annotated in the reference panel and therefore they could not be directly imputed. After testing three different imputed genotype calling threshold (IGCT), we observed that imputation performs at its best for IGCT value equal to 50%, with rate of SSNPs (MAF > 0.05) equal to 85%. Conclusions Most of the genes involved in drug metabolism can be imputed with high efficacy using standard genome-wide genotyping platforms and imputing procedures. PMID:23095502

  5. Artifact due to differential error when cases and controls are imputed from different platforms.

    PubMed

    Sinnott, Jennifer A; Kraft, Peter

    2012-01-01

    Including previously genotyped controls in a genome-wide association study can provide cost-savings, but can also create design biases. When cases and controls are genotyped on different platforms, the imputation needed to provide genome-wide coverage will introduce differential measurement error and may lead to false positives. We compared genotype frequencies of two healthy control groups from the Nurses' Health Study genotyped on different platforms [Affymetrix 6.0 (n = 1,672) and Illumina HumanHap550 (n = 1,038)]. Using standard imputation quality filters, we observed 9,841 single-nucleotide polymorphisms (SNPs) out of 2,347,809 (0.4%) significant at the 5 10(-8) level. We explored three methods for controlling for this Type I error inflation. One method was to remove platform effects using principal components; another was to restrict to SNPs of highest quality imputation; and a third was to genotype some controls alongside cases to exclude SNPs that are statistical artifact. The first method could not reduce the Type I error rate; the other two could dramatically reduce the error rate, although both required that a portion of SNPs be excluded from analysis. Ideally, the biases we describe would be eliminated at the design stage, by genotyping sufficient numbers of cases and controls on each platform. Researchers using imputation to combine samples genotyped on different platforms with severely unbalanced case-control ratios should be aware of the potential for inflated Type I error rates and apply appropriate quality filters. Every SNP found with genome-wide significance should be validated on another platform to verify that its significance is not an artifact of study design. PMID:21735171

  6. Sequence Imputation of HPV16 Genomes for Genetic Association Studies

    PubMed Central

    Smith, Benjamin; Chen, Zigui; Reimers, Laura; van Doorslaer, Koenraad; Schiffman, Mark; DeSalle, Rob; Herrero, Rolando; Yu, Kai; Wacholder, Sholom; Wang, Tao; Burk, Robert D.

    2011-01-01

    Background Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. Methods A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. Results HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. Conclusions Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity. PMID:21731721

  7. Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection.

    PubMed

    Berry, D P; Kearney, J F

    2011-06-01

    The objective of this study was to quantify the accuracy achievable from imputing genotypes from a commercially available low-density marker panel (2730 single nucleotide polymorphisms (SNPs) following edits) to a commercially available higher density marker panel (51 602 SNPs following edits) in Holstein-Friesian cattle using Beagle, a freely available software package. A population of 764 Holstein-Friesian animals born since 2006 were used as the test group to quantify the accuracy of imputation, all of which had genotypes for the high-density panel; only SNPs on the low-density panel were retained with the remaining SNPs to be imputed. The reference population for imputation consisted of 4732 animals born before 2006 also with genotypes on the higher density marker panel. The concordance between the actual and imputed genotypes in the test group of animals did not vary across chromosomes and was on average 95%; the concordance between actual and imputed alleles was, on average, 97% across all SNPs. Genomic predictions were undertaken across a range of production and functional traits for the 764 test group animals using either their real or imputed genotypes. Little or no mean difference in the genomic predictions was evident when comparing direct genomic values (DGVs) using real or imputed genotypes. The average correlation between the DGVs estimated using the real or imputed genotypes for the 15 traits included in the Irish total merit index was 0.97 (range of 0.92 to 0.99), indicating good concordance between proofs from real or imputed genotypes. Results show that a commercially available high-density marker panel can be imputed from a commercially available lower density marker panel, which will also have a lower cost, thereby facilitating a reduction in the cost of genomic selection. Increased available numbers of genotyped and phenotyped animals also has implications for increasing the accuracy of genomic prediction in the entire population and thus genetic gain using genomic selection. PMID:22440168

  8. Genotype imputation via matrix completion

    PubMed Central

    Chi, Eric C.; Zhou, Hua; Chen, Gary K.; Del Vecchyo, Diego Ortega; Lange, Kenneth

    2013-01-01

    Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

  9. Accuracy of imputation of single nucleotide polymorphism marker genotypes from low-density panels in Japanese Black cattle.

    PubMed

    Ogawa, Shinichiro; Matsuda, Hirokazu; Taniguchi, Yukio; Watanabe, Toshio; Takasuga, Akiko; Sugimoto, Yoshikazu; Iwaisaki, Hiroaki

    2016-01-01

    Using target and reference fattened steer populations, the performance of genotype imputation using lower-density marker panels in Japanese Black cattle was evaluated. Population imputation was performed using BEAGLE software. Genotype information for approximately 40?000 single nucleotide polymorphism (SNP) markers by Illumina BovineSNP50 BeadChip was available, and imputation accuracy was assessed based on the average concordance rates of the genotypes, varying equally spaced SNP densities, and the number of individuals in the reference population. Two additional statistics were also calculated as indicators of imputation performance. The concordance rates tended to be lower for SNPs with greater minor allele frequencies, or those located near the ends of the chromosomes. Longer autosomes yielded greater imputation accuracies than shorter ones. When SNPs were selected based on linkage disequilibrium information, relative imputation accuracy was slightly improved. When 3000 and 10?000 equally spaced SNPs were used, the imputation accuracies were greater than 90% and approximately 97%, respectively. These results indicate that combining genotyping using a lower-density SNP chip with genotype imputation based on a population of individuals genotyped using a higher-density SNP chip is a cost-effective and valid approach for genomic prediction. PMID:26032028

  10. Multiple imputation with multivariate imputation by chained equation (MICE) package

    PubMed Central

    2016-01-01

    Multiple imputation (MI) is an advanced technique for handing missing values. It is superior to single imputation in that it takes into account uncertainty in missing value imputation. However, MI is underutilized in medical literature due to lack of familiarity and computational challenges. The article provides a step-by-step approach to perform MI by using R multivariate imputation by chained equation (MICE) package. The procedure firstly imputed m sets of complete dataset by calling mice() function. Then statistical analysis such as univariate analysis and regression model can be performed within each dataset by calling with() function. This function sets the environment for statistical analysis. Lastly, the results obtained from each analysis are combined by using pool() function. PMID:26889483

  11. [Jurisdiction and imputability].

    PubMed

    Tapiador Sanjuán, M J

    2004-12-01

    Validity, efficacy and responsibility of acts depend on the intelligence and will of the acting subject; therefore when they are reduced or debilitated, these acts may be declared as non-valid and the author, not-responsible for the acts. Some neurological pathologies may generate physical and/or psychic permanent deficiencies, which prevent subjects from acting on their own. For these cases, the law establishes the incapacity state, in order to protect the disabled and complete the reduced ability, guaranteeing their rights and security. The disabled state will be determined by a legal sentence, which states the lack of ability to manage. In that sentence extension and limits of the disability will be determined; disability level will be proportional to the insight degree.Similarly, a subject suffering a pathological condition that invalidates his/her will and intelligence will be considered non-responsible and not imputable, since there is no culpability ability. The Penal Code establishes the criteria that will determine the possibility of imputability or its absence, as well as modifying circumstances. PMID:15719288

  12. Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.

    PubMed

    Bolormaa, S; Gore, K; van der Werf, J H J; Hayes, B J; Daetwyler, H D

    2015-10-01

    Genotyping sheep for genome-wide SNPs at lower density and imputing to a higher density would enable cost-effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low-density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50-475kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single-breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%. PMID:26360638

  13. Design of a bovine low-density SNP array optimized for imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where de...

  14. The utility of low-density genotyping for imputation in the Thoroughbred horse

    PubMed Central

    2014-01-01

    Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy. PMID:24495673

  15. Local Exome Sequences Facilitate Imputation of Less Common Variants and Increase Power of Genome Wide Association Studies

    PubMed Central

    Joshi, Peter K.; Prendergast, James; Fraser, Ross M.; Huffman, Jennifer E.; Vitart, Veronique; Hayward, Caroline; McQuillan, Ruth; Glodzik, Dominik; Polaek, Ozren; Hastie, Nicholas D.; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Haley, Chris S.

    2013-01-01

    The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 110%) in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 2838%, for SNPs with a minor allele frequency in the range 13%. PMID:23874685

  16. Imputation-based population genetics analysis of Plasmodium falciparum malaria parasites.

    PubMed

    Samad, Hanif; Coll, Francesc; Preston, Mark D; Ocholla, Harold; Fairhurst, Rick M; Clark, Taane G

    2015-04-01

    Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r2 for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86 k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r2, 0.87-0.96), but the performance of IMPUTE was mixed (allelic r2, 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima's D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and association analyses, and supporting global surveillance for drug resistance markers and candidate vaccine antigens. PMID:25928499

  17. PedBLIMP: extending linear predictors to impute genotypes in pedigrees.

    PubMed

    Chen, Wenan; Schaid, Daniel J

    2014-09-01

    Recently, Wen and Stephens (Wen and Stephens [2010] Ann Appl Stat 4(3):1158-1182) proposed a linear predictor, called BLIMP, that uses conditional multivariate normal moments to impute genotypes with accuracy similar to current state-of-the-art methods. One novelty is that it regularized the estimated covariance matrix based on a model from population genetics. We extended multivariate moments to impute genotypes in pedigrees. Our proposed method, PedBLIMP, utilizes both the linkage-disequilibrium (LD) information estimated from external panel data and the pedigree structure or identity-by-descent (IBD) information. The proposed method was evaluated on a pedigree design where some individuals were genotyped with dense markers and the rest with sparse markers. We found that incorporating the pedigree/IBD information can improve imputation accuracy compared to BLIMP. Because rare variants usually have low LD with other single-nucleotide polymorphisms (SNPs), incorporating pedigree/IBD information largely improved imputation accuracy for rare variants. We also compared PedBLIMP with IMPUTE2 and GIGI. Results show that when sparse markers are in a certain density range, our method can outperform both IMPUTE2 and GIGI. PMID:25044249

  18. Design of a Bovine Low-Density SNP Array Optimized for Imputation

    PubMed Central

    Boichard, Didier; Chung, Hoyoung; Dassonneville, Romain; David, Xavier; Eggen, Andr; Fritz, Sbastien; Gietzen, Kimberly J.; Hayes, Ben J.; Lawley, Cynthia T.; Sonstegard, Tad S.; Van Tassell, Curtis P.; VanRaden, Paul M.; Viaud-Martinez, Karine A.; Wiggans, George R.

    2012-01-01

    The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle. PMID:22470530

  19. Japonica array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals

    PubMed Central

    Kawai, Yosuke; Mimori, Takahiro; Kojima, Kaname; Nariai, Naoki; Danjoh, Inaho; Saito, Rumiko; Yasuda, Jun; Yamamoto, Masayuki; Nagasaki, Masao

    2015-01-01

    The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659?253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r2>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%imputations. PMID:26108142

  20. Genotype Imputation with Millions of Reference Samples.

    PubMed

    Browning, Brian L; Browning, Sharon R

    2016-01-01

    We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle's throughput was more than 100 greater than Impute2's throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26 more memory per computational thread and 15 more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs. PMID:26748515

  1. 16 CFR 1115.11 - Imputed knowledge.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 16 Commercial Practices 2 2012-01-01 2012-01-01 false Imputed knowledge. 1115.11 Section 1115.11 Commercial Practices CONSUMER PRODUCT SAFETY COMMISSION CONSUMER PRODUCT SAFETY ACT REGULATIONS SUBSTANTIAL PRODUCT HAZARD REPORTS General Interpretation 1115.11 Imputed knowledge. (a) In evaluating whether or when a subject firm should...

  2. Fast imputation using medium- or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  3. Posterior predictive checking of multiple imputation models.

    PubMed

    Nguyen, Cattram D; Lee, Katherine J; Carlin, John B

    2015-07-01

    Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution. PMID:25939490

  4. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies

    PubMed Central

    2014-01-01

    Background Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. Results In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate?>?0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. Conclusion GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. GACT software http://www.uvm.edu/genomics/software/gact PMID:25038819

  5. Enlargement of Traffic Information Coverage Area Using Selective Imputation of Floating Car Data

    NASA Astrophysics Data System (ADS)

    Kumagai, Masatoshi; Hiruta, Tomoaki; Fushiki, Takumi; Yokota, Takayoshi

    This paper discusses a real-time imputation method for sparse floating car data (FCD.) Floating cars are effective way to collect traffic information; however, because of the limitation of the number of floating cars, there is a large amount of missing data with FCD. In an effort to address this problem, we previously proposed a new imputation method based on feature space projection. The method consists of three major processes: (i) determination of a feature space from past FCD history; (ii) feature space projection of current FCD; and (iii) estimation of missing data performed by inverse projection from the feature space. Since estimation is achieved on each feature space axis that represents the spatial correlated component of FCD, it performs an accurate imputation and enlarges information coverage area. However, correlation difference among multiple road-links sometimes causes a trade-off problem between the accuracy and the coverage. Therefore, we developed an additional function in order to filter the road-links that have low correlation with the others. The function uses spectral factorization as filtering index, which is suitable to evaluate the correlation on the multidimensional feature space. Combination use of the imputation method and the filtering function decreases maximum estimation error-rate from 0.39 to 0.24, keeping 60% coverage area against sparse FCD of 15% observations.

  6. Dual imputation model for incomplete longitudinal data.

    PubMed

    Jolani, Shahab; Frank, Laurence E; van Buuren, Stef

    2014-05-01

    Missing values are a practical issue in the analysis of longitudinal data. Multiple imputation (MI) is a well-known likelihood-based method that has optimal properties in terms of efficiency and consistency if the imputation model is correctly specified. Doubly robust (DR) weighing-based methods protect against misspecification bias if one of the models, but not necessarily both, for the data or the mechanism leading to missing data is correct. We propose a new imputation method that captures the simplicity of MI and protection from the DR method. This method integrates MI and DR to protect against misspecification of the imputation model under a missing at random assumption. Our method avoids analytical complications of missing data particularly in multivariate settings, and is easy to implement in standard statistical packages. Moreover, the proposed method works very well with an intermittent pattern of missingness when other DR methods can not be used. Simulation experiments show that the proposed approach achieves improved performance when one of the models is correct. The method is applied to data from the fireworks disaster study, a randomized clinical trial comparing therapies in disaster-exposed children. We conclude that the new method increases the robustness of imputations. PMID:23909566

  7. Comparison of imputation methods for missing laboratory data in medicine

    PubMed Central

    Waljee, Akbar K; Mukherjee, Ashin; Singal, Amit G; Zhang, Yiwei; Warren, Jeffrey; Balis, Ulysses; Marrero, Jorge; Zhu, Ji; Higgins, Peter DR

    2013-01-01

    Objectives Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models. Design Retrospective cohort analysis of two large data sets. Setting A tertiary level care institution in Ann Arbor, Michigan. Participants The Cirrhosis cohort had 446 patients and the Inflammatory Bowel Disease cohort had 395 patients. Methods Non-missing laboratory data were randomly removed with varying frequencies from two large data sets, and we then compared the ability of four methodsmissForest, mean imputation, nearest neighbour imputation and multivariate imputation by chained equations (MICE)to impute the simulated missing data. We characterised the accuracy of the imputation and the effect of the imputation on predictive ability in two large data sets. Results MissForest had the least imputation error for both continuous and categorical variables at each frequency of missingness, and it had the smallest prediction difference when models used imputed laboratory values. In both data sets, MICE had the second least imputation error and prediction difference, followed by the nearest neighbour and mean imputation. Conclusions MissForest is a highly accurate method of imputation for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models. PMID:23906948

  8. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  9. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure

  10. Mining SNPs From EST Databases

    PubMed Central

    Picoult-Newberg, Leslie; Ideker, Trey E.; Pohl, Mark G.; Taylor, Scott L.; Donaldson, Miriam A.; Nickerson, Deborah A.; Boyce-Jacino, Michael

    1999-01-01

    There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs. [The SNPs identified in this study can be found in the National Center of Biotechnology (NCBI) SNP database under submitter handles ORCHID (SNPS-981210-A) and debnick (SNPS-981209-A and SNPS-981209-B).] PMID:10022981

  11. JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts

    PubMed Central

    Lee, Donghyung; Williamson, Vernell S.; Bigdeli, T. Bernard; Riley, Brien P.; Webb, Bradley T.; Fanous, Ayman H.; Kendler, Kenneth S.; Vladimirov, Vladimir I.; Bacanu, Silviu-Alin

    2016-01-01

    Motivation: To increase detection power, gene level analysis methods are used to aggregate weak signals. To greatly increase computational efficiency, most methods use as input summary statistics from genome-wide association studies (GWAS). Subsequently, gene statistics are constructed using linkage disequilibrium (LD) patterns from a relevant reference panel. However, all methods, including our own Joint Effect on Phenotype of eQTL/functional single nucleotide polymorphisms (SNPs) associated with a Gene (JEPEG), assume homogeneous panels, e.g. European. However, this renders these tools unsuitable for the analysis of large cosmopolitan cohorts. Results: We propose a JEPEG extension, JEPEGMIX, which similar to one of our software tools, Direct Imputation of summary STatistics of unmeasured SNPs from MIXed ethnicity cohorts, is capable of estimating accurate LD patterns for cosmopolitan cohorts. JEPEGMIX uses this accurate LD estimates to (i) impute the summary statistics at unmeasured functional variants and (ii) test for the joint effect of all measured and imputed functional variants which are associated with a gene. We illustrate the performance of our tool by analyzing the GWAS meta-analysis summary statistics from the multi-ethnic Psychiatric Genomics Consortium Schizophrenia stage 2 cohort. This practical application supports the immune system being one of the main drivers of the process leading to schizophrenia. Availability and implementation: Software, annotation database and examples are available at http://dleelab.github.io/jepegmix/. Contact: donghyung.lee@vcuhealth.org Supplementary information: Supplementary material is available at Bioinformatics online. PMID:26428293

  12. 16 CFR 1115.11 - Imputed knowledge.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Commercial Practices CONSUMER PRODUCT SAFETY COMMISSION CONSUMER PRODUCT SAFETY ACT REGULATIONS SUBSTANTIAL PRODUCT HAZARD REPORTS General Interpretation 1115.11 Imputed knowledge. (a) In evaluating whether or... firm to know what a reasonable person acting in the circumstances in which the firm finds itself...

  13. 16 CFR 1115.11 - Imputed knowledge.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... Commercial Practices CONSUMER PRODUCT SAFETY COMMISSION CONSUMER PRODUCT SAFETY ACT REGULATIONS SUBSTANTIAL PRODUCT HAZARD REPORTS General Interpretation 1115.11 Imputed knowledge. (a) In evaluating whether or... firm to know what a reasonable person acting in the circumstances in which the firm finds itself...

  14. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    PubMed Central

    McClure, Matthew C.; Sonstegard, Tad S.; Wiggans, George R.; Van Eenennaam, Alison L.; Weber, Kristina L.; Penedo, Cecilia T.; Berry, Donagh P.; Flynn, John; Garcia, Jose F.; Carmo, Adriana S.; Regitano, Luciana C. A.; Albuquerque, Milla; Silva, Marcos V. G. B.; Machado, Marco A.; Coffey, Mike; Moore, Kirsty; Boscher, Marie-Yvonne; Genestout, Lucie; Mazza, Raffaele; Taylor, Jeremy F.; Schnabel, Robert D.; Simpson, Barry; Marques, Elisa; McEwan, John C.; Cromie, Andrew; Coutinho, Luiz L.; Kuehn, Larry A.; Keele, John W.; Piper, Emily K.; Cook, Jim; Williams, Robert; Van Tassell, Curtis P.

    2013-01-01

    To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ?1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset. PMID:24065982

  15. Multiple imputation for an incomplete covariate that is a ratio.

    PubMed

    Morris, Tim P; White, Ian R; Royston, Patrick; Seaman, Shaun R; Wood, Angela M

    2014-01-15

    We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable. PMID:23922236

  16. A general efficient and flexible approach for genome-wide association analyses of imputed genotypes in family-based designs.

    PubMed

    Cobat, Aurlie; Abel, Laurent; Alcas, Alexandre; Schurr, Erwin

    2014-09-01

    Genotype imputation is a critical technique for following up genome-wide association studies. Efficient methods are available for dealing with the probabilistic nature of imputed single nucleotide polymorphisms (SNPs) in population-based designs, but not for family-based studies. We have developed a new analytical approach (FBATdosage), using imputed allele dosage in the general framework of family-based association tests to bridge this gap. Simulation studies showed that FBATdosage yielded highly consistent type I error rates, whatever the level of genotype uncertainty, and a much higher power than the best-guess genotype approach. FBATdosage allows fast linkage and association testing of several million of imputed variants with binary or quantitative phenotypes in nuclear families of arbitrary size with arbitrary missing data for the parents. The application of this approach to a family-based association study of leprosy susceptibility successfully refined the association signal at two candidate loci, C1orf141-IL23R on chromosome 1 and RAB32-C6orf103 on chromosome 6. PMID:25044438

  17. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms

    PubMed Central

    Money, Daniel; Gardner, Kyle; Migicovsky, Zo; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-01-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  18. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

    PubMed

    Money, Daniel; Gardner, Kyle; Migicovsky, Zo; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-11-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  19. Clustering with Missing Values: No Imputation Required

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  20. Genotype Imputation with Thousands of Genomes

    PubMed Central

    Howie, Bryan; Marchini, Jonathan; Stephens, Matthew

    2011-01-01

    Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package. PMID:22384356

  1. Towards accurate imputation of quantitative genetic interactions

    PubMed Central

    2009-01-01

    Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae. However, these assays often fail to measure the genetic interactions among up to 40% of the studied gene pairs. Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions. We also present data on almost 190,000 novel interactions. PMID:20003301

  2. When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?

    PubMed Central

    Ramnarine, Shelina; Zhang, Juan; Chen, Li-Shiun; Culverhouse, Robert; Duan, Weimin; Hancock, Dana B.; Hartz, Sarah M.; Johnson, Eric O.; Olfson, Emily; Schwantes-An, Tae-Hwi; Saccone, Nancy L.

    2015-01-01

    Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohens kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants. PMID:26458263

  3. When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?

    PubMed

    Ramnarine, Shelina; Zhang, Juan; Chen, Li-Shiun; Culverhouse, Robert; Duan, Weimin; Hancock, Dana B; Hartz, Sarah M; Johnson, Eric O; Olfson, Emily; Schwantes-An, Tae-Hwi; Saccone, Nancy L

    2015-01-01

    Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen's kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants. PMID:26458263

  4. Comparison of SNPs and microsatellites for assessing the genetic structure of chicken populations.

    PubMed

    Grke, C; Ytournel, F; Bed'hom, B; Gut, I; Lathrop, M; Weigend, S; Simianer, H

    2012-08-01

    Many studies in human genetics compare informativeness of single-nucleotide polymorphisms (SNPs) and microsatellites (single sequence repeats; SSR) in genome scans, but it is difficult to transfer the results directly to livestock because of different population structures. The aim of this study was to determine the number of SNPs needed to obtain the same differentiation power as with a given standard set of microsatellites. Eight chicken breeds were genotyped for 29 SSRs and 9216 SNPs. After filtering, only 2931 SNPs remained. The differentiation power was evaluated using two methods: partitioning of the Euclidean distance matrix based on a principal component analysis (PCA) and a Bayesian model-based clustering approach. Generally, with PCA-based partitioning, 70 SNPs provide a comparable resolution to 29 SSRs. In model-based clustering, the similarity coefficient showed significantly higher values between repeated runs for SNPs compared to SSRs. For the membership coefficients, reflecting the proportion to which a fraction segment of the genome belongs to the ith cluster, the highest values were obtained for 29 SSRs and 100 SNPs respectively. With a low number of loci (29 SSRs or ?100 SNPs), neither marker types could detect the admixture in the Gdll Nhx population. Using more than 250 SNPs allowed a more detailed insight into the genetic architecture. Thus, the admixed population could be detected. It is concluded that breed differentiation studies will substantially gain power even with moderate numbers of SNPs. PMID:22497629

  5. Mining SNPs from EST databases.

    PubMed

    Picoult-Newberg, L; Ideker, T E; Pohl, M G; Taylor, S L; Donaldson, M A; Nickerson, D A; Boyce-Jacino, M

    1999-02-01

    There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs. PMID:10022981

  6. On combining reference data to improve imputation accuracy.

    PubMed

    Chen, Jun; Zhang, Ji-Gang; Li, Jian; Pei, Yu-Fang; Deng, Hong-Wen

    2013-01-01

    Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS) data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE) for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses. PMID:23383238

  7. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted

  8. SNPs and forensic DNA typing.

    PubMed

    Allah, Rakha; Yang, Li; Li, Sheng-bin

    2007-10-01

    There is an increasing interest in single nucleotide polymorphism (SNP) typing in the forensic field. SNPs are very useful for defining Y chromosome or mtDNA haplotypes and DNA phenotyping. We focus on comparative advantages of SNP typing over length variations and expected number of loci required to gain probabilities equal to STR loci in use. This review also offers to the reader a state of the art of SNP genotyping technologies with the advantages and disadvantages of the different techniques and platforms for different forensic requirements. PMID:18175580

  9. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

    PubMed

    Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-03-15

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914

  10. A hybrid imputation approach for microarray missing value estimation

    PubMed Central

    2015-01-01

    Background Missing data is an inevitable phenomenon in gene expression microarray experiments due to instrument failure or human error. It has a negative impact on performance of downstream analysis. Technically, most existing approaches suffer from this prevalent problem. Imputation is one of the frequently used methods for processing missing data. Actually many developments have been achieved in the research on estimating missing values. The challenging task is how to improve imputation accuracy for data with a large missing rate. Methods In this paper, induced by the thought of collaborative training, we propose a novel hybrid imputation method, called Recursive Mutual Imputation (RMI). Specifically, RMI exploits global correlation information and local structure in the data, captured by two popular methods, Bayesian Principal Component Analysis (BPCA) and Local Least Squares (LLS), respectively. Mutual strategy is implemented by sharing the estimated data sequences at each recursive process. Meanwhile, we consider the imputation sequence based on the number of missing entries in the target gene. Furthermore, a weight based integrated method is utilized in the final assembling step. Results We evaluate RMI with three state-of-art algorithms (BPCA, LLS, Iterated Local Least Squares imputation (ItrLLS)) on four publicly available microarray datasets. Experimental results clearly demonstrate that RMI significantly outperforms comparative methods in terms of Normalized Root Mean Square Error (NRMSE), especially for datasets with large missing rates and less complete genes. Conclusions It is noted that our proposed hybrid imputation approach incorporates both global and local information of microarray genes, which achieves lower NRMSE values against to any single approach only. Besides, this study highlights the need for considering the imputing sequence of missing entries for imputation methods. PMID:26330180

  11. Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data.

    PubMed

    Fragoso, Christopher A; Heffelfinger, Christopher; Zhao, Hongyu; Dellaporta, Stephen L

    2016-02-01

    Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute. PMID:26715670

  12. Association analysis of BMD-associated SNPs with knee osteoarthritis.

    PubMed

    Yerges-Armstrong, Laura M; Yau, Michelle S; Liu, Youfang; Krishnan, Subha; Renner, Jordan B; Eaton, Charles B; Kwoh, C Kent; Nevitt, Michael C; Duggan, David J; Mitchell, Braxton D; Jordan, Joanne M; Hochberg, Marc C; Jackson, Rebecca D

    2014-06-01

    Osteoarthritis (OA) risk is widely recognized to be heritable but few loci have been identified. Observational studies have identified higher systemic bone mineral density (BMD) to be associated with an increased risk of radiographic knee osteoarthritis. With this in mind, we sought to evaluate whether well-established genetic loci for variance in BMD are associated with risk for radiographic OA in the Osteoarthritis Initiative (OAI) and the Johnston County Osteoarthritis (JoCo) Project. Cases had at least one knee with definite radiographic OA, defined as the presence of definite osteophytes with or without joint space narrowing (Kellgren-Lawrence [KL] grade ? 2) and controls were absent for definite radiographic OA in both knees (KL grade ? 1 bilaterally). There were 2014 and 658 Caucasian cases, respectively, in the OAI and JoCo Studies, and 953 and 823 controls. Single nucleotide polymorphisms (SNPs) were identified for association analysis from the literature. Genotyping was carried out on Illumina 2.5M and 1M arrays in Genetic Components of Knee OA (GeCKO) and JoCo, respectively and imputation was done. Association analyses were carried out separately in each cohort with adjustments for age, body mass index (BMI), and sex, and then parameter estimates were combined across the two cohorts by meta-analysis. We identified four SNPs significantly associated with prevalent radiographic knee OA. The strongest signal (p?=?0.0009; OR?=?1.22; 95% CI, 1.08-1.37) maps to 12q3, which contains a gene coding for SP7. Additional loci map to 7p14.1 (TXNDC3), 11q13.2 (LRP5), and 11p14.1 (LIN7C). For all four loci the allele associated with higher BMD was associated with higher odds of OA. A BMD risk allele score was not significantly associated with OA risk. This meta-analysis demonstrates that several genomewide association studies (GWAS)-identified BMD SNPs are nominally associated with prevalent radiographic knee OA and further supports the hypothesis that BMD, or its determinants, may be a risk factor contributing to OA development. 2014 American Society for Bone and Mineral Research. PMID:24339167

  13. Association Analysis of BMD-associated SNPs with Knee Osteoarthritis

    PubMed Central

    Yerges-Armstrong, LM; Yau, MS; Liu, Y; Krishnan, S; Renner, JB; Eaton, CB; Kwoh, CK; Nevitt, MC; Duggan, DJ; Mitchell, BD; Jordan, JM; Hochberg, MC; Jackson, RD

    2014-01-01

    Osteoarthritis (OA) risk is widely recognized to be heritable but few loci have been identified. Observational studies have identified higher systemic bone mineral density (BMD) to be associated with an increased risk of radiographic knee osteoarthritis. With this in mind, we sought to evaluate whether well-established genetic loci for variance in BMD are associated with risk for radiographic OA in the Osteoarthritis Initiative (OAI) and the Johnston County Osteoarthritis (JoCo) Project. Cases had at least one knee with definite radiographic OA defined as the presence of definite osteophytes with or without joint space narrowing (KL grade ? 2) and controls were absent for definite radiographic OA in both knees (KL grade ? 1bilaterally). There were 2014 and 658 Caucasian cases, respectively, in the OAI and JoCo Studies, and 953 and 823 controls. Single nucleotide polymorphisms (SNPs) were identified for association analysis from the literature. Genotyping was carried out on the Illumina 2.5M and 1M arrays in GeCKO and JoCo, respectively and imputation was done. Association analyses were carried out separately in each cohort with adjustments for age, BMI, and sex and then parameter estimates were combined across the two cohorts by meta-analysis. We identified 4 SNPs significantly associated with prevalent radiographic knee OA. The strongest signal (p=0.0009, OR=1.22, 95% CI[1.081.37]) maps to 12q3 which contains a gene coding for SP7. Additional loci map to 7p14.1 (TXNDC3), 11q13.2 (LRP5) and 11p14.1 (LIN7C). For all four loci the allele associated with higher BMD was associated with higher odds of OA. A BMD risk allele score was not significantly associated with OA risk. This meta-analysis demonstrates that several GWAS-identified BMD SNPs are nominally associated with prevalent radiographic knee OA and further supports the hypothesis that BMD, or its determinants, may be a risk factor contributing to OA development. PMID:24339167

  14. Multiple imputation for time series data with Amelia package

    PubMed Central

    2016-01-01

    Time series data are common in medical researches. Many laboratory variables or study endpoints could be measured repeatedly over time. Multiple imputation (MI) without considering time trend of a variable may cause it to be unreliable. The article illustrates how to perform MI by using Amelia package in a clinical scenario. Amelia package is powerful in that it allows for MI for time series data. External information on the variable of interest can also be incorporated by using prior or bound argument. Such information may be based on previous published observations, academic consensus, and personal experience. Diagnostics of imputation model can be performed by examining the distributions of imputed and observed values, or by using over-imputation technique. PMID:26904578

  15. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.

    PubMed

    Pistis, Giorgio; Porcu, Eleonora; Vrieze, Scott I; Sidore, Carlo; Steri, Maristella; Danjou, Fabrice; Busonero, Fabio; Mulas, Antonella; Zoledziewska, Magdalena; Maschio, Andrea; Brennan, Christine; Lai, Sandra; Miller, Michael B; Marcelli, Marco; Urru, Maria Francesca; Pitzalis, Maristella; Lyons, Robert H; Kang, Hyun M; Jones, Chris M; Angius, Andrea; Iacono, William G; Schlessinger, David; McGue, Matt; Cucca, Francesco; Abecasis, Gonalo R; Sanna, Serena

    2015-07-01

    The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200?000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies. PMID:25293720

  16. A second generation human haplotype map of over 3.1 million SNPs

    PubMed Central

    2009-01-01

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 2535% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 1030% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

  17. Combining fractional polynomial model building with multiple imputation.

    PubMed

    Morris, Tim P; White, Ian R; Carpenter, James R; Stanworth, Simon J; Royston, Patrick

    2015-11-10

    Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. PMID:26095614

  18. Doubly robust and multiple-imputation-based generalized estimating equations.

    PubMed

    Birhanu, Teshome; Molenberghs, Geert; Sotto, Cristina; Kenward, Michael G

    2011-03-01

    Generalized estimating equations (GEE), proposed by Liang and Zeger (1986), provide a popular method to analyze correlated non-Gaussian data. When data are incomplete, the GEE method suffers from its frequentist nature and inferences under this method are valid only under the strong assumption that the missing data are missing completely at random. When response data are missing at random, two modifications of GEE can be considered, based on inverse-probability weighting or on multiple imputation. The weighted GEE (WGEE) method involves weighting observations by the inverse of their probability of being observed. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model, multiple times. The so-called doubly robust (DR) methods involve both a model for the weights and a predictive model for the missing observations given the observed ones. To yield consistent estimates, WGEE needs correct specification of the dropout model while imputation-based methodology needs a correctly specified imputation model. DR methods need correct specification of either the weight or the predictive model, but not necessarily both. Focusing on incomplete binary repeated measures, we study the relative performance of the singly robust and doubly robust versions of GEE in a variety of correctly and incorrectly specified models using simulation studies. Data from a clinical trial in onychomycosis further illustrate the method. PMID:21390997

  19. Handling Missing Values in Longitudinal Panel Data With Multiple Imputation

    PubMed Central

    Young, Rebekah; Johnson, David R.

    2015-01-01

    This article offers an applied review of key issues and methods for the analysis of longitudinal panel data in the presence of missing values. The authors consider the unique challenges associated with attrition (survey dropout), incomplete repeated measures, and unknown observations of time. Using simulated data based on 4 waves of the Marital Instability Over the Life Course Study (n = 2,034), they applied a fixed effect regression model and an event-history analysis with time-varying covariates. They then compared results for analyses with nonimputed missing data and with imputed data both in long and in wide structures. Imputation produced improved estimates in the event-history analysis but only modest improvements in the estimates and standard errors of the fixed effects analysis. Factors responsible for differences in the value of imputation are examined, and recommendations for handling missing values in panel data are presented. PMID:26113748

  20. Missing value imputation: with application to handwriting data

    NASA Astrophysics Data System (ADS)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  1. A comparison of genomic selection models across time in interior spruce (Picea engelmannii glauca) using unordered SNP imputation methods.

    PubMed

    Ratcliffe, B; El-Dien, O G; Klpt?, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesC? (BC?) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540

  2. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale

  3. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…

  4. Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

    2007-01-01

    The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at

  5. Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory

    PubMed Central

    Li, Haiquan; Lee, Younghee; Chen, James L; Rebman, Ellen; Li, Jianrong

    2012-01-01

    Objective Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning. Methods Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify traittrait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits. Results A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller shortest distance in protein interaction networks of complexly inherited diseases (Spearman p<2.210?16). Further, cancer traits were similar to one another, as were metabolic syndrome traits (Fisher's exact test p=0.001 and 3.510?7, respectively). Conclusion An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches. PMID:22278381

  6. Novel and efficient tag SNPs selection algorithms.

    PubMed

    Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2014-01-01

    SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels. PMID:24212035

  7. Reference-free detection of isolated SNPs.

    PubMed

    Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre

    2015-01-01

    Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism. PMID:25404127

  8. Reference-free detection of isolated SNPs

    PubMed Central

    Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre

    2015-01-01

    Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism. PMID:25404127

  9. Functional annotation of colon cancer risk SNPs

    PubMed Central

    Yao, Lijing; Tak, Yu Gyoung; Berman, Benjamin P.; Farnham, Peggy J.

    2014-01-01

    Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with increased risk for CRC. A molecular understanding of the functional consequences of this genetic variation has been complicated because each GWAS SNP is a surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Here we use genomic and epigenomic information to test the hypothesis that the GWAS SNPs and/or correlated SNPs are in elements that regulate gene expression, and identify 23 promoters and 28 enhancers. Using gene expression data from normal and tumour cells, we identify 66 putative target genes of the risk-associated enhancers (10 of which were also identified by promoter SNPs). Employing CRISPR nucleases, we delete one risk-associated enhancer and identify genes showing altered expression. We suggest that similar studies be performed to characterize all CRC risk-associated enhancers. PMID:25268989

  10. Family-based approaches: design, imputation, analysis, and beyond.

    PubMed

    Wijsman, Ellen M

    2016-01-01

    Participants in the family-based analysis group at Genetic Analysis Workshop 19 addressed diverse topics, all of which used the family data. Topics addressed included questions of study design and data quality control (QC), genotype imputation to augment available sequence data, and linkage and/or association analyses. Results show that pedigree-based tests that are sensitive to genotype error may be useful for QC. Imputation quality improved with inclusion of small amounts of pedigree information used to phase the data in evaluation of 5 commonly used approaches for imputation in samples of (typically) unrelated subjects. It improved still further when pedigree-based imputation using larger pedigrees was also added. An important distinction was made between methods that do versus do not make use of Mendelian transmission in pedigrees, because this serves as a key difference between underlying models and assumptions. Methods that model relatedness generally had higher power in association testing than did analyses that carry out testing in the presence of a transmission model, but this may reflect details of implementation and/or ability of more general methods to jointly include data from larger pedigrees. In either case, for single nucleotide polymorphism-set approaches, weights that incorporate information on functional effects may be more useful than those that are based only on allele frequencies. The overall results demonstrate that family data continue to provide important information in the search for trait loci. PMID:26866700

  11. Imputation of Cow Genotypes and Adjustment of PTAs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two new techniques were introduced in April 2010 to incorporate all available information in the evaluations. The use of imputed genotypes has added over 1600 cows to the genomic database, and adjusting cow evaluations has increased accuracy. All other countries that are producing genomic evaluation...

  12. Fast imputation using medium or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and microarray genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updat...

  13. 32 CFR 776.29 - Imputed disqualification: General rule.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 5 2012-07-01 2012-07-01 false Imputed disqualification: General rule. 776.29 Section 776.29 National Defense Department of Defense (Continued) DEPARTMENT OF THE NAVY MISCELLANEOUS RULES PROFESSIONAL CONDUCT OF ATTORNEYS PRACTICING UNDER THE COGNIZANCE AND SUPERVISION OF THE JUDGE ADVOCATE GENERAL Rules of Professional...

  14. Accuracy of genotype imputation in Swiss cattle breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to evaluate the accuracy of imputation from Illumina Bovine3k Bead Chip (3k) and Illumina BovineLD (6k) to 54k chip information in Swiss dairy cattle breeds. Genotype data comprised of 54k SNP chip data of Original Braunvieh (OB), Brown Swiss (BS), Swiss Fleckvieh (SF...

  15. Investigation of Multiple Imputation in Low-Quality Questionnaire Data

    ERIC Educational Resources Information Center

    Van Ginkel, Joost R.

    2010-01-01

    The performance of multiple imputation in questionnaire data has been studied in various simulation studies. However, in practice, questionnaire data are usually more complex than simulated data. For example, items may be counterindicative or may have unacceptably low factor loadings on every subscale, or completely missing subscales may…

  16. Meta-analysis and imputation refines the association of 15q25 with smoking quantity

    PubMed Central

    Liu, Jason Z.; Tozzi, Federica; Waterworth, Dawn M.; Pillai, Sreekumar G.; Muglia, Pierandrea; Middleton, Lefkos; Berrettini, Wade; Knouff, Christopher W.; Yuan, Xin; Waeber, Grard; Vollenweider, Peter; Preisig, Martin; Wareham, Nicholas J; Zhao, Jing Hua; Loos, Ruth J.F.; Barroso, Ins; Khaw, Kay-Tee; Grundy, Scott; Barter, Philip; Mahley, Robert; Kesaniemi, Antero; McPherson, Ruth; Vincent, John B.; Strauss, John; Kennedy, James L.; Farmer, Anne; McGuffin, Peter; Day, Richard; Matthews, Keith; Bakke, Per; Gulsvik, Amund; Lucae, Susanne; Ising, Marcus; Brueckl, Tanja; Horstmann, Sonja; Wichmann, H.-Erich; Rawal, Rajesh; Dahmen, Norbert; Lamina, Claudia; Polasek, Ozren; Zgaga, Lina; Huffman, Jennifer; Campbell, Susan; Kooner, Jaspal; Chambers, John C; Burnett, Mary Susan; Devaney, Joseph M.; Pichard, Augusto D.; Kent, Kenneth M.; Satler, Lowell; Lindsay, Joseph M.; Waksman, Ron; Epstein, Stephen; Wilson, James F.; Wild, Sarah H.; Campbell, Harry; Vitart, Veronique; Reilly, Muredach P.; Li, Mingyao; Qu, Liming; Wilensky, Robert; Matthai, William; Hakonarson, Hakon H.; Rader, Daniel J.; Franke, Andre; Wittig, Michael; Schfer, Arne; Uda, Manuela; Terracciano, Antonio; Xiao, Xiangjun; Busonero, Fabio; Scheet, Paul; Schlessinger, David; St Clair, David; Rujescu, Dan; Abecasis, Gonalo R.; Grabe, Hans Jrgen; Teumer, Alexander; Vlzke, Henry; Petersmann, Astrid; John, Ulrich; Rudan, Igor; Hayward, Caroline; Wright, Alan F.; Kolcic, Ivana; Wright, Benjamin J; Thompson, John R; Balmforth, Anthony J.; Hall, Alistair S.; Samani, Nilesh J.; Anderson, Carl A.; Ahmad, Tariq; Mathew, Christopher G.; Parkes, Miles; Satsangi, Jack; Caulfield, Mark; Munroe, Patricia B.; Farrall, Martin; Dominiczak, Anna; Worthington, Jane; Thomson, Wendy; Eyre, Steve; Barton, Anne; Mooser, Vincent; Francks, Clyde; Marchini, Jonathan

    2013-01-01

    Smoking is a leading global cause of disease and mortality1. We performed a genomewide meta-analytic association study of smoking-related behavioral traits in a total sample of 41,150 individuals drawn from 20 disease, population, and control cohorts. Our analysis confirmed an effect on smoking quantity (SQ) at a locus on 15q25 (P=9.45e-19) that includes three genes encoding neuronal nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, CHRNB4). We used data from the 1000 Genomes project to investigate the region using imputation, which allowed analysis of virtually all common variants in the region and offered a five-fold increase in coverage over the HapMap. This increased the spectrum of potentially causal single nucleotide polymorphisms (SNPs), which included a novel SNP that showed the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3. PMID:20418889

  17. The Use of SNPs in Pharmacogenomics Studies

    PubMed Central

    Alwi, Zilfalil Bin

    2005-01-01

    Pharmacogenomics is the study of how genetic makeup determines the response to a therapeutic intervention. It has the potential to revolutionize the practice of medicine by individualisation of treatment through the use of novel diagnostic tools. This new science should reduce the trial-and-error approach to the choice of treatment and thereby limit the exposure of patients to drugs that are not effective or are toxic for them. Single Nucleotide Polymorphisms (SNPs) holds the key in defining the risk of an individuals susceptibility to various illnesses and response to drugs. There is an ongoing process of identifying the common, biologically relevant SNPs, in particular those that are associated with the risk of disease. The identification and characterization of large numbers of these SNPs are necessary before we can begin to use them extensively as genetic tools. As SNP allele frequencies vary considerably across human ethnic groups and populations, the SNP consortium has opted to use an ethnically diverse panel to maximize the chances of SNP discovery. Currently most studies are biased deliberately towards coding regions and the data generated from them therefore are unlikely to reflect the overall distribution of SNPs throughout the genome. The SNP consortium protocol was designed to identify SNPs without any bias towards these coding regions. Most pharmacogenomic studies were carried out in heterogeneous clinical trial populations, using case-control or cohort association study designs employing either candidate gene or Linkage disequilibrium (LD) mapping approaches. Concerns about the required patient sample sizes, the extent of LD, the number of SNPs needed in a map, the cost of genotyping SNPs, and the interpretation of results are some of the challenges that surround this field. While LD mapping is appealing in that it is an unbiased approach and allows a comprehensive genome-wide survey, the challenges and limitations are significant. An alternative such as the candidate gene approach does offer several advantages over LD mapping. Ultimately, as all human genes are discovered, the need for random SNP markers diminishes and gene-based SNP approaches will predominate. The challenges will then be to demonstrate convincing links between genetic variation and drug responses and to translate that information into useful pharmacogenomic tests. PMID:22605952

  18. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  19. A spatial haplotype copying model with applications to genotype imputation.

    PubMed

    Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

    2015-05-01

    Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data. PMID:25526526

  20. Visualization of SNPs with t-SNE

    PubMed Central

    Platzer, Alexander

    2013-01-01

    Background Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. Principal Findings We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. Significance To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity. PMID:23457633

  1. Development and characterisation of an expressed sequence tags (EST)-derived single nucleotide polymorphisms (SNPs) resource in rainbow trout

    PubMed Central

    2012-01-01

    Background There is considerable interest in developing high-throughput genotyping with single nucleotide polymorphisms (SNPs) for the identification of genes affecting important ecological or economical traits. SNPs are evenly distributed throughout the genome and are likely to be functionally relevant. In rainbow trout, in silico screening of EST databases represents an attractive approach for de novo SNP identification. Nevertheless, EST sequencing errors and assembly of EST paralogous sequences can lead to the identification of false positive SNPs which renders the reliability of EST-derived SNPs relatively low. Further validation of EST-derived SNPs is therefore required. The objective of this work was to assess the quality of and to validate a large number of rainbow trout EST-derived SNPs. Results A panel of 1,152 EST-derived SNPs was selected from the INRA Sigenae SNP database and was genotyped in standard and double haploid individuals from several populations using the Illumina GoldenGate BeadXpress assay. High-quality genotyping data were obtained for 958 SNPs representing a genotyping success rate of 83.2 %, out of which, 350 SNPs (36.5 %) were polymorphic in at least one population and were designated as true SNPs. They also proved to be a potential tool to investigate genetic diversity of the species, as the set of SNP successfully sorted individuals into three main groups using STRUCTURE software. Functional annotations revealed 28 non-synonymous SNPs, out of which four substitutions were predicted to affect protein functions. A subset of 223 true SNPs were polymorphic in the two INRA mapping reference families and were integrated into the INRA microsatellite-based linkage map. Conclusions Our results represent the first study of EST-derived SNPs validation in rainbow trout, a species whose genome sequences is not yet available. We designed several specific filters in order to improve the genotyping yield. Nevertheless, our selection criteria should be further improved in order to reduce the observed high rate of false positive SNPs which results from the occurrence of whole genome duplications. PMID:22694767

  2. Imputation and quality control steps for combining multiple genome-wide datasets

    PubMed Central

    Verma, Shefali S.; de Andrade, Mariza; Tromp, Gerard; Kuivaniemi, Helena; Pugh, Elizabeth; Namjou-Khales, Bahram; Mukherjee, Shubhabrata; Jarvik, Gail P.; Kottyan, Leah C.; Burt, Amber; Bradford, Yuki; Armstrong, Gretta D.; Derr, Kimberly; Crawford, Dana C.; Haines, Jonathan L.; Li, Rongling; Crosslin, David; Ritchie, Marylyn D.

    2014-01-01

    The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR. PMID:25566314

  3. Methods of tagSNP selection and other variables affecting imputation accuracy in swine

    PubMed Central

    2013-01-01

    Background Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation. Results TagSNP density of at least 1 tagSNP per 340kb (?7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (IA?=?0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end). Conclusion The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype panel is more cost efficient than specifically sampling older animals or trios with no observed loss in imputation accuracy. We expect that the use of imputed genotypes in swine breeding will yield highly accurate predictions of GEBV, based on the observed accuracy and reported results in dairy cattle, where genomic evaluation of some individuals is based on genotypes imputed with the same accuracy as our Yorkshire population. PMID:23433396

  4. HIBAG--HLA genotype imputation with attribute bagging.

    PubMed

    Zheng, X; Shen, J; Cox, C; Wakefield, J C; Ehm, M G; Nelson, M R; Weir, B S

    2014-04-01

    Genotyping of classical human leukocyte antigen (HLA) alleles is an essential tool in the analysis of diseases and adverse drug reactions with associations mapping to the major histocompatibility complex (MHC). However, deriving high-resolution HLA types subsequent to whole-genome single-nucleotide polymorphism (SNP) typing or sequencing is often cost prohibitive for large samples. An alternative approach takes advantage of the extended haplotype structure within the MHC to predict HLA alleles using dense SNP genotypes, such as those available from genome-wide SNP panels. Current methods for HLA imputation are difficult to apply or may require the user to have access to large training data sets with SNP and HLA types. We propose HIBAG, HLA Imputation using attribute BAGging, that makes predictions by averaging HLA-type posterior probabilities over an ensemble of classifiers built on bootstrap samples. We assess the performance of HIBAG using our study data (n=2668 subjects of European ancestry) as a training set and HLA data from the British 1958 birth cohort study (n?1000 subjects) as independent validation samples. Prediction accuracies for HLA-A, B, C, DRB1 and DQB1 range from 92.2% to 98.1% using a set of SNP markers common to the Illumina 1M Duo, OmniQuad, OmniExpress, 660K and 550K platforms. HIBAG performed well compared with the other two leading methods, HLA*IMP and BEAGLE. This method is implemented in a freely available HIBAG R package that includes pre-fit classifiers for European, Asian, Hispanic and African ancestries, providing a readily available imputation approach without the need to have access to large training data sets. PMID:23712092

  5. Genetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment

    PubMed Central

    Fu, Yong-Bi

    2014-01-01

    Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, with up to 90% of observations missing. Here we performed an empirical assessment of accuracy in genetic diversity analysis of highly incomplete single nucleotide polymorphism genotypes with imputations. Three large single-nucleotide polymorphism genotype data sets for corn, wheat, and rice were acquired, and missing data with up to 90% of missing observations were randomly generated and then imputed for missing genotypes with three map-independent imputation methods. Estimating heterozygosity and inbreeding coefficient from original, missing, and imputed data revealed variable patterns of bias from assessed levels of missingness and genotype imputation, but the estimation biases were smaller for missing data without genotype imputation. The estimates of genetic differentiation were rather robust up to 90% of missing observations but became substantially biased when missing genotypes were imputed. The estimates of topology accuracy for four representative samples of interested groups generally were reduced with increased levels of missing genotypes. Probabilistic principal component analysis based imputation performed better in terms of topology accuracy than those analyses of missing data without genotype imputation. These findings are not only significant for understanding the reliability of the genetic diversity analysis with respect to large missing data and genotype imputation but also are instructive for performing a proper genetic diversity analysis of highly incomplete GBS or other genotype data. PMID:24626289

  6. rSNPBase: a database for curated regulatory SNPs

    PubMed Central

    Guo, Liyuan; Du, Yang; Chang, Suhua; Zhang, Kunlin; Wang, Jing

    2014-01-01

    In recent years, human regulatory SNPs (rSNPs) have been widely studied. Here, we present database rSNPBase, freely available at http://rsnp.psych.ac.cn/, to provide curated rSNPs that analyses the regulatory features of all SNPs in the human genome with reference to experimentally supported regulatory elements. In contrast with previous SNP functional annotation databases, rSNPBase is characterized by several unique features. (i) To improve reliability, all SNPs in rSNPBase are annotated with reference to experimentally supported regulatory elements. (ii) rSNPBase focuses on rSNPs involved in a wide range of regulation types, including proximal and distal transcriptional regulation and post-transcriptional regulation, and identifies their potentially regulated genes. (iii) Linkage disequilibrium (LD) correlations between SNPs were analysed so that the regulatory feature is annotated to SNP-set rather than a single SNP. (iv) rSNPBase provides the spatio-temporal labels and experimental eQTL labels for SNPs. In summary, rSNPBase provides more reliable, comprehensive and user-friendly regulatory annotations on rSNPs and will assist researchers in selecting candidate SNPs for further genetic studies and in exploring causal SNPs for in-depth molecular mechanisms of complex phenotypes. PMID:24285297

  7. Multiple ant colony algorithm method for selecting tag SNPs.

    PubMed

    Liao, Bo; Li, Xiong; Zhu, Wen; Li, Renfa; Wang, Shulin

    2012-10-01

    The search for the association between complex disease and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. Finding a set of tag SNPs for haplotyping in a great number of samples is an important step to reduce cost for association study. Therefore, it is essential to select tag SNPs with more efficient algorithms. In this paper, we model problem of selection tag SNPs by MINIMUM TEST SET and use multiple ant colony algorithm (MACA) to search a smaller set of tag SNPs for haplotyping. The various experimental results on various datasets show that the running time of our method is less than GTagger and MLR. And MACA can find the most representative SNPs for haplotyping, so that MACA is more stable and the number of tag SNPs is also smaller than other evolutionary methods (like GTagger and NSGA-II). Our software is available upon request to the corresponding author. PMID:22480582

  8. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  9. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  10. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  11. Estimation of missing rainfall data using spatial interpolation and imputation methods

    NASA Astrophysics Data System (ADS)

    Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Azman, Muhammad Az-zuhri

    2015-02-01

    This study is aimed to estimate missing rainfall data by dividing the analysis into three different percentages namely 5%, 10% and 20% in order to represent various cases of missing data. In practice, spatial interpolation methods are chosen at the first place to estimate missing data. These methods include normal ratio (NR), arithmetic average (AA), coefficient of correlation (CC) and inverse distance (ID) weighting methods. The methods consider the distance between the target and the neighbouring stations as well as the correlations between them. Alternative method for solving missing data is an imputation method. Imputation is a process of replacing missing data with substituted values. A once-common method of imputation is single-imputation method, which allows parameter estimation. However, the single imputation method ignored the estimation of variability which leads to the underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. In this study, comparison of spatial interpolation methods and multiple imputations method are presented to estimate missing rainfall data. The performance of the estimation methods used are assessed using the similarity index (S-index), mean absolute error (MAE) and coefficient of correlation (R).

  12. Methods of Imputation used in the USDA National Nutrient Database for Standard Reference

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Objective: To present the predominate methods of imputing used to estimate nutrient values for foods in the USDA National Nutrient Database for Standard Reference (SR20). Materials and Methods: The USDA Nutrient Data Laboratory developed standard methods for imputing nutrient values for foods wh...

  13. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  14. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... in accordance with 1830.7002-3(b)(1) or 1830.7002-3(c)(1), the cost of money rate shall be the time... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under...

  15. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... in accordance with 1830.7002-3(b)(1) or 1830.7002-3(c)(1), the cost of money rate shall be the time... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under...

  16. Imputation of Missing Genotypes From Sparse to High Density Using Long-Range Phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals in a population share long chromosome segments which trace to a common ancestor. We describe a long-range phasing algorithm that makes use of this property to phase whole chromosomes and simultaneously impute a large number of missing markers. We test our method by imputing marke...

  17. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,

  18. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model

    PubMed Central

    Seaman, Shaun R; White, Ian R; Carpenter, James R

    2015-01-01

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available. PMID:24525487

  19. Imputation of KIR Types from SNP Variation Data.

    PubMed

    Vukcevic, Damjan; Traherne, James A; Nss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-10-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR?IMP, a method for imputation of KIR copy number. We show that KIR?IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  20. Comparison of classification methods for detecting associations between SNPs and chick mortality

    PubMed Central

    2009-01-01

    Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies. PMID:19284707

  1. Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

    PubMed Central

    Deng, Yi; Chang, Changgee; Ido, Moges Seyoum; Long, Qi

    2016-01-01

    Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples. PMID:26868061

  2. The operating regimes and basic control principles of SNPS ``Topaz''

    NASA Astrophysics Data System (ADS)

    Makarov, Anatoly N.; Volberg, Mark S.; Grayznov, Georgy M.; Zhabotinsky, Evgeny E.; Serbin, Victor I.

    1991-01-01

    The basic operating regimes of space nuclear power system (SNPS) ``Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. ``Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

  3. The operating regimes and basic control principles of SNPS 'Topaz'

    NASA Astrophysics Data System (ADS)

    Makarov, Anatolii N.; Vol'Berg, Mark S.; Griaznov, Georgii M.; Zhabotinskii, Evgenii E.; Serbin, Viktor I.

    The basic operating regimes of space nuclear power system (SNPS) 'Topaz' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, and SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. For SNPS of this type control algorithm providing the required thermal state of the cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime.

  4. Missing value imputation improves clustering and interpretation of gene expression microarray data

    PubMed Central

    Tuikkala, Johannes; Elo, Laura L; Nevalainen, Olli S; Aittokallio, Tero

    2008-01-01

    Background Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. Results We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. Conclusion The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA). PMID:18423022

  5. Traffic speed data imputation method based on tensor completion.

    PubMed

    Ran, Bin; Tan, Huachun; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches. PMID:25866501

  6. Traffic Speed Data Imputation Method Based on Tensor Completion

    PubMed Central

    Ran, Bin; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches. PMID:25866501

  7. Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

    PubMed Central

    2013-01-01

    Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available. PMID:23738947

  8. Differential Network Analysis with Multiply Imputed Lipidomic Data

    PubMed Central

    Kujala, Maiju; Nevalainen, Jaakko; März, Winfried; Laaksonen, Reijo; Datta, Susmita

    2015-01-01

    The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. PMID:25822937

  9. Differential network analysis with multiply imputed lipidomic data.

    PubMed

    Kujala, Maiju; Nevalainen, Jaakko; Mrz, Winfried; Laaksonen, Reijo; Datta, Susmita

    2015-01-01

    The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. PMID:25822937

  10. Data imputation through the identification of local anomalies.

    PubMed

    Ozkan, Huseyin; Pelvan, Ozgun Soner; Kozat, Suleyman S

    2015-10-01

    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. PMID:25608311

  11. Potentially Functional SNPs (pfSNPs) as Novel Genomic Predictors of 5-FU Response in Metastatic Colorectal Cancer Patients

    PubMed Central

    Zhao, Mingjue; Choo, Su Pin; Ong, Sin Jen; Ong, Simon Y. K.; Chong, Samuel S.; Teo, Yik Ying; Lee, Caroline G. L.

    2014-01-01

    5-Fluorouracil (5-FU) and its pro-drug Capecitabine have been widely used in treating colorectal cancer. However, not all patients will respond to the drug, hence there is a need to develop reliable early predictive biomarkers for 5-FU response. Here, we report a novel potentially functional Single Nucleotide Polymorphism (pfSNP) approach to identify SNPs that may serve as predictive biomarkers of response to 5-FU in Chinese metastatic colorectal cancer (CRC) patients. 1547 pfSNPs and one variable number tandem repeat (VNTR) in 139 genes in 5-FU drug (both PK and PD pathway) and colorectal cancer disease pathways were examined in 2 groups of CRC patients. Shrinkage of liver metastasis measured by RECIST criteria was used as the clinical end point. Four non-responder-specific pfSNPs were found to account for 37.5% of all non-responders (P<0.0003). Five additional pfSNPs were identified from a multivariate model (AUC under ROC?=?0.875) that was applied for all other pfSNPs, excluding the non-responder-specific pfSNPs. These pfSNPs, which can differentiate the other non-responders from responders, mainly reside in tumor suppressor genes or genes implicated in colorectal cancer risk. Hence, a total of 9 novel SNPs with potential functional significance may be able to distinguish non-responders from responders to 5-FU. These pfSNPs may be useful biomarkers for predicting response to 5-FU. PMID:25372392

  12. Multiple Imputation by Chained Equations: What is it and how does it work?

    PubMed Central

    Azur, Melissa J.; Stuart, Elizabeth A.; Frangakis, Constantine; Leaf, Philip J.

    2011-01-01

    Multivariate imputation by chained equations (MICE) has emerged as a principled method of dealing with missing data. Despite properties that make MICE particularly useful for large imputation procedures and advances in software development that now make it accessible to many researchers, many psychiatric researchers have not been trained in these methods and few practical resources exist to guide researchers in the implementation of this technique. This paper provides an introduction to the MICE method with a focus on practical aspects and challenges in using this method. A brief review of software programs available to implement MICE and then analyze multiply imputed data is also provided. PMID:21499542

  13. Multiple imputation as a means to assess Mammographic vs. Ultrasound technology in Determine Breast Cancer Recurrence

    NASA Astrophysics Data System (ADS)

    Helenowski, Irene B.; Demirtas, Hakan; Khan, Seema; Eladoumikdachi, Firas; Shidfar, Ali

    2014-03-01

    Tumor size based on mammographic and ultrasound data are two methods used in predicting recurrence in breast cancer patients. Which technology offers better determination of diagnosis is an ongoing debate among radiologists, biophysicists, and other clinicians, however. Further complications in assessing the performance of each technology arise from missing data. One approach to remedy this problem may involve multiple imputation. Here, we therefore examine how imputation affects our assessment of the relationship between recurrence and tumor size determined either by mammography of ultrasound technology. We specifically employ the semi-parametric approach for imputing mixed continuous and binary data as presented in Helenowski and Demirtas (2013).

  14. Model, properties and imputation method of missing SNP genotype data utilizing mutual information

    NASA Astrophysics Data System (ADS)

    Wang, Ying; Wan, Weiming; Wang, Rui-Sheng; Feng, Enmin

    2009-07-01

    Mutual information can be used as a measure for the association of a genetic marker or a combination of markers with the phenotype. In this paper, we study the imputation of missing genotype data. We first utilize joint mutual information to compute the dependence between SNP sites, then construct a mathematical model in order to find the two SNP sites having maximal dependence with missing SNP sites, and further study the properties of this model. Finally, an extension method to haplotype-based imputation is proposed to impute the missing values in genotype data. To verify our method, extensive experiments have been performed, and numerical results show that our method is superior to haplotype-based imputation methods. At the same time, numerical results also prove joint mutual information can better measure the dependence between SNP sites. According to experimental results, we also conclude that the dependence between the adjacent SNP sites is not necessarily strongest.

  15. Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation

    PubMed Central

    Bernhardt, Paul W.; Wang, Huixia Judy; Zhang, Daowen

    2013-01-01

    Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study. PMID:24204085

  16. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets.

    PubMed

    Torres-Garca, Wandaliz; Brown, Steven D; Johnson, Roger H; Zhang, Weiwen; Runger, George C; Meldrum, Deirdre R

    2011-04-01

    Despite significant improvements in recent years, proteomic datasets currently available still suffer from large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic datasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes' expression was measured after the cells were exposed to 1 mM potassium chromate for 5, 30, 60, and 90 min, while protein abundance was measured for 45 and 90 min. With the ultimate objective to impute protein values for experimentally undetected samples at 45 and 90 min, we applied a serial set of algorithms to capture relationships between temporal gene and protein expression. This work follows four main steps: (1) a quality control step for gene expression reliability, (2) mRNA imputation, (3) protein prediction, and (4) validation. Initially, an S control chart approach is performed on gene expression replicates to remove unwanted variability. Then, we focused on the missing measurements of gene expression through a nonlinear Smoothing Splines Curve Fitting. This method identifies temporal relationships among transcriptomic data at different time points and enables imputation of mRNA abundance at 45 min. After mRNA imputation was validated by biological constrains (i.e. operons), we used a data-driven GBT model to impute protein abundance for the proteins experimentally undetected in the 45 and 90 min samples, based on relevant predictors such as temporal mRNA gene expression data and cellular functional roles. The imputed protein values were validated using biological constraints such as operon and pathway information through a permutation test to investigate whether dispersion measures are indeed smaller for known biological groups than for any set of random genes. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate. PMID:21212895

  17. Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.

    PubMed

    Blue, Elizabeth M; Sun, Lei; Tintle, Nathan L; Wijsman, Ellen M

    2014-09-01

    When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184

  18. On the value of Mendelian laws of segregation in families: data quality control, imputation and beyond

    PubMed Central

    Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.

    2014-01-01

    When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups Quality Control and Dropping WGS through families using GWAS framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184

  19. Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

    PubMed Central

    Andridge, Rebecca. R.

    2011-01-01

    In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309

  20. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  1. Large-scale epigenome imputation improves data quality and disease variant enrichment

    PubMed Central

    Ernst, Jason; Kellis, Manolis

    2015-01-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals, and surpass experimental datasets in consistency, recovery of gene annotations, and enrichment for disease-associated variants. We use the imputed data to detect low quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments, and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  2. Identifying causal regulatory SNPs in ChIP-seq enhancers

    PubMed Central

    Huang, Di; Ovcharenko, Ivan

    2015-01-01

    Thousands of non-coding SNPs have been linked to human diseases in the past. The identification of causal alleles within this pool of disease-associated non-coding SNPs is largely impossible due to the inability to accurately quantify the impact of non-coding variation. To overcome this challenge, we developed a computational model that uses ChIP-seq intensity variation in response to non-coding allelic change as a proxy to the quantification of the biological role of non-coding SNPs. We applied this model to HepG2 enhancers and detected 4796 enhancer SNPs capable of disrupting enhancer activity upon allelic change. These SNPs are significantly over-represented in the binding sites of HNF4 and FOXA families of liver transcription factors and liver eQTLs. In addition, these SNPs are strongly associated with liver GWAS traits, including type I diabetes, and are linked to the abnormal levels of HDL and LDL cholesterol. Our model is directly applicable to any enhancer set for mapping causal regulatory SNPs. PMID:25520196

  3. Imputation of missing data using machine learning techniques

    SciTech Connect

    Lakshminarayan, Kamakshi; Harp, S.A.; Goldman, R.; Samad, T.

    1996-12-31

    A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine learning techniques. The first is an unsupervised clustering strategy which uses a Bayesian approach to cluster the data into classes. The classes so obtained are then used to predict multiple choices for the attribute of interest. The second technique involves modeling missing variables by supervised induction of a decision tree-based classifier. This predicts the most likely value for the attribute of interest. Empirical tests using extracts from industrial databases maintained by Honeywell customers have been done in order to compare the two techniques. These tests show both approaches are useful and have advantages and disadvantages. We argue that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and discuss potential applications for the procedures we are developing.

  4. Identification, validation and high-throughput genotyping of transcribed gene SNPs in cassava.

    PubMed

    Ferguson, Morag E; Hearne, Sarah J; Close, Timothy J; Wanamaker, Steve; Moskal, William A; Town, Christopher D; de Young, Joe; Marri, Pradeep Reddy; Rabbi, Ismail Yusuf; de Villiers, Etienne P

    2012-03-01

    The availability of genomic resources can facilitate progress in plant breeding through the application of advanced molecular technologies for crop improvement. This is particularly important in the case of less researched crops such as cassava, a staple and food security crop for more than 800 million people. Here, expressed sequence tags (ESTs) were generated from five drought stressed and well-watered cassava varieties. Two cDNA libraries were developed: one from root tissue (CASR), the other from leaf, stem and stem meristem tissue (CASL). Sequencing generated 706 contigs and 3,430 singletons. These sequences were combined with those from two other EST sequencing initiatives and filtered based on the sequence quality. Quality sequences were aligned using CAP3 and embedded in a Windows browser called HarvEST:Cassava which is made available. HarvEST:Cassava consists of a Unigene set of 22,903 quality sequences. A total of 2,954 putative SNPs were identified. Of these 1,536 SNPs from 1,170 contigs and 53 cassava genotypes were selected for SNP validation using Illumina's GoldenGate assay. As a result 1,190 SNPs were validated technically and biologically. The location of validated SNPs on scaffolds of the cassava genome sequence (v.4.1) is provided. A diversity assessment of 53 cassava varieties reveals some sub-structure based on the geographical origin, greater diversity in the Americas as opposed to Africa, and similar levels of diversity in West Africa and southern, eastern and central Africa. The resources presented allow for improved genetic dissection of economically important traits and the application of modern genomics-based approaches to cassava breeding and conservation. PMID:22069119

  5. Missing value imputation for microarray data: a comprehensive comparison study and a web tool

    PubMed Central

    2013-01-01

    Background Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. Results In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. Conclusions In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses. PMID:24565220

  6. Assets of imputation to ultra-high density for productive and functional traits.

    PubMed

    Jimnez-Montero, J A; Gianola, D; Weigel, K; Alenda, R; Gonzlez-Recio, O

    2013-09-01

    The aim of this study was to evaluate different-density genotyping panels for genotype imputation and genomic prediction. Genotypes from customized Golden Gate Bovine3K BeadChip [LD3K; low-density (LD) 3,000-marker (3K); Illumina Inc., San Diego, CA] and BovineLD BeadChip [LD6K; 6,000-marker (6K); Illumina Inc.] panels were imputed to the BovineSNP50v2 BeadChip [50K; 50,000-marker; Illumina Inc.]. In addition, LD3K, LD6K, and 50K genotypes were imputed to a BovineHD BeadChip [HD; high-density 800,000-marker (800K) panel], and with predictive ability evaluated and compared subsequently. Comparisons of prediction accuracy were carried out using Random boosting and genomic BLUP. Four traits under selection in the Spanish Holstein population were used: milk yield, fat percentage (FP), somatic cell count, and days open (DO). Training sets at 50K density for imputation and prediction included 1,632 genotypes. Testing sets for imputation from LD to 50K contained 834 genotypes and testing sets for genomic evaluation included 383 bulls. The reference population genotyped at HD included 192 bulls. Imputation using BEAGLE software (http://faculty.washington.edu/browning/beagle/beagle.html) was effective for reconstruction of dense 50K and HD genotypes, even when a small reference population was used, with 98.3% of SNP correctly imputed. Random boosting outperformed genomic BLUP in terms of prediction reliability, mean squared error, and selection effectiveness of top animals in the case of FP. For other traits, however, no clear differences existed between methods. No differences were found between imputed LD and 50K genotypes, whereas evaluation of genotypes imputed to HD was on average across data set, method, and trait, 4% more accurate than 50K prediction, and showed smaller (2%) mean squared error of predictions. Similar bias in regression coefficients was found across data sets but regressions were 0.32 units closer to unity for DO when genotypes were imputed to HD density. Imputation to HD genotypes might produce higher stability in the genomic proofs of young candidates. Regarding selection effectiveness of top animals, more (2%) top bulls were classified correctly with imputed LD6K genotypes than with LD3K. When the original 50K genotypes were used, correct classification of top bulls increased by 1%, and when those genotypes were imputed to HD, 3% more top bulls were detected. Selection effectiveness could be slightly enhanced for certain traits such as FP, somatic cell count, or DO when genotypes are imputed to HD. Genetic evaluation units may consider a trait-dependent strategy in terms of method and genotype density for use in the genome-enhanced evaluations. PMID:23810591

  7. Worldwide population differentiation at disease-associated SNPs

    PubMed Central

    Myles, Sean; Davison, Dan; Barrett, Jeffrey; Stoneking, Mark; Timpson, Nic

    2008-01-01

    Background Recent genome-wide association (GWA) studies have provided compelling evidence of association between genetic variants and common complex diseases. These studies have made use of cases and controls almost exclusively from populations of European ancestry and little is known about the frequency of risk alleles in other populations. The present study addresses the transferability of disease associations across human populations by examining levels of population differentiation at disease-associated single nucleotide polymorphisms (SNPs). Methods We genotyped ~1000 individuals from 53 populations worldwide at 25 SNPs which show robust association with 6 complex human diseases (Crohn's disease, type 1 diabetes, type 2 diabetes, rheumatoid arthritis, coronary artery disease and obesity). Allele frequency differences between populations for these SNPs were measured using Fst. The Fst values for the disease-associated SNPs were compared to Fst values from 2750 random SNPs typed in the same set of individuals. Results On average, disease SNPs are not significantly more differentiated between populations than random SNPs in the genome. Risk allele frequencies, however, do show substantial variation across human populations and may contribute to differences in disease prevalence between populations. We demonstrate that, in some cases, risk allele frequency differences are unusually high compared to random SNPs and may be due to the action of local (i.e. geographically-restricted) positive natural selection. Moreover, some risk alleles were absent or fixed in a population, which implies that risk alleles identified in one population do not necessarily account for disease prevalence in all human populations. Conclusion Although differences in risk allele frequencies between human populations are not unusually large and are thus likely not due to positive local selection, there is substantial variation in risk allele frequencies between populations which may account for differences in disease prevalence between human populations. PMID:18533027

  8. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets

    SciTech Connect

    Torres-Garca, Wandaliz; Brown, Steven D; Johnson, Roger; Zhang, Weiwen; Runger, George; Meldrum, Deirdre

    2011-01-01

    Despite significant improvements in recent years, proteomic datasets currently available still suffer large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic da-tasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values for proteins experi-mentally undetected, using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes expression was measured after the cells were exposed to 1 mM potassium chromate for 5-, 30-, 60-, and 90-min, while protein abundance was measured only for 45- and 90-min samples. With the goal of elucidating the relationship between temporal gene expression and protein abundance data, and then using it to impute missing proteomic values for samples of 45-min (which does not have cognate transcriptomic data) and 90-min, we initially used nonlinear Smoothing Splines Curve Fitting (SSCF) to identify temporal relationships among transcriptomic data at different time points and then imputed missing gene expression measurements for the sample at 45-min. After the imputation was validated by biological constrains (i.e. operons), we used a data-driven Gradient Boosted Trees (GBT) model to uncover possible non-linear relationships between temporal transcriptomic and proteomic data, and to impute protein abundance for the proteins experimentally undetected in the 45- and 90-min sam-ples, based on relevant predictors such as temporal mRNA gene expression data, cellular roles, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. The imputed protein values were validated using biological constraints such as operon, regulon and pathway information. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.

  9. Imputation of Truncated p-Values For Meta-Analysis Methods and Its Genomic Application1

    PubMed Central

    Tang, Shaowu; Ding, Ying; Sibille, Etienne; Mogil, Jeffrey; Lariviere, William R.; Tseng, George C.

    2014-01-01

    Microarray analysis to monitor expression activities in thousands of genes simultaneously has become routine in biomedical research during the past decade. a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that aggregate transformed p-value evidence have been widely used in genomic settings, among which Fisher's and Stouffer's methods are the most popular ones. In practice, raw data and p-values of DE evidence are often not available in genomic studies that are to be combined. Instead, only the detected DE gene lists under a certain p-value threshold (e.g., DE genes with p-value < 0.001) are reported in journal publications. The truncated p-value information makes the aforementioned meta-analysis methods inapplicable and researchers are forced to apply a less efficient vote counting method or navely drop the studies with incomplete information. The purpose of this paper is to develop effective meta-analysis methods for such situations with partially censored p-values. We developed and compared three imputation methodsmean imputation, single random imputation and multiple imputationfor a general class of evidence aggregation methods of which Fisher's and Stouffer's methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis frameworks were established. Simulations were performed to investigate the type Ierror, power and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were applied to several genomic applications in colorectal cancer, pain and liquid association analysis of major depressive disorder (MDD). The results showed that imputation methods outperformed existing nave approaches. Mean imputation and multiple imputation methods performed the best and are recommended for future applications. PMID:25541588

  10. Imputation of SF-12 Health Scores for Respondents with Partially Missing Data

    PubMed Central

    Liu, Honghu; Hays, Ron D; Adams, John L; Chen, Wen-Pin; Tisnado, Diana; Mangione, Carol M; Damberg, Cheryl L; Kahn, Katherine L

    2005-01-01

    Objective To create an efficient imputation algorithm for imputing the SF-12 physical component summary (PCS) and mental component summary (MCS) scores when patients have one to eleven SF-12 items missing. Study Setting Primary data collection was performed between 1996 and 1998. Study Design Multi-pattern regression was conducted to impute the scores using only available SF-12 items (simple model), and then supplemented by demographics, smoking status and comorbidity (enhanced model) to increase the accuracy. A cut point of missing SF-12 items was determined for using the simple or the enhanced model. The algorithm was validated through simulation. Data Collection Thirty-thousand-three-hundred and eight patients from 63 physician groups were surveyed for a quality of care study in 1996, which collected the SF-12 and other information. The patients were classified as chronic patients if they reported that they had diabetes, heart disease, asthma/chronic obstructive pulmonary disease, or low back pain. A follow-up survey was conducted in 1998. Principal Findings Thirty-one percent of the patients missed at least one SF-12 item. Means of variance of prediction and standard errors of the mean imputed scores increased with the number of missing SF-12 items. Correlations between the observed and the imputed scores derived from the enhanced models were consistently higher than those derived from the simple model and the increments were significant for patients with ?6 missing SF-12 items (p<.03). Conclusion Missing SF-12 items are prevalent and lead to reduced analytical power. Regression-based multi-pattern imputation using the available SF-12 items is efficient and can produce good estimates of the scores. The enhancement from the additional patient information can significantly improve the accuracy of the imputed scores for patients with ?6 items missing, leading to estimated scores that are as accurate as that of patients with <6 missing items. PMID:15960697

  11. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy

    PubMed Central

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy. PMID:26283989

  12. Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis.

    PubMed

    Siddique, Juned; Reiter, Jerome P; Brincks, Ahnalee; Gibbons, Robert D; Crespi, Catherine M; Brown, C Hendricks

    2015-11-20

    There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials and use multiple imputation to fill in missing measurements. We apply our method to five longitudinal adolescent depression trials where four studies used one depression measure and the fifth study used a different depression measure. None of the five studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigate whether external information is appropriately incorporated into the imputed values. PMID:26095855

  13. Multiple imputation and analysis for high-dimensional incomplete proteomics data.

    PubMed

    Yin, Xiaoyan; Levy, Daniel; Willinger, Christine; Adourian, Aram; Larson, Martin G

    2016-04-15

    Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ≫ N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:26565662

  14. Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data.

    PubMed

    Sullivan, Thomas R; Salter, Amy B; Ryan, Philip; Lee, Katherine J

    2015-09-15

    Multiple imputation (MI) is increasingly being used to handle missing data in epidemiologic research. When data on both the exposure and the outcome are missing, an alternative to standard MI is the "multiple imputation, then deletion" (MID) method, which involves deleting imputed outcomes prior to analysis. While MID has been shown to provide efficiency gains over standard MI when analysis and imputation models are the same, the performance of MID in the presence of auxiliary variables for the incomplete outcome is not well understood. Using simulated data, we evaluated the performance of standard MI and MID in regression settings where data were missing on both the outcome and the exposure and where an auxiliary variable associated with the incomplete outcome was included in the imputation model. When the auxiliary variable was unrelated to missingness in the outcome, both standard MI and MID produced negligible bias when estimating regression parameters, with standard MI being more efficient in most settings. However, when the auxiliary variable was also associated with missingness in the outcome, alarmingly MID produced markedly biased parameter estimates. On the basis of these results, we recommend that researchers use standard MI rather than MID in the presence of auxiliary variables associated with an incomplete outcome. PMID:26337075

  15. Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data

    PubMed Central

    Edwards, Jessie K.; Cole, Stephen R.; Troester, Melissa A.; Richardson, David B.

    2013-01-01

    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods. PMID:24627573

  16. Can we spin straw into gold? An evaluation of immigrant legal status imputation approaches.

    PubMed

    Van Hook, Jennifer; Bachmeier, James D; Coffman, Donna L; Harel, Ofer

    2015-02-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants' legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants' legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  17. Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches

    PubMed Central

    Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer

    2014-01-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  18. PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

    PubMed Central

    Livne, Oren E.; Han, Lide; Alkorta-Aranburu, Gorka; Wentworth-Sheilds, William; Abney, Mark; Ober, Carole; Nicolae, Dan L.

    2015-01-01

    Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost. PMID:25735005

  19. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    PubMed

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn Ga; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John Rb

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7 879 351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation. PMID:26014426

  20. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

    PubMed

    Jostins, Luke; Morley, Katherine I; Barrett, Jeffrey C

    2011-06-01

    Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow. PMID:21364697

  1. Identification of SNPs associated with variola virus virulence

    PubMed Central

    2013-01-01

    Background Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. Conclusions We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity. PMID:23410064

  2. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation

    PubMed Central

    Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.

    2015-01-01

    This paper addresses the issue of exact-test based statistical inference for Hardy?Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy?Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, ?2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy?Weinberg disequilibrium. Depending on the imputation method used, 6?13% of the test results changed qualitatively at the 5% level. PMID:26377959

  3. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers.

    PubMed

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  4. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    PubMed Central

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  5. Imputation-based genomic coverage assessments of current human genotyping arrays.

    PubMed

    Nelson, Sarah C; Doheny, Kimberly F; Pugh, Elizabeth W; Romm, Jane M; Ling, Hua; Laurie, Cecelia A; Browning, Sharon R; Weir, Bruce S; Laurie, Cathy C

    2013-10-01

    Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. "Genomic coverage" is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50-80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies. PMID:23979933

  6. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which may help to identify deleterious alleles that are the basis of inbreeding depression in the species. PMID:23324311

  7. A suggested approach for imputation of missing dietary data for young children in daycare

    PubMed Central

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. Results The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children. PMID:26689313

  8. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  9. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  10. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  11. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  12. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions 1508.630 May the African Development Foundation impute conduct of one...

  13. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  14. Next generation tools for the annotation of human SNPs

    PubMed Central

    2009-01-01

    Computational biology has the opportunity to play an important role in the identification of functional single nucleotide polymorphisms (SNPs) discovered in large-scale genotyping studies, ultimately yielding new drug targets and biomarkers. The medical genetics and molecular biology communities are increasingly turning to computational biology methods to prioritize interesting SNPs found in linkage and association studies. Many such methods are now available through web interfaces, but the interested user is confronted with an array of predictive results that are often in disagreement with each other. Many tools today produce results that are difficult to understand without bioinformatics expertise, are biased towards non-synonymous SNPs, and do not necessarily reflect up-to-date versions of their source bioinformatics resources, such as public SNP repositories. Here, I assess the utility of the current generation of webservers; and suggest improvements for the next generation of webservers to better deliver value to medical geneticists and molecular biologists. PMID:19181721

  15. Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

    PubMed Central

    Lee, Minjung; Dignam, James J.; Han, Junhee

    2014-01-01

    We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107

  16. Analysis of mitochondrial transcription factor A SNPs in alcoholic cirrhosis

    PubMed Central

    TANG, CHUN; LIU, HONGMING; TANG, YONGLIANG; GUO, YONG; LIANG, XIANCHUN; GUO, LIPING; PI, RUXIAN; YANG, JUNTAO

    2014-01-01

    Genetic susceptibility to alcoholic cirrhosis (AC) exists. We previously demonstrated hepatic mitochondrial DNA (mtDNA) damage in patients with AC compared with chronic alcoholics without cirrhosis. Mitochondrial transcription factor A (mtTFA) is central to mtDNA expression regulation and repair; however, it is unclear whether there are specific mtTFA single nucleotide polymorphisms (SNPs) in patients with AC and whether they affect mtDNA repair. In the present study, we screened mtTFA SNPs in patients with AC and analyzed their impact on the copy number of mtDNA in AC. A total of 50 patients with AC, 50 alcoholics without AC and 50 normal subjects were enrolled in the study. SNPs of full-length mtTFA were analyzed using the polymerase chain reaction (PCR) combined with gene sequencing. The hepatic mtTFA mRNA and mtDNA copy numbers were measured using quantitative PCR (qPCR), and mtTFA protein was measured using western blot analysis. A total of 18 mtTFA SNPs specific to patients with AC with frequencies >10% were identified. Two were located in the coding region and 16 were identified in non-coding regions. Conversely, there were five SNPs that were only present in patients with AC and normal subjects and had a frequency >10%. In the AC group, the hepatic mtTFA mRNA and protein levels were significantly lower than those in the other two groups. Moreover, the hepatic mtDNA copy number was significantly lower in the AC group than in the controls and alcoholics without AC. Based on these data, we conclude that AC-specific mtTFA SNPs may be responsible for the observed reductions in mtTFA mRNA, protein levels and mtDNA copy number and they may also increase the susceptibility to AC. PMID:24348767

  17. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough.

    PubMed

    McMahon, George; Ring, Susan M; Davey-Smith, George; Timpson, Nicholas J

    2015-10-15

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case-control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E - 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  18. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough

    PubMed Central

    McMahon, George; Ring, Susan M.; Davey-Smith, George; Timpson, Nicholas J.

    2015-01-01

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case–control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E − 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  19. Genotyping of Snps in a polyploid genome by pyrosequencing.

    PubMed

    Rickert, Andreas M; Premstaller, Andreas; Gebhardt, Christiane; Oefner, Peter J

    2002-03-01

    Single-nucleotide polymorphisms (SNPs) are the most frequent DNA sequence variations, and they have become increasingly popular markers for association studies. Allelic discrimination of the mostly binary SNPs has been reported for diploid species, mainly the human, but not for polyploid genomes such as the agriculturally important crops. In the present study, we analyzed the applicability of pyrosequencing to genotyping SNPs in tetraploid potatoes. Out of 94 polymorphic loci tested, 76 (81%) proved to be amenable to allelic discrimination by pyrosequencing. An additional locus could be genotyped by the addition of an ssDNA binding protein to the pyrosequencing reaction. Of the remaining 17 loci, two failed because of the presence of paralogs in the genome, while in the other cases, self-annealing of the primer or template at the low reaction temperature (28 degrees C) employed in pyrosequencing rendered allelic discrimination impossible. The quantitative precision ofpyrosequencing was found to be similar to that of conventional dideoxy sequencing and single-nucleotide primer extension. Exceptfor some sequencespecific limitations, pyrosequencing appears to be an appropriate method for genotying SNPs in polyploid species because it is possible to distinguish not only between homoand heterozygosity but also between the different heterozygous states. PMID:11911662

  20. Outlooks of thermionic SNPS application for material production in space

    NASA Astrophysics Data System (ADS)

    Gryaznov, George M.; Zhabotinsky, Evgeny E.; Andreev, Pavel V.; Galkin, Anatoly Ya.; Nikonov, Anatoly M.; Serbin, Victor I.; Usov, Veniamin A.

    1993-01-01

    One of the important future space technologies is the space production of semiconductive substances, composite materials, medical preparations and so on. Required for this electric power will equal 20-50 kW in nearest ten-year period with further increasing to one hundred and more kilowatt. For similar production spacecrafts the application of thermionic space nuclear power (SNPS) can be perspective because these systems have a number of important advantages as compared with solar systems for spacecrafts of this type. The possible application concepts of thermionic SNPS with electric power 25 to 30 kW production spacecrafts are considered including the next versions: automatic loading and unloading of raw materials and products; loading and unloading executed by cosmonauts when SNPS being as a part of spacecraft and its demating from spacecraft during the process fulfillment; remote power transmission for power supply of production spacecrafts. It is shown that all considered versions thermionic SNPS can provide power supply of production spacecrafts with satisfying necessary operation requirements.

  1. Association analysis of candidate SNPs on reproductive traits in swine

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Being able to identify young females with superior reproduction traits would have a large financial impact on commercial swine producers. Previous studies have discovered SNPs associated with economically important traits such as litter size, growth rate, fat deposition, and feed intake. The objecti...

  2. Quality assessment parameters for EST-derived SNPs from catfish

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two factors were found to be most significant for validation of EST-derived SNPs: the contig size and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contig sizes were equal to or larger than...

  3. Effects of reduced panel, reference origin, and genetic relationship on imputation of genotypes in Hereford cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to investigate alternative methods for designing and utilizing reduced single nucleotide polymorphism (SNP) panels for imputing SNP genotypes. Two purebred Hereford populations, an experimental population known as Line 1 Hereford (L1, N=240) and registered Hereford wi...

  4. 37 CFR 11.110 - Imputation of conflicts of interest; General rule.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... TRADEMARK OFFICE USPTO Rules of Professional Conduct Client-Practitioner Relationship 11.110 Imputation of... knowingly represent a client when any one of them practicing alone would be prohibited from doing so by... practitioner and does not present a significant risk of materially limiting the representation of the client...

  5. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  6. Generating Multiple Imputations for Matrix Sampling Data Analyzed with Item Response Models.

    ERIC Educational Resources Information Center

    Thomas, Neal; Gan, Nianci

    1997-01-01

    Describes and assesses missing data methods currently used to analyze data from matrix sampling designs implemented by the National Assessment of Educational Progress. Several improved methods are developed, and these models are evaluated using an EM algorithm to obtain maximum likelihood estimates followed by multiple imputation of complete data…

  7. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    ERIC Educational Resources Information Center

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a…

  8. Evaluation of an Imputed Pitch Velocity Model of the Auditory Kappa Effect

    ERIC Educational Resources Information Center

    Henry, Molly J.; McAuley, J. Devin

    2009-01-01

    Three experiments evaluated an imputed pitch velocity model of the auditory kappa effect. Listeners heard 3-tone sequences and judged the timing of the middle (target) tone relative to the timing of the 1st and 3rd (bounding) tones. Experiment 1 held pitch constant but varied the time (T) interval between bounding tones (T = 728, 1,000, or 1,600

  9. Imputation of missing genotypes from sparse to high density using long-range phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals share potentially long chromosome segments that trace to a common ancestor. A phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations was developed to phase large sections of a chromosome. In addition to phasing, ChromoPhase imputes missing genotyp...

  10. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    PubMed

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. PMID:26477633

  11. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their

  12. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence

  13. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    ERIC Educational Resources Information Center

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a

  14. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

  15. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their…

  16. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in

  17. Probability genotype imputation method and integrated weighted lasso for QTL identification

    PubMed Central

    2013-01-01

    Background Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings “sparsity” and “causal inference”. The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest. Results Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax. Conclusions Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification. PMID:24378210

  18. Uncovering nativity disparities in cancer patterns: A multiple imputation strategy to handle missing nativity data in the SEER data file

    PubMed Central

    Montealegre, Jane R.; Zhou, Renke; Amirian, E. Susan; Scheurer, Michael E.

    2014-01-01

    Background While birthplace data are routinely collected in the participating Surveillance, Epidemiology, and End Results (SEER) registries, such data are missing in a non-random manner for a large proportion of cases. This hinders analysis of nativity-related cancer disparities. We evaluate multiple imputation of nativity status among Hispanic patients diagnosed with cervix, prostate, and colorectal cancer and demonstrate the effect of multiple imputation on apparent nativity disparities in survival. Methods We used multiple imputation by logistic regression to generate nativity values (U.S.- versus foreign-born) using a priori-defined variables. The accuracy of the method was evaluated among a subset of cases. We used Kaplan-Meier curves to illustrate the effect of imputation by comparing survival among U.S.- and foreign-born Hispanics, with and without imputation of nativity. Results Birthplace was missing for 31%, 49%, and 39% of cervical, prostate, and colorectal cancer cases, respectively. The sensitivity of the imputation strategy for detecting foreign-born status was ? 90% and the specificity ? 86%. The agreement between the true and imputed values was ? 0.80 and the misclassification error was ? 10%. Kaplan-Meier survival curves indicated different associations between nativity and survival when nativity was imputed versus when cases with missing birthplace were omitted from the analysis. Conclusions Multiple imputation using variables available in the SEER data file can be used to accurately detect foreign-born status. This simple strategy may aid researchers to disaggregate analyses by nativity and uncover important nativity disparities in regard to cancer diagnosis, treatment, and survival. PMID:24436157

  19. PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease

    PubMed Central

    Jegga, Anil G.; Gowrisankar, Sivakumar; Chen, Jing; Aronow, Bruce J.

    2007-01-01

    As knowledge of human genetic polymorphisms grows, so does the opportunity and challenge of identifying those polymorphisms that may impact the health or disease risk of an individual person. A critical need is to organize large-scale polymorphism analyses and to prioritize candidate non-synonymous coding SNPs (nsSNPs) that should be tested in experimental and epidemiological studies to establish their context-specific impacts on protein function. In addition, with emerging high-resolution clinical genetics testing, new polymorphisms must be analyzed in the context of all available protein feature knowledge including other known mutations and polymorphisms. To approach this, we developed PolyDoms () as a database to integrate the results of multiple algorithmic procedures and functional criteria applied to the entire Entrez dbSNP dataset. In addition to predicting structural and functional impacts of all nsSNPs, filtering functions enable group-based identification of potentially harmful nsSNPs among multiple genes associated with specific diseases, anatomies, mammalian phenotypes, gene ontologies, pathways or protein domains. PolyDoms, thus, provides a means to derive a list of candidate SNPs to be evaluated in experimental or epidemiological studies for impact on protein functions and disease risk associations. PolyDoms will continue to be curated to improve its usefulness. PMID:17142238

  20. SNP-VISTA: An Interactive SNPs Visualization Tool

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.

  1. In-Silico Computing of the Most Deleterious nsSNPs in HBA1 Gene

    PubMed Central

    AbdulAzeez, Sayed; Borgio, J. Francis

    2016-01-01

    Background ?-Thalassemia (?-thal) is a genetic disorder caused by the substitution of single amino acid or large deletions in the HBA1 and/or HBA2 genes. Method Using modern bioinformatics tools as a systematic in-silico approach to predict the deleterious SNPs in the HBA1 gene and its significant pathogenic impact on the functions and structure of HBA1 protein was predicted. Results and Discussion A total of 389 SNPs in HBA1 were retrieved from dbSNP database, which includes: 201 non-coding synonymous (nsSNPs), 43 human active SNPs, 16 intronic SNPs, 11 mRNA 3? UTR SNPs, 9 coding synonymous SNPs, 9 5? UTR SNPs and other types. Structural homology-based method (PolyPhen) and sequence homology-based tool (SIFT), SNPs&Go, PROVEAN and PANTHER revealed that 2.4% of the nsSNPs are pathogenic. Conclusions A total of 5 nsSNPs (G60V, K17M, K17T, L92F and W15R) were predicted to be responsible for the structural and functional modifications of HBA1 protein. It is evident from the deep comprehensive in-silico analysis that, two nsSNPs such as G60Vand W15R in HBA1 are highly deleterious. These 2 pathogenic nsSNPs can be considered for wet-lab confirmatory analysis. PMID:26824843

  2. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage.

    PubMed

    Wilson, Barry Tyler; Woodall, Christopher W; Griffith, Douglas M

    2013-01-01

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.'s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while allowing timely incorporation of empirical data (e.g., annual forest inventory). PMID:23305341

  3. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    PubMed Central

    2013-01-01

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while allowing timely incorporation of empirical data (e.g., annual forest inventory). PMID:23305341

  4. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population.

    PubMed

    Jattawa, Danai; Elzo, Mauricio A; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-04-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  5. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    PubMed Central

    Jattawa, Danai; Elzo, Mauricio A.; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-01-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  6. Comparison of SNPs and microsatellites in identifying offtypes of cacao clones from Cameroon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single Nucleotide Polymorphism (SNP) markers are increasingly being used in crop breeding programs, slowly replacing microsatellites and other markers. SNPs provide many benefits over microsatellites, including ease of analysis and unambiguous results across various platforms. We compare SNPs to m...

  7. The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

    PubMed

    Minic?, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

    2013-05-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation. PMID:23519635

  8. Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'.

    PubMed

    Deelen, Patrick; Menelaou, Androniki; van Leeuwen, Elisabeth M; Kanterakis, Alexandros; van Dijk, Freerk; Medina-Gomez, Carolina; Francioli, Laurent C; Hottenga, Jouke Jan; Karssen, Lennart C; Estrada, Karol; Kreiner-Møller, Eskil; Rivadeneira, Fernando; van Setten, Jessica; Gutierrez-Achury, Javier; Westra, Harm-Jan; Franke, Lude; van Enckevort, David; Dijkstra, Martijn; Byelas, Heorhiy; van Duijn, Cornelia M; de Bakker, Paul I W; Wijmenga, Cisca; Swertz, Morris A

    2014-11-01

    Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results. PMID:24896149

  9. Sensitivity to imputation models and assumptions in receiver operating characteristic analysis with incomplete data

    PubMed Central

    Karakaya, Jale; Karabulut, Erdem; Yucel, Recai M.

    2015-01-01

    Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. PMID:26379316

  10. Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys

    PubMed Central

    Rendall, Michael S.; Ghosh-Dastidar, Bonnie; Weden, Margaret M.; Baker, Elizabeth H.; Nazarov, Zafar

    2013-01-01

    Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys. PMID:24223447

  11. Inference from Multiple Imputation for Missing Data Using Mixtures of Normals

    PubMed Central

    Steele, Russell J.; Wang, Naisyin; Raftery, Adrian E.

    2010-01-01

    We consider two difficulties with standard multiple imputation methods for missing data based on Rubin's t method for confidence intervals: their often excessive width, and their instability. These problems are present most often when the number of copies is small, as is often the case when a data collection organization is making multiple completed datasets available for analysis. We suggest using mixtures of normals as an alternative to Rubin's t. We also examine the performance of improper imputation methods as an alternative to generating copies from the true posterior distribution for the missing observations. We report the results of simulation studies and analyses of data on health-related quality of life in which the methods suggested here gave narrower confidence intervals and more stable inferences, especially with small numbers of copies or non-normal posterior distributions of parameter estimates. A free R software package called MImix that implements our methods is available from CRAN. PMID:20454634

  12. Disk filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  13. Disk filter

    DOEpatents

    Bergman, Werner (Pleasanton, CA)

    1986-01-01

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  14. Normalization and missing value imputation for label-free LC-MS analysis

    SciTech Connect

    Karpievitch, Yuliya; Dabney, Alan R.; Smith, Richard D.

    2012-11-05

    Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

  15. SNPs Selection using Gravitational Search Algorithm and Exhaustive Search for Association Mapping

    NASA Astrophysics Data System (ADS)

    Kusuma, W. A.; Hasibuan, L. S.; Istiadi, M. A.

    2016-01-01

    Single Nucleotide Polymorphisms (SNPs) are known having association to phenotipic variations. The study of linking SNPs to interest phenotype is refer to Association Mapping (AM), which is classified as a combinatorial problem. Exhaustive Search (ES) approach is able to be implemented to select targeted SNPs exactly since it evaluate all possible combinations of SNPs, but it is not efficient in terms of computer resources and computation time. Heuristic Search (HS) approach is an alternative to improve the performance of ES in those terms, but it still suffers high false positive SNPs in each combinations. Gravitational Search Algorithm (GSA) is a new HS algorithm that yields better performance than other nature inspired HS. This paper proposed a new method which combined GSA and ES to identify the most appropriate combination of SNPs linked to interest phenotype. Testing was conducted using dataset without epistasis and dataset with epistasis. Using dataset without epistasis with 7 targeted SNPs, the proposed method identified 7 SNPs - 6 True Positive (TP) SNPs and 1 False Positive (FP) SNP- with association value of 0.83. In addition, the proposed method could identified 3 SNPs- 2 TP SNP and 1 FP SNP with association value of 0.87 by using dataset with epistases and 5 targeted SNPs. The results showed that the method is robust in reducing redundant SNPs and identifying main markers.

  16. Evaluating model-based imputation methods for missing covariates in regression models with interactions.

    PubMed

    Kim, Soeun; Sugar, Catherine A; Belin, Thomas R

    2015-05-20

    Imputation strategies are widely used in settings that involve inference with incomplete data. However, implementation of a particular approach always rests on assumptions, and subtle distinctions between methods can have an impact on subsequent analyses. In this research article, we are concerned with regression models in which the true underlying relationship includes interaction terms. We focus in particular on a linear model with one fully observed continuous predictor, a second partially observed continuous predictor, and their interaction. We derive the conditional distribution of the missing covariate and interaction term given the observed covariate and the outcome variable, and examine the performance of a multiple imputation procedure based on this distribution. We also investigate several alternative procedures that can be implemented by adapting multivariate normal multiple imputation software in ways that might be expected to perform well despite incompatibilities between model assumptions and true underlying relationships among the variables. The methods are compared in terms of bias, coverage, and CI width. As expected, the procedure based on the correct conditional distribution performs well across all scenarios. Just as importantly for general practitioners, several of the approaches based on multivariate normality perform comparably with the correct conditional distribution in a number of circumstances, although interestingly, procedures that seek to preserve the multiplicative relationship between the interaction term and the main-effects are found to be substantially less reliable. For illustration, the various procedures are applied to an analysis of post-traumatic stress disorder symptoms in a study of childhood trauma. PMID:25630757

  17. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    PubMed Central

    Turrado, Concepcin Crespo; Lpez, Mara del Carmen Meizoso; Lasheras, Fernando Snchez; Gmez, Benigno Antonio Rodrguez; Roll, Jos Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  18. Imputation of Microsatellite Alleles from Dense SNP Genotypes for Parental Verification

    PubMed Central

    McClure, Matthew; Sonstegard, Tad; Wiggans, George; Van Tassell, Curtis P

    2012-01-01

    Microsatellite (MS) markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP)-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP). A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing four dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey) were used to generate reference haplotypes. This approach has been verified (>98% accurate) for imputing the International Society of Animal Genetics recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP-based parental verification. PMID:22912645

  19. Data-driven methods for imputing national-level incidence in global burden of disease studies

    PubMed Central

    McDonald, Scott A; Speybroeck, Niko; Hens, Niel; Praet, Nicolas; Torgerson, Paul R; Havelaar, Arie H; Wu, Felicia; Tremblay, Marlne; Amene, Ermias W; Dpfer, Drte

    2015-01-01

    Abstract Objective To develop transparent and reproducible methods for imputing missing data on disease incidence at national-level for the year 2005. Methods We compared several models for imputing missing country-level incidence rates for two foodborne diseases congenital toxoplasmosis and aflatoxin-related hepatocellular carcinoma. Missing values were assumed to be missing at random. Predictor variables were selected using least absolute shrinkage and selection operator regression. We compared the predictive performance of naive extrapolation approaches and Bayesian random and mixed-effects regression models. Leave-one-out cross-validation was used to evaluate model accuracy. Findings The predictive accuracy of the Bayesian mixed-effects models was significantly better than that of the naive extrapolation method for one of the two disease models. However, Bayesian mixed-effects models produced wider prediction intervals for both data sets. Conclusion Several approaches are available for imputing missing data at national level. Strengths of a hierarchical regression approach for this type of task are the ability to derive estimates from other similar countries, transparency, computational efficiency and ease of interpretation. The inclusion of informative covariates may improve model performance, but results should be appraised carefully. PMID:26229187

  20. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION

    PubMed Central

    Allen, Genevera I.; Tibshirani, Robert

    2015-01-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  1. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions.

    PubMed

    Han, Ying; Hazelett, Dennis J; Wiklund, Fredrik; Schumacher, Fredrick R; Stram, Daniel O; Berndt, Sonja I; Wang, Zhaoming; Rand, Kristin A; Hoover, Robert N; Machiela, Mitchell J; Yeager, Merideth; Burdette, Laurie; Chung, Charles C; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C; Key, Timothy J; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L; Kolb, Suzanne; Gapstur, Susan M; Diver, W Ryan; Stevens, Victoria L; Strom, Sara S; Pettaway, Curtis A; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A; Yeboah, Edward D; Tettey, Yao; Biritwum, Richard B; Adjei, Andrew A; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P; Isaacs, William B; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M; Ingles, Sue A; Kittles, Rick A; Murphy, Adam B; Blot, William J; Signorello, Lisa B; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M Cristina; Wu, Suh-Yuh; Hennis, Anselm J M; Rybicki, Benjamin A; Neslund-Dudas, Christine; Hsing, Ann W; Chu, Lisa; Goodman, Phyllis J; Klein, Eric A; Zheng, S Lilly; Witte, John S; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L; Hunter, David J; Gronberg, Henrik; Cook, Michael B; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J; Easton, Douglas F; Henderson, Brian E; Coetzee, Gerhard A; Conti, David V; Haiman, Christopher A

    2015-10-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 10(-4)-5.6 10(-3)) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 10(-6)) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation. PMID:26162851

  2. Double Sampling with Multiple Imputation to Answer Large Sample Meta-Research Questions: Introduction and Illustration by Evaluating Adherence to Two Simple CONSORT Guidelines

    PubMed Central

    Capers, Patrice L.; Brown, Andrew W.; Dawson, John A.; Allison, David B.

    2015-01-01

    Background: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing) has improved feasibility of large meta-research questions, but possibly at the cost of accuracy. Objective: To evaluate the use of double sampling combined with multiple imputation (DS?+?MI) to address meta-research questions, using as an example adherence of PubMed entries to two simple consolidated standards of reporting trials guidelines for titles and abstracts. Methods: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT, human, abstract available, and English language (n?=?322, 107). For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI) method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO) human rating method. Multiple imputation of the missing-completely at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication. Results: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title?=?1.00, abstract?=?0.92). Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS?+?MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by year: subsample RHITLO 1.0501.174 vs. DS?+?MI 1.0821.151). As evidence of improved accuracy, DS?+?MI coefficient estimates were closer to RHITLO than the large sample RLOTHI. Conclusion: Our results support our hypothesis that DS?+?MI would result in improved precision and accuracy. This method is flexible and may provide a practical way to examine large corpora of literature. PMID:25988135

  3. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti

    PubMed Central

    2014-01-01

    Background Genetic markers are widely used to understand the biology and population dynamics of disease vectors, but often markers are limited in the resolution they provide. In particular, the delineation of population structure, fine scale movement and patterns of relatedness are often obscured unless numerous markers are available. To address this issue in the major arbovirus vector, the yellow fever mosquito (Aedes aegypti), we used double digest Restriction-site Associated DNA (ddRAD) sequencing for the discovery of genome-wide single nucleotide polymorphisms (SNPs). We aimed to characterize the new SNP set and to test the resolution against previously described microsatellite markers in detecting broad and fine-scale genetic patterns in Ae. aegypti. Results We developed bioinformatics tools that support the customization of restriction enzyme-based protocols for SNP discovery. We showed that our approach for RAD library construction achieves unbiased genome representation that reflects true evolutionary processes. In Ae. aegypti samples from three continents we identified more than 18,000 putative SNPs. They were widely distributed across the three Ae. aegypti chromosomes, with 47.9% found in intergenic regions and 17.8% in exons of over 2,300 genes. Pattern of their imputed effects in ORFs and UTRs were consistent with those found in a recent transcriptome study. We demonstrated that individual mosquitoes from Indonesia, Australia, Vietnam and Brazil can be assigned with a very high degree of confidence to their region of origin using a large SNP panel. We also showed that familial relatedness of samples from a 0.4km2 area could be confidently established with a subset of SNPs. Conclusions Using a cost-effective customized RAD sequencing approach supported by our bioinformatics tools, we characterized over 18,000 SNPs in field samples of the dengue fever mosquito Ae. aegypti. The variants were annotated and positioned onto the three Ae. aegypti chromosomes. The new SNP set provided much greater resolution in detecting population structure and estimating fine-scale relatedness than a set of polymorphic microsatellites. RAD-based markers demonstrate great potential to advance our understanding of mosquito population processes, critical for implementing new control measures against this major disease vector. PMID:24726019

  4. Tracing Cattle Breeds with Principal Components Analysis Ancestry Informative SNPs

    PubMed Central

    Lewis, Jamey; Abas, Zafiris; Dadousis, Christos; Lykidis, Dimitrios; Paschou, Peristera; Drineas, Petros

    2011-01-01

    The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products. PMID:21490966

  5. Genetic profile of SNP(s) and ovulation induction.

    PubMed

    Loutradis, D; Theofanakis, Ch; Anagnostou, E; Mavrogianni, D; Partsinevelos, G A

    2012-03-01

    Obtaining an adequate number of good quality oocytes while minimizing adverse drug reactions (ADRs) and cycle cancellation rates is considered the gold standard in controlled ovarian hyperstimulation (COH) for fertility treatment. Patients who undergo IVF/ICSI cycles tend to present with different responses to exogenous gonadotrophin administration. Research has shown that the secret probably lies in the various single nucleotide polymorhisms (SNPs) in their receptor genes. The decryption of human genome provided specialists with additional information in assessing and even predicting ovarian response to COH. In this context, the study of Pharmacogenomics, Pharmacogenetics and SNPs unravels as a promising field in optimizing fertility treatment. Several SNPs in FSH and estrogen receptor genes have been detected so far, but only three of them, one in FSH receptor and two in estrogen receptor genes have been associated with ovarian response to COH. It seems that the Asn/Ser variant of the FSH receptor functions more efficiently, while the Ser/Ser and Asn/Asn variants have a tendency to resist to FSH stimulation. With regards to estrogen receptor 1 (ESR1), the Pvull and the Xbal polymorphisms seem to be associated with differences in the response to ovarian stimulation, while the Rsal polymorphism in estrogen receptor 2 (ESR2) is currently under investigation. There exists evidence supporting the hypothesis that a set of genes, all related to the FSH hormone mechanism of action, may participate along with other factors to the control of ovarian response to FSH, thus a cautious interpretation of polymorphism detection results is considered mandatory. However, identifying potential genetic markers that could predict ovarian response and implementing them in routine screening tests for every woman entering an IVF/ICSI cycle, would be able to tailor fertility treatment to each patients needs thus maximizing the success rate and eliminating potential side-effects of fertility drugs. PMID:21657995

  6. Genotyping of Brucella species using clade specific SNPs

    PubMed Central

    2012-01-01

    Background Brucellosis is a worldwide disease of mammals caused by Alphaproteobacteria in the genus Brucella. The genus is genetically monomorphic, requiring extensive genotyping to differentiate isolates. We utilized two different genotyping strategies to characterize isolates. First, we developed a microarray-based assay based on 1000 single nucleotide polymorphisms (SNPs) that were identified from whole genome comparisons of two B. abortus isolates , one B. melitensis, and one B. suis. We then genotyped a diverse collection of 85 Brucella strains at these SNP loci and generated a phylogenetic tree of relationships. Second, we developed a selective primer-extension assay system using capillary electrophoresis that targeted 17 high value SNPs across 8 major branches of the phylogeny and determined their genotypes in a large collection ( n?=?340) of diverse isolates. Results Our 1000 SNP microarray readily distinguished B. abortus, B. melitensis, and B. suis, differentiating B. melitensis and B. suis into two clades each. Brucella abortus was divided into four major clades. Our capillary-based SNP genotyping confirmed all major branches from the microarray assay and assigned all samples to defined lineages. Isolates from these lineages and closely related isolates, among the most commonly encountered lineages worldwide, can now be quickly and easily identified and genetically characterized. Conclusions We have identified clade-specific SNPs in Brucella that can be used for rapid assignment into major groups below the species level in the three main Brucella species. Our assays represent SNP genotyping approaches that can reliably determine the evolutionary relationships of bacterial isolates without the need for whole genome sequencing of all isolates. PMID:22712667

  7. A Latent Model for Prioritization of SNPs for Functional Studies

    PubMed Central

    Fridley, Brooke L.; Iversen, Ed; Tsai, Ya-Yu; Jenkins, Gregory D.; Goode, Ellen L.; Sellers, Thomas A.

    2011-01-01

    One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating features about a SNP to estimate a latent quality score, with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP features for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking. PMID:21687685

  8. Water Filters

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The Aquaspace H2OME Guardian Water Filter, available through Western Water International, Inc., reduces lead in water supplies. The filter is mounted on the faucet and the filter cartridge is placed in the "dead space" between sink and wall. This filter is one of several new filtration devices using the Aquaspace compound filter media, which combines company developed and NASA technology. Aquaspace filters are used in industrial, commercial, residential, and recreational environments as well as by developing nations where water is highly contaminated.

  9. Molecular Beacon CNT-based Detection of SNPs

    NASA Astrophysics Data System (ADS)

    Egorova, V. P.; Krylova, H. V.; Lipnevich, I. V.; Veligura, A. A.; Shulitsky, B. G.; Y Fedotenkova, L.

    2015-11-01

    An fluorescence quenching effect due to few-walled carbon nanotubes chemically modified by carboxyl groups has been utilized to discriminate Single Nucleotide Polymorphism (SNP). It was shown that the complex obtained from these nanotube and singlestranded primer DNA is formed due to stacking interactions between the hexagons of the nanotubes and aromatic rings of nucleotide bases as well as due to establishing of hydrogen bonds between acceptor amine groups of nucleotide bases and donor carboxyl groups of the nanotubes. It has been demonstrated that these complexes may be used to make highly effective DNA biosensors detecting SNPs which operate as molecular beacons.

  10. Purposeful Variable Selection and Stratification to Impute Missing FAST Data in Trauma Research

    PubMed Central

    Fuchs, Paul A.; del Junco, Deborah J.; Fox, Erin E.; Holcomb, John B.; Rahbar, Mohammad H.; Wade, Charles A.; Alarcon, Louis H.; Brasel, Karen J.; Bulger, Eileen M.; Cohen, Mitchell J.; Myers, John G.; Muskat, Peter; Phelan, Herb A.; Schreiber, Martin A.; Cotton, Bryan A.

    2013-01-01

    Background The Focused Assessment with Sonography for Trauma (FAST) exam is an important variable in many retrospective trauma studies. The purpose of this study was to devise an imputation method to overcome missing data for the FAST exam. Due to variability in patients injuries and trauma care, these data are unlikely to be missing completely at random (MCAR), raising concern for validity when analyses exclude patients with missing values. Methods Imputation was conducted under a less restrictive, more plausible missing at random (MAR) assumption. Patients with missing FAST exams had available data on alternate, clinically relevant elements that were strongly associated with FAST results in complete cases, especially when considered jointly. Subjects with missing data (32.7%) were divided into eight mutually exclusive groups based on selected variables that both described the injury and were associated with missing FAST values. Additional variables were selected within each group to classify missing FAST values as positive or negative, and correct FAST exam classification based on these variables was determined for patients with non-missing FAST values. Results Severe head/neck injury (odds ratio, OR=2.04), severe extremity injury (OR=4.03), severe abdominal injury (OR=1.94), no injury (OR=1.94), other abdominal injury (OR=0.47), other head/neck injury (OR=0.57) and other extremity injury (OR=0.45) groups had significant ORs for missing data; the other group odds ratio was not significant (OR=0.84). All 407 missing FAST values were imputed, with 109 classified as positive. Correct classification of non-missing FAST results using the alternate variables was 87.2%. Conclusions Purposeful imputation for missing FAST exams based on interactions among selected variables assessed by simple stratification may be a useful adjunct to sensitivity analysis in the evaluation of imputation strategies under different missing data mechanisms. This approach has the potential for widespread application in clinical and translational research and validation is warranted. Level of Evidence Level II Prognostic or Epidemiological PMID:23778515

  11. Biological Filters.

    ERIC Educational Resources Information Center

    Klemetson, S. L.

    1978-01-01

    Presents the 1978 literature review of wastewater treatment. The review is concerned with biological filters, and it covers: (1) trickling filters; (2) rotating biological contractors; and (3) miscellaneous reactors. A list of 14 references is also presented. (HM)

  12. Metallic Filters

    NASA Technical Reports Server (NTRS)

    1985-01-01

    Filtration technology originated in a mid 1960's NASA study. The results were distributed to the filter industry, an HR Textron responded, using the study as a departure for the development of 421 Filter Media. The HR system is composed of ultrafine steel fibers metallurgically bonded and compressed so that the pore structure is locked in place. The filters are used to filter polyesters, plastics, to remove hydrocarbon streams, etc. Several major companies use the product in chemical applications, pollution control, etc.

  13. Transcriptome analysis of the gill of Takifugu rubripes using Illumina sequencing for discovery of SNPs.

    PubMed

    Cui, Jun; Wang, Hongdi; Liu, Shikai; Qiu, Xuemei; Jiang, Zhiqiang; Wang, Xiuli

    2014-06-01

    Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies in many species. High-throughput sequencing of RNA was developed primarily to analyze global gene expression, while it is an efficient way to discover SNPs from the expressed genes. In this study, we conducted transcriptome sequencing of the gill samples of Takifugu rubripes analyzed by using Illumina HiSeq 2000 platform to identify gene-associated SNPs from the transcriptome of T. rubripes gill. A total of 27,085,235 unique-mapped-reads from 55,061,524 raw data reads were generated. A total of 56,972 putative SNPs were discovered, which were located in 11,327 genes. 35,839 SNPs were transitions (Ts), 21,074 SNPs were transversions (Tv) and 88.1% of 56,972 SNPs were assigned to the 22 chromosomes. The average minor allele frequency (MAF) of the SNPs was 0.26. GO and KEGG pathway analyses were conducted to analyze the genes containing SNPs. Validation of selected SNPs revealed that 63.4% of SNPs (34/52) were true SNPs. RNA-Seq is a cost-effective way to discover gene-associated SNPs. In this study, a large number of SNPs were identified and these data will be useful resources for population genetic study, evolution analysis, resource assessment, genetic linkage analysis and genome-wide association studies. The results of our study can also offer some useful information as molecular makers to help select and cultivate T. rubripes. PMID:24747987

  14. Lazy collaborative filtering for data sets with missing values.

    PubMed

    Ren, Yongli; Li, Gang; Zhang, Jun; Zhou, Wanlei

    2013-12-01

    As one of the biggest challenges in research on recommender systems, the data sparsity issue is mainly caused by the fact that users tend to rate a small proportion of items from the huge number of available items. This issue becomes even more problematic for the neighborhood-based collaborative filtering (CF) methods, as there are even lower numbers of ratings available in the neighborhood of the query item. In this paper, we aim to address the data sparsity issue in the context of neighborhood-based CF. For a given query (user, item), a set of key ratings is first identified by taking the historical information of both the user and the item into account. Then, an auto-adaptive imputation (AutAI) method is proposed to impute the missing values in the set of key ratings. We present a theoretical analysis to show that the proposed imputation method effectively improves the performance of the conventional neighborhood-based CF methods. The experimental results show that our new method of CF with AutAI outperforms six existing recommendation methods in terms of accuracy. PMID:23757575

  15. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

    PubMed

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J Brent; Durbin, Richard; Timpson, Nicholas J; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants. PMID:26368830

  16. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    PubMed Central

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L.; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Al Turki, Saeed; Amuzu, Antoinette; Anderson, Carl A.; Anney, Richard; Antony, Dinu; Artigas, Mara Soler; Ayub, Muhammad; Bala, Senduran; Barrett, Jeffrey C.; Barroso, Ins; Beales, Phil; Benn, Marianne; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick F.; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Pablo Casas, Juan; Chambers, John C.; Charlton, Ruth; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebahattin; Clapham, Peter; Clement, Gail; Coates, Guy; Cocca, Massimiliano; Collier, David A.; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day, Ian N. M.; Day-Williams, Aaron; Dedoussis, George; Down, Thomas; Du, Yuanping; van Duijn, Cornelia M.; Dunham, Ian; Edkins, Sarah; Ekong, Rosemary; Ellis, Peter; Evans, David M.; Farooqi, I. Sadaf; Fitzpatrick, David R.; Flicek, Paul; Floyd, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Gasparini, Paolo; Gaunt, Tom R.; Geihs, Matthias; Geschwind, Daniel; Greenwood, Celia; Griffin, Heather; Grozeva, Detelina; Guo, Xiaosen; Guo, Xueqin; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey E.; Holmans, Peter; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Iotchkova, Valentina; Isaacs, Aaron; Jackson, David K.; Jamshidi, Yalda; Johnson, Jon; Joyce, Chris; Karczewski, Konrad J.; Kaye, Jane; Keane, Thomas; Kemp, John P.; Kennedy, Karen; Kent, Alastair; Keogh, Julia; Khawaja, Farrah; Kleber, Marcus E.; van Kogelenberg, Margriet; Kolb-Kokocinski, Anja; Kooner, Jaspal S.; Lachance, Genevieve; Langenberg, Claudia; Langford, Cordelia; Lawson, Daniel; Lee, Irene; van Leeuwen, Elisabeth M.; Lek, Monkol; Li, Rui; Li, Yingrui; Liang, Jieqin; Lin, Hong; Liu, Ryan; Lnnqvist, Jouko; Lopes, Luis R.; Lopes, Margarida; Luan, Jian'an; MacArthur, Daniel G.; Mangino, Massimo; Marenne, Galle; Mrz, Winfried; Maslen, John; Matchan, Angela; Mathieson, Iain; McGuffin, Peter; McIntosh, Andrew M.; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Migone, Nicola; Mitchison, Hannah M.; Moayyeri, Alireza; Morris, James; Morris, Richard; Muddyman, Dawn; Muntoni, Francesco; Nordestgaard, Brge G.; Northstone, Kate; O'Donovan, Michael C.; O'Rahilly, Stephen; Onoufriadis, Alexandros; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Payne, Stewart J.; Perry, John R. B.; Pietilainen, Olli; Plagnol, Vincent; Pollitt, Rebecca C.; Povey, Sue; Quail, Michael A.; Quaye, Lydia; Raymond, Lucy; Rehnstrm, Karola; Ridout, Cheryl K.; Ring, Susan; Ritchie, Graham R. S.; Roberts, Nicola; Robinson, Rachel L.; Savage, David B.; Scambler, Peter; Schiffels, Stephan; Schmidts, Miriam; Schoenmakers, Nadia; Scott, Richard H.; Scott, Robert A.; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shaw, Adam; Shihab, Hashem A.; Shin, So-Youn; Skuse, David; Small, Kerrin S.; Smee, Carol; Smith, George Davey; Southam, Lorraine; Spasic-Boskovic, Olivera; Spector, Timothy D.; St Clair, David; St Pourcain, Beate; Stalker, Jim; Stevens, Elizabeth; Sun, Jianping; Surdulescu, Gabriela; Suvisaari, Jaana; Syrris, Petros; Tachmazidou, Ioanna; Taylor, Rohan; Tian, Jing; Tobin, Martin D.; Toniolo, Daniela; Traglia, Michela; Tybjaerg-Hansen, Anne; Valdes, Ana M.; Vandersteen, Anthony M.; Varbo, Anette; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T. R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Eleanor; Whincup, Peter; Whyte, Tamieka; Williams, Hywel J.; Williamson, Kathleen A.; Wilson, Crispian; Wilson, Scott G.; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zaza, Gianluigi; Zeggini, Eleftheria; Zhang, Feng; Zhang, Pingbo; Zhang, Weihua; Gambaro, Giovanni; Richards, J. Brent; Durbin, Richard; Timpson, Nicholas J.; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants. PMID:26368830

  17. Genotyping of 75 SNPs using arrays for individual identification in five population groups.

    PubMed

    Hwa, Hsiao-Lin; Wu, Lawrence Shih Hsin; Lin, Chun-Yen; Huang, Tsun-Ying; Yin, Hsiang-I; Tseng, Li-Hui; Lee, James Chun-I

    2016-01-01

    Single nucleotide polymorphism (SNP) typing offers promise to forensic genetics. Various strategies and panels for analyzing SNP markers for individual identification have been published. However, the best panels with fewer identity SNPs for all major population groups are still under discussion. This study aimed to find more autosomal SNPs with high heterozygosity for individual identification among Asian populations. Ninety-six autosomal SNPs of 502 DNA samples from unrelated individuals of five population groups (208 Taiwanese Han, 83 Filipinos, 62 Thais, 69 Indonesians, and 80 individuals with European, Near Eastern, or South Asian ancestry) were analyzed using arrays in an initial screening, and 75 SNPs (group A, 46 newly selected SNPs; groups B, 29 SNPs based on a previous SNP panel) were selected for further statistical analyses. Some SNPs with high heterozygosity from Asian populations were identified. The combined random match probability of the best 40 and 45 SNPs was between 3.16??10(-17) and 7.75??10(-17) and between 2.33??10(-19) and 7.00??10(-19), respectively, in all five populations. These loci offer comparable power to short tandem repeats (STRs) for routine forensic profiling. In this study, we demonstrated the population genetic characteristics and forensic parameters of 75 SNPs with high heterozygosity from five population groups. This SNPs panel can provide valuable genotypic information and can be helpful in forensic casework for individual identification among these populations. PMID:26297200

  18. Detection of Regulatory SNPs in Human Genome Using ChIP-seq ENCODE Data

    PubMed Central

    Matveeva, Marina Yu.; Shilov, Alexander G.; Kashina, Elena V.; Mordvinov, Viatcheslav A.; Merkulova, Tatyana I.

    2013-01-01

    A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project. PMID:24205329

  19. Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies

    PubMed Central

    2013-01-01

    Summary Background Genome-wide association studies (GWAS) for Parkinson's disease have linked two loci (MAPT and SNCA) to risk of Parkinson's disease. We aimed to identify novel risk loci for Parkinson's disease. Methods We did a meta-analysis of datasets from five Parkinson's disease GWAS from the USA and Europe to identify loci associated with Parkinson's disease (discovery phase). We then did replication analyses of significantly associated loci in an independent sample series. Estimates of population-attributable risk were calculated from estimates from the discovery and replication phases combined, and risk-profile estimates for loci identified in the discovery phase were calculated. Findings The discovery phase consisted of 5333 case and 12-019 control samples, with genotyped and imputed data at 7-689-524 SNPs. The replication phase consisted of 7053 case and 9007 control samples. We identified 11 loci that surpassed the threshold for genome-wide significance (p<510?8). Six were previously identified loci (MAPT, SNCA, HLA-DRB5, BST1, GAK and LRRK2) and five were newly identified loci (ACMSD, STK39, MCCC1/LAMP3, SYT11, and CCDC62/HIP1R). The combined population-attributable risk was 603% (95% CI 437693). In the risk-profile analysis, the odds ratio in the highest quintile of disease risk was 251 (95% CI 223283) compared with 100 in the lowest quintile of disease risk. Interpretation These data provide an insight into the genetics of Parkinson's disease and the molecular cause of the disease and could provide future targets for therapies. Funding Wellcome Trust, National Institute on Aging, and US Department of Defense. PMID:21292315

  20. Chemical derivatization of compact disc polycarbonate surfaces for SNPs detection.

    PubMed

    Bañuls, María-José; García-Piñón, Francisco; Puchades, Rosa; Maquieira, Angel

    2008-03-01

    Compact discs have been proposed as an efficient analytical platform, with potential to develop high-throughput affinity assays for genomics, proteomics, clinics, and health monitoring. Chemical derivatization of CD surfaces is one of the keys to developing highly efficient microarraying-based assays on discs. Approaches for mild chemical modification of polycarbonate (PC) disc surface based on nitration, reduction, and chloromethylation reactions have been developed. Derivatized surfaces as amino and thiol are obtained for PC, maintaining unchanged the mechanical and optical properties of the discs. Studies of covalent attachment of oligonucleotide probes (5' Cy5-labeled, 3' NH 2-ended) on the modified surfaces have been performed to develop microarraying assays based on hybridization of cDNA strands and single nucleotide polymorphism discrimination (SNPs). A demonstration of the applicability to the compact disc audio/video technology for its use as analytical system is performed, including the employment of a commercial CD player to read the results on disc. PMID:18254580

  1. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.

  2. The operating regimes and basic control principles of SNPS Topaz''. [Cs

    SciTech Connect

    Makarov, A.N.; Volberg, M.S.; Grayznov, G.M.; Zhabotinsky, E.E.; Serbin, V.I. )

    1991-01-05

    The basic operating regimes of space nuclear power system (SNPS) Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

  3. dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions

    PubMed Central

    Liu, Xiaoming; Jian, Xueqiu; Boerwinkle, Eric

    2011-01-01

    With the advance of sequencing technologies, whole exome sequencing has increasingly been used to identify mutations that cause human diseases, especially rare Mendelian diseases. Among the analysis steps, functional prediction (of being deleterious) plays an important role in filtering or prioritizing nonsynonymous SNP (NS) for further analysis. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. It has been suggested that investigators should use predictions from multiple algorithms instead of relying on a single one. However, querying predictions from different databases/Web-servers for different algorithms is both tedious and time consuming, especially when dealing with a huge number of NSs identified by exome sequencing. To facilitate the process, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions). It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS in the human genome (a total of 75,931,005). It is the first integrated database of functional predictions from multiple algorithms for the comprehensive collection of human NSs. dbNSFP is freely available for download at http://sites.google.com/site/jpopgen/dbNSFP. Hum Mutat 32:894899, 2011. 2011 Wiley-Liss, Inc. PMID:21520341

  4. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

    PubMed Central

    Paschou, Peristera; Ziv, Elad; Burchard, Esteban G; Choudhry, Shweta; Rodriguez-Cintron, William; Mahoney, Michael W; Drineas, Petros

    2007-01-01

    Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population substructure, stratification assessment in multi-stage whole-genome association studies, and the study of demographic history in human populations. PMID:17892327

  5. Biodiversity of 20 chicken breeds assessed by SNPs located in gene regions.

    PubMed

    Twito, T; Weigend, S; Blum, S; Granevitze, Z; Feldman, M W; Perl-Treves, R; Lavi, U; Hillel, J

    2007-01-01

    Twenty-five single nucleotide polymorphisms (SNPs) were analyzed in 20 distinct chicken breeds. The SNPs, each located in a different gene and mostly on different chromosomes, were chosen to examine the use of SNPs in or close to genes (g-SNPs), for biodiversity studies. Phylogenetic trees were constructed from these data. When bootstrap values were used as a criterion for the tree repeatability, doubling the number of SNPs from 12 to 25 improved tree repeatability more than doubling the number of individuals per population, from five to ten. Clustering results of these 20 populations, based on the software STRUCTURE, are in agreement with those previously obtained from the analysis of microsatellites. When the number of clusters was similar to the number of populations, affiliation of birds to their original populations was correct (>95%) only when at least the 22 most polymorphic SNP loci (out of 25) were included. When ten populations were clustered into five groups based on STRUCTURE, we used membership coefficient (Q) of the major cluster at each population as an indicator for clustering success level. This value was used to compare between three marker types; microsatellites, SNPs in or close to genes (g-SNPs) and SNPs in random fragments (r-SNPs). In this comparison, the same individuals were used (five to ten birds per population) and the same number of loci (14) used for each of the marker types. The average membership coefficients (Q) of the major cluster for microsatellites, g-SNPs and r-SNPs were 0.85, 0.7, and 0.64, respectively. Analysis based on microsatellites resulted in significantly higher clustering success due to their multi-allelic nature. Nevertheless, SNPs have obvious advantages, and are an efficient and cost-effective genetic tool, providing broader genome coverage and reliable estimates of genetic relatedness. PMID:17675874

  6. Filtering apparatus

    DOEpatents

    Haldipur, G.B.; Dilmore, W.J.

    1992-09-01

    A vertical vessel is described having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas. 18 figs.

  7. Filtering apparatus

    DOEpatents

    Haldipur, Gaurang B. (Monroeville, PA); Dilmore, William J. (Murrysville, PA)

    1992-01-01

    A vertical vessel having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas.

  8. Cytokine SNPs: Comparison of Allele Frequencies by Race & Implications for Future Studies

    PubMed Central

    Van Dyke, Alison L.; Cote, Michele L.; Wenzlaff, Angie S.; Land, Susan; Schwartz, Ann G.

    2009-01-01

    The role of inflammation is being considered in chronic diseases. Previous studies have examined SNPs in a few key inflammatory genes and have included small numbers of African American participants. Variation in the frequencies of inflammatory pathway SNPs may help to explain racial disparities in disease risk. Through a population-based study of 103 African American and 380 Caucasian unrelated, healthy women, we examined the relationships between race and allele frequencies of 70 cytokine and cytokine receptor SNPs. The associations between genotypic and haplotype frequencies and race were also analyzed. Allelic frequencies for 52 out of the 70 SNPs meeting criteria for analysis differed significantly by race. Of the 32 pro-inflammatory and 20 anti-inflammatory SNPs for which the allele frequencies varied significantly by race, variant allele frequency differences between Caucasians and African Americans ranged between 6%37% and 7%53% for pro-inflammatory SNPs and anti-inflammatory SNPs, respectively. Our findings suggest that while allele frequencies do vary by race, racial groups are not simplistically represented by a pro-inflammatory or anti-inflammatory genetic profile. Given the racial variability in allele frequencies in inflammatory gene SNPs, studies examining the association between these SNPs and disease should at least incorporate self-reported race in their analyses. PMID:19356949

  9. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    PubMed Central

    Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

    2008-01-01

    Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity, while neighbour-based methods (KNN, OLS, LSA, LLS) performed better in data with higher complexity. We also found that the EBS and STS schemes serve as complementary and effective tools for selecting the optimal imputation algorithm. PMID:18186917

  10. A multiple imputation approach for clustered interval-censored survival data.

    PubMed

    Lam, K F; Xu, Ying; Cheung, Tak-Lun

    2010-03-15

    Multivariate interval-censored failure time data arise commonly in many studies of epidemiology and biomedicine. Analysis of these type of data is more challenging than the right-censored data. We propose a simple multiple imputation strategy to recover the order of occurrences based on the interval-censored event times using a conditional predictive distribution function derived from a parametric gamma random effects model. By imputing the interval-censored failure times, the estimation of the regression and dependence parameters in the context of a gamma frailty proportional hazards model using the well-developed EM algorithm is made possible. A robust estimator for the covariance matrix is suggested to adjust for the possible misspecification of the parametric baseline hazard function. The finite sample properties of the proposed method are investigated via simulation. The performance of the proposed method is highly satisfactory, whereas the computation burden is minimal. The proposed method is also applied to the diabetic retinopathy study (DRS) data for illustration purpose and the estimates are compared with those based on other existing methods for bivariate grouped survival data. PMID:20069624

  11. Multiple imputation of missing covariates in NONMEM and evaluation of the method's sensitivity to ?-shrinkage.

    PubMed

    Johansson, sa M; Karlsson, Mats O

    2013-10-01

    Multiple imputation (MI) is an approach widely used in statistical analysis of incomplete data. However, its application to missing data problems in nonlinear mixed-effects modelling is limited. The objective was to implement a four-step MI method for handling missing covariate data in NONMEM and to evaluate the method's sensitivity to ?-shrinkage. Four steps were needed; (1) estimation of empirical Bayes estimates (EBEs) using a base model without the partly missing covariate, (2) a regression model for the covariate values given the EBEs from subjects with covariate information, (3) imputation of covariates using the regression model and (4) estimation of the population model. Steps (3) and (4) were repeated several times. The procedure was automated in PsN and is now available as the mimp functionality ( http://psn.sourceforge.net/ ). The method's sensitivity to shrinkage in EBEs was evaluated in a simulation study where the covariate was missing according to a missing at random type of missing data mechanism. The ?-shrinkage was increased in steps from 4.5 to 54%. Two hundred datasets were simulated and analysed for each scenario. When shrinkage was low the MI method gave unbiased and precise estimates of all population parameters. With increased shrinkage the estimates became less precise but remained unbiased. PMID:23868748

  12. Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors

    NASA Astrophysics Data System (ADS)

    Mardani, Morteza; Mateos, Gonzalo; Giannakis, Georgios B.

    2015-05-01

    Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with `Big Data' analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for {\\em real-time} processing of streaming data pose major challenges to this end. In this context, the present paper permeates benefits from rank minimization to scalable imputation of missing data, via tracking low-dimensional subspaces and unraveling latent (possibly multi-way) structure from \\emph{incomplete streaming} data. For low-rank matrix data, a subspace estimator is proposed based on an exponentially-weighted least-squares criterion regularized with the nuclear norm. After recasting the non-separable nuclear norm into a form amenable to online optimization, real-time algorithms with complementary strengths are developed and their convergence is established under simplifying technical assumptions. In a stationary setting, the asymptotic estimates obtained offer the well-documented performance guarantees of the {\\em batch} nuclear-norm regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm is developed to obtain multi-way decompositions of \\emph{low-rank tensors} with missing entries, and perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed algorithms, and their superior performance relative to state-of-the-art alternatives.

  13. Gender Imputation

    ERIC Educational Resources Information Center

    National Student Clearinghouse, 2013

    2013-01-01

    In late 2007, the National Student Clearinghouse (NSC) expanded its Enrollment Reporting service to include several additional data elements (commonly referred to as the "A2" or "expanded" data elements). One of these expanded data elements is student gender. Although gender is potentially important to a number of research

  14. High quality SNPs/Indels mining and characterization in ginger from ESTs data base.

    PubMed

    Gaur, Mahendra; Das, Aradhana; Subudhi, Enketeswara

    2015-01-01

    Ginger (Zingiber officinale Rosc.) is an important herb of the family Zingiberaceae. It is accepted as a universal cure for a multitude of diseases in Indian systems of medicine and its rhizomes are equally popular as a spice ingredient throughout Asia. SNPs, the definitive genetic markers, representing the finest resolution of a DNA sequence, are abundantly found in populations having a lower rate of mutation and are used for genomic analysis. The public ESTs sequences mostly lack quality files, making high quality SNPs detection more difficult since it is exclusively based on sequence comparisons. In the present study, current dbESTs of NCBI was mined and 38115 ginger ESTs sequences were obtained and assembled into contigs using CAP3 program. In this analysis, recent software tool QualitySNP was used to detect 11523 potential SNPs sites, 8810 high quality SNPs and 1008 indels polymorphisms with a frequency of 1.61 SNPs / 10 kbp. Of ESTs libraries generated from three ginger tissues together, rhizomes had a frequency of 0.32 SNPs and 0.03 indels per 10 kbp whereas the leaves had a frequency of 2.51 SNPs and 0.23 indels per 10 kbp and root is showing relative frequency of 0.76/10 kbp SNPs and 0.02/10 kbp indels. The present analysis provides additional information about the tissue wise presence of haplotypes (222), distribution of high quality exonic (2355) and intronic (6455) SNPs and information about singletons (7538) in addition to contigs transitions and transversions ratio (0.57). Among all tissue detected SNPs, transversions number is higher in comparison to the number of transitions. Quality SNPs detected in this work can be used as markers for further ginger genetic experiments. PMID:25848168

  15. High quality SNPs/Indels mining and characterization in ginger from ESTs data base

    PubMed Central

    Gaur, Mahendra; Das, Aradhana; Subudhi, Enketeswara

    2015-01-01

    Ginger (Zingiber officinale Rosc.) is an important herb of the family Zingiberaceae. It is accepted as a universal cure for a multitude of diseases in Indian systems of medicine and its rhizomes are equally popular as a spice ingredient throughout Asia. SNPs, the definitive genetic markers, representing the finest resolution of a DNA sequence, are abundantly found in populations having a lower rate of mutation and are used for genomic analysis. The public ESTs sequences mostly lack quality files, making high quality SNPs detection more difficult since it is exclusively based on sequence comparisons. In the present study, current dbESTs of NCBI was mined and 38115 ginger ESTs sequences were obtained and assembled into contigs using CAP3 program. In this analysis, recent software tool QualitySNP was used to detect 11523 potential SNPs sites, 8810 high quality SNPs and 1008 indels polymorphisms with a frequency of 1.61 SNPs / 10 kbp. Of ESTs libraries generated from three ginger tissues together, rhizomes had a frequency of 0.32 SNPs and 0.03 indels per 10 kbp whereas the leaves had a frequency of 2.51 SNPs and 0.23 indels per 10 kbp and root is showing relative frequency of 0.76/10 kbp SNPs and 0.02/10 kbp indels. The present analysis provides additional information about the tissue wise presence of haplotypes (222), distribution of high quality exonic (2355) and intronic (6455) SNPs and information about singletons (7538) in addition to contigs transitions and transversions ratio (0.57). Among all tissue detected SNPs, transversions number is higher in comparison to the number of transitions. Quality SNPs detected in this work can be used as markers for further ginger genetic experiments. PMID:25848168

  16. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

    PubMed Central

    2012-01-01

    Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project. PMID:23110314

  17. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    PubMed

    Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ? MAF <5%) and rare variants (<1%)) can enhance previously identified associations and identify novel loci, we selected 93 quantitative circulating factors where data was available from the InCHIANTI population study. These phenotypes included cytokines, binding proteins, hormones, vitamins and ions. We selected these phenotypes because many have known strong genetic associations and are potentially important to help understand disease processes. We performed a genome-wide scan for these 93 phenotypes in InCHIANTI. We identified 21 signals and 33 signals that reached P<510(-8) based on HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P<510(-11) respectively. Imputation of 1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P<510(-8) in both analyses (17 of which represent well replicated signals in the NHGRI catalogue), six were captured by the same index SNP, five were nominally more strongly associated in 1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF?=?0.007) and alpha1-antitrypsin that predisposes to emphysema (P?=?2.510(-12)). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations. PMID:23696881

  18. PATHOTYPING OF SALMONELLA ENTERICA BY ANALYSIS OF SNPS IN CYAA AND FLANKING 23S RIBOSOMAL SEQUENCES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The egg-contaminating phenotype of Salmonella enterica serotype Enteritidis was linked to single-nucleotide polymorphisms (SNPs) occurring in cyaA, which encodes adenylate cyclase that produces cAMP and pyrophosphate from ATP. Ribotyping indicated that SNPs in cyaA were linked to polymorphisms occur...

  19. 7 CFR 3017.630 - May the Department of Agriculture impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false May the Department of Agriculture impute conduct of one person to another? 3017.630 Section 3017.630 Agriculture Regulations of the Department of Agriculture (Continued) OFFICE OF THE CHIEF FINANCIAL OFFICER, DEPARTMENT OF AGRICULTURE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT)...

  20. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions 1471.630 May the Federal Mediation...

  1. Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...

  2. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... occurred in connection with a partnership, joint venture, joint application, association or similar... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the African Development Foundation impute conduct of one person to another? 1508.630 Section 1508.630 Foreign Relations AFRICAN...

  3. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.

    PubMed

    Yang, Jian; Bakshi, Andrew; Zhu, Zhihong; Hemani, Gibran; Vinkhuyzen, Anna A E; Lee, Sang Hong; Robinson, Matthew R; Perry, John R B; Nolte, Ilja M; van Vliet-Ostaptchouk, Jana V; Snieder, Harold; Esko, Tonu; Milani, Lili; Mgi, Reedik; Metspalu, Andres; Hamsten, Anders; Magnusson, Patrik K E; Pedersen, Nancy L; Ingelsson, Erik; Soranzo, Nicole; Keller, Matthew C; Wray, Naomi R; Goddard, Michael E; Visscher, Peter M

    2015-10-01

    We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ?97% and ?68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ?17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices. PMID:26323059

  4. Cautions on the Use of Multiple Imputation When Selecting Between Latent Categorical versus Continuous Models for Psychological Constructs.

    PubMed

    Sterba, Sonya K

    2014-12-01

    Clinical psychology researchers studying adolescents and young adults long have been interested in characterizing the latent categorical (classes/profiles) versus continuous (factors) nature of psychological syndromes. To inform this debate, researchers sometimes compare the fit of finite mixture versus factor analysis models to symptom data. This study explains and evaluates how missing data handling methods can impact results of this important model fit comparison. Via simulation, we assess three missing data-handling methods previously recommended to researchers fitting these models: multiple imputation using a saturated multivariate normal imputation model, multiple imputation using a hypothesized model, or full information maximum likelihood using the EM algorithm (FIML-EM). Results show that, under certain conditions, the method used to handle missing data can interfere with clinical psychologists' ability to accurately discriminate latent classes from continua. For instance, certain imputation methods increase the chance of selecting latent continua when latent classes truly exist. FIML-EM performed best overall. Recommendations for practice are discussed. PMID:25491166

  5. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    PubMed Central

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  6. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...

  7. Thermal state of SNPS Topaz'' units: Calculation basing and experimental confirmation

    SciTech Connect

    Bogush, I.P.; Bushinsky, A.V.; Galkin, A.Y.; Serbin, V.I.; Zhabotinsky, E.E. )

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS Topaz''.

  8. RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera).

    PubMed

    Yang, Mei; Xu, Liming; Liu, Yanling; Yang, Pingfang

    2015-01-01

    RNA-Seq is an efficient way to comprehensively identify single nucleotide polymorphisms (SNPs) and alternative splicing (AS) events from the expressed genes. In this study, we conducted transcriptome sequencing of four Asian lotus (Nelumbo nucifera) cultivars using Illumina HiSeq2000 platform to identify SNPs and AS events in lotus. A total of 505 million pair-end RNA-Seq reads were generated from four cultivars, of which 86% were mapped to the lotus reference genome. Using the four sets of data together, a total of 357,689 putative SNPs were identified with an average density of one SNP per 2.2 kb. These SNPs were located in 1,253 scaffolds and 15,016 expressed genes. A/G and C/T were the two major types of SNPs in the Asian lotus transcriptome. In parallel, a total of 177,540 AS events were detected in the four cultivars and were distributed in 64% of the expressed genes of lotus. The predominant type of AS events was alternative 5' first exon, which accounted for 41.2% of all the observed AS events, and exon skipping only accounted for 4.3% of all AS. Gene Ontology analysis was conducted to analyze the function of the genes containing SNPs and AS events. Validation of selected SNPs and AS events revealed that 74% of SNPs and 80% of AS events were reliable, which indicates that RNA-Seq is an efficient approach to uncover gene-associated SNPs and AS events. A large number of SNPs and AS events identified in our study will facilitate further genetic and functional genomics research in lotus. PMID:25928215

  9. RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera)

    PubMed Central

    Yang, Mei; Xu, Liming; Liu, Yanling; Yang, Pingfang

    2015-01-01

    RNA-Seq is an efficient way to comprehensively identify single nucleotide polymorphisms (SNPs) and alternative splicing (AS) events from the expressed genes. In this study, we conducted transcriptome sequencing of four Asian lotus (Nelumbo nucifera) cultivars using Illumina HiSeq2000 platform to identify SNPs and AS events in lotus. A total of 505 million pair-end RNA-Seq reads were generated from four cultivars, of which 86% were mapped to the lotus reference genome. Using the four sets of data together, a total of 357,689 putative SNPs were identified with an average density of one SNP per 2.2 kb. These SNPs were located in 1,253 scaffolds and 15,016 expressed genes. A/G and C/T were the two major types of SNPs in the Asian lotus transcriptome. In parallel, a total of 177,540 AS events were detected in the four cultivars and were distributed in 64% of the expressed genes of lotus. The predominant type of AS events was alternative 5’ first exon, which accounted for 41.2% of all the observed AS events, and exon skipping only accounted for 4.3% of all AS. Gene Ontology analysis was conducted to analyze the function of the genes containing SNPs and AS events. Validation of selected SNPs and AS events revealed that 74% of SNPs and 80% of AS events were reliable, which indicates that RNA-Seq is an efficient approach to uncover gene-associated SNPs and AS events. A large number of SNPs and AS events identified in our study will facilitate further genetic and functional genomics research in lotus. PMID:25928215

  10. Accounting for uncertainty due to 'last observation carried forward' outcome imputation in a meta-analysis model.

    PubMed

    Dimitrakopoulou, Vasiliki; Efthimiou, Orestis; Leucht, Stefan; Salanti, Georgia

    2015-02-28

    Missing outcome data are a problem commonly observed in randomized control trials that occurs as a result of participants leaving the study before its end. Missing such important information can bias the study estimates of the relative treatment effect and consequently affect the meta-analytic results. Therefore, methods on manipulating data sets with missing participants, with regard to incorporating the missing information in the analysis so as to avoid the loss of power and minimize the bias, are of interest. We propose a meta-analytic model that accounts for possible error in the effect sizes estimated in studies with last observation carried forward (LOCF) imputed patients. Assuming a dichotomous outcome, we decompose the probability of a successful unobserved outcome taking into account the sensitivity and specificity of the LOCF imputation process for the missing participants. We fit the proposed model within a Bayesian framework, exploring different prior formulations for sensitivity and specificity. We illustrate our methods by performing a meta-analysis of five studies comparing the efficacy of amisulpride versus conventional drugs (flupenthixol and haloperidol) on patients diagnosed with schizophrenia. Our meta-analytic models yield estimates similar to meta-analysis with LOCF-imputed patients. Allowing for uncertainty in the imputation process, precision is decreased depending on the priors used for sensitivity and specificity. Results on the significance of amisulpride versus conventional drugs differ between the standard LOCF approach and our model depending on prior beliefs on the imputation process. Our method can be regarded as a useful sensitivity analysis that can be used in the presence of concerns about the LOCF process. PMID:25492741

  11. High-throughput SNPs for all: genotyping-in-thousands.

    PubMed

    Pavey, Scott A

    2015-07-01

    Understanding the genetic structure of species is essential for conservation. It is only with this information that managers, academics, user groups and land-use planners can understand the spatial scale of migration and local adaptation, source-sink dynamics and effective population size. Such information is essential for a multitude of applications including delineating management units, balancing management priorities, discovering cryptic species and implementing captive breeding programmes. Species can range from locally adapted by hundreds of metres (Pavey et al. ) to complete species panmixia (Côté et al. ). Even more remarkable is that this essential information can be obtained without fully sequenced or annotated genomes, but from mere (putatively) nonfunctional variants. First with allozymes, then microsatellites and now SNPs, this neutral genetic variation carries a wealth of information about migration and drift. For many of us, it may be somewhat difficult to remember our understanding of species conservation before the widespread usage of these useful tools. However most species on earth have yet to give us that 'peek under the curtain'. With the current diversity on earth estimated to be nearly 9 million species (Mora et al. ), we have a long way to go for a comprehensive meta-phylogeographic understanding. A method presented in this issue by Campbell and colleagues (Campbell et al. ) is a tool that will accelerate the pace in this area. Genotyping-in-thousands (GT-seq) leverages recent advancements in sequencing technology to save many hours and dollars over previous methods to generate this important neutral genetic information. PMID:26095005

  12. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    PubMed Central

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  13. MULTIPLE IMPUTATION FOR SHARING PRECISE GEOGRAPHIES IN PUBLIC USE DATA1

    PubMed Central

    Wang, Hao; Reiter, Jerome P.

    2013-01-01

    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data. PMID:23990852

  14. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    PubMed Central

    Artigas, María Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria; Viñuela, Ana; Völzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10−8) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  15. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation.

    PubMed

    Soler Artigas, María; Wain, Louise V; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R; Grallert, Harald; Hammond, Chris J; Harris, Sarah E; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W; Navarro, Pau; Nickle, David C; Padmanabhan, Sandosh; Raitakari, Olli T; Ried, Janina S; Ripatti, Samuli; Schulz, Holger; Scott, Robert A; Sin, Don D; Starr, John M; Viñuela, Ana; Völzke, Henry; Wild, Sarah H; Wright, Alan F; Zemunik, Tatijana; Jarvis, Deborah L; Spector, Tim D; Evans, David M; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J; Karrasch, Stefan; Probst-Hensch, Nicole M; Heinrich, Joachim; Stubbe, Beate; Wilson, James F; Wareham, Nicholas J; James, Alan L; Morris, Andrew P; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P; Hall, Ian P; Tobin, Martin D

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  16. Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data

    PubMed Central

    Welch, Catherine A; Petersen, Irene; Bartlett, Jonathan W; White, Ian R; Marston, Louise; Morris, Richard W; Nazareth, Irwin; Walters, Kate; Carpenter, James

    2014-01-01

    Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One approach is to divide data into time blocks and implement MI independently at each block. An alternative approach is to include all time blocks in the same MI model. With increasing numbers of time blocks, this approach is likely to break down because of co-linearity and over-fitting. The new two-fold fully conditional specification (FCS) MI algorithm addresses these issues, by only conditioning on measurements, which are local in time. We describe and report the results of a novel simulation study to critically evaluate the two-fold FCS algorithm and its suitability for imputation of longitudinal electronic health records. After generating a full data set, approximately 70% of selected continuous and categorical variables were made missing completely at random in each of ten time blocks. Subsequently, we applied a simple time-to-event model. We compared efficiency of estimated coefficients from a complete records analysis, MI of data in the baseline time block and the two-fold FCS algorithm. The results show that the two-fold FCS algorithm maximises the use of data available, with the gain relative to baseline MI depending on the strength of correlations within and between variables. Using this approach also increases plausibility of the missing at random assumption by using repeated measures over time of variables whose baseline values may be missing. PMID:24782349

  17. Analysis of partially observed clustered data using generalized estimating equations and multiple imputation

    PubMed Central

    Aloisio, Kathryn M.; Swanson, Sonja A.; Micali, Nadia; Field, Alison; Horton, Nicholas J.

    2015-01-01

    Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple–source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject’s symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non–monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991–92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness. PMID:25642154

  18. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    SciTech Connect

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.; Loots, Gabriela G.; Houston, Kathryn A.; Dubchak, Inna; Speed, Terence P.; Rubin, Edward M.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs in gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.

  19. Screening and Evaluation of Deleterious SNPs in APOE Gene of Alzheimer's Disease

    PubMed Central

    Masoodi, Tariq Ahmad; Al Shammari, Sulaiman A.; Al-Muammar, May N.; Alhamdan, Adel A.

    2012-01-01

    Introduction. Apolipoprotein E (APOE) is an important risk factor for Alzheimer's disease (AD) and is present in 30–50% of patients who develop late-onset AD. Several single-nucleotide polymorphisms (SNPs) are present in APOE gene which act as the biomarkers for exploring the genetic basis of this disease. The objective of this study is to identify deleterious nsSNPs associated with APOE gene. Methods. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nonsynonymous (ns) SNPs and their effect on protein was predicted by PolyPhen and SIFT, respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the APOE protein was evaluated by using Swiss PDB viewer and NOMAD-Ref server. Results. Six nsSNPs were found to be least stable by I-Mutant 2.0 with DDG value of >−1.0. Four nsSNPs showed a highly deleterious tolerance index score of 0.00. Nine nsSNPs were found to be probably damaging with position-specific independent counts (PSICs) score of ≥2.0. Seven nsSNPs were found to be highly polymorphic with a risk score of 3-4. The total energies and root-mean-square deviation (RMSD) values were higher for three mutant-type structures compared to the native modeled structure. Conclusion. We concluded that three nsSNPs, namely, rs11542041, rs11542040, and rs11542034, to be potentially functional polymorphic. PMID:22530123

  20. Screening and Evaluation of Deleterious SNPs in APOE Gene of Alzheimer's Disease.

    PubMed

    Masoodi, Tariq Ahmad; Al Shammari, Sulaiman A; Al-Muammar, May N; Alhamdan, Adel A

    2012-01-01

    Introduction. Apolipoprotein E (APOE) is an important risk factor for Alzheimer's disease (AD) and is present in 30-50% of patients who develop late-onset AD. Several single-nucleotide polymorphisms (SNPs) are present in APOE gene which act as the biomarkers for exploring the genetic basis of this disease. The objective of this study is to identify deleterious nsSNPs associated with APOE gene. Methods. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nonsynonymous (ns) SNPs and their effect on protein was predicted by PolyPhen and SIFT, respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the APOE protein was evaluated by using Swiss PDB viewer and NOMAD-Ref server. Results. Six nsSNPs were found to be least stable by I-Mutant 2.0 with DDG value of >-1.0. Four nsSNPs showed a highly deleterious tolerance index score of 0.00. Nine nsSNPs were found to be probably damaging with position-specific independent counts (PSICs) score of ≥2.0. Seven nsSNPs were found to be highly polymorphic with a risk score of 3-4. The total energies and root-mean-square deviation (RMSD) values were higher for three mutant-type structures compared to the native modeled structure. Conclusion. We concluded that three nsSNPs, namely, rs11542041, rs11542040, and rs11542034, to be potentially functional polymorphic. PMID:22530123

  1. Imputation of the Rare HOXB13 G84E Mutation and Cancer Risk in a Large Population-Based Cohort

    PubMed Central

    Hoffmann, Thomas J.; Sakoda, Lori C.; Shen, Ling; Jorgenson, Eric; Habel, Laurel A.; Liu, Jinghua; Kvale, Mark N.; Asgari, Maryam M.; Banda, Yambazi; Corley, Douglas; Kushi, Lawrence H.; Quesenberry, Charles P.; Schaefer, Catherine; Van Den Eeden, Stephen K.; Risch, Neil; Witte, John S.

    2015-01-01

    An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37−0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4×10−12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8×10−4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects. PMID:25629170

  2. Water Filters

    NASA Technical Reports Server (NTRS)

    1988-01-01

    Seeking to find a more effective method of filtering potable water that was highly contaminated, Mike Pedersen, founder of Western Water International, learned that NASA had conducted extensive research in methods of purifying water on board manned spacecraft. The key is Aquaspace Compound, a proprietary WWI formula that scientifically blends various types of glandular activated charcoal with other active and inert ingredients. Aquaspace systems remove some substances; chlorine, by atomic adsorption, other types of organic chemicals by mechanical filtration and still others by catalytic reaction. Aquaspace filters are finding wide acceptance in industrial, commercial, residential and recreational applications in the U.S. and abroad.

  3. Sigma Filter

    NASA Technical Reports Server (NTRS)

    Balgovind, R. C.

    1985-01-01

    The GLA Fourth-Order model is needed to smooth the topography. This is to remove the Gibbs phenomenon. The Gibbs phenomenon occurs whenever we truncate a Fourier Series. The Sigma factors were introduced to reduce the Gibbs phenomenon. It is found that the smooth Fourier series is nothing but the original Fourier series with its coefficients multiplied by corresponding sigma factors. This operator can be applied many times to obtain high order sigma filtered field and is easily applicable using FFT. It is found that this filter is beneficial in deriving the topography.

  4. Phosphorus Filter

    USGS Multimedia Gallery

    Tom Kehler, fishery biologist at the U.S. Fish and Wildlife Service's Northeast Fishery Center in Lamar, Pennsylvania, checks the flow rate of water leaving a phosphorus filter column. The USGS has pioneered a new use for acid mine drainage residuals that are currently a disposal challenge, usi...

  5. SNP-Seek database of SNPs derived from 3000 rice genomes.

    PubMed

    Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L

    2015-01-01

    We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

  6. A genomewide comparison of population structure at STRPs and nearby SNPs in humans.

    PubMed

    Payseur, Bret A; Jing, Peicheng

    2009-06-01

    Patterns of population structure provide insights into evolutionary processes and help identify groups of individuals for genotype-phenotype association studies. With increasing availability of polymorphic molecular markers across genomes, the examination of population structure using large numbers of unlinked loci has become a common practice in evolutionary biology and human genetics. The two classes of molecular variation most widely used for this purpose, short tandem repeat polymorphisms (STRPs) and single-nucleotide polymorphisms (SNPs), differ in mutational properties expected to affect population structure. To measure the relative ability of these loci to describe population structure, we compared diversity at neighboring STRPs and SNPs from 720 genomic regions in the four populations that comprise the Human HapMap. Comparing loci from the same genomic regions allowed us to focus on the contribution of mutational differences (rather than variation in genealogical history) to disparities in population structure between STRPs and SNPs. Relative to average values for SNPs from the same regions, STRPs had lower F(st), but higher G(st)' and I(n) values. STRP-SNP correlations in population structure across genomic regions were statistically significant but weak in magnitude. Separate analyses by repeat type showed that these correlations were driven primarily by tetranucleotide and trinucleotide STRPs; measures of population structure at dinucleotides and SNPs were not significantly correlated. Pairwise comparisons among populations revealed effects of divergence time on differences in population structure between STRPs and SNPs. Collectively, these results confirm that individual STRPs can provide more information about population structure than individual SNPs, but suggest that the difference in structure at STRPs and SNPs depends on local genealogical history. Our study motivates theoretical comparisons of population structure at loci with different mutational properties. PMID:19289600

  7. SNPRanker: a tool for identification and scoring of SNPs associated to target genes.

    PubMed

    Calabria, Andrea; Mosca, Ettore; Viti, Federica; Merelli, Ivan; Milanesi, Luciano

    2010-01-01

    The identification of genes and SNPs involved in human diseases remains a challenge. Many public resources, databases and applications, collect biological data and perform annotations, increasing the global biological knowledge. The need of SNPs prioritization is emerging with the development of new high-throughput genotyping technologies, which allow to develop customized disease-oriented chips. Therefore, given a list of genes related to a specific biological process or disease as input, a crucial issue is finding the most relevant SNPs to analyse. The selection of these SNPs may rely on the relevant a-priori knowledge of biomolecular features characterising all the annotated SNPs and genes of the provided list. The bioinformatics approach described here allows to retrieve a ranked list of significant SNPs from a set of input genes, such as candidate genes associated with a specific disease. The system enriches the genes set by including other genes, associated to the original ones by ontological similarity evaluation. The proposed method relies on the integration of data from public resources in a vertical perspective (from genomics to systems biology data), the evaluation of features from biomolecular knowledge, the computation of partial scores for SNPs and finally their ranking, relying on their global score. Our approach has been implemented into a web based tool called SNPRanker, which is accessible through at the URL http://www.itb.cnr.it/snpranker . An interesting application of the presented system is the prioritisation of SNPs related to genes involved in specific pathologies, in order to produce custom arrays. PMID:20375450

  8. Identification of common carp (Cyprinus carpio) microRNAs and microRNA-related SNPs

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) exist pervasively across viruses, plants and animals and play important roles in the post-transcriptional regulation of genes. In the common carp, miRNA targets have not been investigated. In model species, single-nucleotide polymorphisms (SNPs) have been reported to impair or enhance miRNA regulation as well as to alter miRNA biogenesis. SNPs are often associated with diseases or traits. To date, no studies into the effects of SNPs on miRNA biogenesis and regulation in the common carp have been reported. Results Using homology-based prediction combined with small RNA sequencing, we have identified 113 common carp mature miRNAs, including 92 conserved miRNAs and 21 common carp specific miRNAs. The conserved miRNAs had significantly higher expression levels than the specific miRNAs. The miRNAs were clustered into three phylogenetic groups. Totally 394 potential miRNA binding sites in 206 target mRNAs were predicted for 83 miRNAs. We identified 13 SNPs in the miRNA precursors. Among them, nine SNPs had the potential to either increase or decrease the energy of the predicted secondary structures of the precursors. Further, two SNPs in the 3 untranslated regions of target genes were predicted to either disturb or create miRNA-target interactions. Conclusions The common carp miRNAs and their target genes reported here will help further our understanding of the role of miRNAs in gene regulation. The analysis of the miRNA-related SNPs and their effects provided insights into the effects of SNPs on miRNA biogenesis and function. The resource data generated in this study will help advance the study of miRNA function and phenotype-associated miRNA identification. PMID:22908890

  9. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

    PubMed

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

    2013-04-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  10. All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

    PubMed Central

    Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR?=?1?FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  11. Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs.

    PubMed

    Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

    2014-01-01

    Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40-60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor-SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100

  12. Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs

    PubMed Central

    Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

    2014-01-01

    Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100

  13. Computational Characterization of Osteoporosis Associated SNPs and Genes Identified by Genome-Wide Association Studies

    PubMed Central

    Wang, Ya; Wu, Guiju; Chen, Jie; Ye, Weiyuan; Yang, Jiancai; Huang, Qingyang

    2016-01-01

    Objectives Genome-wide association studies (GWASs) have revealed many SNPs and genes associated with osteoporosis. However, influence of these SNPs and genes on the predisposition to osteoporosis is not fully understood. We aimed to identify osteoporosis GWASs-associated SNPs potentially influencing the binding affinity of transcription factors and miRNAs, and reveal enrichment signaling pathway and “hub” genes of osteoporosis GWAS-associated genes. Methods We conducted multiple computational analyses to explore function and mechanisms of osteoporosis GWAS-associated SNPs and genes, including SNP conservation analysis and functional annotation (influence of SNPs on transcription factors and miRNA binding), gene ontology analysis, pathway analysis and protein-protein interaction analysis. Results Our results suggested that a number of SNPs potentially influence the binding affinity of transcription factors (NFATC2, MEF2C, SOX9, RUNX2, ESR2, FOXA1 and STAT3) and miRNAs. Osteoporosis GWASs-associated genes showed enrichment of Wnt signaling pathway, basal cell carcinoma and Hedgehog signaling pathway. Highly interconnected “hub” genes revealed by interaction network analysis are RUNX2, SP7, TNFRSF11B, LRP5, DKK1, ESR1 and SOST. Conclusions Our results provided the targets for further experimental assessment and further insight on osteoporosis pathophysiology. PMID:26930606

  14. Evaluating information content of SNPs for sample-tagging in re-sequencing projects.

    PubMed

    Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F

    2015-01-01

    Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness. PMID:25975447

  15. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon.

    PubMed

    Tsai, Hsin Y; Hamilton, Alastair; Guy, Derrick R; Tinch, Alan E; Bishop, Steve C; Houston, Ross D

    2015-01-01

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth. PMID:26703584

  16. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon

    PubMed Central

    Tsai, Hsin Y.; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Bishop, Steve C.; Houston, Ross D.

    2015-01-01

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth. PMID:26703584

  17. Mining for SNPs and SSRs using SNPServer, dbSNP and SSR taxonomy tree.

    PubMed

    Batley, Jacqueline; Edwards, David

    2009-01-01

    Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequence data sets permits mining for these molecular markers, which may then be used for applications such as genetic trait mapping, diversity analysis and marker assisted selection in agriculture. Here we describe web-based automated methods for the discovery of SSRs using SSR taxonomy tree, the discovery of SNPs from sequence data using SNPServer and the identification of validated SNPs from within the dbSNP database. SSR taxonomy tree identifies pre-determined SSR amplification primers for virtually all species represented within the GenBank database. SNPServer uses a redundancy based approach to identify SNPs within DNA sequences. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms. The NCBI dbSNP database is a catalogue of molecular variation, hosting validated SNPs for several species within a public-domain archive. PMID:19378151

  18. Evaluating information content of SNPs for sample-tagging in re-sequencing projects

    PubMed Central

    Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F

    2015-01-01

    Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness. PMID:25975447

  19. netview p: a network visualization tool to unravel complex population structure using genome-wide SNPs.

    PubMed

    Steinig, Eike J; Neuditschko, Markus; Khatkar, Mehar S; Raadsma, Herman W; Zenger, Kyall R

    2016-01-01

    Network-based approaches are emerging as valuable tools for the analysis of complex genetic structure in wild and captive populations. netview p combines data quality control with the construction of population networks through mutual k-nearest neighbours thresholds applied to genome-wide SNPs. The program is cross-platform compatible, open-source and efficiently operates on data ranging from hundreds to hundreds of thousands of SNPs. The pipeline was used for the analysis of pedigree data from simulated (n = 750, SNPs = 1279) and captive silver-lipped pearl oysters (n = 415, SNPs = 1107), wild populations of the European hake from the Atlantic and Mediterranean (n = 834, SNPs = 380) and grey wolves from North America (n = 239, SNPs = 78 255). The population networks effectively visualize large- and fine-scale genetic structure within and between populations, including family-level structure and relationships. netview p comprises a network-based addition to other population analysis tools and provides user-friendly access to a complex network analysis pipeline through implementation in python. PMID:26129944

  20. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.

    PubMed

    Webb-Robertson, Bobbie-Jo M; Wiberg, Holli K; Matzke, Melissa M; Brown, Joseph N; Wang, Jing; McDermott, Jason E; Smith, Richard D; Rodland, Karin D; Metz, Thomas O; Pounds, Joel G; Waters, Katrina M

    2015-05-01

    In this review, we apply selected imputation strategies to label-free liquid chromatography-mass spectrometry (LC-MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC-MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives. PMID:25855118

  1. Tailored selection of study individuals to be sequenced in order to improve the accuracy of genotype imputation.

    PubMed

    Peil, Barbara; Kabisch, Maria; Fischer, Christine; Hamann, Ute; Bermejo, Justo Lorenzo

    2015-02-01

    The addition of sequence data from own-study individuals to genotypes from external data repositories, for example, the HapMap, has been shown to improve the accuracy of imputed genotypes. Early approaches for reference panel selection favored individuals who best reflect recombination patterns in the study population. By contrast, a maximization of genetic diversity in the reference panel has been recently proposed. We investigate here a novel strategy to select individuals for sequencing that relies on the characterization of the ancestral kernel of the study population. The simulated study scenarios consisted of several combinations of subpopulations from HapMap. HapMap individuals who did not belong to the study population constituted an external reference panel which was complemented with the sequences of study individuals selected according to different strategies. In addition to a random choice, individuals with the largest statistical depth according to the first genetic principal components were selected. In all simulated scenarios the integration of sequences from own-study individuals increased imputation accuracy. The selection of individuals based on the statistical depth resulted in the highest imputation accuracy for European and Asian study scenarios, whereas random selection performed best for an African-study scenario. Present findings indicate that there is no universal 'best strategy' to select individuals for sequencing. We propose to use the methodology described in the manuscript to assess the advantage of focusing on the ancestral kernel under own study characteristics (study size, genetic diversity, availability and properties of external reference panels, frequency of imputed variants). PMID:25537753

  2. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    NASA Astrophysics Data System (ADS)

    Riggi, S.; Riggi, D.; Riggi, F.

    2015-04-01

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures' models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers' Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  3. Using multiple imputation to efficiently correct cerebral MRI whole brain lesion and atrophy data in patients with multiple sclerosis.

    PubMed

    Chua, Alicia S; Egorova, Svetlana; Anderson, Mark C; Polgar-Turcsanyi, Mariann; Chitnis, Tanuja; Weiner, Howard L; Guttmann, Charles R G; Bakshi, Rohit; Healy, Brian C

    2015-10-01

    Automated segmentation of brain MRI scans into tissue classes is commonly used for the assessment of multiple sclerosis (MS). However, manual correction of the resulting brain tissue label maps by an expert reader remains necessary in many cases. Since automated segmentation data awaiting manual correction are "missing", we proposed to use multiple imputation (MI) to fill-in the missing manually-corrected MRI data for measures of normalized whole brain volume (brain parenchymal fraction-BPF) and T2 hyperintense lesion volume (T2LV). Automated and manually corrected MRI measures from 1300 patients enrolled in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at the Brigham and Women's Hospital (CLIMB) were identified. Simulation studies were conducted to assess the performance of MI with missing data both missing completely at random and missing at random. An imputation model including the concurrent automated data as well as clinical and demographic variables explained a high proportion of the variance in the manually corrected BPF (R(2)=0.97) and T2LV (R(2)=0.89), demonstrating the potential to accurately impute the missing data. Further, our results demonstrate that MI allows for the accurate estimation of group differences with little to no bias and with similar precision compared to an analysis with no missing data. We believe that our findings provide important insights for efficient correction of automated MRI measures to obviate the need to perform manual correction on all cases. PMID:26093330

  4. Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data.

    PubMed

    Welch, Catherine; Bartlett, Jonathan; Petersen, Irene

    2014-04-01

    Electronic health records of longitudinal clinical data are a valuable resource for health care research. One obstacle of using databases of health records in epidemiological analyses is that general practitioners mainly record data if they are clinically relevant. We can use existing methods to handle missing data, such as multiple imputation (mi), if we treat the unavailability of measurements as a missing-data problem. Most software implementations of MI do not take account of the longitudinal and dynamic structure of the data and are difficult to implement in large databases with millions of individuals and long follow-up. Nevalainen, Kenward, and Virtanen (2009, Statistics in Medicine 28: 3657-3669) proposed the two-fold fully conditional specification algorithm to impute missing data in longitudinal data. It imputes missing values at a given time point, conditional on information at the same time point and immediately adjacent time points. In this article, we describe a new command, twofold, that implements the two-fold fully conditional specification algorithm. It is extended to accommodate MI of longitudinal clinical records in large databases. PMID:25420071

  5. Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect

    PubMed Central

    Wan, Y.; Datta, S.; Conklin, D.J.; Kong, M.

    2015-01-01

    The statistical methods for variable selection and prediction could be challenging when missing covariates exist. Although multiple imputation (MI) is a universally accepted technique for solving missing data problem, how to combine the MI results for variable selection is not quite clear, because different imputations may result in different selections. The widely applied variable selection methods include the sparse partial least-squares (SPLS) method and the penalized least-squares method, e.g. the elastic net (ENet) method. In this paper, we propose an MI-based weighted elastic net (MI-WENet) method that is based on stacked MI data and a weighting scheme for each observation in the stacked data set. In the MI-WENet method, MI accounts for sampling and imputation uncertainty for missing values, and the weight accounts for the observed information. Extensive numerical simulations are carried out to compare the proposed MI-WENet method with the other competing alternatives, such as the SPLS and ENet. In addition, we applied the MIWENet method to examine the predictor variables for the endothelial function that can be characterized by median effective dose (ED50) and maximum effect (Emax) in an ex-vivo phenylephrine-induced extension and acetylcholine-induced relaxation experiment. PMID:26412909

  6. Evaluation of transethnic fine mapping with population-specific and cosmopolitan imputation reference panels in diverse Asian populations.

    PubMed

    Wang, Xu; Cheng, Ching-Yu; Liao, Jiemin; Sim, Xueling; Liu, Jianjun; Chia, Kee-Seng; Tai, E-Shyong; Little, Peter; Khor, Chiea-Chuen; Aung, Tin; Wong, Tien-Yin; Teo, Yik-Ying

    2016-04-01

    There has been limited success in identifying causal variants underlying association signals observed in genome-wide association studies (GWAS). The use of 1000 Genomes Project (1KGP) allows the imputation to estimate the genetic information at untyped variants. However, long stretches of high linkage disequilibrium within the genome prevent us from differentiating between causal variants and perfect surrogates, thus limiting our ability to identify causal variants. Transethnic strategies have been proposed as a possible solution to mitigate this. However, these studies generally rely on imputing genotypes from multiple ancestries from 1KGP but not against population-specific reference panels. Here, we perform the first transethnic fine-mapping study across three Asian cohorts from diverse ancestries at the loci implicated with eye and blood lipid traits, using population-specific reference panels that have been generated by whole-genome sequencing samples from the same ancestry groups. Our study outlines several challenges faced in a fine-mapping exercise where one simply aims to meta-analyse existing GWAS that have been imputed against reference haplotypes from the 1KGP. PMID:26130488

  7. A Bayesian Multiple Imputation Method for Handling Longitudinal Pesticide Data with Values below the Limit of Detection

    PubMed Central

    Chen, Haiying; Quandt, Sara A.; Grzywacz, Joseph G.; Arcury, Thomas A.

    2013-01-01

    Environmental and biomedical research often produces data below the limit of detection (LOD), or left-censored data. Imputing explicit values for values < LOD in a multivariate setting, such as with longitudinal data, is difficult using a likelihood-based approach. A Bayesian multiple imputation (MI) method is introduced to handle left-censored multivariate data. A Gibbs sampler, which uses an iterative process, is employed to simulate the target multivariate distribution within a Bayesian framework. Following convergence, multiple plausible data sets are generated for analysis by standard statistical methods outside of a Bayesian framework. With explicit imputed values available variables can be analyzed as outcomes or predictors. We illustrate a practical application using longitudinal data from the Community Participatory Approach to Measuring Farmworker Pesticide Exposure (PACE3) study to evaluate the association between urinary acephate concentrations (indicating pesticide exposure) and self-reported potential pesticide poisoning symptoms. Additionally, a simulation study is used to evaluate the sampling property of the estimators for distributional parameters as well as regression coefficients estimated with the generalized estimating equation (GEE) approach. Results demonstrated that the Bayesian MI estimates performed well in most settings, and we recommend the use of this valid and feasible approach to analyze multivariate data with values < LOD. PMID:23504271

  8. Plasmonic filters.

    SciTech Connect

    Passmore, Brandon Scott; Shaner, Eric Arthur; Barrick, Todd A.

    2009-09-01

    Metal films perforated with subwavelength hole arrays have been show to demonstrate an effect known as Extraordinary Transmission (EOT). In EOT devices, optical transmission passbands arise that can have up to 90% transmission and a bandwidth that is only a few percent of the designed center wavelength. By placing a tunable dielectric in proximity to the EOT mesh, one can tune the center frequency of the passband. We have demonstrated over 1 micron of passive tuning in structures designed for an 11 micron center wavelength. If a suitable midwave (3-5 micron) tunable dielectric (perhaps BaTiO{sub 3}) were integrated with an EOT mesh designed for midwave operation, it is possible that a fast, voltage tunable, low temperature filter solution could be demonstrated with a several hundred nanometer passband. Such an element could, for example, replace certain components in a filter wheel solution.

  9. Water Filter

    NASA Astrophysics Data System (ADS)

    1982-01-01

    A compact, lightweight electrolytic water sterilizer available through Ambassador Marketing, generates silver ions in concentrations of 50 to 100 parts per billion in water flow system. The silver ions serve as an effective bactericide/deodorizer. Tap water passes through filtering element of silver that has been chemically plated onto activated carbon. The silver inhibits bacterial growth and the activated carbon removes objectionable tastes and odors caused by addition of chlorine and other chemicals in municipal water supply. The three models available are a kitchen unit, a "Tourister" unit for portable use while traveling and a refrigerator unit that attaches to the ice cube water line. A filter will treat 5,000 to 10,000 gallons of water.

  10. A Reduced Number of mtSNPs Saturates Mitochondrial DNA Haplotype Diversity of Worldwide Population Groups

    PubMed Central

    Salas, Antonio; Amigo, Jorge

    2010-01-01

    Background The high levels of variation characterising the mitochondrial DNA (mtDNA) molecule are due ultimately to its high average mutation rate; moreover, mtDNA variation is deeply structured in different populations and ethnic groups. There is growing interest in selecting a reduced number of mtDNA single nucleotide polymorphisms (mtSNPs) that account for the maximum level of discrimination power in a given population. Applications of the selected mtSNP panel range from anthropologic and medical studies to forensic genetic casework. Methodology/Principal Findings This study proposes a new simulation-based method that explores the ability of different mtSNP panels to yield the maximum levels of discrimination power. The method explores subsets of mtSNPs of different sizes randomly chosen from a preselected panel of mtSNPs based on frequency. More than 2,000 complete genomes representing three main continental human population groups (Africa, Europe, and Asia) and two admixed populations (African-Americans and Hispanics) were collected from GenBank and the literature, and were used as training sets. Haplotype diversity was measured for each combination of mtSNP and compared with existing mtSNP panels available in the literature. The data indicates that only a reduced number of mtSNPs ranging from six to 22 are needed to account for 95% of the maximum haplotype diversity of a given population sample. However, only a small proportion of the best mtSNPs are shared between populations, indicating that there is not a perfect set of universal mtSNPs suitable for all population contexts. The discrimination power provided by these mtSNPs is much higher than the power of the mtSNP panels proposed in the literature to date. Some mtSNP combinations also yield high diversity values in admixed populations. Conclusions/Significance The proposed computational approach for exploring combinations of mtSNPs that optimise the discrimination power of a given set of mtSNPs is more efficient than previous empirical approaches. In contrast to precedent findings, the results seem to indicate that only few mtSNPs are needed to reach high levels of discrimination power in a population, independently of its ancestral background. PMID:20454657

  11. Eyeglass Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Biomedical Optical Company of America's suntiger lenses eliminate more than 99% of harmful light wavelengths. NASA derived lenses make scenes more vivid in color and also increase the wearer's visual acuity. Distant objects, even on hazy days, appear crisp and clear; mountains seem closer, glare is greatly reduced, clouds stand out. Daytime use protects the retina from bleaching in bright light, thus improving night vision. Filtering helps prevent a variety of eye disorders, in particular cataracts and age related macular degeneration.

  12. Impact of Supported Housing on Clinical Outcomes Analysis of a Randomized Trial Using Multiple Imputation Technique

    PubMed Central

    Cheng, An-Lin; Lin, Haiqun; Kasprow, Wesley; Rosenheck, Robert A.

    2011-01-01

    In 1992, the US Department of Housing and Urban Development (HUD) and the US Department of Veterans Affairs (VA) established the HUD-VA Supported Housing (HUD-VASH) Program to provide integrated clinical and housing services to homeless veterans with psychiatric and/or substance abuse disorders at 19 sites. At four sites, 460 subjects were randomly assigned to one of the three groups: (1) HUD-VASH, with both Section 8 vouchers and intensive case management; (2) case management only; and (3) standard VA care. A previous publication found HUD-VASH resulted in superior housing outcomes but yielded no benefits on clinical outcomes. Since many participants missed prescheduled visits during the follow-up period and follow-up rates were quite different across the groups, we reanalyzed these data using multiple imputation statistical methods to account for the missing observations. Significant benefits were found for HUD-VASH in drug and alcohol abuse outcomes that had not previously been identified. PMID:17220745

  13. A comparison of two methods of estimating propensity scores after multiple imputation.

    PubMed

    Mitra, Robin; Reiter, Jerome P

    2016-02-01

    In many observational studies, analysts estimate treatment effects using propensity scores, e.g. by matching or sub-classifying on the scores. When some values of the covariates are missing, analysts can use multiple imputation to fill in the missing data, estimate propensity scores based on the m completed datasets, and use the propensity scores to estimate treatment effects. We compare two approaches to implement this process. In the first, the analyst estimates the treatment effect using propensity score matching within each completed data set, and averages the m treatment effect estimates. In the second approach, the analyst averages the m propensity scores for each record across the completed datasets, and performs propensity score matching with these averaged scores to estimate the treatment effect. We compare properties of both methods via simulation studies using artificial and real data. The simulations suggest that the second method has greater potential to produce substantial bias reductions than the first, particularly when the missing values are predictive of treatment assignment. PMID:22687877

  14. An imputation of air pollution social cost of energy: A case study of Taiwan

    SciTech Connect

    Chi-Yuan Liang

    1995-12-31

    Based on the Air Pollution Control Act, the Environmental Protection Administration, Taiwan is scheduled to implement an anti-air-pollution fee on energy products in the coming July. The revenue of the anti-air-pollution fee will be used solely for air pollution control. The rationale of this fee is to endogenize the social cost of air pollution attributed to energy consumption and hence to curb the consumption of energy through price mechanism for a cleaner environment. Thus, to impute the social cost of air pollution caused by types of energy consumption is imminent for policy making. The objective of this paper is to propose a methodology to estimate the air pollution social cost of air pollution for types of energy in Taiwan. It is useful for policy making of the government in Taiwan and other countries as well. We employ data on epidemiology study and CVM study as well as energy consumption and pollution statistics to evaluate the social cost of air pollution for types of energy. This paper contains the following sections: (1) Introduction; (2) Methodology and Estimation Procedure; (3) Empirical Results; (4) Conclusions and Implications.

  15. SNPs for parentage testing and traceability in globally diverse breeds of sheep.

    PubMed

    Heaton, Michael P; Leymaster, Kreg A; Kalbfleisch, Theodore S; Kijas, James W; Clarke, Shannon M; McEwan, John; Maddox, Jillian F; Basnayake, Veronica; Petrik, Dustin T; Simpson, Barry; Smith, Timothy P L; Chitko-McKown, Carol G

    2014-01-01

    DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular "parentage SNP" varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF?0.3) in 485 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parent's genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.110(-39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the world's sheep breeds. PMID:24740156

  16. SNPs for Parentage Testing and Traceability in Globally Diverse Breeds of Sheep

    PubMed Central

    Heaton, Michael P.; Leymaster, Kreg A.; Kalbfleisch, Theodore S.; Kijas, James W.; Clarke, Shannon M.; McEwan, John; Maddox, Jillian F.; Basnayake, Veronica; Petrik, Dustin T.; Simpson, Barry; Smith, Timothy P. L.; Chitko-McKown, Carol G.

    2014-01-01

    DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular parentage SNP varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF?0.3) in 485 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionizationtime-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parents genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.110(?39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the worlds sheep breeds. PMID:24740156

  17. FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease

    PubMed Central

    Chen, Rong; Morgan, Alex A; Dudley, Joel; Deshpande, Tarangini; Li, Li; Kodama, Keiichi; Chiang, Annie P; Butte, Atul J

    2008-01-01

    Background Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs. Results We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis. Conclusions Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs. PMID:19061490

  18. Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips.

    PubMed

    Boison, S A; Santos, D J A; Utsunomiya, A H T; Carvalheiro, R; Neves, H H R; O'Brien, A M Perez; Garcia, J F; Slkner, J; da Silva, M V G B

    2015-07-01

    Genotype imputation is widely used as a cost-effective strategy in genomic evaluation of cattle. Key determinants of imputation accuracies, such as linkage disequilibrium patterns, marker densities, and ascertainment bias, differ between Bos indicus and Bos taurus breeds. Consequently, there is a need to investigate effectiveness of genotype imputation in indicine breeds. Thus, the objective of the study was to investigate strategies and factors affecting the accuracy of genotype imputation in Gyr (Bos indicus) dairy cattle. Four imputation scenarios were studied using 471 sires and 1,644 dams genotyped on Illumina BovineHD (HD-777K; San Diego, CA) and BovineSNP50 (50K) chips, respectively. Scenarios were based on which reference high-density single nucleotide polymorphism (SNP) panel (HDP) should be adopted [HD-777K, 50K, and GeneSeek GGP-75Ki (Lincoln, NE)]. Depending on the scenario, validation animals had their genotypes masked for one of the lower-density panels: Illumina (3K, 7K, and 50K) and GeneSeek (SGGP-20Ki and GGP-75Ki). We randomly selected 171 sires as reference and 300 as validation for all the scenarios. Additionally, all sires were used as reference and the 1,644 dams were imputed for validation. Genotypes of 98 individuals with 4 and more offspring were completely masked and imputed. Imputation algorithms FImpute and Beagle v3.3 and v4 were used. Imputation accuracies were measured using the correlation and allelic correct rate. FImpute resulted in highest accuracies, whereas Beagle 3.3 gave the least-accurate imputations. Accuracies evaluated as correlation (allelic correct rate) ranged from 0.910 (0.942) to 0.961 (0.974) using 50K as HDP and with 3K (7K) as low-density panels. With GGP-75Ki as HDP, accuracies were moderate for 3K, 7K, and 50K, but high for SGGP-20Ki. The use of HD-777K as HDP resulted in accuracies of 0.888 (3K), 0.941 (7K), 0.980 (SGGP-20Ki), 0.982 (50K), and 0.993 (GGP-75Ki). Ungenotyped individuals were imputed with an average accuracy of 0.970. The average top 5 kinship coefficients between reference and imputed individuals was a strong predictor of imputation accuracy. FImpute was faster and used less memory than Beagle v4. Beagle v4 outperformed Beagle v3.3 in accuracy and speed of computation. A genotyping strategy that uses the HD-777K SNP chip as a reference panel and SGGP-20Ki as the lower-density SNP panel should be adopted as accuracy was high and similar to that of the 50K. However, the effect of using imputed HD-777K genotypes from the SGGP-20Ki on genomic evaluation is yet to be studied. PMID:25958293

  19. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass

    PubMed Central

    Ramstein, Guillaume P.; Lipka, Alexander E.; Lu, Fei; Costich, Denise E.; Cherney, Jerome H.; Buckler, Edward S.; Casler, Michael D.

    2015-01-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  20. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

    PubMed

    Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D

    2015-05-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  1. A small number of candidate gene SNPs reveal continental ancestry in African Americans

    PubMed Central

    KODAMAN, NURI; ALDRICH, MELINDA C.; SMITH, JEFFREY R.; SIGNORELLO, LISA B.; BRADLEY, KEVIN; BREYER, JOAN; COHEN, SARAH S.; LONG, JIRONG; CAI, QIUYIN; GILES, JUSTIN; BUSH, WILLIAM S.; BLOT, WILLIAM J.; MATTHEWS, CHARLES E.; WILLIAMS, SCOTT M.

    2013-01-01

    SUMMARY Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K=2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99±0.003; 0.98± 0.01; 0.93±0.03; and 0.81± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably. PMID:23278390

  2. Studies on interaction of colloidal silver nanoparticles (SNPs) with five different bacterial species.

    PubMed

    Khan, S Sudheer; Mukherjee, Amitava; Chandrasekaran, N

    2011-10-01

    Silver nanoparticles (SNPs) are being increasingly used in many consumer products like textile fabrics, cosmetics, washing machines, food and drug products owing to its excellent antimicrobial properties. Here we have studied the adsorption and toxicity of SNPs on bacterial species such as Pseudomonas aeruginosa, Micrococcus luteus, Bacillus subtilis, Bacillus barbaricus and Klebsiella pneumoniae. The influence of zeta potential on the adsorption of SNPs on bacterial cell surface was investigated at acidic, neutral and alkaline pH and with varying salt (NaCl) concentrations (0.05, 0.1, 0.5, 1 and 1.5 M). The survival rate of bacterial species decreased with increase in adsorption of SNPs. Maximum adsorption and toxicity was observed at pH 5, and NaCl concentration of <0.5 M. A very less adsorption was observed at pH 9 and NaCl concentration >0.5 M, there by resulting in less toxicity. The zeta potential study suggests that, the adsorption of SNPs on the cell surface was related to electrostatic force of attraction. The equilibrium and kinetics of the adsorption process were also studied. The adsorption equilibrium isotherms fitted well to the Langmuir model. The kinetics of adsorption fitted best to pseudo-first-order. These findings form a basis for interpreting the interaction of nanoparticles with environmental bacterial species. PMID:21640562

  3. Mining the 3?UTR of Autism-implicated Genes for SNPs Perturbing MicroRNA Regulation

    PubMed Central

    Vaishnavi, Varadharajan; Manikandan, Mayakannan; Munirajan, Arasambattu Kannan

    2014-01-01

    Autism spectrum disorder (ASD) refers to a group of childhood neurodevelopmental disorders with polygenic etiology. The expression of many genes implicated in ASD is tightly regulated by various factors including microRNAs (miRNAs), a class of noncoding RNAs ?22 nucleotides in length that function to suppress translation by pairing with miRNA recognition elements (MREs) present in the 3?untranslated region (3?UTR) of target mRNAs. This emphasizes the role played by miRNAs in regulating neurogenesis, brain development and differentiation and hence any perturbations in this regulatory mechanism might affect these processes as well. Recently, single nucleotide polymorphisms (SNPs) present within 3?UTRs of mRNAs have been shown to modulate existing MREs or even create new MREs. Therefore, we hypothesized that SNPs perturbing miRNA-mediated gene regulation might lead to aberrant expression of autism-implicated genes, thus resulting in disease predisposition or pathogenesis in at least a subpopulation of ASD individuals. We developed a systematic computational pipeline that integrates data from well-established databases. By following a stringent selection criterion, we identified 9 MRE-modulating SNPs and another 12 MRE-creating SNPs in the 3?UTR of autism-implicated genes. These high-confidence candidate SNPs may play roles in ASD and hence would be valuable for further functional validation. PMID:24747189

  4. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    SciTech Connect

    Yang, Jing; Li, Yuan-Yuan; Shanghai Center for Bioinformation Technology, Shanghai 200235 ; Li, Yi-Xue; Shanghai Center for Bioinformation Technology, Shanghai 200235 ; Ye, Zhi-Qiang; Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

  5. Bioinformatics prioritization of SNPs perturbing microRNA regulation of hematological malignancy-implicated genes.

    PubMed

    Ghaedi, Hamid; Bastami, Milad; Zare-Abdollahi, Davood; Alipoor, Behnam; Movafagh, Abolfazl; Mirfakhraie, Reza; Omrani, Mir Davood; Masotti, Andrea

    2015-12-01

    The contribution of microRNAs (miRNAs) to cancer has been extensively investigated and it became obvious that a strict regulation of miRNA-mRNA regulatory network is crucial for safeguarding cell health. Apart from the direct impact of miRNA dysregulation in cancer pathogenesis, genetic variations in miRNAs are likely to disrupt miRNA-target interaction. Indeed, many evidences suggested that SNPs within miRNA regulome are associated with the development of different hematological malignancies. However, a full catalog of SNPs within miRNAs target sites of genes relevant to hematopoiesis and hematological malignancies is still lacking. Accordingly, we aimed to systematically identify and characterize such SNPs and provide a prioritized list of most potentially disrupting SNPs. Although in the present study we did not address the functional significance of these potential disturbing variants, we believe that our compiled results will be valuable for researchers interested in determining the role of target-SNPs in the development of hematological malignancies. PMID:26520014

  6. Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability

    PubMed Central

    Das, Avinash; Morley, Michael; Moravec, Christine S.; Tang, W. H. W.; Hakonarson, Hakon; Ashley, Euan A.; Brandimarto, Jeffrey; Hu, Ray; Li, Mingyao; Li, Hongzhe; Liu, Yichuan; Qu, Liming; Sanchez, Pablo; Margulies, Kenneth B.; Cappola, Thomas P.; Jensen, Shane; Hannenhalli, Sridhar

    2015-01-01

    The standard expression quantitative trait loci (eQTL) detects polymorphisms associated with gene expression without revealing causality. We introduce a coupled Bayesian regression approacheQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combination of regulatory single-nucleotide polymorphisms (SNPs) that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance but also predicts gene expression more accurately than other methods. Based on realistic simulated data, we demonstrate that eQTeL accurately detects causal regulatory SNPs, including those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. PMID:26456756

  7. Rocket noise filtering system using digital filters

    NASA Technical Reports Server (NTRS)

    Mauritzen, David

    1990-01-01

    A set of digital filters is designed to filter rocket noise to various bandwidths. The filters are designed to have constant group delay and are implemented in software on a general purpose computer. The Parks-McClellan algorithm is used. Preliminary tests are performed to verify the design and implementation. An analog filter which was previously employed is also simulated.

  8. Prediction of eye and skin color in diverse populations using seven SNPs.

    PubMed

    Spichenok, Olga; Budimlija, Zoran M; Mitchell, Adele A; Jenny, Andreas; Kovacevic, Lejla; Marjanovic, Damir; Caragine, Theresa; Prinz, Mechthild; Wurmbach, Elisa

    2011-11-01

    An essential component in identifying human remains is the documentation of the decedent's visible characteristics, such as eye, hair and skin color. However, if a decedent is decomposed or only skeletal remains are found, this critical, visibly identifying information is lost. It would be beneficial to use genetic information to reveal these visible characteristics. In this study, seven single nucleotide polymorphisms (SNPs), located in and nearby genes known for their important role in pigmentation, were validated on 554 samples, donated from non-related individuals of various populations. Six SNPs were used in predicting the eye color of an individual, and all seven were used to describe the skin coloration. The outcome revealed that these markers can be applied to all populations with very low error rates. However, the call-rate to determine the skin coloration varied between populations, demonstrating its complexity. Overall, these results prove the importance of these seven SNPs for potential forensic tests. PMID:21050833

  9. Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs

    NASA Astrophysics Data System (ADS)

    Watson, Corey T.; Disanto, Giulio; Breden, Felix; Giovannoni, Gavin; Ramagopalan, Sreeram V.

    2012-10-01

    Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.

  10. Application of thermionic SNPS with thermal reactor for spacecraft orbital transfer mission

    NASA Astrophysics Data System (ADS)

    Andreev, Pavel V.; Griaznov, Georgii M.; Zhabotinskii, Evgenii E.; Nikonov, Anatolii M.; Serbin, Viktor I.

    The region of expedient use of SNPS with an in-core thermal thermionic reactor (ITR) is limited by an electric power level of about 100 kWe under an SNPS lifetime from 3 to 5 years. At the same time the reactor power may be forced from two to three times during the period of about half a year. The mathematical model of SNPS mass dependence on the degree of forcing is given. The results of calculation of payload masses and transfer times for transfer from low orbit to geostationary orbit for two thermal reactors having an emission area 1.6 sq m and 2.5 sq m are given for different types of electrojets.

  11. Application of thermionic SNPS with thermal reactor for spacecraft orbital transfer mission

    NASA Astrophysics Data System (ADS)

    Andreev, Pavel V.; Gryaznov, Georgy M.; Zhabotinsky, Evgeny E.; Nikonov, Anatoly M.; Serbin, Victor I.

    1991-01-01

    The region of expedient using of SNPS with in-core thermal thermionic reactor (ITR) is limited by electric power level of about 100 kWe under SNPS lifetime from 3 to 5 years. At the same time the reactor power may be forced from two to three times during the period of about half a year. The mathematical model of SNPS mass dependence on a degree of forcing is given. The results of calculation of payload masses and transfer times for transfer from low orbit to geostationary orbit for two thermal reactors having emission area 1.6 m2 and 2.5 m2 are given for different types of electrojets.

  12. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    PubMed Central

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  13. Multimodal MRI-based imputation of the A?+ in early mild cognitive impairment

    PubMed Central

    Tosun, Duygu; Joshi, Sarang; Weiner, Michael W; for the Alzheimer's Disease Neuroimaging Initiative

    2014-01-01

    Objective The primary goal of this study was to identify brain atrophy from structural MRI (magnetic resonance imaging) and cerebral blood flow (CBF) patterns from arterial spin labeling perfusion MRI that are best predictors of the A?-burden, measured as composite 18F-AV45-PET (positron emission tomography) uptake, in individuals with early mild cognitive impairment (MCI). Furthermore, another objective was to assess the relative importance of imaging modalities in classification of A?+/A?? early MCI. Methods Sixty-seven Alzheimer's Disease Neuroimaging Initiative (ADNI)-GO/2 participants with early MCI were included. Voxel-wise anatomical shape variation measures were computed by estimating the initial diffeomorphic mapping momenta from an unbiased control template. CBF measures normalized to average motor cortex CBF were mapped onto the template space. Using partial least squares regression, we identified the structural and CBF signatures of A? after accounting for normal cofounding effects of age, gender, and education. Results 18F-AV45-positive early MCIs could be identified with 83% classification accuracy, 87% positive predictive value, and 84% negative predictive value by multidisciplinary classifiers combining demographics data, ApoE ?4-genotype, and a multimodal MRI-based A? score. Interpretation Multimodal MRI can be used to predict the amyloid status of early-MCI individuals. MRI is a very attractive candidate for the identification of inexpensive and noninvasive surrogate biomarkers of A? deposition. Our approach is expected to have value for the identification of individuals likely to be A?+ in circumstances where cost or logistical problems prevent A? detection using cerebrospinal fluid analysis or A?-PET. This can also be used in clinical settings and clinical trials, aiding subject recruitment and evaluation of treatment efficacy. Imputation of the A?-positivity status could also complement A?-PET by identifying individuals who would benefit the most from this assessment. PMID:24729983

  14. Detection and mapping of mtDNA SNPs in Atlantic salmon using high throughput DNA sequencing

    PubMed Central

    2011-01-01

    Background Approximately half of the mitochondrial genome inherent within 546 individual Atlantic salmon (Salmo salar) derived from across the species' North Atlantic range, was selectively amplified with a novel combination of standard PCR and pyro-sequencing in a single run using 454 Titanium FLX technology (Roche, 454 Life Sciences). A unique combination of barcoded primers and a partitioned sequencing plate was employed to designate each sequence read to its original sample. The sequence reads were aligned according to the S. salar mitochondrial reference sequence (NC_001960.1), with the objective of identifying single nucleotide polymorphisms (SNPs). They were validated if they met with the following three stringent criteria: (i) sequence reads were produced from both DNA strands; (ii) SNPs were confirmed in a minimum of 90% of replicate sequence reads; and (iii) SNPs occurred in more than one individual. Results Pyrosequencing generated a total of 179,826,884 bp of data, and 10,765 of the total 10,920 S. salar sequences (98.6%) were assigned back to their original samples. The approach taken resulted in a total of 216 SNPs and 2 indels, which were validated and mapped onto the S. salar mitochondrial genome, including 107 SNPs and one indel not previously reported. An average of 27.3 sequence reads with a standard deviation of 11.7 supported each SNP per individual. Conclusion The study generated a mitochondrial SNP panel from a large sample group across a broad geographical area, reducing the potential for ascertainment bias, which has hampered previous studies. The SNPs identified here validate those identified in previous studies, and also contribute additional potentially informative loci for the future study of phylogeography and evolution in the Atlantic salmon. The overall success experienced with this novel application of HT sequencing of targeted regions suggests that the same approach could be successfully applied for SNP mining in other species. PMID:21473771

  15. SNPs in stress-responsive rice genes: validation, genotyping, functional relevance and population structure

    PubMed Central

    2012-01-01

    Background Single nucleotide polymorphism (SNP) validation and large-scale genotyping are required to maximize the use of DNA sequence variation and determine the functional relevance of candidate genes for complex stress tolerance traits through genetic association in rice. We used the bead array platform-based Illumina GoldenGate assay to validate and genotype SNPs in a select set of stress-responsive genes to understand their functional relevance and study the population structure in rice. Results Of the 384 putative SNPs assayed, we successfully validated and genotyped 362 (94.3%). Of these 325 (84.6%) showed polymorphism among the 91 rice genotypes examined. Physical distribution, degree of allele sharing, admixtures and introgression, and amino acid replacement of SNPs in 263 abiotic and 62 biotic stress-responsive genes provided clues for identification and targeted mapping of trait-associated genomic regions. We assessed the functional and adaptive significance of validated SNPs in a set of contrasting drought tolerant upland and sensitive lowland rice genotypes by correlating their allelic variation with amino acid sequence alterations in catalytic domains and three-dimensional secondary protein structure encoded by stress-responsive genes. We found a strong genetic association among SNPs in the nine stress-responsive genes with upland and lowland ecological adaptation. Higher nucleotide diversity was observed in indica accessions compared with other rice sub-populations based on different population genetic parameters. The inferred ancestry of 16% among rice genotypes was derived from admixed populations with the maximum between upland aus and wild Oryza species. Conclusions SNPs validated in biotic and abiotic stress-responsive rice genes can be used in association analyses to identify candidate genes and develop functional markers for stress tolerance in rice. PMID:22921105

  16. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris).

    PubMed

    Villordo-Pineda, Emiliano; Gonzlez-Chavira, Mario M; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A; Caballero-Prez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  17. Silver sulfide nanoparticles (Ag2S-NPs) are taken up by plants and are phytotoxic.

    PubMed

    Wang, Peng; Menzies, Neal W; Lombi, Enzo; Sekine, Ryo; Blamey, F Pax C; Hernandez-Soriano, Maria C; Cheng, Miaomiao; Kappen, Peter; Peijnenburg, Willie J G M; Tang, Caixian; Kopittke, Peter M

    2015-01-01

    Silver nanoparticles (NPs) are used in more consumer products than any other nanomaterial and their release into the environment is unavoidable. Of primary concern is the wastewater stream in which most silver NPs are transformed to silver sulfide NPs (Ag2S-NPs) before being applied to agricultural soils within biosolids. While Ag2S-NPs are assumed to be biologically inert, nothing is known of their effects on terrestrial plants. The phytotoxicity of Ag and its accumulation was examined in short-term (24?h) and longer-term (2-week) solution culture experiments with cowpea (Vigna unguiculata L. Walp.) and wheat (Triticum aestivum L.) exposed to Ag2S-NPs (0-20?mg?Ag?L(-1)), metallic Ag-NPs (0-1.6?mg?Ag?L(-1)), or ionic Ag (AgNO3; 0-0.086?mg?Ag?L(-1)). Although not inducing any effects during 24-h exposure, Ag2S-NPs reduced growth by up to 52% over a 2-week period. This toxicity did not result from their dissolution and release of toxic Ag(+) in the rooting medium, with soluble Ag concentrations remaining below 0.001?mg?Ag?L(-1). Rather, Ag accumulated as Ag2S in the root and shoot tissues when plants were exposed to Ag2S-NPs, consistent with their direct uptake. Importantly, this differed from the form of Ag present in tissues of plants exposed to AgNO3. For the first time, our findings have shown that Ag2S-NPs exert toxic effects through their direct accumulation in terrestrial plant tissues. These findings need to be considered to ensure high yield of food crops, and to avoid increasing Ag in the food chain. PMID:25686712

  18. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris)

    PubMed Central

    Villordo-Pineda, Emiliano; González-Chavira, Mario M.; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A.; Caballero-Pérez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  19. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references.

    PubMed

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-12-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case-control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5-98.8% obtained with the original HIBAG references to 95.2-99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets. PMID:25707395

  20. A genome-wide survey for SNPs altering microRNA seed sites identifies functional candidates in GWAS

    PubMed Central

    2011-01-01

    Background Gene variants within regulatory regions are thought to be major contributors of the variation of complex traits/diseases. Genome wide association studies (GWAS), have identified scores of genetic variants that appear to contribute to human disease risk. However, most of these variants do not appear to be functional. Thus, the significance of the association may be brought up by still unknown mechanisms or by linkage disequilibrium (LD) with functional polymorphisms. In the present study, focused on functional variants related with the binding of microRNAs (miR), we utilized SNP data, including newly released 1000 Genomes Project data to perform a genome-wide scan of SNPs that abrogate or create miR recognition element (MRE) seed sites (MRESS). Results We identified 2723 SNPs disrupting, and 22295 SNPs creating MRESSs. We estimated the percent of SNPs falling within both validated (5%) and predicted conserved MRESSs (3%). We determined 87 of these MRESS SNPs were listed in GWAS association studies, or in strong LD with a GWAS SNP, and may represent the functional variants of identified GWAS SNPs. Furthermore, 39 of these have evidence of co-expression of target mRNA and the predicted miR. We also gathered previously published eQTL data supporting a functional role for four of these SNPs shown to associate with disease phenotypes. Comparison of FST statistics (a measure of population subdivision) for predicted MRESS SNPs against non MRESS SNPs revealed a significantly higher (P = 0.0004) degree of subdivision among MRESS SNPs, suggesting a role for these SNPs in environmentally driven selection. Conclusions We have demonstrated the potential of publicly available resources to identify high priority candidate SNPs for functional studies and for disease risk prediction. PMID:21995669

  1. Application of Population Sequencing (POPSEQ) for Ordering and Imputing Genotyping-by-Sequencing Markers in Hexaploid Wheat

    PubMed Central

    Edae, Erena A.; Bowden, Robert L.; Poland, Jesse

    2015-01-01

    The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based, high-resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for association mapping and genomics-assisted breeding is limited by a large proportion of missing data per marker. For species with a reference genomic sequence, markers can be ordered on the physical map. However, in the absence of reference marker order, the use and imputation of GBS markers is challenging. Here, we demonstrate how the population sequencing (POPSEQ) approach can be used to provide marker context for GBS in wheat. The utility of a POPSEQ-based genetic map as a reference map to create genetically ordered markers on a chromosome for hexaploid wheat was validated by constructing an independent de novo linkage map of GBS markers from a Synthetic W7984 × Opata M85 recombinant inbred line (SynOpRIL) population. The results indicated that there is strong agreement between the independent de novo linkage map and the POPSEQ mapping approach in mapping and ordering GBS markers for hexaploid wheat. After ordering, a large number of GBS markers were imputed, thus providing a high-quality reference map that can be used for QTL mapping for different traits. The POPSEQ-based reference map and whole-genome sequence assemblies are valuable resources that can be used to order GBS markers and enable the application of highly accurate imputation methods to leverage the application GBS markers in wheat. PMID:26530417

  2. Polymorphisms involving gain or loss of CpG sites are significantly enriched in trait-associated SNPs

    PubMed Central

    Zhou, Dan; Li, Zhenli; Yu, Dan; Wan, Ledong; Zhu, Yimin; Lai, Maode; Zhang, Dandan

    2015-01-01

    Some single nucleotide polymorphisms (SNPs) influence the existence of CpG sites, the basis of DNA modification such as methylation and hydroxymethylation. These polymorphisms can lead to gain or loss of CpG sites and were defined as CpG site related SNPs (cgSNPs) in this study. The cgSNPs change DNA sequence and might potentially affect DNA modification such as methylation. However, the functional consequence of cgSNPs is poorly understood. We observed that a considerable proportion (23.0%) of common variants were cgSNPs in human genome. Mutations involving loss of CpG sites were associated with reduced levels of methylation (~20.2%) using The Cancer Genome Atlas (TCGA) data. Using public databases (SCAN and seeQTL) of expression quantitative trait loci (eQTLs), we found that the cgSNPs were significantly enriched in eQTLs via logistic regression and simulation test. Furthermore, we observed that cgSNPs were more likely to be trait-associated loci especially cancers using a catalog of published genome-wide association studies (GWAS) recorded by National Human Genome Research Institute (NHGRI). Our results indicated that cgSNP might be meaningful as annotation either in SNP functional prediction or in screening for trait-associated SNPs. PMID:26503467

  3. Tool for rapid annotation of microbial SNPs (TRAMS): a simple program for rapid annotation of genomic variation in prokaryotes.

    PubMed

    Reumerman, Richard A; Tucker, Nicholas P; Herron, Paul R; Hoskisson, Paul A; Sangal, Vartul

    2013-09-01

    Next generation sequencing (NGS) has been widely used to study genomic variation in a variety of prokaryotes. Single nucleotide polymorphisms (SNPs) resulting from genomic comparisons need to be annotated for their functional impact on the coding sequences. We have developed a program, TRAMS, for functional annotation of genomic SNPs which is available to download as a single file executable for WINDOWS users with limited computational experience and as a Python script for Mac OS and Linux users. TRAMS needs a tab delimited text file containing SNP locations, reference nucleotide and SNPs in variant strains along with a reference genome sequence in GenBank or EMBL format. SNPs are annotated as synonymous, nonsynonymous or nonsense. Nonsynonymous SNPs in start and stop codons are separated as non-start and non-stop SNPs, respectively. SNPs in multiple overlapping features are annotated separately for each feature and multiple nucleotide polymorphisms within a codon are combined before annotation. We have also developed a workflow for Galaxy, a highly used tool for analysing NGS data, to map short reads to a reference genome and extract and annotate the SNPs. TRAMS is a simple program for rapid and accurate annotation of SNPs that will be very useful for microbiologists in analysing genomic diversity in microbial populations. PMID:23828175

  4. Exonic versus intronic SNPs: contrasting roles in revealing the population genetic differentiation of a widespread bird species.

    PubMed

    Zhan, X; Dixon, A; Batbayar, N; Bragin, E; Ayas, Z; Deutschova, L; Chavko, J; Domashevsky, S; Dorosencu, A; Bagyura, J; Gombobaatar, S; Grlica, I D; Levin, A; Milobog, Y; Ming, M; Prommer, M; Purev-Ochir, G; Ragyov, D; Tsurkanu, V; Vetrov, V; Zubkov, N; Bruford, M W

    2015-01-01

    Recent years have seen considerable progress in applying single nucleotide polymorphisms (SNPs) to population genetics studies. However, relatively few have attempted to use them to study the genetic differentiation of wild bird populations and none have examined possible differences of exonic and intronic SNPs in these studies. Here, using 144 SNPs, we examined population genetic differentiation in the saker falcon (Falco cherrug) across Eurasia. The position of each SNP was verified using the recently sequenced saker genome with 108 SNPs positioned within the introns of 10 fragments and 36 SNPs in the exons of six genes, comprising MHC, MC1R and four others. In contrast to intronic SNPs, both Bayesian clustering and principal component analyses using exonic SNPs consistently revealed two genetic clusters, within which the least admixed individuals were found in Europe/central Asia and Qinghai (China), respectively. Pairwise D analysis for exonic SNPs showed that the two populations were significantly differentiated and between the two clusters the frequencies of five SNP markers were inferred to be influenced by selection. Central Eurasian populations clustered in as intermediate between the two main groups, consistent with their geographic position. But the westernmost populations of central Europe showed evidence of demographic isolation. Our work highlights the importance of functional exonic SNPs for studying population genetic pattern in a widespread avian species. PMID:25074575

  5. Effective filtering strategies to improve data quality from population-based whole exome sequencing studies

    PubMed Central

    2014-01-01

    Background Genotypes generated in next generation sequencing studies contain errors which can significantly impact the power to detect signals in common and rare variant association tests. These genotyping errors are not explicitly filtered by the standard GATK Variant Quality Score Recalibration (VQSR) tool and thus remain a source of errors in whole exome sequencing (WES) projects that follow GATKs recommended best practices. Therefore, additional data filtering methods are required to effectively remove these errors before performing association analyses with complex phenotypes. Here we empirically derive thresholds for genotype and variant filters that, when used in conjunction with the VQSR tool, achieve higher data quality than when using VQSR alone. Results The detailed filtering strategies improve the concordance of sequenced genotypes with array genotypes from 99.33% to 99.77%; improve the percent of discordant genotypes removed from 10.5% to 69.5%; and improve the Ti/Tv ratio from 2.63 to 2.75. We also demonstrate that managing batch effects by separating samples based on different target capture and sequencing chemistry protocols results in a final data set containing 40.9% more high-quality variants. In addition, imputation is an important component of WES studies and is used to estimate common variant genotypes to generate additional markers for association analyses. As such, we demonstrate filtering methods for imputed data that improve genotype concordance from 79.3% to 99.8% while removing 99.5% of discordant genotypes. Conclusions The described filtering methods are advantageous for large population-based WES studies designed to identify common and rare variation associated with complex diseases. Compared to data processed through standard practices, these strategies result in substantially higher quality data for common and rare association analyses. PMID:24884706

  6. The effect of simple imputation on inferences about population means when data are missing in biomedical research due to detection limits

    PubMed Central

    WANG, Hongyue; CHEN, Guanqing; LU, Xiang; ZHANG, Hui; FENG, Changyong

    2015-01-01

    Summary The sample geometric mean has been widely used in biomedical and psychosocial research to estimate and compare population geometric means. However, due to the detection limit of measurement instruments, the actual value of the measurement is not always observable. A common practice to deal with this problem is to replace missing values by small positive constants and make inferences based on the imputed data. However, no work has been carried out to study the effect of this naïve imputation method on inference. In this report, we show that this simple imputation method may dramatically change the reported outcomes of a study and, thus, make the results uninterpretable, even if the detection limit is very small. PMID:26977131

  7. Hansa: an automated method for discriminating disease and neutral human nsSNPs.

    PubMed

    Acharya, Vishal; Nagarajaram, Hampapathalu A

    2012-02-01

    Variations are mostly due to nonsynonymous single nucleotide polymorphisms (nsSNPs), some of which are associated with certain diseases. Phenotypic effects of a large number of nsSNPs have not been characterized. Although several methods have been developed to predict the effects of nsSNPs as "disease" or "neutral," there is still a need for development of methods with improved prediction accuracies. We, therefore, developed a support vector machine (SVM) based method named Hansa which uses a novel set of discriminatory features to classify nsSNPs into disease (pathogenic) and benign (neutral) types. Validation studies on a benchmark dataset and further on an independent dataset of well-characterized known disease and neutral mutations show that Hansa outperforms the other known methods. For example, fivefold cross-validation studies using the benchmark HumVar dataset reveal that at the false positive rate (FPR) of 20% Hansa yields a true positive rate (TPR) of 82% that is about 10% higher than the best-known method. Hansa is available in the form of a web server at http://hansa.cdfd.org.in:8080. PMID:22045683

  8. Pre-selection of most significant SNPS for the estimation of genomic breeding values

    PubMed Central

    Macciotta, Nicol PP; Gaspa, Giustino; Steri, Roberto; Pieramati, Camillo; Carnier, Paolo; Dimauro, Corrado

    2009-01-01

    The availability of a large amount of SNP markers throughout the genome of different livestock species offers the opportunity to estimate genomic breeding values (GEBVs). However, the estimation of many effects in a data set of limited size represent a severe statistical problem. A pre-selection of SNPS based on single regression may provide a reasonable compromise between accuracy of results, number of independent variables to be considered and computing requirements. A total of 595 and 618 SNPS were pre-selected using a simple linear regression for each SNP, based on phenotypes or polygenic EBVs, respectively, with an average distance of 910 cM between them. Chromosome four had the largest frequency of selected SNPS. Average correlations between GEBVs and TBVs were about 0.82 and 0.73 for the TRAINING generations when phenotypes or polygenic EBVs were considered as dependent variable, whereas they tend to decrease to 0.66 and 0.54 for the PREDICTION generations. The pre-selection of SNPs using the phenotypes as dependent variable together with a BLUP estimation of marker genotype effects using a variance contribution of each marker equal to ?2a/nsnps resulted in a remarkable accuracy of GEBV estimation (0.77) in the PREDICTION generations. PMID:19278540

  9. Cross-Amplification and Validation of SNPs Conserved over 44 Million Years between Seals and Dogs

    PubMed Central

    Hoffman, Joseph I.; Thorne, Michael A. S.; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  10. The effects of single nucleotide polymorphisms (SNPs) of calpastatin (CAST) gene on meat tenderness of yak.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The association of single nucleotide polymorphisms (SNPs) of calpastatin (CAST) gene with shear force of 2.54 cm steaks from M. longissimus dorsi from Gannan yaks (Bos grunniens, n=181) was studied. Yaks were harvested at 2, 3, and 4 yr of age (n=51, 59, and 71, respectively), and samples of each ya...

  11. Identification of pummelo cultivars by using a panel of 25 selected SNPs and 12 DNA segments.

    PubMed

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  12. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing

    PubMed Central

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  13. Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA Segments

    PubMed Central

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  14. BARCSOYSNP23: A SELECTED PANEL OF SNPS FOR SOYBEAN CULTIVAR IDENTIFICATION

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This report describes a set of 23 informative SNPs (BARCSoySNP23) distributed on 19 of the 20 soybean linkage groups that can be used for soybean cultivar identification. Selection of the set was made based upon the linkage map position of each SNP as well as the information provided by each SNP fo...

  15. Assessing SNPs versus RAPDs for predicting heterogeneity and screening efficiency in wild potato (Solanum)species

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Knowing how genetic diversity is partitioned among and within wild potato species populations is important for efficient sampling for collection, preservation and evaluation. We sought to evaluate the effectiveness of SNPs for assessing germplasm by using the exact set of four model species previous...

  16. Validation of 58 autosomal individual identification SNPs in three Chinese populations

    PubMed Central

    Wei, Yi-Liang; Qin, Cui-Jiao; Liu, Hai-Bo; Jia, Jing; Hu, Lan; Li, Cai-Xia

    2014-01-01

    Aim To genotype and evaluate a panel of single-nucleotide polymorphisms for individual identification (IISNPs) in three Chinese populations: Chinese Han, Uyghur, and Tibetan. Methods Two previously identified panels of IISNPs, 86 unlinked IISNPs and SNPforID 52-plex markers, were pooled and analyzed. Four SNPs were included in both panels. In total, 132 SNPs were typed on Sequenom MassARRAY platform in 330 individuals from Han Chinese, Uyghur, and Tibetan populations. Population genetic indices and forensic parameters were determined for all studied markers. Results No significant deviation from Hardy-Weinberg equilibrium was observed for any of the SNPs in 3 populations. Expected heterozygosity (He) ranged from 0.144 to 0.500 in Han Chinese, from 0.197 to 0.500 in Uyghur, and from 0.018 to 0.500 in Tibetan population. Wright's Fst values ranged from 0.0001 to 0.1613. Pairwise linkage disequilibrium (LD) calculations for all 132 SNPs showed no significant LD across the populations (r2<0.147). A subset of 58 unlinked IISNPs (r2<0.094) with He>0.450 and Fst values from 0.0002 to 0.0536 gave match probabilities of 10?25 and a cumulative probability of exclusion of 0.999992. Conclusion The 58 unlinked IISNPs with high heterozygosity have low allele frequency variation among 3 Chinese populations, which makes them excellent candidates for the development of multiplex assays for individual identification and paternity testing. PMID:24577821

  17. Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNPs that Associate with Disease

    PubMed Central

    Guy, Richard T.; Santago, Peter; Langefeld, Carl D.

    2013-01-01

    Complex genetic disorders are a result of a combination of genetic and non-genetic factors, all potentially interacting. Machine learning methods hold the potential to identify multi-locus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of SNPs of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of ADTrees. The algorithm is order nk2, where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7822 SNPs in 3548 individuals). Our results suggest that BADTrees holds promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease. PMID:22851473

  18. SNPs for parentage testing and traceability in globally diverse breeds of sheep

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA-based parentage determination accelerates genetic improvement by increasing pedigree accuracy. However, the utility of any parentage SNP varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities...

  19. Alteration of Antiviral Signalling by Single Nucleotide Polymorphisms (SNPs) of Mitochondrial Antiviral Signalling Protein (MAVS)

    PubMed Central

    Xing, Fei; Matsumiya, Tomoh; Hayakari, Ryo; Yoshida, Hidemi; Kawaguchi, Shogo; Takahashi, Ippei; Nakaji, Shigeyuki; Imaizumi, Tadaatsu

    2016-01-01

    Genetic variation is associated with diseases. As a type of genetic variation occurring with certain regularity and frequency, the single nucleotide polymorphism (SNP) is attracting more and more attention because of its great value for research and real-life application. Mitochondrial antiviral signalling protein (MAVS) acts as a common adaptor molecule for retinoic acid-inducible gene-I (RIG-I)-like receptors (RLRs), which can recognize foreign RNA, including viral RNA, leading to the induction of type I interferons (IFNs). Therefore, MAVS is thought to be a crucial molecule in antiviral innate immunity. We speculated that genetic variation of MAVS may result in susceptibility to infectious diseases. To assess the risk of viral infection based on MAVS variation, we tested the effects of twelve non-synonymous MAVS coding-region SNPs from the National Center for Biotechnology Information (NCBI) database that result in amino acid substitutions. We found that five of these SNPs exhibited functional alterations. Additionally, four resulted in an inhibitory immune response, and one had the opposite effect. In total, 1,032 human genomic samples obtained from a mass examination were genotyped at these five SNPs. However, no homozygous or heterozygous variation was detected. We hypothesized that these five SNPs are not present in the Japanese population and that such MAVS variations may result in serious immune diseases. PMID:26954674

  20. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing.

    PubMed

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10?ng-0.5?ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R(2)?=?0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  1. Large-scale enrichment and discovery of gene-associated SNPs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...

  2. SNPs at 3'-UTR of the bovine CDIPT gene associated with Qinchuan cattle meat quality traits.

    PubMed

    Fu, C Z; Wang, H; Mei, C G; Wang, J L; Jiang, B J; Ma, X H; Wang, H B; Cheng, G; Zan, L S

    2013-01-01

    The CDIPT is crucial to the fatty acid metabolic pathway, intracellular signal transduction and energy metabolism in eukaryotic cells. We detected three SNPs at 3'-untranslated regions (UTR), named 3'-UTR_108 A > G, 3'-UTR_448 G > A and 3'-UTR_477 C > G, of the CDIPT gene in 618 Qinchuan cattle using PCR-RFLP and DNA sequencing methods. At each of the three SNPs, we found three genotypes named as follows: AA, AB, BB (3'-UTR_108 A > G), CC, CD, DD (3'-UTR_448 G > A) and EE, EF, FF (3'-UTR_477 C > G.). Based on association analysis of these SNPs with ultrasound measurement traits, individuals of genotype BB had a significantly larger loin muscle area than genotype AA. Individuals of genotype CC had significantly thicker back fat than individuals of genotype DD. Individuals of genotype EE also had significantly thicker back fat than did individuals of genotype FF. We conclude that these SNPs of the CDIPT gene could be used as molecular markers for selecting and breeding beef cattle with superior body traits, depending on breeding goals. PMID:23546961

  3. Mining SNPs and Indels in Mung Bean (Vigna radiata) by Ecotilling

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ecotilling is a powerful genetic analysis tool. It can provide rapid identification of naturally occurring Single Nucleotide Polymorphisms (SNPs) and small insertion/deletions (indels) in a pool of accessions for a gene of interest. This technique eliminates the time consuming and expensive proced...

  4. Heritability Estimated Using 50K SNPs Indicates Missing Heritability Problem in Holstein Breeding

    PubMed Central

    Shin, Donghyun; Park, Kyoung-Do; Ka, Sojoeng

    2015-01-01

    Previous studies in Holstein have shown 35% to 51.8% heritability in milk production traits, such as milk yield, fat, and protein, using pedigree data. Other studies in complex human traits could be captured by common single-nucleotide polymorphisms (SNPs), and their genetic variations, attributed to chromosomes, are in proportion to their length. Using genome-wide estimation and partitioning approaches, we analyzed three quantitative Holstein traits relevant to milk production in Korean Holstein data harvested from 462 individuals genotyped for 54,609 SNPs. For all three traits (milk yield, fat, and protein), we estimated a nominally significant (p = 0.1) proportion of variance explained by all SNPs on the Illumina BovineSNP50 Beadchip (h2G). These common SNPs explained approximately most of the narrow-sense heritability. Longer genomic regions tended to provide more phenotypic variation information, with a correlation of 0.46~0.53 between the estimate of variance explained by individual chromosomes and their physical length. These results suggested that polygenicity was ubiquitous for Holstein milk production traits. These results will expand our knowledge on recent animal breeding, such as genomic selection in Holstein. PMID:26865846

  5. Filter apparatus

    SciTech Connect

    Zahedi, K.; Alexander, J. C.; Zieve, P. B.

    1985-03-19

    Electrified filter bed apparatus includes inner and outer cylindrical bed-retaining structures for confining a granular bed therebetween. The inner cylindrical structure may comprise a cage of superposed frusto-conical louvers and the outer structure may comprise a similar cage or a perforated cylindrical, liquid-drainage sheet. A cylindrical bed electrode for electrically charging the bed granules is suspended between the retaining structures. The tubular bed surrounds an internal gas passage from which polluted gas flows through the bed from the inside out. Gas enters the internal passage from above through an ionizer section of the apparatus. The ionizer section may include a disc-type ionizer assembly in an ionizer tube. The tube may form an extension of the inner louver cage. A corona discharge may be formed between the disc and the ionizer tube by providing electric current to the discs, whereby the corona discharge electrically charges particulate material within the gas stream. The discs may carry radially protruding needles defining circumferential corona discharge points. A blowdown system may be provided for cleaning the ionizer discs and the tube wall in the region of the discs. The apparatus may include means for avoiding blowout of bed granules from between the outer louvers, and a system for washing pollutant-coated bed granules.

  6. MiR-SNPs as Markers of Toxicity and Clinical Outcome in Hodgkin Lymphoma Patients

    PubMed Central

    Navarro, Alfons; Muoz, Carmen; Gaya, Anna; Daz-Bey, Marina; Gel, Bernat; Tejero, Rut; Daz, Tania; Martinez, Antonio; Monz, Mariano

    2013-01-01

    Background In recent years, microRNA (miRNA) pathways have emerged as a crucial system for the regulation of tumorogenesis. miR-SNPs are a novel class of single nucleotide polymorphisms that can affect miRNA pathways. Design and Methods We analyzed eight miR-SNPs by allelic discrimination in 141 patients with Hodgkin lymphoma and correlated the results with treatment-related toxicity, response, disease-free survival (DFS) and overall survival (OS). Results The KRT81 (rs3660) GG genotype was associated with an increased risk of neurological toxicity (P?=?0.016), while patients with XPO5 (rs11077) AA or CC genotypes had a higher rate of bleomycin-associated pulmonary toxicity (P?=?0.048). Both miR-SNPs emerged as independent factors in the multivariate analysis. The XPO5 AA and CC genotypes were also associated with a lower response rate (P?=?0.036). XPO5 (P?=?0.039) and TRBP (rs784567) (P?=?0.022) genotypes emerged as prognostic markers for DFS, and XPO5 was also associated with OS (P?=?0.033). In the multivariate analysis, only XPO5 emerged as an independent prognostic factor for DFS (HR: 2.622; 95%CI 1.0396.620; P?=?0.041). Given the influence of XPO5 and TRBP as individual markers, we then investigated the combined effect of these miR-SNPs. Patients with both the XPO5 AA/CC and TRBP TT/TC genotypes had the shortest DFS (P?=?0.008) and OS (P?=?0.008). Conclusion miR-SNPs can add useful prognostic information on treatment-related toxicity and clinical outcome in Hodgkin lymphoma and can be used to identify patients likely to be chemoresistant or to relapse. PMID:23705004

  7. Identification of putative SNPs in progressive retinal atrophy affected Canis lupus familiaris using exome sequencing.

    PubMed

    Reddy, Bhaskar; Kelawala, Divyesh N; Shah, Tejas; Patel, Anand B; Patil, Deepak B; Parikh, Pinesh V; Patel, Namrata; Parmar, Nidhi; Mohapatra, Amit B; Singh, Krishna M; Menon, Ramesh; Pandya, Dipal; Jakhesara, Subhash J; Koringa, Prakash G; Rao, Mandava V; Joshi, Chaitanya G

    2015-12-01

    Progressive retinal atrophy (PRA) is one of the major causes of retinal photoreceptor cell degeneration in canines. The inheritance pattern of PRA is autosomal recessive and genetically heterogeneous. Here, using targeted sequencing technology, we have performed exome sequencing of 10 PRA-affected (Spitz=7, Cocker Spaniel=1, Lhasa Aphso=1 and Spitz-Labrador cross breed=1) and 6 normal (Spitz=5, Cocker Spaniel=1) dogs. The high-throughput sequencing using 454-Roche Titanium sequencer generated about 2.16 Giga bases of raw data. Initially, we have successfully identified 25,619 single nucleotide polymorphisms (SNPs) that passed the stringent SNP calling parameters. Further, we performed association study on the cohort, and the highly significant (0.001) associations were short-listed and investigated in-depth. Out of the 171 significant SNPs, 113 were previously unreported. Interestingly, six among them were non-synonymous coding (NSC) SNPs, which includes CPPED1 A>G (p.M307V), PITRM1 T>G (p.S715A), APP G>A (p.T266M), RNF213 A>G (p.V1482A), C>A (p.V1456L), and SLC46A3 G>A (p.R168Q). On the other hand, 35 out of 113 unreported SNPs were falling in regulatory regions such as 3'-UTR, 5'-UTR, etc. In-depth bioinformatics analysis revealed that majority of NSC SNPs have damaging effect and alter protein stability. This study highlighted the genetic markers associated with PRA, which will help to develop genetic assay-based screening in effective breeding. PMID:26515695

  8. Identification of Type 2 Diabetes-associated combination of SNPs using Support Vector Machine

    PubMed Central

    2010-01-01

    Background Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. Results We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Conclusions Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population. PMID:20416077

  9. PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs

    PubMed Central

    Cheng, Zhongshan; Chu, Hin; Fan, Yanhui; Li, Cun; Song, You-Qiang; Zhou, Jie; Yuen, Kwok-Yung

    2015-01-01

    Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228 K SNP cis-eQTLs and 21 K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets. PMID:26612672

  10. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    PubMed Central

    Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

    2012-01-01

    Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses. PMID:22013517

  11. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island.

    PubMed

    Kumbhare, Shreyas V; Dhotre, Dhiraj P; Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A; Shouche, Yogesh S; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1-40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1-20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25-40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  12. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island

    PubMed Central

    Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A.; Shouche, Yogesh S.; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1–40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1–20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25–40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  13. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions.

    PubMed

    Yates, Christopher M; Sternberg, Michael J E

    2013-11-01

    Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective. PMID:23867278

  14. Towards an integrated approach to study SNPs and expression of candidate genes associated with milk protein biosynthesis.

    PubMed

    Kami?ski, S; Malewski, T; Ahman, A; Wjcik, E; Ru??, A; Ole?ski, K; Jakubczak, A; Sazanov, A A

    2008-04-01

    MilkProtChip is oligonucleotide microarray allowing bovine genotyping based on single nucleotide polymorphisms (SNPs) in genes influencing milk protein biosynthesis. A total of 71 SNPs in 42 genes were selected as associated with milk protein biosynthesis. Genotyping of about 300 animals of Polish Black-and-White cattle showed that SNPs in acyl-CoA: 1,2-diacylglycerol O-transferase (DGAT1), lactoferrin (LTF), casein kappa (CSN3) and growth hormone receptor (GHR) genes were associated with several milk performance traits. Analysis of correlations between SNPs and milk production traits showed that SNPs in single genes rarely affect the investigated traits. Only 4 of 42 investigated single SNPs had impact on milk production traits while 22 combinations of paired SNPs in these genes had impact. Positive effect SNP combinations in two genes can be a result of additive effect on these SNPs on the same traits or effect of genes interaction. The MilkBovExp chip representing 90 genes encoding transcription factors expressed in the bovine mammary gland and/or involved in mammary gland signaling pathways was designed for further investigation of impact of gene expression and/or its encoded products on milk traits performance. PMID:18666558

  15. Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations.

    PubMed

    Pryce, J E; Johnston, J; Hayes, B J; Sahana, G; Weigel, K A; McParland, S; Spurlock, D; Krattenmacher, N; Spelman, R J; Wall, E; Calus, M P L

    2014-03-01

    Combining data from research herds may be advantageous, especially for difficult or expensive-to-measure traits (such as dry matter intake). Cows in research herds are often genotyped using low-density single nucleotide polymorphism (SNP) panels. However, the precision of quantitative trait loci detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from Australasia (Australia and New Zealand), and 3 from North America (Canada and the United States). Heifers from the Australian and New Zealand research herds were already genotyped at high density (approximately 700,000 SNP). The remaining genotypes were imputed from around 50,000 SNP to 700,000 using 2 reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were quantified. The European genotypes (n=4,097) were imputed as 1 data set, using a reference population of 3,150 that included genotypes from 835 Australian and 1,053 New Zealand females, with the remainder being males. Imputation was undertaken using population-wide linkage disequilibrium with no family information exploited. The UK animals were also included in the North American data set (n=1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele frequencies of the 2 imputed data sets was high (>0.98) and even stronger (>0.99) for the UK animals that were part of each imputation data set. For the UK genotypes, 2.2% were imputed differently in the 2 high-density reference data sets used. Only 0.025% of these were homozygous switches. The number of discordant SNP was lower for animals that had sires that were genotyped. Discordant imputed SNP genotypes were most common when a large difference existed in allele frequency between the 2 imputed genotype data sets. For SNP that had ? 20% discordant genotypes, the difference between imputed data sets of allele frequencies of the UK (imputed) genotypes was 0.07, whereas the difference in allele frequencies of the (reference) high-density genotypes was 0.30. In fact, regions existed across the genome where the frequency of discordant SNP was higher. For example, on chromosome 10 (centered on 520,948 bp), 52 SNP (out of a total of 103 SNP) had ? 20% discordant SNP. Four hundred and eight SNP had more than 20% discordant genotypes and were removed from the final set of imputed genotypes. We concluded that both discordance of imputed SNP genotypes and differences in allele frequencies, after imputation using different reference data sets, may be used to identify and remove poorly imputed SNP. PMID:24472132

  16. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean

  17. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...

  18. Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...

  19. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  20. Transcriptome-facilitated development of SNPs for the Sonoran Desert rock fig, Ficus petiolaris (Moraceae)1

    PubMed Central

    Davis, Nicholas G.; Houston, Derek D.; Nason, John D.

    2015-01-01

    Premise of the study: Single-nucleotide polymorphism (SNP) primers were developed for a native North American desert fig, Ficus petiolaris (Moraceae), to provide markers for population genetic studies designed to quantify patterns of gene flow across a complex landscape. Methods and Results: Transcriptome sequencing and bioinformatic protocols were implemented to discover SNPs in single-copy protein-coding genes. Multiplexes of 30 nuclear and 24 organellar (chloroplast and mitochondrial) SNPs were selected for primer development and genotyping on the Sequenom MASSArray System. Of these 54 loci, 49 reliably amplified across a panel of 96 F. petiolaris individuals. Conclusions: This study has provided SNP primers that can be applied in future studies investigating population genetics of F. petiolaris and its coevolution with associated pollinating and nonpollinating fig wasps. PMID:26191464

  1. Coding SNPs as intrinsic markers for sample tracking in large-scale transcriptome studies

    PubMed Central

    Xu, Weihong; Gao, Hong; Seok, Junhee; Wilhelmy, Julie; Mindrinos, Michael N.; Davis, Ronald W.; Xiao, Wenzhong

    2014-01-01

    Large-scale transcriptome profiling in clinical studies often involves assaying multiple samples of a patient to monitor disease progression, treatment effect, and host response in multiple tissues. Such profiling is prone to human error, which often results in mislabeled samples. Here, we present a method to detect mislabeled sample outliers using coding single nucleotide polymorphisms (cSNPs) specifically designed on the microarray and demonstrate that the mislabeled samples can be efficiently identified by either simple clustering of allele-specific expression scores or Mahalanobis distance-based outlier detection method. Based on our results, we recommend the incorporation of cSNPs into future transcriptome array designs as intrinsic markers for sample tracking. PMID:22668418

  2. Collective effects of SNPs on transgenerational inheritance in Caenorhabditis elegans and budding yeast.

    PubMed

    Zhu, Zuobin; Man, Xian; Xia, Mengying; Huang, Yimin; Yuan, Dejian; Huang, Shi

    2015-07-01

    We studied the collective effects of single nucleotide polymorphisms (SNPs) on transgenerational inheritance in Caenorhabditis elegans recombinant inbred advanced intercross lines (RIAILs) and yeast segregants. We divided the RIAILs and segregants into two groups of high and low minor allele content (MAC). RIAILs with higher MAC needed less generations of benzaldehyde training to gain a stable olfactory imprint and showed a greater change from normal after benzaldehyde training. Yeast segregants with higher MAC showed a more dramatic shortening of the lag phase length after ethanol exposure. The short lag phase as acquired by ethanol training was more dramatically lost after recovery in ethanol free medium for the high MAC group. We also found a preferential association between MAC and traits linked with higher number of additive QTLs. These results suggest a role for the collective effects of SNPs in transgenerational inheritance, and may help explain human variations in disease susceptibility. PMID:25882787

  3. Recirculating electric air filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric air filter cartridge has a cylindrical inner high voltage electrode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  4. HEPA filter dissolution process

    DOEpatents

    Brewer, K.N.; Murphy, J.A.

    1994-02-22

    A process is described for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal. 4 figures.

  5. Hepa filter dissolution process

    DOEpatents

    Brewer, Ken N. (Arco, ID); Murphy, James A. (Idaho Falls, ID)

    1994-01-01

    A process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  6. HEPA filter dissolution process

    SciTech Connect

    Brewer, K.N.; Murphy, J.A.

    1992-12-31

    This invention is comprised of a process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  7. Recirculating electric air filter

    DOEpatents

    Bergman, Werner (Pleasanton, CA)

    1986-01-01

    An electric air filter cartridge has a cylindrical inner high voltage eleode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  8. Impulsiveness mediates the association between GABRA2 SNPs and lifetime alcohol problems

    PubMed Central

    Villafuerte, Sandra; Strumba, Viktorya; Stoltenberg, Scott F.; Zucker, Robert A.; Burmeister, Margit

    2013-01-01

    Genetic variants in GABRA2 have previously been shown to be associated with alcohol measures, EEG β waves, and impulsiveness-related traits. Impulsiveness is a behavioral risk factor for alcohol and other substance abuse. Here, we tested association between 11 variants in GABRA2 with NEO- impulsiveness and problem drinking. Our sample of 295 unrelated adult subjects was from a community of families with at least one male with DSM-IV Alcohol use diagnosis, and from a socioeconomically comparable control group. Ten GABRA2 SNPs were associated with the NEO-impulsiveness (p < 0.03). The alleles associated with higher impulsiveness correspond to the minor alleles identified in previous alcohol dependence studies. All ten SNPs are in LD with each other and represent one effect on impulsiveness. Four SNPs and the corresponding haplotype from intron 3 to intron 4 were also associated with Lifetime Alcohol Problems Score (LAPS, p < 0.03) (not corrected for multiple testing). Impulsiveness partially mediates (22.6% average) this relation between GABRA2 and LAPS. Our results suggest that GABRA2 variation in the region between introns 3 and 4 is associated with impulsiveness and this effect partially influences the development of alcohol problems, but a direct effect of GABRA2 on problem drinking remains. A potential functional SNP rs279827, located next to a splice site, is located in the most significant region for both impulsiveness and LAPS. The high degree of LD among nine of these SNPs and the conditional analyses we have performed suggest that all variants represent one signal. PMID:23566244

  9. Prediction of CYP3A4 enzyme activity using haplotype tag SNPs in African Americans

    PubMed Central

    Perera, MA; Thirumaran, RK; Cox, NJ; Hanauer, S; Das, S; Brimer-Cline, C; Lamba, V; Schuetz, EG; Ratain, MJ; Di Rienzo, A

    2009-01-01

    The CYP3A locus encodes hepatic enzymes that metabolize many clinically used drugs. However, there is marked interindividual variability in enzyme expression and clearance of drugs metabolized by these enzymes. We utilized comparative genomics and computational prediction of transcriptional factor binding sites to evaluate regions within CYP3A that were most likely to contribute to this variation. We then used a haplotype tagging single-nucleotide polymorphisms (htSNPs) approach to evaluate the entire locus with the fewest number of maximally informative SNPs. We investigated the association between these htSNPs and in vivo CYP3A enzyme activity using a single-point IV midazolam clearance assay. We found associations between the midazolam phenotype and age, diagnosis of hypertension and one htSNP (141689) located upstream of CYP3A4. 141689 lies near the xenobiotic responsive enhancer module (XREM) regulatory region of CYP3A4. Cell-based studies show increased transcriptional activation with the minor allele at 141689, in agreement with the in vivo association study findings. This study marks the first systematic evaluation of coding and noncoding variation that may contribute to CYP3A phenotypic variability. PMID:18825162

  10. Enrichment of Minor Alleles of Common SNPs and Improved Risk Prediction for Parkinson's Disease

    PubMed Central

    Zhu, Zuobin; Yuan, Dejian; Luo, Denghui; Lu, Xitong; Huang, Shi

    2015-01-01

    Parkinson disease (PD) is the second most common neurodegenerative disorder in the aged population and thought to involve many genetic loci. While a number of individual single nucleotide polymorphisms (SNPs) have been linked with PD, many remain to be found and no known markers or combinations of them have a useful predictive value for sporadic PD cases. The collective effects of genome wide minor alleles of common SNPs, or the minor allele content (MAC) in an individual, have recently been shown to be linked with quantitative variations of numerous complex traits in model organisms with higher MAC more likely linked with lower fitness. Here we found that PD cases had higher MAC than matched controls. A set of 37564 SNPs with MA (MAF < 0.4) more common in cases (P < 0.05) was found to have the best predictive accuracy. A weighted risk score calculated by using this set can predict 2% of PD cases (100% specificity), which is comparable to using familial PD genes to identify familial PD cases. These results suggest a novel genetic component in PD and provide a useful genetic method to identify a small fraction of PD cases. PMID:26207627

  11. Genetic association of SNPs in the FTO gene and predisposition to obesity in Malaysian Malays

    PubMed Central

    Apalasamy, Y.D.; Ming, M.F.; Rampal, S.; Bulgiba, A.; Mohamed, Z.

    2012-01-01

    The common variants in the fat mass- and obesity-associated (FTO) gene have been previously found to be associated with obesity in various adult populations. The objective of the present study was to investigate whether the single nucleotide polymorphisms (SNPs) and linkage disequilibrium (LD) blocks in various regions of the FTO gene are associated with predisposition to obesity in Malaysian Malays. Thirty-one FTO SNPs were genotyped in 587 (158 obese and 429 non-obese) Malaysian Malay subjects. Obesity traits and lipid profiles were measured and single-marker association testing, LD testing, and haplotype association analysis were performed. LD analysis of the FTO SNPs revealed the presence of 57 regions with complete LD (D' = 1.0). In addition, we detected the association of rs17817288 with low-density lipoprotein cholesterol. The FTO gene may therefore be involved in lipid metabolism in Malaysian Malays. Two haplotype blocks were present in this region of the FTO gene, but no particular haplotype was found to be significantly associated with an increased risk of obesity in Malaysian Malays. PMID:22911346

  12. Identification of Deleterious SNPs and Their Effects on Structural Level in CHRNA3 Gene.

    PubMed

    Chandramohan, Vivek; Nagaraju, Navya; Rathod, Shrikant; Kaphle, Anubhav; Muddapur, Uday

    2015-08-01

    The aim of our study is to identify probable deleterious genetic variations that can alter the expression and the function of the CHRNA3 gene using in silico methods. Of the 2305 SNPs identified in the CHRNA3 gene, 115 were found to be non-synonymous and 12 and 15 nsSNPs were found to be in the 5' and 3' UTRs, respectively. Further, out of the 115 nsSNPs investigated, eight were predicted to be deleterious by both SIFT and PredictSNP servers. The major mutations predicted to affect the structure of the protein are phenylalanine to valine (Y43V) and lysine to asparagine (K216N) as shown by the trajectory run in molecular dynamics studies. The random transition of the protein structures over the simulation period caused by these mutations hints at how the native state is distorted which could lead to the loss of structural stability and functionality of the nicotinic acetylcholine receptors subunit ?-3 protein. Based on this work, we propose that the nsSNP with SNP id of rs75495285 and rs76821682 will have comparatively more deleterious effects than the other predicted mutations in destabilizing the protein structure. PMID:26002565

  13. Y-chromosomal SNPs in Finno-Ugric-speaking populations analyzed by minisequencing on microarrays.

    PubMed

    Raitio, M; Lindroos, K; Laukkanen, M; Pastinen, T; Sistonen, P; Sajantila, A; Syvnen, A C

    2001-03-01

    An increasing number of single nucleotide polymorphisms (SNPs) on the Y chromosome are being identified. To utilize the full potential of the SNP markers in population genetic studies, new genotyping methods with high throughput are required. We describe a microarray system based on the minisequencing single nucleotide primer extension principle for multiplex genotyping of Y-chromosomal SNP markers. The system was applied for screening a panel of 25 Y-chromosomal SNPs in a unique collection of samples representing five Finno--Ugric populations. The specific minisequencing reaction provides 5-fold to infinite discrimination between the Y-chromosomal genotypes, and the microarray format of the system allows parallel and simultaneous analysis of large numbers of SNPs and samples. In addition to the SNP markers, five Y-chromosomal microsatellite loci were typed. Altogether 10,000 genotypes were generated to assess the genetic diversity in these population samples. Six of the 25 SNP markers (M9, Tat, SRY10831, M17, M12, 92R7) were polymorphic in the analyzed populations, yielding six distinct SNP haplotypes. The microsatellite data were used to study the genetic structure of two major SNP haplotypes in the Finns and the Saami in more detail. We found that the most common haplotypes are shared between the Finns and the Saami, and that the SNP haplotypes show regional differences within the Finns and the Saami, which supports the hypothesis of two separate settlement waves to Finland. PMID:11230171

  14. Y-Chromosomal SNPs in FinnoUgric-Speaking Populations Analyzed by Minisequencing on Microarrays

    PubMed Central

    Raitio, Mirja; Lindroos, Katarina; Laukkanen, Minna; Pastinen, Tomi; Sistonen, Pertti; Sajantila, Antti; Syvnen, Ann-Christine

    2001-01-01

    An increasing number of single nucleotide polymorphisms (SNPs) on the Y chromosome are being identified. To utilize the full potential of the SNP markers in population genetic studies, new genotyping methods with high throughput are required. We describe a microarray system based on the minisequencing single nucleotide primer extension principle for multiplex genotyping of Y-chromosomal SNP markers. The system was applied for screening a panel of 25 Y-chromosomal SNPs in a unique collection of samples representing five FinnoUgric populations. The specific minisequencing reaction provides 5-fold to infinite discrimination between the Y-chromosomal genotypes, and the microarray format of the system allows parallel and simultaneous analysis of large numbers of SNPs and samples. In addition to the SNP markers, five Y-chromosomal microsatellite loci were typed. Altogether 10,000 genotypes were generated to assess the genetic diversity in these population samples. Six of the 25 SNP markers (M9, Tat, SRY10831, M17, M12, 92R7) were polymorphic in the analyzed populations, yielding six distinct SNP haplotypes. The microsatellite data were used to study the genetic structure of two major SNP haplotypes in the Finns and the Saami in more detail. We found that the most common haplotypes are shared between the Finns and the Saami, and that the SNP haplotypes show regional differences within the Finns and the Saami, which supports the hypothesis of two separate settlement waves to Finland. PMID:11230171

  15. Identification of Sex-Linked SNPs and Sex-Determining Regions in the Yellowtail Genome.

    PubMed

    Koyama, Takashi; Ozaki, Akiyuki; Yoshida, Kazunori; Suzuki, Junpei; Fuji, Kanako; Aoki, Jun-ya; Kai, Wataru; Kawabata, Yumi; Tsuzaki, Tatsuo; Araki, Kazuo; Sakamoto, Takashi

    2015-08-01

    Unlike the conservation of sex-determining (SD) modes seen in most mammals and birds, teleost fishes exhibit a wide variety of SD systems and genes. Hence, the study of SD genes and sex chromosome turnover in fish is one of the most interesting topics in evolutionary biology. To increase resolution of the SD gene evolutionary trajectory in fish, identification of the SD gene in more fish species is necessary. In this study, we focused on the yellowtail, a species widely cultivated in Japan. It is a member of family Carangidae in which no heteromorphic sex chromosome has been observed, and no SD gene has been identified to date. By performing linkage analysis and BAC walking, we identified a genomic region and SNPs with complete linkage to yellowtail sex. Comparative genome analysis revealed the yellowtail SD region ancestral chromosome structure as medaka-fugu. Two inversions occurred in the yellowtail linage after it diverged from the yellowtail-medaka ancestor. An association study using wild yellowtails and the SNPs developed from BAC ends identified two SNPs that can reasonably distinguish the sexes. Therefore, these will be useful genetic markers for yellowtail breeding. Based on a comparative study, it was suggested that a PDZ domain containing the GIPC protein might be involved in yellowtail sex determination. The homomorphic sex chromosomes widely observed in the Carangidae suggest that this family could be a suitable marine fish model to investigate the early stages of sex chromosome evolution, for which our results provide a good starting point. PMID:25975833

  16. Impact of Single Nucleotide Polymorphisms (SNPs) on Immunosuppressive Therapy in Lung Transplantation

    PubMed Central

    Ruiz, Jesus; Herrero, María José; Bosó, Virginia; Megías, Juan Eduardo; Hervás, David; Poveda, Jose Luis; Escrivá, Juan; Pastor, Amparo; Solé, Amparo; Aliño, Salvador Francisco

    2015-01-01

    Lung transplant patients present important variability in immunosuppressant blood concentrations during the first months after transplantation. Pharmacogenetics could explain part of this interindividual variability. We evaluated SNPs in genes that have previously shown correlations in other kinds of solid organ transplantation, namely ABCB1 and CYP3A5 genes with tacrolimus (Tac) and ABCC2, UGT1A9 and SLCO1B1 genes with mycophenolic acid (MPA), during the first six months after lung transplantation (51 patients). The genotype was correlated to the trough blood drug concentrations corrected for dose and body weight (C0/Dc). The ABCB1 variant in rs1045642 was associated with significantly higher Tac concentration, at six months post-transplantation (CT vs. CC). In the MPA analysis, CT patients in ABCC2 rs3740066 presented significantly lower blood concentrations than CC or TT, three months after transplantation. Other tendencies, confirming previously expected results, were found associated with the rest of studied SNPs. An interesting trend was recorded for the incidence of acute rejection according to NOD2/CARD15 rs2066844 (CT: 27.9%; CC: 12.5%). Relevant SNPs related to Tac and MPA in other solid organ transplants also seem to be related to the efficacy and safety of treatment in the complex setting of lung transplantation. PMID:26307985

  17. [Association analysis between SNPs of the growth hormone receptor gene and growth traits in arctic fox].

    PubMed

    DU, Zhi-Heng; Liu, Zong-Yue; Bai, Xiu-Juan

    2010-06-01

    Using single-strand conformation polymorphism (PCR-SSCP) and DNA sequencing, single nucleotide polymorphisms (SNPs) of growth hormone receptor (GHR) gene were detected in an arctic fox population. Correlation analysis between GHR polymorphisms and growth traits were carried out using the appropriate model. Four SNPs, G3A in the 5'UTR, C99T in the first exon, T59C and G65A in the fifth exon were identified on the arctic fox GHR gene. The G3A and C99T polymorphisms of GHR were associated with female fox body weight (Pamp;0.05) and the T59C and G65A polymorphisms of GHR were associated with male fox body weight (Pamp;0.05) and the skin length of the female fox (Pamp;0.01). Therefore, marker assistant selection on body weight and skin length of arctic foxes using these SNPs can be applied to get big and high quality arctic foxes. PMID:20566464

  18. Impact of Single Nucleotide Polymorphisms (SNPs) on Immunosuppressive Therapy in Lung Transplantation.

    PubMed

    Ruiz, Jesus; Herrero, Mara Jos; Bos, Virginia; Megas, Juan Eduardo; Hervs, David; Poveda, Jose Luis; Escriv, Juan; Pastor, Amparo; Sol, Amparo; Alio, Salvador Francisco

    2015-01-01

    Lung transplant patients present important variability in immunosuppressant blood concentrations during the first months after transplantation. Pharmacogenetics could explain part of this interindividual variability. We evaluated SNPs in genes that have previously shown correlations in other kinds of solid organ transplantation, namely ABCB1 and CYP3A5 genes with tacrolimus (Tac) and ABCC2, UGT1A9 and SLCO1B1 genes with mycophenolic acid (MPA), during the first six months after lung transplantation (51 patients). The genotype was correlated to the trough blood drug concentrations corrected for dose and body weight (C0/Dc). The ABCB1 variant in rs1045642 was associated with significantly higher Tac concentration, at six months post-transplantation (CT vs. CC). In the MPA analysis, CT patients in ABCC2 rs3740066 presented significantly lower blood concentrations than CC or TT, three months after transplantation. Other tendencies, confirming previously expected results, were found associated with the rest of studied SNPs. An interesting trend was recorded for the incidence of acute rejection according to NOD2/CARD15 rs2066844 (CT: 27.9%; CC: 12.5%). Relevant SNPs related to Tac and MPA in other solid organ transplants also seem to be related to the efficacy and safety of treatment in the complex setting of lung transplantation. PMID:26307985

  19. Functional classification of 15 million SNPs detected from diverse chicken populations

    PubMed Central

    Gheyas, Almas A.; Boschiero, Clarissa; Eory, Lel; Ralph, Hannah; Kuo, Richard; Woolliams, John A.; Burt, David W.

    2015-01-01

    Next-generation sequencing has prompted a surge of discovery of millions of genetic variants from vertebrate genomes. Besides applications in genetic association and linkage studies, a fraction of these variants will have functional consequences. This study describes detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions. The study reports: 183K amino acid-altering SNPs of which 48% predicted as evolutionary intolerant, 13K splicing variants, 51K likely to alter RNA secondary structures, 500K within most conserved elements and 3K from non-coding RNAs. Regions of local fixation within commercial broiler and layer lines were investigated as potential selective sweeps using genome-wide SNP data. Relationships with phenotypes, if any, of the pfVars were explored by overlaying the sweep regions with known QTLs. Based on this, the candidate genes and/or causal mutations for a number of important traits are discussed. Although the fixed variants within sweep regions were enriched with non-coding SNPs, some non-synonymous-intolerant mutations reached fixation, suggesting their possible adaptive advantage. The results presented in this study are expected to have important implications for future genomic research to identify candidate causal mutations and in poultry breeding. PMID:25926514

  20. Rank and Order: Evaluating the Performance of SNPs for Individual Assignment in a Non-Model Organism

    PubMed Central

    Storer, Caroline G.; Pascal, Carita E.; Roberts, Steven B.; Templin, William D.; Seeb, Lisa W.; Seeb, James E.

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: FST, informativeness (In), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from FST, In, and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  1. Rank and order: evaluating the performance of SNPs for individual assignment in a non-model organism.

    PubMed

    Storer, Caroline G; Pascal, Carita E; Roberts, Steven B; Templin, William D; Seeb, Lisa W; Seeb, James E

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: F(ST), informativeness (I(n)), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from F(ST), I(n), and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  2. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references

    PubMed Central

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-01-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case–control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5–98.8% obtained with the original HIBAG references to 95.2–99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets. PMID:25707395

  3. Co-regulated Transcripts Associated to Cooperating eSNPs Define Bi-fan Motifs in Human Gene Networks

    PubMed Central

    Kreimer, Anat; Pe'er, Itsik

    2014-01-01

    Associations between the level of single transcripts and single corresponding genetic variants, expression single nucleotide polymorphisms (eSNPs), have been extensively studied and reported. However, most expression traits are complex, involving the cooperative action of multiple SNPs at different loci affecting multiple genes. Finding these cooperating eSNPs by exhaustive search has proven to be statistically challenging. In this paper we utilized availability of sequencing data with transcriptional profiles in the same cohorts to identify two kinds of usual suspects: eSNPs that alter coding sequences or eSNPs within the span of transcription factors (TFs). We utilize a computational framework for considering triplets, each comprised of a SNP and two associated genes. We examine pairs of triplets with such cooperating source eSNPs that are both associated with the same pair of target genes. We characterize such quartets through their genomic, topological and functional properties. We establish that this regulatory structure of cooperating quartets is frequent in real data, but is rarely observed in permutations. eSNP sources are mostly located on different chromosomes and away from their targets. In the majority of quartets, SNPs affect the expression of the two gene targets independently of one another, suggesting a mutually independent rather than a directionally dependent effect. Furthermore, the directions in which the minor allele count of the SNP affects gene expression within quartets are consistent, so that the two source eSNPs either both have the same effect on the target genes or both affect one gene in the opposite direction to the other. Same-effect eSNPs are observed more often than expected by chance. Cooperating quartets reported here in a human system might correspond to bi-fans, a known network motif of four nodes previously described in model organisms. Overall, our analysis offers insights regarding the fine motif structure of human regulatory networks. PMID:25210734

  4. Backward multiple imputation estimation of the conditional lifetime expectancy function with application to censored human longevity data

    PubMed Central

    Kong, Jing; Klein, Barbara E. K.; Klein, Ronald; Wahba, Grace

    2015-01-01

    The conditional lifetime expectancy function (LEF) is the expected lifetime of a subject given survival past a certain time point and the values of a set of explanatory variables. This function is attractive to researchers because it summarizes the entire residual life distribution and has an easy interpretation compared with the popularly used hazard function. In this paper, we propose a general framework of backward multiple imputation for estimating the conditional LEF and the variance of the estimator in the right-censoring setting. Simulation studies are conducted to investigate the empirical properties of the proposed estimator and the corresponding variance estimator. We demonstrate the method on the Beaver Dam Eye Study data, where the expected human lifetime is modeled with smoothing-spline ANOVA given the covariates information including sex, lifestyle factors, and disease variables. PMID:26371300

  5. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    PubMed Central

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  6. A set of EST-SNPs for map saturation and cultivar identification in melon

    PubMed Central

    Deleu, Wim; Esteras, Cristina; Roig, Cristina; González-To, Mireia; Fernández-Silva, Iria; Gonzalez-Ibeas, Daniel; Blanca, José; Aranda, Miguel A; Arús, Pere; Nuez, Fernando; Monforte, Antonio J; Picó, Maria Belén; Garcia-Mas, Jordi

    2009-01-01

    Background There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs. Results EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) × 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars. Conclusion This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 × 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon accessions as efficiently as with SSR markers, and these markers may also be useful for cultivar identification in Occidental melon varieties. PMID:19604363

  7. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

    PubMed Central

    van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

    2015-01-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  8. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer

    PubMed Central

    Al-Tassan, Nada A.; Whiffin, Nicola; Hosking, Fay J.; Palles, Claire; Farrington, Susan M.; Dobbins, Sara E.; Harris, Rebecca; Gorman, Maggie; Tenesa, Albert; Meyer, Brian F.; Wakil, Salma M.; Kinnersley, Ben; Campbell, Harry; Martin, Lynn; Smith, Christopher G.; Idziaszczyk, Shelley; Barclay, Ella; Maughan, Timothy S.; Kaplan, Richard; Kerr, Rachel; Kerr, David; Buchannan, Daniel D.; Ko Win, Aung; Hopper, John; Jenkins, Mark; Lindor, Noralane M.; Newcomb, Polly A.; Gallinger, Steve; Conti, David; Schumacher, Fred; Casey, Graham; Dunlop, Malcolm G.; Tomlinson, Ian P.; Cheadle, Jeremy P.; Houlston, Richard S.

    2015-01-01

    Genome-wide association studies (GWAS) of colorectal cancer (CRC) have identified 23 susceptibility loci thus far. Analyses of previously conducted GWAS indicate additional risk loci are yet to be discovered. To identify novel CRC susceptibility loci, we conducted a new GWAS and performed a meta-analysis with five published GWAS (totalling 7,577 cases and 9,979 controls of European ancestry), imputing genotypes utilising the 1000 Genomes Project. The combined analysis identified new, significant associations with CRC at 1p36.2 marked by rs72647484 (minor allele frequency [MAF] = 0.09) near CDC42 and WNT4 (P = 1.21 × 10−8, odds ratio [OR] = 1.21 ) and at 16q24.1 marked by rs16941835 (MAF = 0.21, P = 5.06 × 10−8; OR = 1.15) within the long non-coding RNA (lncRNA) RP11-58A18.1 and ~500 kb from the nearest coding gene FOXL1. Additionally we identified a promising association at 10p13 with rs10904849 intronic to CUBN (MAF = 0.32, P = 7.01 × 10-8; OR = 1.14). These findings provide further insights into the genetic and biological basis of inherited genetic susceptibility to CRC. Additionally, our analysis further demonstrates that imputation can be used to exploit GWAS data to identify novel disease-causing variants. PMID:25990418

  9. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    PubMed

    Horikoshi, Momoko; M?gi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; H?gg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtim?ki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Mller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ?0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  10. ARRANGEMENT FOR REPLACING FILTERS

    DOEpatents

    Blomgren, R.A.; Bohlin, N.J.C.

    1957-08-27

    An improved filtered air exhaust system which may be continually operated during the replacement of the filters without the escape of unfiltered air is described. This is accomplished by hermetically sealing the box like filter containers in a rectangular tunnel with neoprene covered sponge rubber sealing rings coated with a silicone impregnated pneumatic grease. The tunnel through which the filters are pushed is normal to the exhaust air duct. A number of unused filters are in line behind the filters in use, and are moved by a hydraulic ram so that a fresh filter is positioned in the air duct. The used filter is pushed into a waiting receptacle and is suitably disposed. This device permits a rapid and safe replacement of a radiation contaminated filter without interruption to the normal flow of exhaust air.

  11. Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The dissection of complex traits of economic importance for the pig industry requires the availability of a significant number of genetic markers, such as SNPs. This study was conducted in order to discover thousands of porcine SNPs using next generation sequencing technologies and use those SNPs, a...

  12. Rigid porous filter

    DOEpatents

    Chiang, Ta-Kuan (Morgantown, WV); Straub, Douglas L. (Morgantown, WV); Dennis, Richard A. (Morgantown, WV)

    2000-01-01

    The present invention involves a porous rigid filter including a plurality of concentric filtration elements having internal flow passages and forming external flow passages there between. The present invention also involves a pressure vessel containing the filter for the removal of particulates from high pressure particulate containing gases, and further involves a method for using the filter to remove such particulates. The present filter has the advantage of requiring fewer filter elements due to the high surface area-to-volume ratio provided by the filter, requires a reduced pressure vessel size, and exhibits enhanced mechanical design properties, improved cleaning properties, configuration options, modularity and ease of fabrication.

  13. Rethinking Stability of Silver Sulfide Nanoparticles (Ag2S-NPs) in the Aquatic Environment: Photoinduced Transformation of Ag2S-NPs in the Presence of Fe(III).

    PubMed

    Li, Lingxiangyu; Wang, Yawei; Liu, Qian; Jiang, Guibin

    2016-01-01

    The stability of engineered nanomaterials in a natural aquatic environment has drawn much attention over the past few years. Silver sulfide nanoparticles (Ag2S-NPs) are generally assumed to be stable in a natural environment as a result of their physicochemical property; however, it may vary depending upon environmental conditions. Here, we investigated whether and how the environmentally relevant factors including light irradiation, solution pH, inorganic salts, dissolved organic matter (DOM), and dissolved oxygen (DO) individually and in combination influenced the stability of Ag2S-NPs in an aquatic environment. We presented for the first time that transformation of Ag2S-NPs can indeed occur in the aqueous system with an environmentally relevant concentration of Fe(3+) under simulated solar irradiation and natural sunlight within a short time (96 h), along with significant changes in morphology and dissolution. The photoinduced transformation of Ag2S-NPs in the presence of Fe(3+) can be dramatically influenced by solution pH, Ca(2+)/Na(+), Cl(-)/SO4(2-), DOM, and DO. Moreover, Ag2S-NP dissolution increased within 28 h, followed rapid decline in the next 68 h, which may be a result of the reconstitution of small Ag2S-NPs. Taken together, this work is of importance to comprehensively evaluate the stability of Ag2S-NPs in an aquatic environment, improving our understanding of their potential risks to human and environmental health. PMID:26606372

  14. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, Harry S.; Thompson, Robert C.; Hubbard, Charles W.; Perkins, Richard W.

    1997-01-01

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, whereafter the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant.

  15. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, H.S.; Thompson, R.C.; Hubbard, C.W.; Perkins, R.W.

    1997-03-25

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, where after the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant. 5 figs.

  16. Extended active optical lattice filters: filter synthesis.

    PubMed

    Dabkowski, Mieczyslaw; El Nagdi, Amr; Hunt, Louis R; Liu, Ke; Macfarlane, Duncan L; Ramakrishna, Viswanath

    2010-04-01

    In this paper, we study the synthesis of asymptotically stable filters from a unit cell of a two-dimensional tunable lattice filter architecture consisting of four four-port couplers and four waveguides containing semiconductor optical amplifiers. Upper bounds on the number of gains that will produce a filter with a priori prescribed poles, for a specific system, are obtained. We also provide sufficient conditions on the reflection-type coefficients, characterizing each four-port coupler, which ensure that real-valued gains, taking values in [0,1], exist so that the filter is asymptotically stable. Finally, we motivate the notion of a transmission zero of a filter and discuss the possibility of simultaneously placing both poles and transmission zeros for the unit cell. PMID:20360832

  17. Cordierite silicon nitride filters

    SciTech Connect

    Sawyer, J.; Buchan, B. ); Duiven, R.; Berger, M. ); Cleveland, J.; Ferri, J. )

    1992-02-01

    The objective of this project was to develop a silicon nitride based crossflow filter. This report summarizes the findings and results of the project. The project was phased with Phase I consisting of filter material development and crossflow filter design. Phase II involved filter manufacturing, filter testing under simulated conditions and reporting the results. In Phase I, Cordierite Silicon Nitride (CSN) was developed and tested for permeability and strength. Target values for each of these parameters were established early in the program. The values were met by the material development effort in Phase I. The crossflow filter design effort proceeded by developing a macroscopic design based on required surface area and estimated stresses. Then the thermal and pressure stresses were estimated using finite element analysis. In Phase II of this program, the filter manufacturing technique was developed, and the manufactured filters were tested. The technique developed involved press-bonding extruded tiles to form a filter, producing a monolithic filter after sintering. Filters manufactured using this technique were tested at Acurex and at the Westinghouse Science and Technology Center. The filters did not delaminate during testing and operated and high collection efficiency and good cleanability. Further development in areas of sintering and filter design is recommended.

  18. Predicting functional regulatory SNPs in the human antimicrobial peptide genes DEFB1 and CAMP in tuberculosis and HIV/AIDS.

    PubMed

    Flores Saiffe Faras, Adolfo; Jaime Herrera Lpez, Enrique; Moreno Vzquez, Cristopher Jorge; Li, Wentian; Prado Montes de Oca, Ernesto

    2015-12-01

    Single nucleotide polymorphisms (SNPs) in transcription factor binding sites (TFBSs) within gene promoter region or enhancers can modify the transcription rate of genes related to complex diseases. These SNPs can be called regulatory SNPs (rSNPs). Data compiled from recent projects, such as the 1000 Genomes Project and ENCODE, has revealed essential information used to perform in silico prediction of the molecular and biological repercussions of SNPs within TFBS. However, most of these studies are very limited, as they only analyze SNPs in coding regions or when applied to promoters, and do not integrate essential biological data like TFBSs, expression profiles, pathway analysis, homotypic redundancy (number of TFBSs for the same TF in a region), chromatin accessibility and others, which could lead to a more accurate prediction. Our aim was to integrate different data in a biologically coherent method to analyze the proximal promoter regions of two antimicrobial peptide genes, DEFB1 and CAMP, that are associated with tuberculosis (TB) and HIV/AIDS. We predicted SNPs within the promoter regions that are more likely to interact with transcription factors (TFs). We also assessed the impact of homotypic redundancy using a novel approach called the homotypic redundancy weight factor (HWF). Our results identified 10 SNPs, which putatively modify the binding affinity of 24 TFs previously identified as related to TB and HIV/AIDS expression profiles (e.g. KLF5, CEBPA and NFKB1 for TB; FOXP2, BRCA1, CEBPB, CREB1, EBF1 and ZNF354C for HIV/AIDS; and RUNX2, HIF1A, JUN/AP-1, NR4A2, EGR1 for both diseases). Validating with the OregAnno database and cell-specific functional/non functional SNPs from additional 13 genes, our algorithm performed 53% sensitivity and 84.6% specificity to detect functional rSNPs using the DNAseI-HUP database. We are proposing our algorithm as a novel in silico method to detect true functional rSNPs in antimicrobial peptide genes. With further improvement, this novel method could be applied to other promoters in order to design probes and to discover new drug targets for complex diseases. PMID:26447748

  19. A real-time PCR genotyping assay to detect FAD2A SNPs in peanuts (Arachis hypogaea L.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The high oleic (C18:1) phenotype in peanuts has been previously demonstrated to result from a homozygous recessive genotype (ol1ol1ol2ol2) in two homeologous fatty acid desaturase genes (FAD2A and FAD2B) with two key SNPs. These mutant SNPs, specifically G448A in FAD2A and 442insA in FAD2B, signifi...

  20. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs.

    PubMed

    Wu, XianMing; Hurst, Laurence D

    2016-02-01

    Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3-69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20-45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3' non-"AGgt" splice site. We suggest the concept of the "fragile" exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density. PMID:26545919

  1. Bag filters for TPP

    SciTech Connect

    L.V. Chekalov; Yu.I. Gromov; V.V. Chekalov

    2007-05-15

    Cleaning of TPP flue gases with bag filters capable of pulsed regeneration is examined. A new filtering element with a three-dimensional filtering material formed from a needle-broached cloth in which the filtration area, as compared with a conventional smooth bag, is increased by more than two times, is proposed. The design of a new FRMI type of modular filter is also proposed. A standard series of FRMI filters with a filtration area ranging from 800 to 16,000 m{sup 2} is designed for an output more than 1 million m{sub 3}/h of with respect to cleaned gas. The new bag filter permits dry collection of sulfur oxides from waste gases at TPP operating on high-sulfur coals. The design of the filter makes it possible to replace filter elements without taking the entire unit out of service.

  2. Novel Backup Filter Device for Candle Filters

    SciTech Connect

    Bishop, B.; Goldsmith, R.; Dunham, G.; Henderson, A.

    2002-09-18

    The currently preferred means of particulate removal from process or combustion gas generated by advanced coal-based power production processes is filtration with candle filters. However, candle filters have not shown the requisite reliability to be commercially viable for hot gas clean up for either integrated gasifier combined cycle (IGCC) or pressurized fluid bed combustion (PFBC) processes. Even a single candle failure can lead to unacceptable ash breakthrough, which can result in (a) damage to highly sensitive and expensive downstream equipment, (b) unacceptably low system on-stream factor, and (c) unplanned outages. The U.S. Department of Energy (DOE) has recognized the need to have fail-safe devices installed within or downstream from candle filters. In addition to CeraMem, DOE has contracted with Siemens-Westinghouse, the Energy & Environmental Research Center (EERC) at the University of North Dakota, and the Southern Research Institute (SRI) to develop novel fail-safe devices. Siemens-Westinghouse is evaluating honeycomb-based filter devices on the clean-side of the candle filter that can operate up to 870 C. The EERC is developing a highly porous ceramic disk with a sticky yet temperature-stable coating that will trap dust in the event of filter failure. SRI is developing the Full-Flow Mechanical Safeguard Device that provides a positive seal for the candle filter. Operation of the SRI device is triggered by the higher-than-normal gas flow from a broken candle. The CeraMem approach is similar to that of Siemens-Westinghouse and involves the development of honeycomb-based filters that operate on the clean-side of a candle filter. The overall objective of this project is to fabricate and test silicon carbide-based honeycomb failsafe filters for protection of downstream equipment in advanced coal conversion processes. The fail-safe filter, installed directly downstream of a candle filter, should have the capability for stopping essentially all particulate bypassing a broken or leaking candle while having a low enough pressure drop to allow the candle to be backpulse-regenerated. Forward-flow pressure drop should increase by no more than 20% because of incorporation of the fail-safe filter.

  3. SNP mining in Crassostrea gigas EST data: transferability to four other Crassostrea species, phylogenetic inferences and outlier SNPs under selection.

    PubMed

    Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

    2014-01-01

    Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies. PMID:25238392

  4. SNP Mining in Crassostrea gigas EST Data: Transferability to Four Other Crassostrea Species, Phylogenetic Inferences and Outlier SNPs under Selection

    PubMed Central

    Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

    2014-01-01

    Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies. PMID:25238392

  5. Genetic Association of Recovery from Eating Disorders: The Role of GABA Receptor SNPs

    PubMed Central

    Bloss, Cinnamon S; Berrettini, Wade; Bergen, Andrew W; Magistretti, Pierre; Duvvuri, Vikas; Strober, Michael; Brandt, Harry; Crawford, Steve; Crow, Scott; Fichter, Manfred M; Halmi, Katherine A; Johnson, Craig; Kaplan, Allan S; Keel, Pamela; Klump, Kelly L; Mitchell, James; Treasure, Janet; Woodside, D Blake; Marzola, Enrica; Schork, Nicholas J; Kaye, Walter H

    2011-01-01

    Follow-up studies of eating disorders (EDs) suggest outcomes ranging from recovery to chronic illness or death, but predictors of outcome have not been consistently identified. We tested 5151 single-nucleotide polymorphisms (SNPs) in approximately 350 candidate genes for association with recovery from ED in 1878 women. Initial analyses focused on a strictly defined discovery cohort of women who were over age 25 years, carried a lifetime diagnosis of an ED, and for whom data were available regarding the presence (n=361 ongoing symptoms in the past year, ie, ill') or absence (n=115 no symptoms in the past year, ie, recovered') of ED symptoms. An intronic SNP (rs17536211) in GABRG1 showed the strongest statistical evidence of association (p=4.63 10?6, false discovery rate (FDR)=0.021, odds ratio (OR)=0.46). We replicated these findings in a more liberally defined cohort of women age 25 years or younger (n=464 ill, n=107 recovered; p=0.0336, OR=0.68; combined sample p=4.57 10?6, FDR=0.0049, OR=0.55). Enrichment analyses revealed that GABA (?-aminobutyric acid) SNPs were over-represented among SNPs associated at p<0.05 in both the discovery (Z=3.64, p=0.0003) and combined cohorts (Z=2.07, p=0.0388). In follow-up phenomic association analyses with a third independent cohort (n=154 ED cases, n=677 controls), rs17536211 was associated with trait anxiety (p=0.049), suggesting a possible mechanism through which this variant may influence ED outcome. These findings could provide new insights into the development of more effective interventions for the most treatment-resistant patients. PMID:21750581

  6. Association study of FOXO3A SNPs and aging phenotypes in Danish oldest-old individuals.

    PubMed

    Soerensen, Mette; Nygaard, Marianne; Dato, Serena; Stevnsner, Tinna; Bohr, Vilhelm A; Christensen, Kaare; Christiansen, Lene

    2015-02-01

    FOXO3A variation has repeatedly been reported to associate with human longevity, yet only few studies have investigated whether FOXO3A variation also associates with aging-related traits. Here, we investigate the association of 15 FOXO3A tagging single nucleotide polymorphisms (SNPs) in 1088 oldest-old Danes (age 92-93) with 4 phenotypes known to predict their survival: cognitive function, hand grip strength, activity of daily living (ADL), and self-rated health. Based on previous studies in humans and foxo animal models, we also explore self-reported diabetes, cancer, cardiovascular disease, osteoporosis, and bone (femur/spine/hip/wrist) fracture. Gene-based testing revealed significant associations of FOXO3A variation with ADL (P = 0.044) and bone fracture (P = 0.006). The single-SNP statistics behind the gene-based analysis indicated increased ADL (decreased disability) and reduced bone fracture risk for carriers of the minor alleles of 8 and 10 SNPs, respectively. These positive directions of effects are in agreement with the positive effects on longevity previously reported for these SNPs. However, when correcting for the test of 9 phenotypes by Bonferroni correction, bone fracture showed borderline significance (P = 0.054), while ADL did not (P = 0.396). Although the single-SNP associations did not formally replicate in another study population of oldest-old Danes (n = 1279, age 94-100), the estimates were of similar direction of effect as observed in the Discovery sample. A pooled analysis of both study populations displayed similar or decreased sized P-values for most associations, hereby supporting the initial findings. Nevertheless, confirmation in additional study populations is needed. PMID:25470651

  7. Associations and interactions between SNPs in the alcohol metabolizing genes and alcoholism phenotypes in European Americans

    PubMed Central

    Sherva, Richard; Rice, John P.; Neuman, Rosalind J.; Rochberg, Nanette; Saccone, Nancy L.; Bierut, Laura J.

    2010-01-01

    Background Alcohol abuse and dependence are major causes of morbidity and mortality worldwide, and have a strong familial component. Several linkage and association studies have identified chromosomal regions and/or genes that affect alcohol consumption, notably in genes involved in the two-stage pathway of alcohol metabolism. Methods Here, we use multiple regression models to test for associations and interactions between two alcohol related phenotypes and SNPs in 17 genes involved in alcohol metabolism in the U.S. Caucasian subset of the Collaborative Genetic Study of Nicotine Dependence (COGEND) participants. Results Several SNPs across six genes showed evidence for association with either maximum number of drinks consumed in a 24-hour period or DSM-IV symptom count. The strongest evidence for association was between rs1229984, a non-synonymous coding SNP in ADH1B, and DSM-IV symptom count (P = 0.0003). This SNP was also associated with maximum drinks (P = 0.0004). Each minor allele at this SNP predicts 45% fewer DSM-IV symptoms and 18% fewer max drinks. Another SNP in a splice site in ALDH1A1 (rs8187974) showed evidence for association with both phenotypes as well. Minor alleles at this SNP predict greater alcohol consumption. In addition, pairwise interactions were observed between SNPs in several genes (P = 0.00002). Conclusions We replicated the large effect of rs1229984 on alcohol behavior, and although not common (MAF = 4%), this polymorphism may be highly relevant from a public health perspective in European Americans. Another SNP, rs8187974, may also affect alcohol behavior but requires replication. Also, interactions between polymorphisms in genes involved in alcohol metabolism are likely determinants of the parameters that ultimately affect alcohol consumption. PMID:19298322

  8. Genomic and geographic distribution of private SNPs and pathways in human populations

    PubMed Central

    Baye, Tesfaye M; Wilke, Russell A; Olivier, Michael

    2010-01-01

    Aims Geography-based genetic differentials operating on entire biochemical pathways may reflect different adaptive evolutionary processes that separated populations may have undergone. They may also influence treatment outcome for a variety of drugs an emerging and important area of study. This research article leverages the International HapMap Consortium data to identify pathway components that differ in genotype frequency for four populations: individuals of Northern European descent from the USA (CEU), individuals from West Africa (YRI), Japan (JPT) and China (CHB). Materials & methods By identifying loci with fixed or large frequency differences (? = 1) between paired population samples (CEU vs YRI, CEU vs CHB, CEU vs JPT, YRI vs CHB, YRI vs JPT and CHB vs JPT), and reconstructing the physiological functions of genes at these loci, we report a list of pathways affected by natural selection during human evolution. Results Of the 3.7 million HapMap SNPs, 463 loci (which mapped to 38 genes) were fixed (? = 1) in at least one population pair. These private loci included four nonsynonymous coding SNPs: rs4536103 (NEUROG3), rs1385699 (EDA2R), rs11946338 (ARHGAP24) and rs4422842 (CACNA1B). A total of four additional genes demonstrated evidence of recent positive selection: three genes in European subjects (IER5L, NPNT and SESTD1) and a single gene in Asian subjects (EXOC6B). Discussion Gene ontology and pathway analyses suggest that cellular differentiation, apoptosis and activation of the NF-?B transcription factor vary between populations in genomic regions of fixed (private) SNPs identified in this study. Variability in these pathways may provide important clues into the mechanisms of human adaptation to different environments. An improved understanding of their variability may also help to explain race-specific differences in the treatment outcomes observed for a variety of modern drugs. PMID:20352079

  9. SNPs detected in the yak MC4R gene and their association with growth traits.

    PubMed

    Cai, X; Mipam, T D; Zhao, F F; Sun, L

    2015-07-01

    MC4R (melanocortin 4 receptor) is expressed in the appetite-regulating areas of the brain and takes part in leptin signaling pathways. Sequencing of the coding region of the MC4R gene for 354 yaks identified the following five single nucleotide polymorphisms (SNPs): SNP1 (273C>T), SNP2 (321 G>T), SNP3 (864 C>A), SNP4 (1069G>C) and SNP5 (1206 G>C). SNP1, SNP2 and SNP3 were synonymous mutations, whereas SNP4 and SNP5 were missense mutations resulting in amino acid substitutions (V286L and R331S). Pairwise linkage disequilibrium (LD) analysis indicated that two pairs of SNPs, SNP2 and SNP5 (r(2)=0.81027) and SNP4 and SNP5 (r(2)=0.53816), exhibited higher degrees of LD. CC genotype of SNP4, CGACG and CTCCC haplotypes for all SNPs were associated with increased BW of animals that were 18 months old and with the average daily gain. The secondary structure and transmembrane region prediction of the yak MC4R protein suggested that SNP4 was correlated with influential changes in the seventh transmembrane domain of the MC4R protein and with the functional deterioration or even incapacitation of MC4R, which may contribute to the increased feed intake, BW and average daily gain of the yaks with CC genotypes. The data from this study suggested that 1069G>C SNP of the MC4R gene could be used in marker-assisted selection of growth traits in the Maiwa yak breed. PMID:25757688

  10. A joint association test for multiple SNPs in genetic case-control studies.

    PubMed

    Wang, Tao; Jacob, Howard; Ghosh, Soumitra; Wang, Xujing; Zeng, Zhao-Bang

    2009-02-01

    For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods. PMID:18770519

  11. Differences in allele frequencies of autosomal dominant hypercholesterolemia SNPs in the Malaysian population.

    PubMed

    Alex, Livy; Chahil, Jagdish Kaur; Lye, Say Hean; Bagali, Pramod; Ler, Lian Wee

    2012-06-01

    Hypercholesterolemia is caused by different interactions of lifestyle and genetic determinants. At the genetic level, it can be attributed to the interactions of multiple polymorphisms, or as in the example of familial hypercholesterolemia (FH), it can be the result of a single mutation. A large number of genetic markers, mostly single nucleotide polymorphisms (SNP) or mutations in three genes, implicated in autosomal dominant hypercholesterolemia (ADH), viz APOB (apolipoprotein B), LDLR (low density lipoprotein receptor) and PCSK9 (proprotein convertase subtilisin/kexin type-9), have been identified and characterized. However, such studies have been insufficiently undertaken specifically in Malaysia and Southeast Asia in general. The main objective of this study was to identify ADH variants, specifically ADH-causing mutations and hypercholesterolemia-associated polymorphisms in multiethnic Malaysian population. We aimed to evaluate published SNPs in ADH causing genes, in this population and to report any unusual trends. We examined a large number of selected SNPs from previous studies of APOB, LDLR, PCSK9 and other genes, in clinically diagnosed ADH patients (n=141) and healthy control subjects (n=111). Selection of SNPs was initiated by searching within genes reported to be associated with ADH from known databases. The important finding was 137 mono-allelic markers (44.1%) and 173 polymorphic markers (55.8%) in both subject groups. By comparing to publicly available data, out of the 137 mono-allelic markers, 23 markers showed significant differences in allele frequency among Malaysians, European Whites, Han Chinese, Yoruba and Gujarati Indians. Our data can serve as reference for others in related fields of study during the planning of their experiments. PMID:22534770

  12. A robust linkage map of the porcine autosomes based on gene-associated SNPs

    PubMed Central

    Vingborg, Rikke KK; Gregersen, Vivi R; Zhan, Bujie; Panitz, Frank; Hj, Anette; Srensen, Kirsten K; Madsen, Lone B; Larsen, Knud; Hornshj, Henrik; Wang, Xuefei; Bendixen, Christian

    2009-01-01

    Background Genetic linkage maps are necessary for mapping of mendelian traits and quantitative trait loci (QTLs). To identify the actual genes, which control these traits, a map based on gene-associated single nucleotide polymorphism (SNP) markers is highly valuable. In this study, the SNPs were genotyped in a large family material comprising more than 5,000 piglets derived from 12 Duroc boars crossed with 236 Danish Landrace/Danish Large White sows. The SNPs were identified in sequence alignments of 4,600 different amplicons obtained from the 12 boars and containing coding regions of genes derived from expressed sequence tags (ESTs) and genomic shotgun sequences. Results Linkage maps of all 18 porcine autosomes were constructed based on 456 gene-associated and six porcine EST-based SNPs. The total length of the averaged-sex whole porcine autosome was estimated to 1,711.8 cM resulting in an average SNP spacing of 3.94 cM. The female and male maps were estimated to 2,336.1 and 1,441.5 cM, respectively. The gene order was validated through comparisons to the cytogenetic and/or physical location of 203 genes, linkage to evenly spaced microsatellite markers as well as previously reported conserved synteny. A total of 330 previously unmapped genes and ESTs were mapped to the porcine autosome while ten genes were mapped to unexpected locations. Conclusion The linkage map presented here shows high accuracy in gene order. The pedigree family network as well as the large amount of meiotic events provide good reliability and make this map suitable for QTL and association studies. In addition, the linkage to the RH-map of microsatellites makes it suitable for comparison to other QTL studies. PMID:19327136

  13. A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

    PubMed Central

    Bakir-Gungor, Burcu; Sezerman, Osman Ugur

    2011-01-01

    Genome-wide association studies (GWAS) with hundreds of ?thousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network-oriented analysis and prior knowledge from functional properties of a SNP. PMID:22046267

  14. HLA-A SNPs and amino acid variants are associated with nasopharyngeal carcinoma in Malaysian Chinese.

    PubMed

    Chin, Yoon-Ming; Mushiroda, Taisei; Takahashi, Atsushi; Kubo, Michiaki; Krishnan, Gopala; Yap, Lee-Fah; Teo, Soo-Hwang; Lim, Paul Vey-Hong; Yap, Yoke-Yeow; Pua, Kin-Choo; Kamatani, Naoyuki; Nakamura, Yusuke; Sam, Choon-Kook; Khoo, Alan Soo-Beng; Ng, Ching-Ching

    2015-02-01

    Nasopharyngeal carcinoma (NPC) arises from the mucosal epithelium of the nasopharynx and is constantly associated with Epstein-Barr virus type 1 (EBV-1) infection. We carried out a genome-wide association study (GWAS) of 575,247 autosomal SNPs in 184 NPC patients and 236 healthy controls of Malaysian Chinese ethnicity. Potential association signals were replicated in a separate cohort of 260 NPC patients and 245 healthy controls. We confirmed the association of HLA-A to NPC with the strongest signal detected in rs3869062 (p?=?1.73 10(-9)). HLA-A fine mapping revealed associations in the amino acid variants as well as its corresponding SNPs in the antigen peptide binding groove (p(HLA-A-aa-site-99)?=?3.79 10(-8), p(rs1136697)?=?3.79 10(-8)) and T-cell receptor binding site (p(HLA-A-aa-site-145)?=?1.41 10(-4), p(rs1059520)?=?1.41 10(-4)) of the HLA-A. We also detected strong association signals in the 5'-UTR region with predicted active promoter states (p(rs41545520)?=?7.91 10(-8)). SNP rs41545520 is a potential binding site for repressor ATF3, with increased binding affinity for rs41545520-G correlated with reduced HLA-A expression. Multivariate logistic regression diminished the effects of HLA-A amino acid variants and SNPs, indicating a correlation with the effects of HLA-A*11:01, and to a lesser extent HLA-A*02:07. We report the strong genetic influence of HLA-A on NPC susceptibility in the Malaysian Chinese. PMID:24947555

  15. MST Filterability Tests

    SciTech Connect

    Poirier, M. R.; Burket, P. R.; Duignan, M. R.

    2015-03-12

    The Savannah River Site (SRS) is currently treating radioactive liquid waste with the Actinide Removal Process (ARP) and the Modular Caustic Side Solvent Extraction Unit (MCU). The low filter flux through the ARP has limited the rate at which radioactive liquid waste can be treated. Recent filter flux has averaged approximately 5 gallons per minute (gpm). Salt Batch 6 has had a lower processing rate and required frequent filter cleaning. Savannah River Remediation (SRR) has a desire to understand the causes of the low filter flux and to increase ARP/MCU throughput. In addition, at the time the testing started, SRR was assessing the impact of replacing the 0.1 micron filter with a 0.5 micron filter. This report describes testing of MST filterability to investigate the impact of filter pore size and MST particle size on filter flux and testing of filter enhancers to attempt to increase filter flux. The authors constructed a laboratory-scale crossflow filter apparatus with two crossflow filters operating in parallel. One filter was a 0.1 micron Mott sintered SS filter and the other was a 0.5 micron Mott sintered SS filter. The authors also constructed a dead-end filtration apparatus to conduct screening tests with potential filter aids and body feeds, referred to as filter enhancers. The original baseline for ARP was 5.6 M sodium salt solution with a free hydroxide concentration of approximately 1.7 M.3 ARP has been operating with a sodium concentration of approximately 6.4 M and a free hydroxide concentration of approximately 2.5 M. SRNL conducted tests varying the concentration of sodium and free hydroxide to determine whether those changes had a significant effect on filter flux. The feed slurries for the MST filterability tests were composed of simple salts (NaOH, NaNO2, and NaNO3) and MST (0.2 – 4.8 g/L). The feed slurry for the filter enhancer tests contained simulated salt batch 6 supernate, MST, and filter enhancers.

  16. An active filter primer

    NASA Astrophysics Data System (ADS)

    Delagrange, A. D.

    1983-02-01

    In the past few years active filters have become very popular. This report explains why, and explains what active filters can (and can't) do. It gives the basics of active filter design, both theory and practice. It can be used as a handbook to build working active filters of the most common types. This report is an update of the original issued in 1979.

  17. Survey of digital filtering

    NASA Technical Reports Server (NTRS)

    Nagle, H. T., Jr.

    1972-01-01

    A three part survey is made of the state-of-the-art in digital filtering. Part one presents background material including sampled data transformations and the discrete Fourier transform. Part two, digital filter theory, gives an in-depth coverage of filter categories, transfer function synthesis, quantization and other nonlinear errors, filter structures and computer aided design. Part three presents hardware mechanization techniques. Implementations by general purpose, mini-, and special-purpose computers are presented.

  18. Practical Active Capacitor Filter

    NASA Technical Reports Server (NTRS)

    Shuler, Robert L., Jr. (Inventor)

    2005-01-01

    A method and apparatus is described that filters an electrical signal. The filtering uses a capacitor multiplier circuit where the capacitor multiplier circuit uses at least one amplifier circuit and at least one capacitor. A filtered electrical signal results from a direct connection from an output of the at least one amplifier circuit.

  19. HEPA filter encapsulation

    DOEpatents

    Gates-Anderson, Dianne D. (Union City, CA); Kidd, Scott D. (Brentwood, CA); Bowers, John S. (Manteca, CA); Attebery, Ronald W. (San Lorenzo, CA)

    2003-01-01

    A low viscosity resin is delivered into a spent HEPA filter or other waste. The resin is introduced into the filter or other waste using a vacuum to assist in the mass transfer of the resin through the filter media or other waste.

  20. Filter service system

    DOEpatents

    Sellers, Cheryl L. (Peoria, IL); Nordyke, Daniel S. (Arlington Heights, IL); Crandell, Richard A. (Morton, IL); Tomlins, Gregory (Peoria, IL); Fei, Dong (Peoria, IL); Panov, Alexander (Dunlap, IL); Lane, William H. (Chillicothe, IL); Habeger, Craig F. (Chillicothe, IL)

    2008-12-09

    According to an exemplary embodiment of the present disclosure, a system for removing matter from a filtering device includes a gas pressurization assembly. An element of the assembly is removably attachable to a first orifice of the filtering device. The system also includes a vacuum source fluidly connected to a second orifice of the filtering device.

  1. Nonlinear Attitude Filtering Methods

    NASA Technical Reports Server (NTRS)

    Markley, F. Landis; Crassidis, John L.; Cheng, Yang

    2005-01-01

    This paper provides a survey of modern nonlinear filtering methods for attitude estimation. Early applications relied mostly on the extended Kalman filter for attitude estimation. Since these applications, several new approaches have been developed that have proven to be superior to the extended Kalman filter. Several of these approaches maintain the basic structure of the extended Kalman filter, but employ various modifications in order to provide better convergence or improve other performance characteristics. Examples of such approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters that propagate and update a discrete set of sigma points rather than using linearized equations for the mean and covariance are also reviewed. A two-step approach is discussed with a first-step state that linearizes the measurement model and an iterative second step to recover the desired attitude states. These approaches are all based on the Gaussian assumption that the probability density function is adequately specified by its mean and covariance. Other approaches that do not require this assumption are reviewed, including particle filters and a Bayesian filter based on a non-Gaussian, finite-parameter probability density function on SO(3). Finally, the predictive filter, nonlinear observers and adaptive approaches are shown. The strengths and weaknesses of the various approaches are discussed.

  2. Parallel DC notch filter

    NASA Astrophysics Data System (ADS)

    Kwok, Kam-Cheung; Chan, Ming-Kam

    1991-12-01

    In the process of image acquisition, the object of interest may not be evenly illuminated. So an image with shading irregularities would be produced. This type of image is very difficult to analyze. Consequently, a lot of research work concentrates on this problem. In order to remove the light illumination problem, one of the methods is to filter the image. The dc notch filter is one of the spatial domain filters used for reducing the effect of uneven light illumination on the image. Although the dc notch filter is a spatial domain filter, it is still rather time consuming to apply, especially when it is implemented on a microcomputer. To overcome the speed problem, a parallel dc notch filter is proposed. Based on the separability of the algorithm dc of notch filter, image parallelism (parallel image processing model) is used. To improve the performance of the microcomputer, an INMOS IMS B008 Module Mother Board with four IMS T800-17 is installed in the microcomputer. In fact, the dc notch filter is implemented on the transputer network. This parallel dc notch filter creates a great improvement in the computation time of the filter in comparison with the sequential one. Furthermore, the speed-up is used to analyze the performance of the parallel algorithm. As a result, parallel implementation of the dc notch filter on a transputer network gives a real-time performance of this filter.

  3. Regenerative particulate filter development

    NASA Technical Reports Server (NTRS)

    Descamp, V. A.; Boex, M. W.; Hussey, M. W.; Larson, T. P.

    1972-01-01

    Development, design, and fabrication of a prototype filter regeneration unit for regenerating clean fluid particle filter elements by using a backflush/jet impingement technique are reported. Development tests were also conducted on a vortex particle separator designed for use in zero gravity environment. A maintainable filter was designed, fabricated and tested that allows filter element replacement without any leakage or spillage of system fluid. Also described are spacecraft fluid system design and filter maintenance techniques with respect to inflight maintenance for the space shuttle and space station.

  4. Genotyping three SNPs affecting warfarin drug response by isothermal real-time HAD assays

    PubMed Central

    Li, Ying; Jortani, Saeed A.; Ramey-Hartung, Bronwyn; Hudson, Elizabeth; Lemieux, Bertrand; Kong, Huimin

    2010-01-01

    Background The response to the anticoagulant drug warfarin is greatly affected by genetic polymorphisms in the VKORC1 and CYP2C9 genes. Genotyping these polymorphisms has been shown to be important in reducing the time of the trial and error process for finding the maintenance dose of warfarin thus reducing the risk of adverse effects of the drug. Method We developed a real-time isothermal DNA amplification system for genotyping three single nucleotide polymorphisms (SNPs) that influence warfarin response. For each SNP, real-time isothermal Helicase Dependent Amplification (HDA) reactions were performed to amplify a DNA fragment containing the SNP. Amplicons were detected by fluorescently labeled allele specific probes during real-time HDA amplification. Results Fifty clinical samples were analyzed by the HDA-based method, generating a total of 150 results. Of these, 148 were consistent between the HDA-based assays and a reference method. The two samples with unresolved HDA-based test results were repeated and found to be consistent with the reference method. Conclusion The HDA-based assays demonstrated a clinically acceptable performance for genotyping the VKORC1 -1639G>A SNP and two SNPs (430C>T and 1075A>C) for the CYP2C9 enzyme (CYP2C9*2 and CYP2C9*3), all of which are relevant in warfarin pharmacogenentics. PMID:20854800

  5. A simple method using PyrosequencingTM to identify de novo SNPs in pooled DNA samples

    PubMed Central

    Lin, Yeong-Shin; Liu, Fu-Guo Robert; Wang, Tzi-Yuan; Pan, Cheng-Tsung; Chang, Wei-Ting; Li, Wen-Hsiung

    2011-01-01

    A practical way to reduce the cost of surveying single-nucleotide polymorphism (SNP) in a large number of individuals is to measure the allele frequencies in pooled DNA samples. PyrosequencingTM has been frequently used for this application because signals generated by this approach are proportional to the amount of DNA templates. The PyrosequencingTM pyrogram is determined by the dispensing order of dNTPs, which is usually designed based on the known SNPs to avoid asynchronistic extensions of heterozygous sequences. Therefore, utilizing the pyrogram signals to identify de novo SNPs in DNA pools has never been undertook. Here, in this study we developed an algorithm to address this issue. With the sequence and pyrogram of the wild-type allele known in advance, we could use the pyrogram obtained from the pooled DNA sample to predict the sequence of the unknown mutant allele (de novo SNP) and estimate its allele frequency. Both computational simulation and experimental PyrosequencingTM test results suggested that our method performs well. The web interface of our method is available at http://life.nctu.edu.tw/∼yslin/PSM/. PMID:21131285

  6. Linkage analysis of SNPs in IGFBP-6 and its relation with the body sizes of pig.

    PubMed

    Fang, X B; Liu, S C; Wu, Q Y; Li, S M; Cheng, Y Y; Fu, H Y; Lu, C; Su, D; Yu, H; Hao, L L

    2015-01-01

    Insulin-like growth factor binding protein-6 (IGFBP-6) is a member of the IGFBP family, which is known to be a key factor in regulating the effect of insulin-like growth factor-2 (IGF-2) on the animal growth and development. Gene sequences of 3'-untranslated regions (UTR) and exon 4 of IGFBP-6 may influence the expression and proteolysis of IGFBP-6. In this study, 551 bp of the IGFBP-6 (including 257 bp of intron 3, exon 4, and 170 bp of 3' UTR) were sequenced and compared in the Bama and Tibetan mini-pigs, the Landrace and Large White pigs, and the Northeast wild boars. Six single nucleotide polymorphisms (SNPs) were detected in the IGFBP-6, in which T593C, T636C, and T745C were in intron 3, A67G was in exon 4, and G37A was in 3' UTR. T636C, T745C, and A67G were in linkage and formed four kinds of haplotypes, with CCT being the dominant haplotype in the mini-pigs; however, the haplotype block was not formed in the Landrace pigs and Large White pigs or the Northeast wild boars. Based on the above results, we concluded that the SNPs and haplotype of the IGFBP-6 may be related to the mini-size formation of the pig. PMID:26681221

  7. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

    PubMed

    Lee, S Hong; Ripke, Stephan; Neale, Benjamin M; Faraone, Stephen V; Purcell, Shaun M; Perlis, Roy H; Mowry, Bryan J; Thapar, Anita; Goddard, Michael E; Witte, John S; Absher, Devin; Agartz, Ingrid; Akil, Huda; Amin, Farooq; Andreassen, Ole A; Anjorin, Adebayo; Anney, Richard; Anttila, Verneri; Arking, Dan E; Asherson, Philip; Azevedo, Maria H; Backlund, Lena; Badner, Judith A; Bailey, Anthony J; Banaschewski, Tobias; Barchas, Jack D; Barnes, Michael R; Barrett, Thomas B; Bass, Nicholas; Battaglia, Agatino; Bauer, Michael; Bays, Mnica; Bellivier, Frank; Bergen, Sarah E; Berrettini, Wade; Betancur, Catalina; Bettecken, Thomas; Biederman, Joseph; Binder, Elisabeth B; Black, Donald W; Blackwood, Douglas H R; Bloss, Cinnamon S; Boehnke, Michael; Boomsma, Dorret I; Breen, Gerome; Breuer, Ren; Bruggeman, Richard; Cormican, Paul; Buccola, Nancy G; Buitelaar, Jan K; Bunney, William E; Buxbaum, Joseph D; Byerley, William F; Byrne, Enda M; Caesar, Sian; Cahn, Wiepke; Cantor, Rita M; Casas, Miguel; Chakravarti, Aravinda; Chambert, Kimberly; Choudhury, Khalid; Cichon, Sven; Cloninger, C Robert; Collier, David A; Cook, Edwin H; Coon, Hilary; Cormand, Bru; Corvin, Aiden; Coryell, William H; Craig, David W; Craig, Ian W; Crosbie, Jennifer; Cuccaro, Michael L; Curtis, David; Czamara, Darina; Datta, Susmita; Dawson, Geraldine; Day, Richard; De Geus, Eco J; Degenhardt, Franziska; Djurovic, Srdjan; Donohoe, Gary J; Doyle, Alysa E; Duan, Jubao; Dudbridge, Frank; Duketis, Eftichia; Ebstein, Richard P; Edenberg, Howard J; Elia, Josephine; Ennis, Sean; Etain, Bruno; Fanous, Ayman; Farmer, Anne E; Ferrier, I Nicol; Flickinger, Matthew; Fombonne, Eric; Foroud, Tatiana; Frank, Josef; Franke, Barbara; Fraser, Christine; Freedman, Robert; Freimer, Nelson B; Freitag, Christine M; Friedl, Marion; Frisn, Louise; Gallagher, Louise; Gejman, Pablo V; Georgieva, Lyudmila; Gershon, Elliot S; Geschwind, Daniel H; Giegling, Ina; Gill, Michael; Gordon, Scott D; Gordon-Smith, Katherine; Green, Elaine K; Greenwood, Tiffany A; Grice, Dorothy E; Gross, Magdalena; Grozeva, Detelina; Guan, Weihua; Gurling, Hugh; De Haan, Lieuwe; Haines, Jonathan L; Hakonarson, Hakon; Hallmayer, Joachim; Hamilton, Steven P; Hamshere, Marian L; Hansen, Thomas F; Hartmann, Annette M; Hautzinger, Martin; Heath, Andrew C; Henders, Anjali K; Herms, Stefan; Hickie, Ian B; Hipolito, Maria; Hoefels, Susanne; Holmans, Peter A; Holsboer, Florian; Hoogendijk, Witte J; Hottenga, Jouke-Jan; Hultman, Christina M; Hus, Vanessa; Ingason, Andrs; Ising, Marcus; Jamain, Stphane; Jones, Edward G; Jones, Ian; Jones, Lisa; Tzeng, Jung-Ying; Khler, Anna K; Kahn, Ren S; Kandaswamy, Radhika; Keller, Matthew C; Kennedy, James L; Kenny, Elaine; Kent, Lindsey; Kim, Yunjung; Kirov, George K; Klauck, Sabine M; Klei, Lambertus; Knowles, James A; Kohli, Martin A; Koller, Daniel L; Konte, Bettina; Korszun, Ania; Krabbendam, Lydia; Krasucki, Robert; Kuntsi, Jonna; Kwan, Phoenix; Landn, Mikael; Lngstrm, Niklas; Lathrop, Mark; Lawrence, Jacob; Lawson, William B; Leboyer, Marion; Ledbetter, David H; Lee, Phil H; Lencz, Todd; Lesch, Klaus-Peter; Levinson, Douglas F; Lewis, Cathryn M; Li, Jun; Lichtenstein, Paul; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Liu, Chunyu; Lohoff, Falk W; Loo, Sandra K; Lord, Catherine; Lowe, Jennifer K; Lucae, Susanne; MacIntyre, Donald J; Madden, Pamela A F; Maestrini, Elena; Magnusson, Patrik K E; Mahon, Pamela B; Maier, Wolfgang; Malhotra, Anil K; Mane, Shrikant M; Martin, Christa L; Martin, Nicholas G; Mattheisen, Manuel; Matthews, Keith; Mattingsdal, Morten; McCarroll, Steven A; McGhee, Kevin A; McGough, James J; McGrath, Patrick J; McGuffin, Peter; McInnis, Melvin G; McIntosh, Andrew; McKinney, Rebecca; McLean, Alan W; McMahon, Francis J; McMahon, William M; McQuillin, Andrew; Medeiros, Helena; Medland, Sarah E; Meier, Sandra; Melle, Ingrid; Meng, Fan; Meyer, Jobst; Middeldorp, Christel M; Middleton, Lefkos; Milanova, Vihra; Miranda, Ana; Monaco, Anthony P; Montgomery, Grant W; Moran, Jennifer L; Moreno-De-Luca, Daniel; Morken, Gunnar; Morris, Derek W; Morrow, Eric M; Moskvina, Valentina; Muglia, Pierandrea; Mhleisen, Thomas W; Muir, Walter J; Mller-Myhsok, Bertram; Murtha, Michael; Myers, Richard M; Myin-Germeys, Inez; Neale, Michael C; Nelson, Stan F; Nievergelt, Caroline M; Nikolov, Ivan; Nimgaonkar, Vishwajit; Nolen, Willem A; Nthen, Markus M; Nurnberger, John I; Nwulia, Evaristus A; Nyholt, Dale R; O'Dushlaine, Colm; Oades, Robert D; Olincy, Ann; Oliveira, Guiomar; Olsen, Line; Ophoff, Roel A; Osby, Urban; Owen, Michael J; Palotie, Aarno; Parr, Jeremy R

    2013-09-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 0.06 s.e.), and ADHD and major depressive disorder (0.32 0.07 s.e.), low between schizophrenia and ASD (0.16 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  8. SNPs Previously Associated with Dupuytrens Disease Replicated in a North American Cohort

    PubMed Central

    Anderson, Eric R.; Ye, Zhan; Caldwell, Michael D.; Burmester, James K.

    2014-01-01

    Objective Dupuytrens disease is a progressive fibrosis of the hand that often results in debilitating flexion contractures. Its etiology is not completely understood but likely involves both genetic and environmental factors. A recent study performed in Europe identified DNA variants that associate with Dupuytrens disease. Given the likelihood for genetic variation among populations, we planned to validate the genetic variants identified by this study in a North American population. Methods In the Marshfield Clinics Personalized Medicine Research Project, 296 cases with Dupuytrens disease were identified and matched 3-to-1 to controls without Dupuytrens disease. Clinical data were abstracted from the electronic medical record. The top 12 single nucleotide polymorphisms (SNPs) from the European study were selected and tested in a multiplex assay using the MassArray Analyzer 4 (Sequenom, Inc., San Diego, CA). Differences in allele frequency were determined, and variants with a P value of <0.004 were considered significant. Results We replicated 5 of the 12 SNPs previously reported to be associated with Dupuytrens disease. Conclusion Our findings support a role for the Wnt signaling pathway in the development of Dupuytrens disease, and suggest that further study of this pathway may result in early diagnosis and non-surgical treatments for Dupuytrens disease. PMID:24573701

  9. The genetics of human infertility by functional interrogation of SNPs in mice.

    PubMed

    Singh, Priti; Schimenti, John C

    2015-08-18

    Infertility is a prevalent health issue, affecting ?15% of couples of childbearing age. Nearly one-half of idiopathic infertility cases are thought to have a genetic basis, but the underlying causes are largely unknown. Traditional methods for studying inheritance, such as genome-wide association studies and linkage analyses, have been confounded by the genetic and phenotypic complexity of reproductive processes. Here we describe an association- and linkage-free approach to identify segregating infertility alleles, in which CRISPR/Cas9 genome editing is used to model putatively deleterious nonsynonymous SNPs (nsSNPs) in the mouse orthologs of fertility genes. Mice bearing "humanized" alleles of four essential meiosis genes, each predicted to be deleterious by most of the commonly used algorithms for analyzing functional SNP consequences, were examined for fertility and reproductive defects. Only a Cdk2 allele mimicking SNP rs3087335, which alters an inhibitory WEE1 protein kinase phosphorylation site, caused infertility and revealed a novel function in regulating spermatogonial stem cell maintenance. Our data indicate that segregating infertility alleles exist in human populations. Furthermore, whereas computational prediction of SNP effects is useful for identifying candidate causal mutations for diverse diseases, this study underscores the need for in vivo functional evaluation of physiological consequences. This approach can revolutionize personalized reproductive genetics by establishing a permanent reference of benign vs. infertile alleles. PMID:26240362

  10. Genetic Diversity and Demographic History of Cajanus spp. Illustrated from Genome-Wide SNPs

    PubMed Central

    Saxena, Rachit K.; von Wettberg, Eric; Upadhyaya, Hari D.; Sanchez, Vanessa; Songok, Serah; Saxena, Kulbhushan; Kimurto, Paul; Varshney, Rajeev K.

    2014-01-01

    Understanding genetic structure of Cajanus spp. is essential for achieving genetic improvement by quantitative trait loci (QTL) mapping or association studies and use of selected markers through genomic assisted breeding and genomic selection. After developing a comprehensive set of 1,616 single nucleotide polymorphism (SNPs) and their conversion into cost effective KASPar assays for pigeonpea (Cajanus cajan), we studied levels of genetic variability both within and between diverse set of Cajanus lines including 56 breeding lines, 21 landraces and 107 accessions from 18 wild species. These results revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, 75.8% of successful SNP assays revealed polymorphism, and more than 95% of these assays could be successfully transferred to related wild species. To show regional patterns of variation, we used STRUCTURE and Analysis of Molecular Variance (AMOVA) to partition variance among hierarchical sets of landraces and wild species at either the continental scale or within India. STRUCTURE separated most of the domesticated germplasm from wild ecotypes, and separates Australian and Asian wild species as has been found previously. Among Indian regions and states within regions, we found 36% of the variation between regions, and 64% within landraces or wilds within states. The highest level of polymorphism in wild relatives and landraces was found in Madhya Pradesh and Andhra Pradesh provinces of India representing the centre of origin and domestication of pigeonpea respectively. PMID:24533111

  11. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  12. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs

    PubMed Central

    2013-01-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  13. Two novel single nucleotide polymorphisms (SNPs) of the FMO3 gene in Japanese.

    PubMed

    Fujieda, Masaki; Yamazaki, Hiroshi; Togashi, Masahiro; Saito, Tetsuya; Kamataki, Tetsuya

    2003-01-01

    We sequenced all exons and exon-intron junctions of the flavin-containing monooxygenase 3 (FMO3) gene from 27 Japanese individuals who are trimethylaminuria volunteers judged by self-reported analysis. We found two novel single nucleotide polymorphisms (SNPs) (21246 T>A and 21265 C>T) causing amino acid substitutions (Asp(198)Glu and Arg(205)Cys in exon 5), respectively. The Asp(198)Glu allele also presented together with known SNPs (20852 C>T in exon4, 20960_20962 CTT deletion, 21115 G>A in intron 4, and 21243_21244 TG deletion in exon 5) in the same allele of the FMO3 gene to form a novel haplotype. These sequences are as follows:1) SNP, 030609Fujieda019; GENE NAME, FMO3; ACCESSION NUMBER, AL021026; LENGTH, 25 base; 5'-TTCGGGCTG(TG/-)AT/AATTGCCACAGAA-3'.2) SNP, 030609Fujieda020; GENE NAME, FMO3; ACCESSION NUMBER, AL021026; LENGTH, 25 base; 5'-ACAGAACTCAGCC/TGCACAGCAGAAC-3'. PMID:15618753

  14. Development of a multiplex PCR system of 59 mitochondrial SNPs and genetic analysis in Chinese population.

    PubMed

    Nie, Yanchai; Zhang, Chen; Jiao, Haitao; Zhao, Ziqin; Zhou, Huaigu

    2014-07-01

    The analysis of SNPs located on the mitochondrial DNA can provide information on maternal genetics. In the present study, a set of 59 SNPs were detected simultaneously using three multiplex allele-specific PCR and subsequent CE. Allele-specific primers were designed with different sizes to allow for specifically amplified paired alleles in the same reaction. An allelic ladder based on reference alleles was also created to maintain high-quality analysis standard. Samples from 400 unrelated individuals (200 of Han population and 200 of Uyghur population, China) were successfully analyzed and assigned into 106 relevant haplotypes, resulting in a discrimination power of 98.5%. The haplotype diversity was 0.978 for Han and 0.972 for Uyghur, respectively. Pairwise comparison of haplotype frequency distributions showed significant difference across ethnicities. These results suggest that the 59-SNP PCR system is a reliable, rapid, and economical method for large-scale screening of mitochondrial DNA variation, adding a new aspect for forensic individual identification. PMID:24659556

  15. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests

    PubMed Central

    2015-01-01

    Background Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. Results This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. Conclusion The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods. PMID:25708662

  16. The compensated Kalman filter.

    NASA Technical Reports Server (NTRS)

    Athans, M.

    1972-01-01

    This paper introduces the compensated Kalman filter, a suboptimal state estimator which can be used to eliminate steady-state bias errors when it is used in conjunction with the mismatched steady-state (asymptotic) time-invariant Kalman-Bucy filter. The uncompensated mismatched steady state Kalman-Bucy filter exhibits bias errors whenever the nominal plant parameters used in the filter design are different from the actual plant parameters. The approach used relies on the utilization of the residual (innovations) process of the mismatched filter to estimate, via a Kalman-Bucy filter, the state estimation errors and subsequent improvements of the state estimate. The compensated Kalman filter augments the mismatched steady state Kalman-Bucy filrby the introduction of additional dynamics and feedforward integral compensation channels.

  17. Detection of SNPs in the TBC1D1 gene and their association with carcass traits in chicken.

    PubMed

    Wang, Yan; Xu, Heng-Yong; Gilbert, Elizabeth R; Peng, Xing; Zhao, Xiao-Ling; Liu, Yi-Ping; Zhu, Qing

    2014-09-01

    TBC1D1 plays an important role in numerous fundamental physiological processes including muscle metabolism, regulation of whole body energy homeostasis and lipid metabolism. The objective of the present study was to identify single nucleotide polymorphisms (SNPs) in chicken TBC1D1 using 128 Erlang mountainous chickens and to determine if these SNPs are associated with carcass traits. The approach consisted of sequencing TBC1D1 using a panel of DNA from different individuals, revealing twenty-two SNPs. Among these SNPs, two polymorphisms (g.69307744C>T and g.69307608T>G) of block 1, four polymorphisms (g.69322320C>T, g.69322314G>A, g.69317290A>G and g.69317276T>C) of block 2 and four polymorphisms of block 3 (g.69349746G>A, g.69349736C>G, g.69349727C>T and g.69349694C>T) exhibited a high degree of linkage disequilibrium in all test populations. An association analysis was performed between the twenty-two SNPs and seven performance traits. SNPs g.69307744C>T, g.69340192G>A and g.69355665T>C were demonstrated to have a strong effect on liveweight (BW), carcass weight (CW), semi-eviscerated weight (SEW) and eviscerated weight (EW) and g.69340070C>T polymorphism was related to BW, SEW and BMW in chicken populations. However, for the other SNPs, there were no significant correlations between different genotypes and carcass traits. Meanwhile, haplotype CT-TG of block 1 and combined genotype AG-TT-AC-CT of block 3 were significantly associated with BW, CW, SEW and EW. Overall, our results provide evidence that polymorphisms in TBC1D1 are associated with carcass traits and would be a useful candidate gene in selection programs for improving carcass traits. PMID:24979340

  18. A Comprehensive In Silico Analysis of the Functional and Structural Impact of Nonsynonymous SNPs in the ABCA1 Transporter Gene

    PubMed Central

    Marín-Martín, Francisco R.; Soler-Rivas, Cristina; Martín-Hernández, Roberto; Rodriguez-Casado, Arantxa

    2014-01-01

    Disease phenotypes and defects in function can be traced to nonsynonymous single nucleotide polymorphisms (nsSNPs), which are important indicators of action sites and effective potential therapeutic approaches. Identification of deleterious nsSNPs is crucial to characterize the genetic basis of diseases, assess individual susceptibility to disease, determinate molecular and therapeutic targets, and predict clinical phenotypes. In this study using PolyPhen2 and MutPred in silico algorithms, we analyzed the genetic variations that can alter the expression and function of the ABCA1 gene that causes the allelic disorders familial hypoalphalipoproteinemia and Tangier disease. Predictions were validated with published results from in vitro, in vivo, and human studies. Out of a total of 233 nsSNPs, 80 (34.33%) were found deleterious by both methods. Among these 80 deleterious nsSNPs found, 29 (12.44%) rare variants resulted highly deleterious with a probability >0.8. We have observed that mostly variants with verified functional effect in experimental studies are correctly predicted as damage variants by MutPred and PolyPhen2 tools. Still, the controversial results of experimental approaches correspond to nsSNPs predicted as neutral by both methods, or contradictory predictions are obtained for them. A total of seventeen nsSNPs were predicted as deleterious by PolyPhen2, which resulted neutral by MutPred. Otherwise, forty two nsSNPs were predicted as deleterious by MutPred, which resulted neutral by PolyPhen2. PMID:25215231

  19. Compact planar microwave blocking filters

    NASA Technical Reports Server (NTRS)

    U-Yen, Kongpop (Inventor); Wollack, Edward J. (Inventor)

    2012-01-01

    A compact planar microwave blocking filter includes a dielectric substrate and a plurality of filter unit elements disposed on the substrate. The filter unit elements are interconnected in a symmetrical series cascade with filter unit elements being organized in the series based on physical size. In the filter, a first filter unit element of the plurality of filter unit elements includes a low impedance open-ended line configured to reduce the shunt capacitance of the filter.

  20. Electromechanical Frequency Filters

    NASA Astrophysics Data System (ADS)

    Wersing, W.; Lubitz, K.

    Frequency filters select signals with a frequency inside a definite frequency range or band from signals outside this band, traditionally afforded by a combination of L-C-resonators. The fundamental principle of all modern frequency filters is the constructive interference of travelling waves. If a filter is set up of coupled resonators, this interference occurs as a result of the successive wave reflection at the resonators' ends. In this case, the center frequency f c of a filter, e.g., set up of symmetrical ?/2-resonators of length 1, is given by f_c = f_r = v_{ph}/? = v_{ph}/2l , where v ph is the phase velocity of the wave. This clearly shows the big advantage of acoustic waves for filter applications in comparison to electro-magnetic waves. Because v ph of acoustic waves in solids is about 104-105 smaller than that of electro-magnetic waves, much smaller filters can be realised. Today, piezoelectric materials and processing technologies exist that electromechanical resonators and filters can be produced in the frequency range from 1 kHz up to 10 GHz. Further requirements for frequency filters such as low losses (high resonator Q) and low temperature coefficients of frequency constants can also be fulfilled with these filters. Important examples are quartz-crystal resonators and filters (1 kHz-200 MHz) as discussed in Chap. 2, electromechanical channel filters (50 kHz and 130 kHz) for long-haul communication systems as discussed in this section, surface acoustic wave (SAW) filters (20 MHz-5 GHz), as discussed in Chap. 14, and thin film bulk acoustic resonators (FBAR) and filters (500 MHz-10 GHz), as discussed in Chap. 15.

  1. Filtering separators having filter cleaning apparatus

    SciTech Connect

    Margraf, A.

    1984-08-28

    This invention relates to filtering separators of the kind having a housing which is subdivided by a partition, provided with parallel rows of holes or slots, into a dust-laden gas space for receiving filter elements positioned in parallel rows and being impinged upon by dust-laden gas from the outside towards the inside, and a clean gas space. In addition, the housing is provided with a chamber for cleansing the filter element surfaces of a row by counterflow action while covering at the same time the partition holes or slots leading to the adjacent rows of filter elements. The chamber is arranged for the supply of compressed air to at least one injector arranged to feed compressed air and secondary air to the row of filter elements to be cleansed. The chamber is also reciprocatingly displaceable along the partition in periodic and intermittent manner. According to the invention, a surface of the chamber facing towards the partition covers at least two of the rows of holes or slots of the partition, and the chamber is closed upon itself with respect to the clean gas space, and is connected to a compressed air reservoir via a distributor pipe and a control valve. At least one of the rows of holes or slots of the partition and the respective row of filter elements in flow communication therewith are in flow communication with the discharge side of at least one injector acted upon with compressed air. At least one other row of the rows of holes or slots of the partition and the respective row of filter elements is in flow communication with the suction side of the injector.

  2. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  3. Using imputation and mixture model approaches to integrate multi-state capture-recapture models with assignment information.

    PubMed

    Wen, Zhi; Pollock, Kenneth H; Nichols, James D; Waser, Peter M; Cao, Weihua

    2014-06-01

    In this article, we first extend the superpopulation capture-recapture model to multiple states (locations or populations) for two age groups., Wen et al., (2011; 2013) developed a new approach combining capture-recapture data with population assignment information to estimate the relative contributions of in situ births and immigrants to the growth of a single study population. Here, we first generalize Wen et al., (2011; 2013) approach to a system composed of multiple study populations (multi-state) with two age groups, where an imputation approach is employed to account for the uncertainty inherent in the population assignment information. Then we develop a different, individual-level mixture model approach to integrate the individual-level population assignment information with the capture-recapture data. Our simulation and real data analyses show that the fusion of population assignment information with capture-recapture data allows us to estimate the origination-specific recruitment of new animals to the system and the dispersal process between populations within the system. Compared to a standard capture-recapture model, our new models improve the estimation of demographic parameters, including survival probability, origination-specific entry probability, and especially the probability of movement between populations, yielding higher accuracy and precision. PMID:24571715

  4. Identification of Pyrus Single Nucleotide Polymorphisms (SNPs) and Evaluation for Genetic Mapping in European Pear and Interspecific Pyrus Hybrids

    PubMed Central

    Troggio, Michela; Malnoy, Mickael; Velasco, Riccardo; Fontana, Paolo; Won, KyungHo; Durel, Charles-Eric; Perchepied, Laure; Schaffer, Robert; Wiedow, Claudia; Bus, Vincent; Brewer, Lester; Gardiner, Susan E.; Crowhurst, Ross N.; Chagn, David

    2013-01-01

    We have used new generation sequencing (NGS) technologies to identify single nucleotide polymorphism (SNP) markers from three European pear (Pyrus communis L.) cultivars and subsequently developed a subset of 1096 pear SNPs into high throughput markers by combining them with the set of 7692 apple SNPs on the IRSC apple Infinium II 8K array. We then evaluated this apple and pear Infinium II 9K SNP array for large-scale genotyping in pear across several species, using both pear and apple SNPs. The segregating populations employed for array validation included a segregating population of European pear (Old HomeבLouise Bon Jersey) and four interspecific breeding families derived from Asian (P. pyrifolia Nakai and P. bretschneideri Rehd.) and European pear pedigrees. In total, we mapped 857 polymorphic pear markers to construct the first SNP-based genetic maps for pear, comprising 78% of the total pear SNPs included in the array. In addition, 1031 SNP markers derived from apple (13% of the total apple SNPs included in the array) were polymorphic and were mapped in one or more of the pear populations. These results are the first to demonstrate SNP transferability across the genera Malus and Pyrus. Our construction of high density SNP-based and gene-based genetic maps in pear represents an important step towards the identification of chromosomal regions associated with a range of horticultural characters, such as pest and disease resistance, orchard yield and fruit quality. PMID:24155917

  5. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea.

    PubMed

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93?cM with an average inter-marker distance of 0.16?cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  6. Identification of Novel Single Nucleotide Polymorphisms (SNPs) in Deer (Odocoileus spp.) Using the BovineSNP50 BeadChip

    PubMed Central

    Haynes, Gwilym D.; Latch, Emily K.

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the FST-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1−30.1 million years before present). PMID:22590559

  7. Detection of associations with rare and common SNPs for quantitative traits: a nonparametric Bayes-based approach.

    PubMed

    Ding, Lili; Baye, Tesfaye M; He, Hua; Zhang, Xue; Kurowski, Brad G; Martin, Lisa J

    2011-01-01

    We propose a nonparametric Bayes-based clustering algorithm to detect associations with rare and common single-nucleotide polymorphisms (SNPs) for quantitative traits. Unlike current methods, our approach identifies associations with rare genetic variants at the variant level, not the gene level. In this method, we use a Dirichlet process prior for the distribution of SNP-specific regression coefficients, conduct hierarchical clustering with a distance measure derived from posterior pairwise probabilities of two SNPs having the same regression coefficient, and explore data-driven approaches to select the number of clusters. SNPs falling inside the largest cluster have relatively low or close to zero estimates of regression coefficients and are considered not associated with the trait. SNPs falling outside the largest cluster have relatively high estimates of regression coefficients and are considered potential risk variants. Using the data from the Genetic Analysis Workshop 17, we successfully detected associations with both rare and common SNPs for a quantitative trait. We conclude that our method provides a novel and broadly applicable strategy for obtaining association results with a reasonably low proportion of false discovery and that it can be routinely used in resequencing studies. PMID:22373351

  8. SNPs in the aryl hydrocarbon receptor-interacting protein gene associated with sporadic non-functioning pituitary adenoma

    PubMed Central

    HU, YESHUAI; YANG, JUN; CHANG, YONGKAI; MA, SHUNCHANG; QI, JIANFA

    2016-01-01

    Mutations in the aryl hydrocarbon receptor-interacting protein (AIP) gene have previously been associated with a predisposition to pituitary adenomas. However, to the best of our knowledge, mutations in AIP that relate specifically to sporadic non-functioning pituitary adenomas (NFPAs) have yet to be reported. Therefore, the present study aimed to identify single nucleotide polymorphisms (SNPs) in the AIP gene that may be associated with NFPAs. Peripheral blood samples and the entire coding sequence of the AIP gene from 56 patients with NFPAs and 56 controls were analyzed in triplicate. Of the 56 patients with NFPAs, 9 patients (16.1%) were identified as harboring five different SNPs, although no germline mutations in the AIP gene were detected in any of the patients. Three different SNPs (7051C>T, 8012G>C and 8020G>C) were identified in exons 4 and 6 in 3 different patients (each in 1 patient). Two different SNPs (7318C>A and 7886A>G) were identified in exons 5 and 6, respectively, in 6 different patients (each in 3 patients). No SNPs or germline mutations in the AIP gene were identified in the controls. The results of the present study suggested that mutations in the AIP gene might not have an important role in the tumorigenesis of NFPAs. However, further studies are required in order to investigate potential molecular and genetic mechanisms that may underlie the involvement of AIP in NFPA. PMID:26998050

  9. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  10. Mapping the genetic variation of regional brain volumes as explained by all common SNPs from the ADNI study.

    PubMed

    Bryant, Christopher; Giovanello, Kelly S; Ibrahim, Joseph G; Chang, Jing; Shen, Dinggang; Peterson, Bradley S; Zhu, Hongtu

    2013-01-01

    Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p?=?0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p?=?0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer's disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs. PMID:24015190

  11. In silico analysis of consequences of non-synonymous SNPs of Slc11a2 gene in Indian bovines.

    PubMed

    Patel, Shreya M; Koringa, Prakash G; Reddy, Bhaskar B; Nathani, Neelam M; Joshi, Chaitanya G

    2015-09-01

    The aim of our study was to analyze the consequences of non-synonymous SNPs in Slc11a2 gene using bioinformatic tools. There is a current need of efficient bioinformatic tools for in-depth analysis of data generated by the next generation sequencing technologies. SNPs are known to play an imperative role in understanding the genetic basis of many genetic diseases. Slc11a2 is one of the major metal transporter families in mammals and plays a critical role in host defenses. In this study, we performed a comprehensive analysis of the impact of all non-synonymous SNPs in this gene using multiple tools like SIFT, PROVEAN, I-Mutant and PANTHER. Among the total 124 SNPs obtained from amplicon sequencing of Slc11a2 gene by Ion Torrent PGM involving 10 individuals of Gir cattle and Murrah buffalo each, we found 22 non-synonymous. Comparing the prediction of these 4 methods, 5 nsSNPs (G369R, Y374C, A377V, Q385H and N492S) were identified as deleterious. In addition, while tested out for polar interactions with other amino acids in the protein, from above 5, Y374C, Q385H and N492S showed a change in interaction pattern and further confirmed by an increase in total energy after energy minimizations in case of mutant protein compared to the native. PMID:26484229

  12. Discovery of URAT1 SNPs and association between serum uric acid levels and URAT1

    PubMed Central

    Cho, Sung Kweon; Kim, Soriul; Chung, Jae-Yong; Jee, Sun Ha

    2015-01-01

    Objectives Human urate transporter 1 (URAT1) is a member of the organic anion transporter family (SLC22A12) that primarily regulates the renal tubular reabsorption of uric acid. This casecontrol study was designed to analyse whether hURAT1 might also be a candidate gene for hyperuricaemia or hypouricaemia. Setting We recruited 68 healthy volunteers and divided them into two groups: a normal uric acid group and a hyperuricaemia group. We analysed the sequence of the URAT1 gene and found five significant single nucleotide polymorphisms (SNPs). We then selected 900 male subjects from the 262?200 enrolled in the Korean Cancer Prevention Study-II (KCPS-II) cohort for further genetic analysis. Participants DNA samples from 36 individuals with normal uric acid (<4.5?mg/dL) and 32 individuals with hyperuricaemia (>8.5?mg/dL) were sequenced. Five significant SNPs (rs7929627, rs75786299, rs3825017, rs11602903 and rs121907892) were identified. We then chose 900 subjects from the KCPS-II cohort consisting of 450 subjects with normal uric acid (UA <4.1?mg/dL) and 450 subjects with hyperuricaemia (UA >8.7?mg/dL). The groups were matched by age, body mass index, metabolic syndrome and use of anti-hypertensive medication. Primary outcome measures We compared the OR of the incidence of hyperuricaemia by URAT1 genotype. Results The strongest association with hyperuricaemia was observed for rs75786299 (IVS3+11A/G) with an OR of 32.05. rs7929627 (IVS7-103A/G) and rs3825017 (N82N) showed an association with hyperuricaemia with ORs of 2.56 and 2.29, respectively. rs11602903 (788A/T) and rs121907892 (W258X) were negatively correlated with hyperuricaemia with ORs of 0.350 and 0.447, respectively. Individuals carrying the GATAG haplotype (n=32)a relatively common variant consisting of rs7929627, rs75786299 and rs3825017showed the highest risk for hyperuricaemia with an OR of 92.23 (p=9.5510?3). Conclusions These results indicate that five newly described SNPs in the hURAT1 gene are significantly associated with uric acid level (4-2008-0318 and 4-2011-0277). PMID:26603249

  13. Generic Kalman Filter Software

    NASA Technical Reports Server (NTRS)

    Lisano, Michael E., II; Crues, Edwin Z.

    2005-01-01

    The Generic Kalman Filter (GKF) software provides a standard basis for the development of application-specific Kalman-filter programs. Historically, Kalman filters have been implemented by customized programs that must be written, coded, and debugged anew for each unique application, then tested and tuned with simulated or actual measurement data. Total development times for typical Kalman-filter application programs have ranged from months to weeks. The GKF software can simplify the development process and reduce the development time by eliminating the need to re-create the fundamental implementation of the Kalman filter for each new application. The GKF software is written in the ANSI C programming language. It contains a generic Kalman-filter-development directory that, in turn, contains a code for a generic Kalman filter function; more specifically, it contains a generically designed and generically coded implementation of linear, linearized, and extended Kalman filtering algorithms, including algorithms for state- and covariance-update and -propagation functions. The mathematical theory that underlies the algorithms is well known and has been reported extensively in the open technical literature. Also contained in the directory are a header file that defines generic Kalman-filter data structures and prototype functions and template versions of application-specific subfunction and calling navigation/estimation routine code and headers. Once the user has provided a calling routine and the required application-specific subfunctions, the application-specific Kalman-filter software can be compiled and executed immediately. During execution, the generic Kalman-filter function is called from a higher-level navigation or estimation routine that preprocesses measurement data and post-processes output data. The generic Kalman-filter function uses the aforementioned data structures and five implementation- specific subfunctions, which have been developed by the user on the basis of the aforementioned templates. The GKF software can be used to develop many different types of unfactorized Kalman filters. A developer can choose to implement either a linearized or an extended Kalman filter algorithm, without having to modify the GKF software. Control dynamics can be taken into account or neglected in the filter-dynamics model. Filter programs developed by use of the GKF software can be made to propagate equations of motion for linear or nonlinear dynamical systems that are deterministic or stochastic. In addition, filter programs can be made to operate in user-selectable "covariance analysis" and "propagation-only" modes that are useful in design and development stages.

  14. Concentric Split Flow Filter

    NASA Technical Reports Server (NTRS)

    Stapleton, Thomas J. (Inventor)

    2015-01-01

    A concentric split flow filter may be configured to remove odor and/or bacteria from pumped air used to collect urine and fecal waste products. For instance, filter may be designed to effectively fill the volume that was previously considered wasted surrounding the transport tube of a waste management system. The concentric split flow filter may be configured to split the air flow, with substantially half of the air flow to be treated traveling through a first bed of filter media and substantially the other half of the air flow to be treated traveling through the second bed of filter media. This split flow design reduces the air velocity by 50%. In this way, the pressure drop of filter may be reduced by as much as a factor of 4 as compare to the conventional design.

  15. Optically tunable optical filter

    NASA Astrophysics Data System (ADS)

    James, Robert T. B.; Wah, Christopher; Iizuka, Keigo; Shimotahira, Hiroshi

    1995-12-01

    We experimentally demonstrate an optically tunable optical filter that uses photorefractive barium titanate. With our filter we implement a spectrum analyzer at 632.8 nm with a resolution of 1.2 nm. We simulate a wavelength-division multiplexing system by separating two semiconductor laser diodes, at 1560 nm and 1578 nm, with the same filter. The filter has a bandwidth of 6.9 nm. We also use the same filter to take 2.5-nm-wide slices out of a 20-nm-wide superluminescent diode centered at 840 nm. As a result, we experimentally demonstrate a phenomenal tuning range from 632.8 to 1578 nm with a single filtering device.

  16. Contactor/filter improvements

    DOEpatents

    Stelman, D.

    1988-06-30

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream is described. The filter includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. A cover screen isolates the filter element from contact with the moving granular bed. In one embodiment, the granular material is comprised of porous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses. 6 figs.

  17. Filter vapor trap

    DOEpatents

    Guon, Jerold

    1976-04-13

    A sintered filter trap is adapted for insertion in a gas stream of sodium vapor to condense and deposit sodium thereon. The filter is heated and operated above the melting temperature of sodium, resulting in a more efficient means to remove sodium particulates from the effluent inert gas emanating from the surface of a liquid sodium pool. Preferably the filter leaves are precoated with a natrophobic coating such as tetracosane.

  18. Practical alarm filtering

    SciTech Connect

    Bray, M.; Corsberg, D. )

    1994-02-01

    An expert system-based alarm filtering method is described which prioritizes and reduces the number of alarms facing an operator. This patented alarm filtering methodology was originally developed and implemented in a pressurized water reactor, and subsequently in a chemical processing facility. Both applications were in LISP and both were successful. In the chemical processing facility, for instance, alarm filtering reduced the quantity of alarm messages by 90%. 6 figs.

  19. Hybrid Filter Membrane

    NASA Technical Reports Server (NTRS)

    Laicer, Castro; Rasimick, Brian; Green, Zachary

    2012-01-01

    Cabin environmental control is an important issue for a successful Moon mission. Due to the unique environment of the Moon, lunar dust control is one of the main problems that significantly diminishes the air quality inside spacecraft cabins. Therefore, this innovation was motivated by NASA s need to minimize the negative health impact that air-suspended lunar dust particles have on astronauts in spacecraft cabins. It is based on fabrication of a hybrid filter comprising nanofiber nonwoven layers coated on porous polymer membranes with uniform cylindrical pores. This design results in a high-efficiency gas particulate filter with low pressure drop and the ability to be easily regenerated to restore filtration performance. A hybrid filter was developed consisting of a porous membrane with uniform, micron-sized, cylindrical pore channels coated with a thin nanofiber layer. Compared to conventional filter media such as a high-efficiency particulate air (HEPA) filter, this filter is designed to provide high particle efficiency, low pressure drop, and the ability to be regenerated. These membranes have well-defined micron-sized pores and can be used independently as air filters with discreet particle size cut-off, or coated with nanofiber layers for filtration of ultrafine nanoscale particles. The filter consists of a thin design intended to facilitate filter regeneration by localized air pulsing. The two main features of this invention are the concept of combining a micro-engineered straight-pore membrane with nanofibers. The micro-engineered straight pore membrane can be prepared with extremely high precision. Because the resulting membrane pores are straight and not tortuous like those found in conventional filters, the pressure drop across the filter is significantly reduced. The nanofiber layer is applied as a very thin coating to enhance filtration efficiency for fine nanoscale particles. Additionally, the thin nanofiber coating is designed to promote capture of dust particles on the filter surface and to facilitate dust removal with pulse or back airflow.

  20. Searching for candidate genes in acute lung injury: SNPs, Chips and PBEF.

    PubMed

    Garcia, Joe G N

    2005-01-01

    Acute lung injury (ALI) is a devastating illness, occurring in the setting of sepsis, with genetic variations contributing to ALI susceptibility and severity. We utilized the "candidate gene approach" with extensive expression profiling in animal and human ALI models to identify novel candidate genes. We noted significant expression of pre-B-cell colony enhancing factor (PBEF), a gene not previously associated with lung pathophysiology. This finding was validated by molecular, biochemical and immunohistochemical approaches with increased levels of PBEF also detected in human BAL and serum. DNA sequencing identified two single nucleotide polymorphisms (SNPs) in the PBEF promoter (T-1001G, C-1543T), which were genotyped in a Caucasian cohort of sepsis-associated ALI patients. Carriers of the GC haplotype exhibited a 5.7-fold relative ALI risk compared to controls associated with increased PBEF promoter activity. These studies demonstrate the successful application of genomic technologies in the identification of novel candidate genes in complex lung disease. PMID:16555615

  1. Nanofiber Filters Eliminate Contaminants

    NASA Technical Reports Server (NTRS)

    2009-01-01

    With support from Phase I and II SBIR funding from Johnson Space Center, Argonide Corporation of Sanford, Florida tested and developed its proprietary nanofiber water filter media. Capable of removing more than 99.99 percent of dangerous particles like bacteria, viruses, and parasites, the media was incorporated into the company's commercial NanoCeram water filter, an inductee into the Space Foundation's Space Technology Hall of Fame. In addition to its drinking water filters, Argonide now produces large-scale nanofiber filters used as part of the reverse osmosis process for industrial water purification.

  2. Independent task Fourier filters

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    2001-11-01

    Since the early 1960s, a major part of optical computing systems has been Fourier pattern recognition, which takes advantage of high speed filter changes to enable powerful nonlinear discrimination in `real time.' Because filter has a task quite independent of the tasks of the other filters, they can be applied and evaluated in parallel or, in a simple approach I describe, in sequence very rapidly. Thus I use the name ITFF (independent task Fourier filter). These filters can also break very complex discrimination tasks into easily handled parts, so the wonderful space invariance properties of Fourier filtering need not be sacrificed to achieve high discrimination and good generalizability even for ultracomplex discrimination problems. The training procedure proceeds sequentially, as the task for a given filter is defined a posteriori by declaring it to be the discrimination of particular members of set A from all members of set B with sufficient margin. That is, we set the threshold to achieve the desired margin and note the A members discriminated by that threshold. Discriminating those A members from all members of B becomes the task of that filter. Those A members are then removed from the set A, so no other filter will be asked to perform that already accomplished task.

  3. Birefringent filter design

    NASA Technical Reports Server (NTRS)

    Bair, Clayton H. (Inventor)

    1991-01-01

    A birefringent filter is provided for tuning the wavelength of a broad band emission laser. The filter comprises thin plates of a birefringent material having thicknesses which are non-unity, integral multiples of the difference between the thicknesses of the two thinnest plates. The resulting wavelength selectivity is substantially equivalent to the wavelength selectivity of a conventional filter which has a thinnest plate having a thickness equal to this thickness difference. The present invention obtains an acceptable tuning of the wavelength while avoiding a decrease in optical quality associated with conventional filters wherein the respective plate thicknesses are integral multiples of the thinnest plate.

  4. Linear phase compressive filter

    DOEpatents

    McEwan, T.E.

    1995-06-06

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line. 2 figs.

  5. Linear phase compressive filter

    DOEpatents

    McEwan, Thomas E. (Livermore, CA)

    1995-01-01

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line.

  6. MICA SNPs and the NKG2D system in virus-induced HCC.

    PubMed

    Goto, Kaku; Kato, Naoya

    2015-03-01

    Hepatocellular carcinoma (HCC) is one of the most frequent causes of cancer-related death globally. Above well-known risk factors for HCC development ranging from various toxins to diseases such as diabetes mellitus, chronic infection with hepatitis B virus and hepatitis C virus (HCV) poses the most serious threat, constituting the cause in more than 80% of cases. In addition to the viral genes intensively investigated, the pathophysiological importance of host genetic factors has also been greatly and increasingly appreciated. Genome-wide association studies (GWAS) comprehensively search the host genome at the single-nucleotide level, and have successfully identified the genomic region associated with a whole variety of diseases. With respect to HCC, there have been reports from several groups on single nucleotide polymorphisms (SNPs) associated with hepatocarcinogenesis, among which was our GWAS discovering MHC class I polypeptide-related sequence A (MICA) as a susceptibility gene for HCV-induced HCC. MICA is a natural killer (NK) group 2D (NKG2D) ligand, whose interaction with NKG2D triggers NK cell-mediated cytotoxicity toward the target cells, and is a key molecule in tumor immune surveillance as its expression is induced on stressed cells such as transformed tumor cells for the detection by NK cells. In this review, the latest understanding of the MICA-NKG2D system in viral HCC, particularly focused on its antitumor properties and the involvement of MICA SNPs, is summarized, followed by a discussion of targets for state-of-the-art cancer immunotherapy with personalized medicine in view. PMID:25270965

  7. Disrupted-in-Schizophrenia-1 SNPs and Susceptibility to Schizophrenia: Evidence from Malaysia

    PubMed Central

    Kartini, Abdullah; Norsidah, Kuzaifah; Ramli, Musa; Tariq, Abdul Razak; Wan Rohani, Wan Taib

    2015-01-01

    Objective Even though the role of the DICS1 gene as a risk factor for schizophrenia is still unclear, there is substantial evidence from functional and cell biology studies that supports the connection of the gene with schizophrenia. The studies associating the DISC1 gene with schizophrenia in Asian populations are limited to East-Asian populations. Our study examined several DISC1 markers of schizophrenia that were identified in the Caucasian and East-Asian populations in Malaysia and assessed the role of rs2509382, which is located at 11q14.3, the mutual translocation region of the famous DISC1 translocation [t (1; 11) (p42.1; q14.3)]. Methods We genotyped eleven single-neucleotide polymorphism (SNPs) within or related to DISC1 (rs821597, rs821616, rs4658971, rs1538979, rs843979, rs2812385, rs1407599, rs4658890, and rs2509382) using the PCR-RFLP methods. Results In all, there were 575 participants (225 schizophrenic patients and 350 healthy controls) of either Malay or Chinese ethnicity. The case-control analyses found two SNPs that were associated with schizophrenia [rs4658971 (p=0.030; OR=1.43 (1.35-1.99) and rs1538979-(p=0.036; OR=1.35 (1.02-1.80)] and rs2509382-susceptibility among the males schizophrenics [p=0.0082; OR=2.16 (1.22-3.81)]. This is similar to the meta-analysis findings for the Caucasian populations. Conclusion The study supports the notion that the DISC1 gene is a marker of schizophrenia susceptibility and that rs2509382 in the mutual DISC1 translocation region is a susceptibility marker for schizophrenia among males in Malaysia. However, the finding of the study is limited due to possible genetic stratification and the small sample size. PMID:25670952

  8. Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion

    PubMed Central

    Vinkhuyzen, A A E; Pedersen, N L; Yang, J; Lee, S H; Magnusson, P K E; Iacono, W G; McGue, M; Madden, P A F; Heath, A C; Luciano, M; Payton, A; Horan, M; Ollier, W; Pendleton, N; Deary, I J; Montgomery, G W; Martin, N G; Visscher, P M; Wray, N R

    2012-01-01

    The personality traits of neuroticism and extraversion are predictive of a number of social and behavioural outcomes and psychiatric disorders. Twin and family studies have reported moderate heritability estimates for both traits. Few associations have been reported between genetic variants and neuroticism/extraversion, but hardly any have been replicated. Moreover, the ones that have been replicated explain only a small proportion of the heritability (<∼2%). Using genome-wide single-nucleotide polymorphism (SNP) data from ∼12 000 unrelated individuals we estimated the proportion of phenotypic variance explained by variants in linkage disequilibrium with common SNPs as 0.06 (s.e.=0.03) for neuroticism and 0.12 (s.e.=0.03) for extraversion. In an additional series of analyses in a family-based sample, we show that while for both traits ∼45% of the phenotypic variance can be explained by pedigree data (that is, expected genetic similarity) one third of this can be explained by SNP data (that is, realized genetic similarity). A part of the so-called ‘missing heritability' has now been accounted for, but some of the reported heritability is still unexplained. Possible explanations for the remaining missing heritability are that: (i) rare variants that are not captured by common SNPs on current genotype platforms make a major contribution; and/ or (ii) the estimates of narrow sense heritability from twin and family studies are biased upwards, for example, by not properly accounting for nonadditive genetic factors and/or (common) environmental factors. PMID:22832902

  9. Allelic Spectra of Risk SNPs Are Different for Environment/Lifestyle Dependent versus Independent Diseases

    PubMed Central

    Amos, Christopher I.

    2015-01-01

    Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning “environment” or “lifestyle” AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases. PMID:26201053

  10. The evolutionary history of Afrocanarian blue tits inferred from genomewide SNPs.

    PubMed

    Gohli, Jostein; Leder, Erica H; Garcia-Del-Rey, Eduardo; Johannessen, Lars Erik; Johnsen, Arild; Laskemoen, Terje; Popp, Magnus; Lifjeld, Jan T

    2015-01-01

    A common challenge in phylogenetic reconstruction is to find enough suitable genomic markers to reliably trace splitting events with short internodes. Here, we present phylogenetic analyses based on genomewide single-nucleotide polymorphisms (SNPs) of an enigmatic avian radiation, the subspecies complex of Afrocanarian blue tits (Cyanistes teneriffae). The two sister species, the Eurasian blue tit (Cyanistes caeruleus) and the azure tit (Cyanistes cyanus), constituted the out-group. We generated a large data set of SNPs for analysis of population structure and phylogeny. We also adapted our protocol to utilize degraded DNA from old museum skins from Libya. We found strong population structuring that largely confirmed subspecies monophyly and constructed a coalescent-based phylogeny with full support at all major nodes. The results are consistent with a recent hypothesis that La Palma and Libya are relic populations of an ancient Afrocanarian blue tit, although a small data set for Libya could not resolve its position relative to La Palma. The birds on the eastern islands of Fuerteventura and Lanzarote are similar to those in Morocco. Together they constitute the sister group to the clade containing the other Canary Islands (except La Palma), in which El Hierro is sister to the three central islands. Hence, extant Canary Islands populations seem to originate from multiple independent colonization events. We also found population divergences in a key reproductive trait, viz. sperm length, which may constitute reproductive barriers between certain populations. We recommend a taxonomic revision of this polytypic species, where several subspecies should qualify for species rank. PMID:25407440

  11. Assignment of Y-chromosomal SNPs found in Japanese population to Y-chromosomal haplogroup tree.

    PubMed

    Naitoh, Sae; Kasahara-Nonaka, Iku; Minaguchi, Kiyoshi; Nambiar, Phrabhakaran

    2013-04-01

    The relationship between Y-chromosome single-nucleotide polymorphisms (SNPs) registered in the Japanese SNP (JSNP) database (http://snp.ims.u-tokyo.ac.jp) and Y-binary haplogroup lineages was investigated to identify new Y-chromosomal binary haplogroup markers and further refine Y-chromosomal haplogroup classification in the Japanese population. We used SNPs for which it was possible to construct primers to make Y-specific PCR product sizes small enough to obtain amplification products even from degraded DNA, as this would allow their use not only in genetic but also in archeological and forensic studies. The genotype of 35 JSNP markers were determined, of which 14 were assigned to appropriate positions on the Y-chromosomal haplogroup tree, together with 5 additional new non-JSNP markers. These markers defined 14 new branches (C3/64562+13, C3/2613-27, D2a1b/006841*, D2a1b/119166-11A, D2a/022456*, D2a/119166-11A, D2a/119167rec/119167-40rec*, D2a/75888-GC, O3a3c/075888-9T/10T*, O3a3c/075888-9T/9T, O3a3/8425+6, O3a3/119166-13A*, O3a3/008002 and O3a4/037852) and 21 new internal markers on the 2008 Y-chromosome haplogroup tree. These results will provide useful information for Y-chromosomal polymorphic studies of East Asian populations, particularly those in and around Japan, in the fields of anthropology, genetics and forensics. PMID:23389242

  12. A consensus linkage map of the grass carp (Ctenopharyngodon idella) based on microsatellites and SNPs

    PubMed Central

    2010-01-01

    Background Grass carp (Ctenopharyngodon idella) belongs to the family Cyprinidae which includes more than 2000 fish species. It is one of the most important freshwater food fish species in world aquaculture. A linkage map is an essential framework for mapping traits of interest and is often the first step towards understanding genome evolution. The aim of this study is to construct a first generation genetic map of grass carp using microsatellites and SNPs to generate a new resource for mapping QTL for economically important traits and to conduct a comparative mapping analysis to shed new insights into the evolution of fish genomes. Results We constructed a first generation linkage map of grass carp with a mapping panel containing two F1 families including 192 progenies. Sixteen SNPs in genes and 263 microsatellite markers were mapped to twenty-four linkage groups (LGs). The number of LGs was corresponding to the haploid chromosome number of grass carp. The sex-specific map was 1149.4 and 888.8 cM long in females and males respectively whereas the sex-averaged map spanned 1176.1 cM. The average resolution of the map was 4.2 cM/locus. BLAST searches of sequences of mapped markers of grass carp against the whole genome sequence of zebrafish revealed substantial macrosynteny relationship and extensive colinearity of markers between grass carp and zebrafish. Conclusions The linkage map of grass carp presented here is the first linkage map of a food fish species based on co-dominant markers in the family Cyprinidae. This map provides a valuable resource for mapping phenotypic variations and serves as a reference to approach comparative genomics and understand the evolution of fish genomes and could be complementary to grass carp genome sequencing project. PMID:20181260

  13. Replication study of 34 common SNPs associated with prostate cancer in the Romanian population.

    PubMed

    Jinga, Viorel; Csiki, Irma Eva; Manolescu, Andrei; Iordache, Paul; Mates, Ioan Nicolae; Radavoi, Daniel; Rascu, Stefan; Badescu, Daniel; Badea, Paula; Mates, Dana

    2016-04-01

    Prostate cancer is the third-most common form of cancer in men in Romania. The Romanian unscreened population represents a good sample to study common genetic risk variants. However, a comprehensive analysis has not been conducted yet. Here, we report our replication efforts in a Romanian population of 979 cases and 1027 controls, for potential association of 34 literature-reported single nucleotide polymorphisms (SNPs) with prostate cancer. We also examined whether any SNP was differentially associated with tumour grade or stage at diagnosis, with disease aggressiveness, and with the levels of PSA (prostate specific antigen). In the allelic analysis, we replicated the previously reported risk for 19 loci on 4q24, 6q25.3, 7p15.2, 8q24.21, 10q11.23, 10q26.13, 11p15.5, 11q13.2, 11q13.3. Statistically significant associations were replicated for other six SNPs only with a particular disease phenotype: low-grade tumour and low PSA levels (rs1512268), high PSA levels (rs401681 and rs11649743), less aggressive cancers (rs1465618, rs721048, rs17021918). The strongest association of our tested SNP's with PSA in controls was for rs2735839, with 29% increase for each copy of the major allele G, consistent with previous results. Our results suggest that rs4962416, previously associated only with prostate cancer, is also associated with PSA levels, with 12% increase for each copy of the minor allele C. The study enabled the replication of the effect for the majority of previously reported genetic variants in a set of clinically relevant prostate cancers. This is the first replication study on these loci, known to associate with prostate cancer, in a Romanian population. PMID:26773531

  14. SNPs and breast cancer risk prediction for African American and Hispanic women.

    PubMed

    Allman, Richard; Dite, Gillian S; Hopper, John L; Gordon, Ora; Starlard-Davenport, Athena; Chlebowski, Rowan; Kooperberg, Charles

    2015-12-01

    For African American or Hispanic women, the extent to which clinical breast cancer risk prediction models are improved by including information on susceptibility single nucleotide polymorphisms (SNPs) is unknown, even though these women comprise increasing proportions of the US population and represent a large proportion of the world's population. We studied 7539 African American and 3363 Hispanic women from the Women's Health Initiative. The age-adjusted 5-year risks from the BCRAT and IBIS risk prediction models were measured and combined with a risk score based on >70 independent susceptibility SNPs. Logistic regression, adjusting for age group, was used to estimate risk associations with log-transformed age-adjusted 5-year risks. Discrimination was measured by the odds ratio (OR) per standard deviation (SD) and the area under the receiver operator curve (AUC). When considered alone, the ORs for African American women were 1.28 for BCRAT, and 1.04 for IBIS. When combined with the SNP risk score (OR 1.23), the corresponding ORs were 1.39 and 1.22. For Hispanic women the corresponding ORs were 1.25 for BCRAT, and 1.15 for IBIS. When combined with the SNP risk score (OR 1.39), the corresponding ORs were 1.48 and 1.42. There was no evidence that any of the combined models were not well calibrated. Including information on known breast cancer susceptibility loci provides approximately 10 and 19% improvement in risk prediction using BCRAT for African Americans and Hispanics, respectively. The corresponding figures for IBIS are approximately 18 and 26%, respectively. PMID:26589314

  15. The identification of trans-associations between prostate cancer GWAS SNPs and RNA expression differences in tumor-adjacent stroma

    PubMed Central

    Chen, Xin; McClelland, Michael; Jia, Zhenyu; Rahmatpanah, Farah B.; Sawyers, Anne; Trent, Jeffrey; Duggan, David; Mercola, Dan

    2015-01-01

    Here we tested the hypothesis that SNPs associated with prostate cancer risk, might differentially affect RNA expression in prostate cancer stroma. The most significant 35 SNP loci were selected from Genome Wide Association (GWA) studies of ~40,000 patients. We also selected 4030 transcripts previously associated with prostate cancer diagnosis and prognosis. eQTL analysis was carried out by a modified BAYES method to analyze the associations between the risk variants and expressed transcripts jointly in a single model. We observed 47 significant associations between eight risk variants and the expression patterns of 46 genes. This is the first study to identify associations between multiple SNPs and multiple in trans gene expression differences in cancer stroma. Potentially, a combination of SNPs and associated expression differences in prostate stroma may increase the power of risk assessment for individuals, and for cancer progression. PMID:25638161

  16. SNPs in genes implicated in radiation response are associated with radiotoxicity and evoke roles as predictive and prognostic biomarkers

    PubMed Central

    2013-01-01

    Background Biomarkers are needed to individualize cancer radiation treatment. Therefore, we have investigated the association between various risk factors, including single nucleotide polymorphisms (SNPs) in candidate genes and late complications to radiotherapy in our nasopharyngeal cancer patients. Methods A cohort of 155 patients was included. Normal tissue fibrosis was scored using RTOG/EORTC grading system. A total of 45 SNPs in 11 candidate genes (ATM, XRCC1, XRCC3, XRCC4, XRCC5, PRKDC, LIG4, TP53, HDM2, CDKN1A, TGFB1) were genotyped by direct genomic DNA sequencing. Patients with severe fibrosis (cases, G3-4, n?=?48) were compared to controls (G0-2, n?=?107). Results Univariate analysis showed significant association (P?SNPs (ATM G/A rs1801516, HDM2 promoter T/G rs2279744 and T/A rs1196333, XRCC1 G/A rs25487, XRCC5 T/C rs1051677 and TGFB1 C/T rs1800469). In addition, Kaplan-Meier analyses have also highlighted significant association between genotypes and length of patients follow-up after radiotherapy. Multivariate logistic regression has further sustained these results suggesting predictive and prognostic roles of SNPs. Conclusions Univariate and multivariate analysis suggest that radiation toxicity in radiotherapy patients are associated with certain SNPs, in genes including HDM2 promoter studied for the 1st time. These results support the use of SNPs as genetic predictive markers for clinical radiosensitivity and evoke a prognostic role for length of patients follow-up after radiotherapy. PMID:23697595

  17. Uneven-order decentered Shapiro filters for boundary filtering

    NASA Astrophysics Data System (ADS)

    Falissard, F.

    2015-07-01

    This paper addresses the use of Shapiro filters for boundary filtering. A new class of uneven-order decentered Shapiro filters is proposed and compared to classical Shapiro filters and even-order decentered Shapiro filters. The theoretical analysis shows that the proposed boundary filters are more accurate than the centered Shapiro filters and more robust than the even-order decentered boundary filters usable at the same distance to the boundary. The benefit of the new boundary filters is assessed for computations using the compressible Euler equations.

  18. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, T.E.; Alvin, M.A.; Bruck, G.J.; Smeltzer, E.E.

    1999-03-02

    A filter holder and gasket assembly are disclosed for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut. 9 figs.

  19. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, Thomas Edwin (Murrysville, PA); Alvin, Mary Anne (Pittsburgh, PA); Bruck, Gerald Joseph (Murrysville, PA); Smeltzer, Eugene E. (Export, PA)

    1999-03-02

    A filter holder and gasket assembly for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut.

  20. A Global View of 54,001 Single Nucleotide Polymorphisms (SNPs) on the Illumina BovineSNP50 BeadChip and Their Transferability to Water Buffalo

    PubMed Central

    Michelizzi, Vanessa N.; Wu, Xiaolin; Dodson, Michael V.; Michal, Jennifer J.; Zambrano-Varon, Jorge; McLean, Derek J.; Jiang, Zhihua

    2011-01-01

    The Illumina BovineSNP50 BeadChip features 54,001 informative single nucleotide polymorphisms (SNPs) that uniformly span the entire bovine genome. Among them, 52,255 SNPs have locations assigned in the current genome assembly (Btau_4.0), including 19,294 (37%) intragenic SNPs (i.e., located within genes) and 32,961 (63%) intergenic SNPs (i.e., located between genes). While the SNPs represented on the Illumina Bovine50K BeadChip are evenly distributed along each bovine chromosome, there are over 14,000 genes that have no SNPs placed on the current BeadChip. Kernel density estimation, a non-parametric method, was used in the present study to identify SNP-poor and SNP-rich regions on each bovine chromosome. With bandwidth = 0.05 Mb, we observed that most regions have SNP densities within 2 standard deviations of the chromosome SNP density mean. The SNP density on chromosome X was the most dynamic, with more than 30 SNP-rich regions and at least 20 regions with no SNPs. Genotyping ten water buffalo using the Illumina BovineSNP50 BeadChip revealed that 41,870 of the 54,001 SNPs are fully scored on all ten water buffalo, but 6,771 SNPs are partially scored on one to nine animals. Both fully scored and partially/no scored SNPs are clearly clustered with various sizes on each chromosome. However, among 43,687 bovine SNPs that were successfully genotyped on nine and ten water buffalo, only 1,159 were polymorphic in the species. These results indicate that the SNPs sites, but not the polymorphisms, are conserved between two species. Overall, our present study provides a solid foundation to further characterize the SNP evolutionary process, thus improving understanding of within- and between-species biodiversity, phylogenetics and adaption to environmental changes. PMID:21209788

  1. Genetic polymorphism and prostate cancer aggressiveness: A case-only study of 1536 GWAS and candidate SNPs in African Americans and European Americans

    PubMed Central

    Bensen, Jeannette T.; Xu, Zongli; Smith, Gary J.; Mohler, James L.; Fontham, Elizabeth T.H.; Taylor, Jack A.

    2012-01-01

    BACKGROUND Genome-wide association studies have established a number of replicated single nucleotide polymorphisms (SNPs) for susceptibility to prostate cancer (CaP), but it is unclear whether these susceptibility SNPs are also associated with disease aggressiveness. This study evaluates whether such replication SNPs or other candidate SNPs are associated with CaP aggressiveness in African-American (AA) and European-American (EA) men. METHODS A 1,536 SNP panel which included 34 genome-wide association study (GWAS) replication SNPs, 38 flanking SNPs, a set of ancestry informative markers, and SNPs in candidate genes and other areas was genotyped in 1,060 AA and 1,087 EA men with incident CaP from the North Carolina-Louisiana Prostate Cancer Project (PCaP). Tests for association were conducted using ordinal logistic regression with a log-additive genotype model and a 3-category CaP aggressiveness variable. RESULTS 4 GWAS replication SNPs (rs2660753, rs13254738, rs10090154, rs2735839) and 7 flanking SNPs were associated with CaP aggressiveness (P<0.05) in 3 genomic regions: one at 3p12 (EA), 7 at 8q24 (5 AA, 2 EA), and 3 at 19q13 at the kallilkrein-related peptidase 3 (KLK3) locus (2 AA, 1 AA and EA). The KLK3 SNPs also were associated with serum prostate-specific antigen (PSA) levels in AA (p < 0.001) but not in EA. A number of the other SNPs showed some evidence of association but none met study-wide significance levels after adjusting for multiple comparisons. CONCLUSIONS Some replicated GWAS susceptibility SNPs may play a role in CaP aggressiveness. However, like susceptibility, these associations are not consistent between racial groups. PMID:22549899

  2. Durability of ceramic filters

    SciTech Connect

    Alvin, M.A.; Tressler, R.E.; Lippert, T.E.; Diaz, E.S.; Smeltzer, E.E.

    1994-10-01

    The objectives of this program are to identify the potential long-term thermal/chemical effects that advanced coal-based power generating systems have on the stability of porous ceramic filter materials, as well as to assess the influence of these effects on filter operating performance and life.

  3. Tracking harmonic notch filter

    NASA Astrophysics Data System (ADS)

    Emo, Frederick L.

    1990-07-01

    Disclosed in this patent is an electronic filter for automatically tracking and removing harmonically related interfering electrical signals such as power line interference harmonics without attenuating other signals of interest even though the signals are frequency stable and/or near the interference signal frequencies. The filter comprises a very narrow band electronic commutated capacitor-bank comb-notch filter driven by a counter/decoder circuit which is in turn driven by a phase locked loop. The filter also comprises two narrow band analog filters tuned to the two lowest harmonics of the interfering signal and drives the comb-notch at unit multiples of the fundamental of the interference frequency. This action is continuous such that center frequencies of the notches are automatically adjusted to compensate for small variations in the interference frequency.

  4. Sub-micron filter

    DOEpatents

    Tepper, Frederick; Kaledin, Leonid

    2009-10-13

    Aluminum hydroxide fibers approximately 2 nanometers in diameter and with surface areas ranging from 200 to 650 m.sup.2/g have been found to be highly electropositive. When dispersed in water they are able to attach to and retain electronegative particles. When combined into a composite filter with other fibers or particles they can filter bacteria and nano size particulates such as viruses and colloidal particles at high flux through the filter. Such filters can be used for purification and sterilization of water, biological, medical and pharmaceutical fluids, and as a collector/concentrator for detection and assay of microbes and viruses. The alumina fibers are also capable of filtering sub-micron inorganic and metallic particles to produce ultra pure water. The fibers are suitable as a substrate for growth of cells. Macromolecules such as proteins may be separated from each other based on their electronegative charges.

  5. Implicit Kalman filtering

    NASA Technical Reports Server (NTRS)

    Skliar, M.; Ramirez, W. F.

    1997-01-01

    For an implicitly defined discrete system, a new algorithm for Kalman filtering is developed and an efficient numerical implementation scheme is proposed. Unlike the traditional explicit approach, the implicit filter can be readily applied to ill-conditioned systems and allows for generalization to descriptor systems. The implementation of the implicit filter depends on the solution of the congruence matrix equation (A1)(Px)(AT1) = Py. We develop a general iterative method for the solution of this equation, and prove necessary and sufficient conditions for convergence. It is shown that when the system matrices of an implicit system are sparse, the implicit Kalman filter requires significantly less computer time and storage to implement as compared to the traditional explicit Kalman filter. Simulation results are presented to illustrate and substantiate the theoretical developments.

  6. Sintered composite filter

    DOEpatents

    Bergman, W.

    1986-05-02

    A particulate filter medium formed of a sintered composite of 0.5 micron diameter quartz fibers and 2 micron diameter stainless steel fibers is described. Preferred composition is about 40 vol.% quartz and about 60 vol.% stainless steel fibers. The media is sintered at about 1100/sup 0/C to bond the stainless steel fibers into a cage network which holds the quartz fibers. High filter efficiency and low flow resistance are provided by the smaller quartz fibers. High strength is provided by the stainless steel fibers. The resulting media has a high efficiency and low pressure drop similar to the standard HEPA media, with tensile strength at least four times greater, and a maximum operating temperature of about 550/sup 0/C. The invention also includes methods to form the composite media and a HEPA filter utilizing the composite media. The filter media can be used to filter particles in both liquids and gases.

  7. Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

    NASA Astrophysics Data System (ADS)

    Manago, K. F.; Hogue, T. S.; Hering, A. S.

    2014-12-01

    In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.

  8. BIREFRINGENT FILTER MODEL

    NASA Technical Reports Server (NTRS)

    Cross, P. L.

    1994-01-01

    Birefringent filters are often used as line-narrowing components in solid state lasers. The Birefringent Filter Model program generates a stand-alone model of a birefringent filter for use in designing and analyzing a birefringent filter. It was originally developed to aid in the design of solid state lasers to be used on aircraft or spacecraft to perform remote sensing of the atmosphere. The model is general enough to allow the user to address problems such as temperature stability requirements, manufacturing tolerances, and alignment tolerances. The input parameters for the program are divided into 7 groups: 1) general parameters which refer to all elements of the filter; 2) wavelength related parameters; 3) filter, coating and orientation parameters; 4) input ray parameters; 5) output device specifications; 6) component related parameters; and 7) transmission profile parameters. The program can analyze a birefringent filter with up to 12 different components, and can calculate the transmission and summary parameters for multiple passes as well as a single pass through the filter. The Jones matrix, which is calculated from the input parameters of Groups 1 through 4, is used to calculate the transmission. Output files containing the calculated transmission or the calculated Jones' matrix as a function of wavelength can be created. These output files can then be used as inputs for user written programs. For example, to plot the transmission or to calculate the eigen-transmittances and the corresponding eigen-polarizations for the Jones' matrix, write the appropriate data to a file. The Birefringent Filter Model is written in Microsoft FORTRAN 2.0. The program format is interactive. It was developed on an IBM PC XT equipped with an 8087 math coprocessor, and has a central memory requirement of approximately 154K. Since Microsoft FORTRAN 2.0 does not support complex arithmetic, matrix routines for addition, subtraction, and multiplication of complex, double precision variables are included. The Birefringent Filter Model was written in 1987.

  9. ReMo-SNPs: a new software tool for identification of polymorphisms in regions and motifs genome-wide.

    PubMed

    Graae, Lisette; Paddock, Silvia; Belin, Andrea Carmine

    2015-01-01

    Studies of complex genetic diseases have revealed many risk factors of small effect, but the combined amount of heritability explained is still low. Genome-wide association studies are often underpowered to identify true effects because of the very large number of parallel tests. There is, therefore, a great need to generate data sets that are enriched for those markers that have an increased a priori chance of being functional, such as markers in genomic regions involved in gene regulation. ReMo-SNPs is a computational program developed to aid researchers in the process of selecting functional SNPs for association analyses in user-specified regions and/or motifs genome-wide. The useful feature of automatic selection of genotyped markers in the user-provided material makes the output data ready to be used in a following association study. In this article we describe the program and its functions. We also validate the program by including an example study on three different transcription factors and results from an association study on two psychiatric phenotypes. The flexibility of the ReMo-SNPs program enables the user to study any region or sequence of interest, without limitation to transcription factor binding regions and motifs. The program is freely available at: http://www.neuro.ki.se/ReMo-SNPs/. PMID:25882789

  10. Association between IL-10a SNPs and resistance to cyprinid herpesvirus-3 infection in common carp (Cyprinus carpio)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Analysis of gene polymorphisms and disease association is essential for assessing putative candidate genes affecting susceptibility or resistance to disease. In this paper, we report the results of an association analysis between SNPs in common carp innate immune response genes and resistance to Cy...

  11. Association of three SNPs in TOX3 and breast cancer risk: Evidence from 97275 cases and 128686 controls

    PubMed Central

    Zhang, Li; Long, Xinghua

    2015-01-01

    The associations of SNPs in TOX3 gene with breast cancer risk were investigated by some Genome-wide association studies and epidemiological studies, but the study results were contradictory. To derive a more precise e