Science.gov

Sample records for filtering snps imputed

  1. Impact of pre-imputation SNP-filtering on genotype imputation results

    PubMed Central

    2014-01-01

    Background Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time. PMID:25112433

  2. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

    PubMed Central

    Bigdeli, T. Bernard; Williamson, Vernell S.; Vladimirov, Vladimir I.; Riley, Brien P.; Fanous, Ayman H.; Bacanu, Silviu-Alin

    2015-01-01

    Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. Contact: dlee4@vcu.edu Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:26059716

  3. Candidate Gene Analysis Using Imputed Genotypes: Cell Cycle SNPs and Ovarian Cancer Risk

    PubMed Central

    Goode, Ellen L.; Fridley, Brooke L.; Vierkant, Robert A.; Cunningham, Julie M.; Phelan, Catherine M.; Anderson, Stephanie; Rider, David N.; White, Kristin L.; Pankratz, V. Shane; Song, Honglin; Hogdall, Estrid; Kjaer, Susanne K.; Whittemore, Alice S.; DiCioccio, Richard; Ramus, Susan J.; Gayther, Simon A.; Schildkraut, Joellen M.; Pharaoh, Paul P.D.; Sellers, Thomas A.

    2009-01-01

    Polymorphisms in genes critical to cell cycle control are outstanding candidates for association with ovarian cancer risk; numerous genes have been interrogated by multiple research groups using differing tagging SNP sets. In order to maximize information gleaned from existing genotype data, we conducted a combined analysis of five independent studies of invasive epithelial ovarian cancer. Up to 2,120 cases and 3,382 controls were genotyped in the course of two collaborations at a variety of SNPs in 11 cell cycle genes (CDKN2C, CDKN1A, CCND3, CCND1, CCND2, CDKN1B, CDK2, CDK4, RB1, CDKN2D, CCNE1) and one gene region (CDKN2A-CDKN2B). Because of the semi-overlapping nature of the 123 assayed tagging SNPs, we performed multiple imputation based on fastPHASE using data from White non-Hispanic study participants and participants in the international HapMap Consortium and NIEHS SNPs Program. Logistic regression assuming a log-additive model was performed on combined and imputed data. We observed strengthened signals in imputation-based analyses at several SNPs, particularly CDKN2A-CDKN2B rs3731239, CCND1 rs602652, rs3212879, rs649392, and rs3212891, CDK2 rs2069391, rs2069414, and rs17528736, and CCNE1 rs3218036. These results lend evidence to a role of cell cycle genes in ovarian cancer etiology, suggest a reduced set of SNPs to target in additional cases and controls, and exemplify the utility of imputation in candidate gene studies. PMID:19258477

  4. Genotype imputation efficiency in Nelore Cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotype imputation efficiency in Nelore cattle was evaluated in different scenarios of lower density (LD) chips, imputation methods and sets of animals to have their genotypes imputed. Twelve commercial and virtual custom LD chips with densities varying from 7K to 75K SNPs were tested. Customized L...

  5. SWEEP: A Tool for Filtering High-Quality SNPs in Polyploid Crops

    PubMed Central

    Clevenger, Josh P.; Ozias-Akins, Peggy

    2015-01-01

    High-throughput next-generation sequence-based genotyping and single nucleotide polymorphism (SNP) detection opens the door for emerging genomics-based breeding strategies such as genome-wide association analysis and genomic selection. In polyploids, SNP detection is confounded by a highly similar homeologous sequence where a polymorphism between subgenomes must be differentiated from a SNP. We have developed and implemented a novel tool called SWEEP: Sliding Window Extraction of Explicit Polymorphisms. SWEEP uses subgenome polymorphism haplotypes as contrast to identify true SNPs between genotypes. The tool is a single command script that calls a series of modules based on user-defined options and takes sorted/indexed bam files or vcf files as input. Filtering options are highly flexible and include filtering based on sequence depth, alternate allele ratio, and SNP quality on top of the SWEEP filtering procedure. Using real and simulated data we show that SWEEP outperforms current SNP filtering methods for polyploids. SWEEP can be used for high-quality SNP discovery in polyploid crops. PMID:26153076

  6. Analyses and Comparison of Imputation-Based Association Methods

    PubMed Central

    Pei, Yu-Fang; Zhang, Lei; Li, Jian; Deng, Hong-Wen

    2010-01-01

    Genotype imputation methods have become increasingly popular for recovering untyped genotype data. An important application with imputed genotypes is to test genetic association for diseases. Imputation-based association test can provide additional insight beyond what is provided by testing on typed tagging SNPs only. A variety of effective imputation-based association tests have been proposed. However, their performances are affected by a variety of genetic factors, which have not been well studied. In this study, using both simulated and real data sets, we investigated the effects of LD, MAF of untyped causal SNP and imputation accuracy rate on the performances of seven popular imputation-based association methods, including MACH2qtl/dat, SNPTEST, ProbABEL, Beagle, Plink, BIMBAM and SNPMStat. We also aimed to provide a comprehensive comparison among methods. Results show that: 1). imputation-based association tests can boost signals and improve power under medium and high LD levels, with the power improvement increasing with strengthening LD level; 2) the power increases with higher MAF of untyped causal SNPs under medium to high LD level; 3). under low LD level, a high imputation accuracy rate cannot guarantee an improvement of power; 4). among methods, MACH2qtl/dat, ProbABEL and SNPTEST perform similarly and they consistently outperform other methods. Our results are helpful in guiding the choice of imputation-based association test in practical application. PMID:20520814

  7. Missing data imputation: focusing on single imputation

    PubMed Central

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations. PMID:26855945

  8. Sequence Imputation of HPV16 Genomes for Genetic Association Studies

    PubMed Central

    Smith, Benjamin; Chen, Zigui; Reimers, Laura; van Doorslaer, Koenraad; Schiffman, Mark; DeSalle, Rob; Herrero, Rolando; Yu, Kai; Wacholder, Sholom; Wang, Tao; Burk, Robert D.

    2011-01-01

    Background Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. Methods A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. Results HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. Conclusions Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity. PMID:21731721

  9. Imputation Without Doing Imputation: A New Method for the Detection of Non-Genotyped Causal Variants

    PubMed Central

    Howey, Richard; Cordell, Heather J

    2014-01-01

    Genome-wide association studies allow detection of non-genotyped disease-causing variants through testing of nearby genotyped SNPs. This approach may fail when there are no genotyped SNPs in strong LD with the causal variant. Several genotyped SNPs in weak LD with the causal variant may, however, considered together, provide equivalent information. This observation motivates popular but computationally intensive approaches based on imputation or haplotyping. Here we present a new method and accompanying software designed for this scenario. Our approach proceeds by selecting, for each genotyped “anchor” SNP, a nearby genotyped “partner” SNP, chosen via a specific algorithm we have developed. These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test. In simulations, our method captures much of the signal captured by imputation, while taking a fraction of the time and disc space, and generating a smaller number of false-positives. We apply our method to a case/control study of severe malaria genotyped using the Affymetrix 500K array. Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels. Our method also increases the signal of association from to . Our method thus, in some cases, eliminates the need for more complex methods such as sequencing and imputation, and provides a useful additional test that may be used to identify genetic regions of interest. PMID:24535679

  10. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. PMID:27049046

  11. Genotype imputation via matrix completion

    PubMed Central

    Chi, Eric C.; Zhou, Hua; Chen, Gary K.; Del Vecchyo, Diego Ortega; Lange, Kenneth

    2013-01-01

    Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

  12. Genotype imputation via matrix completion.

    PubMed

    Chi, Eric C; Zhou, Hua; Chen, Gary K; Del Vecchyo, Diego Ortega; Lange, Kenneth

    2013-03-01

    Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

  13. Comparison of imputation variance estimators.

    PubMed

    Hughes, Ra; Sterne, Jac; Tilling, K

    2014-04-22

    Appropriate imputation inference requires both an unbiased imputation estimator and an unbiased variance estimator. The commonly used variance estimator, proposed by Rubin, can be biased when the imputation and analysis models are misspecified and/or incompatible. Robins and Wang proposed an alternative approach, which allows for such misspecification and incompatibility, but it is considerably more complex. It is unknown whether in practice Robins and Wang's multiple imputation procedure is an improvement over Rubin's multiple imputation. We conducted a critical review of these two multiple imputation approaches, a re-sampling method called full mechanism bootstrapping and our modified Rubin's multiple imputation procedure via simulations and an application to data. We explored four common scenarios of misspecification and incompatibility. In general, for a moderate sample size (n = 1000), Robins and Wang's multiple imputation produced the narrowest confidence intervals, with acceptable coverage. For a small sample size (n = 100) Rubin's multiple imputation, overall, outperformed the other methods. Full mechanism bootstrapping was inefficient relative to the other methods and required modelling of the missing data mechanism under the missing at random assumption. Our proposed modification showed an improvement over Rubin's multiple imputation in the presence of misspecification. Overall, Rubin's multiple imputation variance estimator can fail in the presence of incompatibility and/or misspecification. For unavoidable incompatibility and/or misspecification, Robins and Wang's multiple imputation could provide more robust inferences. PMID:24682265

  14. SNP panels/Imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Participants from thirteen countries discussed services that Interbull can perform or recommendations that Interbull can make to promote harmonization and assist member countries in improving their genomic evaluations in regard to SNP panels and imputation. The panel recommended: A mechanism to shar...

  15. Sequential imputation for missing values.

    PubMed

    Verboven, Sabine; Branden, Karlien Vanden; Goos, Peter

    2007-10-01

    As missing values are often encountered in gene expression data, many imputation methods have been developed to substitute these unknown values with estimated values. Despite the presence of many imputation methods, these available techniques have some disadvantages. Some imputation techniques constrain the imputation of missing values to a limited set of genes, whereas other imputation methods optimise a more global criterion whereby the computation time of the method becomes infeasible. Others might be fast but inaccurate. Therefore in this paper a new, fast and accurate estimation procedure, called SEQimpute, is proposed. By introducing the idea of minimisation of a statistical distance rather than a Euclidean distance the method is intrinsically different from the thus far existing imputation methods. Moreover, this newly proposed method can be easily embedded in a multiple imputation technique which is better suited to highlight the uncertainties about the missing value estimates. A comparative study is performed to assess the estimation of the missing values by different imputation approaches. The proposed imputation method is shown to outperform some of the existing imputation methods in terms of accuracy and computation speed. PMID:17920334

  16. Multiple imputation with multivariate imputation by chained equation (MICE) package

    PubMed Central

    2016-01-01

    Multiple imputation (MI) is an advanced technique for handing missing values. It is superior to single imputation in that it takes into account uncertainty in missing value imputation. However, MI is underutilized in medical literature due to lack of familiarity and computational challenges. The article provides a step-by-step approach to perform MI by using R multivariate imputation by chained equation (MICE) package. The procedure firstly imputed m sets of complete dataset by calling mice() function. Then statistical analysis such as univariate analysis and regression model can be performed within each dataset by calling with() function. This function sets the environment for statistical analysis. Lastly, the results obtained from each analysis are combined by using pool() function. PMID:26889483

  17. [Jurisdiction and imputability].

    PubMed

    Tapiador Sanjuán, M J

    2004-12-01

    Validity, efficacy and responsibility of acts depend on the intelligence and will of the acting subject; therefore when they are reduced or debilitated, these acts may be declared as non-valid and the author, not-responsible for the acts. Some neurological pathologies may generate physical and/or psychic permanent deficiencies, which prevent subjects from acting on their own. For these cases, the law establishes the incapacity state, in order to protect the disabled and complete the reduced ability, guaranteeing their rights and security. The disabled state will be determined by a legal sentence, which states the lack of ability to manage. In that sentence extension and limits of the disability will be determined; disability level will be proportional to the insight degree.Similarly, a subject suffering a pathological condition that invalidates his/her will and intelligence will be considered non-responsible and not imputable, since there is no culpability ability. The Penal Code establishes the criteria that will determine the possibility of imputability or its absence, as well as modifying circumstances. PMID:15719288

  18. Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.

    PubMed

    Bolormaa, S; Gore, K; van der Werf, J H J; Hayes, B J; Daetwyler, H D

    2015-10-01

    Genotyping sheep for genome-wide SNPs at lower density and imputing to a higher density would enable cost-effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low-density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50-475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single-breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%. PMID:26360638

  19. Design of a bovine low-density SNP array optimized for imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where de...

  20. minimac2: faster genotype imputation

    PubMed Central

    Fuchsberger, Christian; Abecasis, Gonçalo R.; Hinds, David A.

    2015-01-01

    Summary: Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation. Availability and implementation: minimac2, including source code, documentation, and examples is available at http://genome.sph.umich.edu/wiki/Minimac2 Contact: cfuchsb@umich.edu, goncalo@umich.edu PMID:25338720

  1. Impact of Genotype Imputation on the Performance of GBLUP and Bayesian Methods for Genomic Prediction

    PubMed Central

    Chen, Liuhong; Li, Changxi; Sargolzaei, Mehdi; Schenkel, Flavio

    2014-01-01

    The aim of this study was to evaluate the impact of genotype imputation on the performance of the GBLUP and Bayesian methods for genomic prediction. A total of 10,309 Holstein bulls were genotyped on the BovineSNP50 BeadChip (50 k). Five low density single nucleotide polymorphism (SNP) panels, containing 6,177, 2,480, 1,536, 768 and 384 SNPs, were simulated from the 50 k panel. A fraction of 0%, 33% and 66% of the animals were randomly selected from the training sets to have low density genotypes which were then imputed into 50 k genotypes. A GBLUP and a Bayesian method were used to predict direct genomic values (DGV) for validation animals using imputed or their actual 50 k genotypes. Traits studied included milk yield, fat percentage, protein percentage and somatic cell score (SCS). Results showed that performance of both GBLUP and Bayesian methods was influenced by imputation errors. For traits affected by a few large QTL, the Bayesian method resulted in greater reductions of accuracy due to imputation errors than GBLUP. Including SNPs with largest effects in the low density panel substantially improved the accuracy of genomic prediction for the Bayesian method. Including genotypes imputed from the 6 k panel achieved almost the same accuracy of genomic prediction as that of using the 50 k panel even when 66% of the training population was genotyped on the 6 k panel. These results justified the application of the 6 k panel for genomic prediction. Imputations from lower density panels were more prone to errors and resulted in lower accuracy of genomic prediction. But for animals that have close relationship to the reference set, genotype imputation may still achieve a relatively high accuracy. PMID:25025158

  2. The utility of low-density genotyping for imputation in the Thoroughbred horse

    PubMed Central

    2014-01-01

    Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy. PMID:24495673

  3. Imputation-based population genetics analysis of Plasmodium falciparum malaria parasites.

    PubMed

    Samad, Hanif; Coll, Francesc; Preston, Mark D; Ocholla, Harold; Fairhurst, Rick M; Clark, Taane G

    2015-04-01

    Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r2 for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86 k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r2, 0.87-0.96), but the performance of IMPUTE was mixed (allelic r2, 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima's D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and association analyses, and supporting global surveillance for drug resistance markers and candidate vaccine antigens. PMID:25928499

  4. PedBLIMP: extending linear predictors to impute genotypes in pedigrees.

    PubMed

    Chen, Wenan; Schaid, Daniel J

    2014-09-01

    Recently, Wen and Stephens (Wen and Stephens [2010] Ann Appl Stat 4(3):1158-1182) proposed a linear predictor, called BLIMP, that uses conditional multivariate normal moments to impute genotypes with accuracy similar to current state-of-the-art methods. One novelty is that it regularized the estimated covariance matrix based on a model from population genetics. We extended multivariate moments to impute genotypes in pedigrees. Our proposed method, PedBLIMP, utilizes both the linkage-disequilibrium (LD) information estimated from external panel data and the pedigree structure or identity-by-descent (IBD) information. The proposed method was evaluated on a pedigree design where some individuals were genotyped with dense markers and the rest with sparse markers. We found that incorporating the pedigree/IBD information can improve imputation accuracy compared to BLIMP. Because rare variants usually have low LD with other single-nucleotide polymorphisms (SNPs), incorporating pedigree/IBD information largely improved imputation accuracy for rare variants. We also compared PedBLIMP with IMPUTE2 and GIGI. Results show that when sparse markers are in a certain density range, our method can outperform both IMPUTE2 and GIGI. PMID:25044249

  5. Japonica array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals

    PubMed Central

    Kawai, Yosuke; Mimori, Takahiro; Kojima, Kaname; Nariai, Naoki; Danjoh, Inaho; Saito, Rumiko; Yasuda, Jun; Yamamoto, Masayuki; Nagasaki, Masao

    2015-01-01

    The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659?253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r2>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%imputations. PMID:26108142

  6. Genotype Imputation with Millions of Reference Samples.

    PubMed

    Browning, Brian L; Browning, Sharon R

    2016-01-01

    We present a genotype imputation method that scales to millions of reference samples. The imputation method, based on the Li and Stephens model and implemented in Beagle v.4.1, is parallelized and memory efficient, making it well suited to multi-core computer processors. It achieves fast, accurate, and memory-efficient genotype imputation by restricting the probability model to markers that are genotyped in the target samples and by performing linear interpolation to impute ungenotyped variants. We compare Beagle v.4.1 with Impute2 and Minimac3 by using 1000 Genomes Project data, UK10K Project data, and simulated data. All three methods have similar accuracy but different memory requirements and different computation times. When imputing 10 Mb of sequence data from 50,000 reference samples, Beagle's throughput was more than 100× greater than Impute2's throughput on our computer servers. When imputing 10 Mb of sequence data from 200,000 reference samples in VCF format, Minimac3 consumed 26× more memory per computational thread and 15× more CPU time than Beagle. We demonstrate that Beagle v.4.1 scales to much larger reference panels by performing imputation from a simulated reference panel having 5 million samples and a mean marker density of one marker per four base pairs. PMID:26748515

  7. Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.

    PubMed

    Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun

    2016-05-01

    DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717

  8. Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression

    PubMed Central

    Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N.; Guan, Weihua; Kang, Jian; Li, Yun

    2016-01-01

    DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). PMID:27061717

  9. What Improves with Increased Missing Data Imputations?

    ERIC Educational Resources Information Center

    Bodner, Todd E.

    2008-01-01

    When using multiple imputation in the analysis of incomplete data, a prominent guideline suggests that more than 10 imputed data values are seldom needed. This article calls into question the optimism of this guideline and illustrates that important quantities (e.g., p values, confidence interval half-widths, and estimated fractions of missing…

  10. HLA imputation in an admixed population: An assessment of the 1000 Genomes data as a training set.

    PubMed

    Nunes, Kelly; Zheng, Xiuwen; Torres, Margareth; Moraes, Maria Elisa; Piovezan, Bruno Z; Pontes, Gerlandia N; Kimura, Lilian; Carnavalli, Juliana E P; Mingroni Netto, Regina C; Meyer, Diogo

    2016-03-01

    Methods to impute HLA alleles based on dense single nucleotide polymorphism (SNP) data provide a valuable resource to association studies and evolutionary investigation of the MHC region. The availability of appropriate training sets is critical to the accuracy of HLA imputation, and the inclusion of samples with various ancestries is an important pre-requisite in studies of admixed populations. We assess the accuracy of HLA imputation using 1000 Genomes Project data as a training set, applying it to a highly admixed Brazilian population, the Quilombos from the state of São Paulo. To assess accuracy, we compared imputed and experimentally determined genotypes for 146 samples at 4 HLA classical loci. We found imputation accuracies of 82.9%, 81.8%, 94.8% and 86.6% for HLA-A, -B, -C and -DRB1 respectively (two-field resolution). Accuracies were improved when we included a subset of Quilombo individuals in the training set. We conclude that the 1000 Genomes data is a valuable resource for construction of training sets due to the diversity of ancestries and the potential for a large overlap of SNPs with the target population. We also show that tailoring training sets to features of the target population substantially enhances imputation accuracy. PMID:26582005

  11. Imputation of missing data in time series for air pollutants

    NASA Astrophysics Data System (ADS)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  12. SNP imputation bias reduces effect size determination

    PubMed Central

    Khankhanian, Pouya; Din, Lennox; Caillier, Stacy J.; Gourraud, Pierre-Antoine; Baranzini, Sergio E.

    2015-01-01

    Imputation is a commonly used technique that exploits linkage disequilibrium to infer missing genotypes in genetic datasets, using a well-characterized reference population. While there is agreement that the reference population has to match the ethnicity of the query dataset, it is common practice to use the same reference to impute genotypes for a wide variety of phenotypes. We hypothesized that using a reference composed of samples with a different phenotype than the query dataset would introduce imputation bias. To test this hypothesis we used GWAS datasets from Amyotrophic Lateral Sclerosis (ALS), Parkinson Disease (PD), and Crohn's Disease (CD). First, we masked and then performed imputation of 100 disease-associated markers and 100 non-associated markers from each study. Two references for imputation were used in parallel: one consisting of healthy controls and another consisting of patients with the same disease. We assessed the discordance (imprecision) and bias (inaccuracy) of imputation by comparing predicted genotypes to those assayed by SNP-chip. We also assessed the bias on the observed effect size when the predicted genotypes were used in a GWAS study. When healthy controls were used as reference for imputation, a significant bias was observed, particularly in the disease-associated markers. Using cases as reference significantly attenuated this bias. For nearly all markers, the direction of the bias favored the non-risk allele. In GWAS studies of the three diseases (with healthy reference controls from the 1000 genomes as reference), the mean OR for disease-associated markers obtained by imputation was lower than that obtained using original assayed genotypes. We found that the bias is inherent to imputation as using different methods did not alter the results. In conclusion, imputation is a powerful method to predict genotypes and estimate genetic risk for GWAS. However, a careful choice of reference population is needed to minimize biases inherent to this approach. PMID:25709616

  13. Fast accurate missing SNP genotype local imputation

    PubMed Central

    2012-01-01

    Background Single nucleotide polymorphism (SNP) genotyping assays normally give rise to certain percents of no-calls; the problem becomes severe when the target organisms, such as cattle, do not have a high resolution genomic sequence. Missing SNP genotypes, when related to target traits, would confound downstream data analyses such as genome-wide association studies (GWAS). Existing methods for recovering the missing values are successful to some extent – either accurate but not fast enough or fast but not accurate enough. Results To a target missing genotype, we take only the SNP loci within a genetic distance vicinity and only the samples within a similarity vicinity into our local imputation process. For missing genotype imputation, the comparative performance evaluations through extensive simulation studies using real human and cattle genotype datasets demonstrated that our nearest neighbor based local imputation method was one of the most efficient methods, and outperformed existing methods except the time-consuming fastPHASE; for missing haplotype allele imputation, the comparative performance evaluations using real mouse haplotype datasets demonstrated that our method was not only one of the most efficient methods, but also one of the most accurate methods. Conclusions Given that fastPHASE requires a long imputation time on medium to high density datasets, and that our nearest neighbor based local imputation method only performed slightly worse, yet better than all other methods, one might want to adopt our method as an alternative missing SNP genotype or missing haplotype allele imputation method. PMID:22863359

  14. CUTOFF: A spatio-temporal imputation method

    NASA Astrophysics Data System (ADS)

    Feng, Lingbing; Nowak, Gen; O'Neill, T. J.; Welsh, A. H.

    2014-11-01

    Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value's nearest spatial neighbors. Extensions to this method are also developed to expand the method's ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters for CUTOFF, which can be used by other imputation methods as well. We analyze some rainfall data from 78 gauging stations in the Murray-Darling Basin in Australia using the CUTOFF imputation method and compare its performance to four well-studied competing imputation methods, namely, k-nearest neighbors, singular value decomposition, multiple imputation and random forest. Empirical results show that our method captures the temporal patterns well and is effective at imputing large gaps in the data. Compared to the competing methods, CUTOFF is more accurate and much faster. We analyze further examples to demonstrate CUTOFF's applications to two different data sets and provide extra evidence of its validity and usefulness. We implement a simulation study based on the Murray-Darling Basin data to evaluate the method; the results show that our method performs well in both accuracy and computational efficiency.

  15. Fast imputation using medium- or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  16. A Study of Imputation Algorithms. Working Paper Series.

    ERIC Educational Resources Information Center

    Hu, Ming-xiu; Salvucci, Sameena

    Many imputation techniques and imputation software packages have been developed over the years to deal with missing data. Different methods may work well under different circumstances, and it is advisable to conduct a sensitivity analysis when choosing an imputation method for a particular survey. This study reviewed about 30 imputation methods…

  17. Enlargement of Traffic Information Coverage Area Using Selective Imputation of Floating Car Data

    NASA Astrophysics Data System (ADS)

    Kumagai, Masatoshi; Hiruta, Tomoaki; Fushiki, Takumi; Yokota, Takayoshi

    This paper discusses a real-time imputation method for sparse floating car data (FCD.) Floating cars are effective way to collect traffic information; however, because of the limitation of the number of floating cars, there is a large amount of missing data with FCD. In an effort to address this problem, we previously proposed a new imputation method based on feature space projection. The method consists of three major processes: (i) determination of a feature space from past FCD history; (ii) feature space projection of current FCD; and (iii) estimation of missing data performed by inverse projection from the feature space. Since estimation is achieved on each feature space axis that represents the spatial correlated component of FCD, it performs an accurate imputation and enlarges information coverage area. However, correlation difference among multiple road-links sometimes causes a trade-off problem between the accuracy and the coverage. Therefore, we developed an additional function in order to filter the road-links that have low correlation with the others. The function uses spectral factorization as filtering index, which is suitable to evaluate the correlation on the multidimensional feature space. Combination use of the imputation method and the filtering function decreases maximum estimation error-rate from 0.39 to 0.24, keeping 60% coverage area against sparse FCD of 15% observations.

  18. Improving accuracy of rare variant imputation with a two-step imputation approach.

    PubMed

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G; Rivadeneira, Fernando; Estrada, Karol

    2015-03-01

    Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies. PMID:24939589

  19. Improving accuracy of rare variant imputation with a two-step imputation approach

    PubMed Central

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G; Rivadeneira, Fernando; Estrada, Karol

    2015-01-01

    Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r2 using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies. PMID:24939589

  20. Dual imputation model for incomplete longitudinal data.

    PubMed

    Jolani, Shahab; Frank, Laurence E; van Buuren, Stef

    2014-05-01

    Missing values are a practical issue in the analysis of longitudinal data. Multiple imputation (MI) is a well-known likelihood-based method that has optimal properties in terms of efficiency and consistency if the imputation model is correctly specified. Doubly robust (DR) weighing-based methods protect against misspecification bias if one of the models, but not necessarily both, for the data or the mechanism leading to missing data is correct. We propose a new imputation method that captures the simplicity of MI and protection from the DR method. This method integrates MI and DR to protect against misspecification of the imputation model under a missing at random assumption. Our method avoids analytical complications of missing data particularly in multivariate settings, and is easy to implement in standard statistical packages. Moreover, the proposed method works very well with an intermittent pattern of missingness when other DR methods can not be used. Simulation experiments show that the proposed approach achieves improved performance when one of the models is correct. The method is applied to data from the fireworks disaster study, a randomized clinical trial comparing therapies in disaster-exposed children. We conclude that the new method increases the robustness of imputations. PMID:23909566

  1. Automatic Treatment Planning with Convex Imputing

    NASA Astrophysics Data System (ADS)

    Sayre, G. A.; Ruan, D.

    2014-03-01

    Current inverse optimization-based treatment planning for radiotherapy requires a set of complex DVH objectives to be simultaneously minimized. This process, known as multi-objective optimization, is challenging due to non-convexity in individual objectives and insufficient knowledge in the tradeoffs among the objective set. As such, clinical practice involves numerous iterations of human intervention that is costly and often inconsistent. In this work, we propose to address treatment planning with convex imputing, a new-data mining technique that explores the existence of a latent convex objective whose optimizer reflects the DVH and dose-shaping properties of previously optimized cases. Using ten clinical prostate cases as the basis for comparison, we imputed a simple least-squares problem from the optimized solutions of the prostate cases, and show that the imputed plans are more consistent than their clinical counterparts in achieving planning goals.

  2. Multiple Imputation of Multilevel Missing Data-Rigor versus Simplicity

    ERIC Educational Resources Information Center

    Drechsler, Jörg

    2015-01-01

    Multiple imputation is widely accepted as the method of choice to address item-nonresponse in surveys. However, research on imputation strategies for the hierarchical structures that are typically found in the data in educational contexts is still limited. While a multilevel imputation model should be preferred from a theoretical point of view if…

  3. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    ERIC Educational Resources Information Center

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  4. JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts

    PubMed Central

    Lee, Donghyung; Williamson, Vernell S.; Bigdeli, T. Bernard; Riley, Brien P.; Webb, Bradley T.; Fanous, Ayman H.; Kendler, Kenneth S.; Vladimirov, Vladimir I.; Bacanu, Silviu-Alin

    2016-01-01

    Motivation: To increase detection power, gene level analysis methods are used to aggregate weak signals. To greatly increase computational efficiency, most methods use as input summary statistics from genome-wide association studies (GWAS). Subsequently, gene statistics are constructed using linkage disequilibrium (LD) patterns from a relevant reference panel. However, all methods, including our own Joint Effect on Phenotype of eQTL/functional single nucleotide polymorphisms (SNPs) associated with a Gene (JEPEG), assume homogeneous panels, e.g. European. However, this renders these tools unsuitable for the analysis of large cosmopolitan cohorts. Results: We propose a JEPEG extension, JEPEGMIX, which similar to one of our software tools, Direct Imputation of summary STatistics of unmeasured SNPs from MIXed ethnicity cohorts, is capable of estimating accurate LD patterns for cosmopolitan cohorts. JEPEGMIX uses this accurate LD estimates to (i) impute the summary statistics at unmeasured functional variants and (ii) test for the joint effect of all measured and imputed functional variants which are associated with a gene. We illustrate the performance of our tool by analyzing the GWAS meta-analysis summary statistics from the multi-ethnic Psychiatric Genomics Consortium Schizophrenia stage 2 cohort. This practical application supports the immune system being one of the main drivers of the process leading to schizophrenia. Availability and implementation: Software, annotation database and examples are available at http://dleelab.github.io/jepegmix/. Contact: donghyung.lee@vcuhealth.org Supplementary information: Supplementary material is available at Bioinformatics online. PMID:26428293

  5. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    PubMed Central

    McClure, Matthew C.; Sonstegard, Tad S.; Wiggans, George R.; Van Eenennaam, Alison L.; Weber, Kristina L.; Penedo, Cecilia T.; Berry, Donagh P.; Flynn, John; Garcia, Jose F.; Carmo, Adriana S.; Regitano, Luciana C. A.; Albuquerque, Milla; Silva, Marcos V. G. B.; Machado, Marco A.; Coffey, Mike; Moore, Kirsty; Boscher, Marie-Yvonne; Genestout, Lucie; Mazza, Raffaele; Taylor, Jeremy F.; Schnabel, Robert D.; Simpson, Barry; Marques, Elisa; McEwan, John C.; Cromie, Andrew; Coutinho, Luiz L.; Kuehn, Larry A.; Keele, John W.; Piper, Emily K.; Cook, Jim; Williams, Robert; Van Tassell, Curtis P.

    2013-01-01

    To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ≤1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset. PMID:24065982

  6. Missing value imputation strategies for metabolomics data.

    PubMed

    Armitage, Emily Grace; Godzien, Joanna; Alonso-Herranz, Vanesa; López-Gonzálvez, Ángeles; Barbas, Coral

    2015-12-01

    The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros. PMID:26376450

  7. Marker imputation in barley association studies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Association mapping requires higher marker density than linkage mapping, potentially leading to more missing marker data and to higher genotyping costs. In human genetics, methods exist to impute missing marker data and whole markers that were typed in a reference panel but not in the experimental d...

  8. Multiple imputation for an incomplete covariate that is a ratio.

    PubMed

    Morris, Tim P; White, Ian R; Royston, Patrick; Seaman, Shaun R; Wood, Angela M

    2014-01-15

    We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable. PMID:23922236

  9. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms

    PubMed Central

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-01-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  10. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

    PubMed

    Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean

    2015-11-01

    Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960

  11. Clustering with Missing Values: No Imputation Required

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  12. Genotype Imputation with Thousands of Genomes

    PubMed Central

    Howie, Bryan; Marchini, Jonathan; Stephens, Matthew

    2011-01-01

    Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package. PMID:22384356

  13. On combining reference data to improve imputation accuracy.

    PubMed

    Chen, Jun; Zhang, Ji-Gang; Li, Jian; Pei, Yu-Fang; Deng, Hong-Wen

    2013-01-01

    Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS) data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE) for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses. PMID:23383238

  14. 12 CFR 367.9 - Imputation of causes.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that...

  15. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted

  16. 47 CFR 1.1416 - Imputation of rates; modification costs.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 47 Telecommunication 1 2013-10-01 2013-10-01 false Imputation of rates; modification costs. 1.1416 Section 1.1416 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Grants by Random Selection Pole Attachment Complaint Procedures § 1.1416 Imputation of rates;...

  17. 47 CFR 1.1416 - Imputation of rates; modification costs.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 47 Telecommunication 1 2012-10-01 2012-10-01 false Imputation of rates; modification costs. 1.1416 Section 1.1416 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Grants by Random Selection Pole Attachment Complaint Procedures § 1.1416 Imputation of rates;...

  18. A Comparison of Imputation Methods for Bayesian Factor Analysis Models

    ERIC Educational Resources Information Center

    Merkle, Edgar C.

    2011-01-01

    Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

  19. Should "Multiple Imputations" Be Treated as "Multiple Indicators"?

    ERIC Educational Resources Information Center

    Mislevy, Robert J.

    1993-01-01

    Multiple imputations for latent variables are constructed so that analyses treating them as true variables have the correct expectations for population characteristics. Analyzing multiple imputations in accordance with their construction yields correct estimates of population characteristics, whereas analyzing them as multiple indicators generally…

  20. 12 CFR 367.9 - Imputation of causes.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 4 2011-01-01 2011-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that...

  1. Geometric median for missing rainfall data imputation

    NASA Astrophysics Data System (ADS)

    Burhanuddin, Siti Nur Zahrah Amin; Deni, Sayang Mohd; Ramli, Norazan Mohamed

    2015-02-01

    Missing data is a common problem faced by researchers in environmental studies. Environmental data, particularly, rainfall data are highly vulnerable to be missed, which is due to several reasons, such as malfunction instrument, incorrect measurements, and relocation of stations. Rainfall data are also affected by the presence of outliers due to the temporal and spatial variability of rainfall measurements. These problems may harm the quality of rainfall data and subsequently, produce inaccuracy in the results of analysis. Thus, this study is aimed to propose an imputation method that is robust towards the presence of outliers for treating the missing rainfall data. Geometric median was applied to estimate the missing values based on the available rainfall data from neighbouring stations. The method was compared with several conventional methods, such as normal ratio and inverse distance weighting methods, in order to evaluate its performance. Thirteen rainfall stations in Peninsular Malaysia were selected for the application of the imputation methods. The results indicated that the proposed method provided the most accurate estimation values compared to both conventional methods based on the least mean absolute error. The normal ratio was found to be the worst method in estimating the missing rainfall values.

  2. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study

    PubMed Central

    Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-01-01

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914

  3. Imputing and Predicting Quantitative Genetic Interactions in Epistatic MAPs

    PubMed Central

    Ryan, Colm; Cagney, Gerard; Krogan, Nevan; Cunningham, Pádraig; Greene, Derek

    2012-01-01

    Mapping epistatic (or genetic) interactions has emerged as an important network biology approach for establishing functional relationships among genes and proteins. Epistasis networks are complementary to physical protein interaction networks, providing valuable insight into both the function of individual genes and the overall wiring of the cell. A high-throughput method termed “epistatic mini array profiles” (E-MAPs) was recently developed in yeast to quantify alleviating or aggravating interactions between gene pairs. The typical output of an E-MAP experiment is a large symmetric matrix of interaction scores. One problem with this data is the large amount of missing values – interactions that cannot be measured during the high-throughput process or whose measurements were discarded due to quality filtering steps. These missing values can reduce the effectiveness of some data analysis techniques and prevent the use of others. Here, we discuss one solution to this problem, imputation using nearest neighbors, and give practical examples of the use of a freely available implementation of this method. PMID:21877290

  4. Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx.

    PubMed

    Wang, Jiebiao; Gamazon, Eric R; Pierce, Brandon L; Stranger, Barbara E; Im, Hae Kyung; Gibbons, Robert D; Cox, Nancy J; Nicolae, Dan L; Chen, Lin S

    2016-04-01

    Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies. PMID:27040689

  5. Use and abuse of census editing and imputation.

    PubMed

    Banister, J

    1980-02-01

    With the advent of electronic processing of census data, it has become common practice in some countries to change answers on questionnaires that seem inconsistent with other answers ("editing"), and to fill in blank spaces on questionnaires with plausable answers ("imputation"). Increasing incidence of these practices has caused uneasiness among both users and producers of census data. Elaborate editing and imputation can introduce serious errors into published data, and can destroy evidence that collected data is of limited quality and must be used with caution. In support of editing and imputation, it is argued that the quality of data is improved, that convenience of analysis is enhanced, and that data may be more credible. The author discusses each of these arguments in turn. She concludes that some types of editing (notably field editing and imputation and redundant imputation) enhance or help maintain data quality, others (semi-informed or blind imputation) can debase quality and must be used with great caution. User convenience justifies some use of imputation, such as replacement of unknown data that can have negligible effects on census results, but is not good enough reason for filling in all unknowns -- it is reasonable to expect variance in quality of data and to use caution when using some data. The credibility of census data can be damaged by excessive editing and imputation, and users of data should be educated about its limitations. The author believes that it has become necessary for census organizations to establish guidelines for editing and imputation, which should be published. A series of principles around which a wider discussion of the subject could be organized is offered. PMID:12309770

  6. A hybrid imputation approach for microarray missing value estimation

    PubMed Central

    2015-01-01

    Background Missing data is an inevitable phenomenon in gene expression microarray experiments due to instrument failure or human error. It has a negative impact on performance of downstream analysis. Technically, most existing approaches suffer from this prevalent problem. Imputation is one of the frequently used methods for processing missing data. Actually many developments have been achieved in the research on estimating missing values. The challenging task is how to improve imputation accuracy for data with a large missing rate. Methods In this paper, induced by the thought of collaborative training, we propose a novel hybrid imputation method, called Recursive Mutual Imputation (RMI). Specifically, RMI exploits global correlation information and local structure in the data, captured by two popular methods, Bayesian Principal Component Analysis (BPCA) and Local Least Squares (LLS), respectively. Mutual strategy is implemented by sharing the estimated data sequences at each recursive process. Meanwhile, we consider the imputation sequence based on the number of missing entries in the target gene. Furthermore, a weight based integrated method is utilized in the final assembling step. Results We evaluate RMI with three state-of-art algorithms (BPCA, LLS, Iterated Local Least Squares imputation (ItrLLS)) on four publicly available microarray datasets. Experimental results clearly demonstrate that RMI significantly outperforms comparative methods in terms of Normalized Root Mean Square Error (NRMSE), especially for datasets with large missing rates and less complete genes. Conclusions It is noted that our proposed hybrid imputation approach incorporates both global and local information of microarray genes, which achieves lower NRMSE values against to any single approach only. Besides, this study highlights the need for considering the imputing sequence of missing entries for imputation methods. PMID:26330180

  7. An imputation-based genome-wide association study on traits related to male reproduction in a White Duroc × Erhualian F2 population.

    PubMed

    Zhao, Xueyan; Zhao, Kewei; Ren, Jun; Zhang, Feng; Jiang, Chao; Hong, Yuan; Jiang, Kai; Yang, Qiang; Wang, Chengbin; Ding, Nengshui; Huang, Lusheng; Zhang, Zhiyan; Xing, Yuyun

    2016-05-01

    Boar reproductive traits are economically important for the pig industry. Here we conducted a genome-wide association study (GWAS) for 13 reproductive traits measured on 205 F2 boars at day 300 using 60 K single nucleotide polymorphism (SNP) data imputed from a reference panel of 1200 pigs in a White Duroc × Erhualian F2 intercross population. We identified 10 significant loci for seven traits on eight pig chromosomes (SSC). Two loci surpassed the genome-wide significance level, including one for epididymal weight around 60.25 Mb on SSC7 and one for semen temperature around 43.69 Mb on SSC4. Four of the 10 significant loci that we identified were consistent with previously reported quantitative trait loci for boar reproduction traits. We highlighted several interesting candidate genes at these loci, including APN, TEP1, PARP2, SPINK1 and PDE1C. To evaluate the imputation accuracy, we further genotyped nine GWAS top SNPs using PCR restriction fragment length polymorphism or Sanger sequencing. We found an average of 91.44% of genotype concordance, 95.36% of allelic concordance and 0.85 of r(2) correlation between imputed and real genotype data. This indicates that our GWAS mapping results based on imputed SNP data are reliable, providing insights into the genetic basis of boar reproductive traits. PMID:26425933

  8. Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data.

    PubMed

    Fragoso, Christopher A; Heffelfinger, Christopher; Zhao, Hongyu; Dellaporta, Stephen L

    2016-02-01

    Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute. PMID:26715670

  9. Association Analysis of BMD-associated SNPs with Knee Osteoarthritis

    PubMed Central

    Yerges-Armstrong, LM; Yau, MS; Liu, Y; Krishnan, S; Renner, JB; Eaton, CB; Kwoh, CK; Nevitt, MC; Duggan, DJ; Mitchell, BD; Jordan, JM; Hochberg, MC; Jackson, RD

    2014-01-01

    Osteoarthritis (OA) risk is widely recognized to be heritable but few loci have been identified. Observational studies have identified higher systemic bone mineral density (BMD) to be associated with an increased risk of radiographic knee osteoarthritis. With this in mind, we sought to evaluate whether well-established genetic loci for variance in BMD are associated with risk for radiographic OA in the Osteoarthritis Initiative (OAI) and the Johnston County Osteoarthritis (JoCo) Project. Cases had at least one knee with definite radiographic OA defined as the presence of definite osteophytes with or without joint space narrowing (KL grade ? 2) and controls were absent for definite radiographic OA in both knees (KL grade ? 1bilaterally). There were 2014 and 658 Caucasian cases, respectively, in the OAI and JoCo Studies, and 953 and 823 controls. Single nucleotide polymorphisms (SNPs) were identified for association analysis from the literature. Genotyping was carried out on the Illumina 2.5M and 1M arrays in GeCKO and JoCo, respectively and imputation was done. Association analyses were carried out separately in each cohort with adjustments for age, BMI, and sex and then parameter estimates were combined across the two cohorts by meta-analysis. We identified 4 SNPs significantly associated with prevalent radiographic knee OA. The strongest signal (p=0.0009, OR=1.22, 95% CI[1.081.37]) maps to 12q3 which contains a gene coding for SP7. Additional loci map to 7p14.1 (TXNDC3), 11q13.2 (LRP5) and 11p14.1 (LIN7C). For all four loci the allele associated with higher BMD was associated with higher odds of OA. A BMD risk allele score was not significantly associated with OA risk. This meta-analysis demonstrates that several GWAS-identified BMD SNPs are nominally associated with prevalent radiographic knee OA and further supports the hypothesis that BMD, or its determinants, may be a risk factor contributing to OA development. PMID:24339167

  10. Multiple imputation for time series data with Amelia package

    PubMed Central

    2016-01-01

    Time series data are common in medical researches. Many laboratory variables or study endpoints could be measured repeatedly over time. Multiple imputation (MI) without considering time trend of a variable may cause it to be unreliable. The article illustrates how to perform MI by using Amelia package in a clinical scenario. Amelia package is powerful in that it allows for MI for time series data. External information on the variable of interest can also be incorporated by using prior or bound argument. Such information may be based on previous published observations, academic consensus, and personal experience. Diagnostics of imputation model can be performed by examining the distributions of imputed and observed values, or by using over-imputation technique. PMID:26904578

  11. Multiple imputation for time series data with Amelia package.

    PubMed

    Zhang, Zhongheng

    2016-02-01

    Time series data are common in medical researches. Many laboratory variables or study endpoints could be measured repeatedly over time. Multiple imputation (MI) without considering time trend of a variable may cause it to be unreliable. The article illustrates how to perform MI by using Amelia package in a clinical scenario. Amelia package is powerful in that it allows for MI for time series data. External information on the variable of interest can also be incorporated by using prior or bound argument. Such information may be based on previous published observations, academic consensus, and personal experience. Diagnostics of imputation model can be performed by examining the distributions of imputed and observed values, or by using over-imputation technique. PMID:26904578

  12. A second generation human haplotype map of over 3.1 million SNPs

    PubMed Central

    2009-01-01

    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

  13. QuickSNP: an automated web server for selection of tagSNPs

    PubMed Central

    Grover, Deepak; Woodfield, Alonzo S.; Verma, Ranjana; Zandi, Peter P.; Levinson, Douglas F.; Potash, James B.

    2007-01-01

    Although large-scale genetic association studies involving hundreds to thousands of SNPs have become feasible, the associated cost is substantial. Even with the increased efficiency introduced by the use of tagSNPs, researchers are often seeking ways to maximize resource utilization given a set of SNP-based gene-mapping goals. We have developed a web server named QuickSNP in order to provide cost-effective selection of SNPs, and to fill in some of the gaps in existing SNP selection tools. One useful feature of QuickSNP is the option to select only gene-centric SNPs from a chromosomal region in an automated fashion. Other useful features include automated selection of coding non-synonymous SNPs, SNP filtering based on inter-SNP distances and information regarding the availability of genotyping assays for SNPs and whether they are present on whole genome chips. The program produces user-friendly summary tables and results, and a link to a UCSC Genome Browser track illustrating the position of the selected tagSNPs in relation to genes and other genomic features. We hope the unique combination of features of this server will be useful for researchers aiming to select markers for their genotyping studies. The server is freely available and can be accessed at the URL http://bioinformoodics.jhmi.edu/quickSNP.pl. PMID:17517769

  14. Doubly robust and multiple-imputation-based generalized estimating equations.

    PubMed

    Birhanu, Teshome; Molenberghs, Geert; Sotto, Cristina; Kenward, Michael G

    2011-03-01

    Generalized estimating equations (GEE), proposed by Liang and Zeger (1986), provide a popular method to analyze correlated non-Gaussian data. When data are incomplete, the GEE method suffers from its frequentist nature and inferences under this method are valid only under the strong assumption that the missing data are missing completely at random. When response data are missing at random, two modifications of GEE can be considered, based on inverse-probability weighting or on multiple imputation. The weighted GEE (WGEE) method involves weighting observations by the inverse of their probability of being observed. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model, multiple times. The so-called doubly robust (DR) methods involve both a model for the weights and a predictive model for the missing observations given the observed ones. To yield consistent estimates, WGEE needs correct specification of the dropout model while imputation-based methodology needs a correctly specified imputation model. DR methods need correct specification of either the weight or the predictive model, but not necessarily both. Focusing on incomplete binary repeated measures, we study the relative performance of the singly robust and doubly robust versions of GEE in a variety of correctly and incorrectly specified models using simulation studies. Data from a clinical trial in onychomycosis further illustrate the method. PMID:21390997

  15. Combining fractional polynomial model building with multiple imputation.

    PubMed

    Morris, Tim P; White, Ian R; Carpenter, James R; Stanworth, Simon J; Royston, Patrick

    2015-11-10

    Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. PMID:26095614

  16. Missing value imputation: with application to handwriting data

    NASA Astrophysics Data System (ADS)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  17. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    PubMed

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540

  18. Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

    2007-01-01

    The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at

  19. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…

  20. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    ERIC Educational Resources Information Center

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale

  1. Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

    2007-01-01

    The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…

  2. SPSS Syntax for Missing Value Imputation in Test and Questionnaire Data

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries

    2005-01-01

    A well-known problem in the analysis of test and questionnaire data is that some item scores may be missing. Advanced methods for the imputation of missing data are available, such as multiple imputation under the multivariate normal model and imputation under the saturated logistic model (Schafer, 1997). Accompanying software was made available…

  3. Reference-free detection of isolated SNPs

    PubMed Central

    Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre

    2015-01-01

    Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism. PMID:25404127

  4. Functional annotation of colon cancer risk SNPs

    PubMed Central

    Yao, Lijing; Tak, Yu Gyoung; Berman, Benjamin P.; Farnham, Peggy J.

    2014-01-01

    Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with increased risk for CRC. A molecular understanding of the functional consequences of this genetic variation has been complicated because each GWAS SNP is a surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Here we use genomic and epigenomic information to test the hypothesis that the GWAS SNPs and/or correlated SNPs are in elements that regulate gene expression, and identify 23 promoters and 28 enhancers. Using gene expression data from normal and tumour cells, we identify 66 putative target genes of the risk-associated enhancers (10 of which were also identified by promoter SNPs). Employing CRISPR nucleases, we delete one risk-associated enhancer and identify genes showing altered expression. We suggest that similar studies be performed to characterize all CRC risk-associated enhancers. PMID:25268989

  5. Reference-free detection of isolated SNPs.

    PubMed

    Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre

    2015-01-01

    Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism. PMID:25404127

  6. Novel and efficient tag SNPs selection algorithms.

    PubMed

    Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2014-01-01

    SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels. PMID:24212035

  7. Fast imputation using medium or low-coverage sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and microarray genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updat...

  8. Imputation of Cow Genotypes and Adjustment of PTAs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two new techniques were introduced in April 2010 to incorporate all available information in the evaluations. The use of imputed genotypes has added over 1600 cows to the genomic database, and adjusting cow evaluations has increased accuracy. All other countries that are producing genomic evaluation...

  9. Impact of adding foreign genomic information on Mexican Holstein imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The impact of adding US and Canada genomic information to the imputation of Mexican Holstein genotypes was measured by comparing 3 scenarios: 1) 2,018 Mexican genotyped animals; 2) animals from scenario 1 plus 886 related North American animals; and 3) animals from scenario 1 and all North American ...

  10. Family-based approaches: design, imputation, analysis, and beyond.

    PubMed

    Wijsman, Ellen M

    2016-01-01

    Participants in the family-based analysis group at Genetic Analysis Workshop 19 addressed diverse topics, all of which used the family data. Topics addressed included questions of study design and data quality control (QC), genotype imputation to augment available sequence data, and linkage and/or association analyses. Results show that pedigree-based tests that are sensitive to genotype error may be useful for QC. Imputation quality improved with inclusion of small amounts of pedigree information used to phase the data in evaluation of 5 commonly used approaches for imputation in samples of (typically) unrelated subjects. It improved still further when pedigree-based imputation using larger pedigrees was also added. An important distinction was made between methods that do versus do not make use of Mendelian transmission in pedigrees, because this serves as a key difference between underlying models and assumptions. Methods that model relatedness generally had higher power in association testing than did analyses that carry out testing in the presence of a transmission model, but this may reflect details of implementation and/or ability of more general methods to jointly include data from larger pedigrees. In either case, for single nucleotide polymorphism-set approaches, weights that incorporate information on functional effects may be more useful than those that are based only on allele frequencies. The overall results demonstrate that family data continue to provide important information in the search for trait loci. PMID:26866700

  11. Investigation of Multiple Imputation in Low-Quality Questionnaire Data

    ERIC Educational Resources Information Center

    Van Ginkel, Joost R.

    2010-01-01

    The performance of multiple imputation in questionnaire data has been studied in various simulation studies. However, in practice, questionnaire data are usually more complex than simulated data. For example, items may be counterindicative or may have unacceptably low factor loadings on every subscale, or completely missing subscales may

  12. Guidebook for Imputation of Missing Data. Technical Report No. 17.

    ERIC Educational Resources Information Center

    Wise, Lauress L.; McLaughlin, Donald H.

    This guidebook is designed for data analysts who are working with computer data files that contain records with incomplete data. It indicates choices the analyst must make and the criteria for making those choices in regard to the following questions: (1) What resources are available for performing the imputation? (2) How big is the data file? (3)…

  13. Multiple Imputation Strategies for Multiple Group Structural Equation Models

    ERIC Educational Resources Information Center

    Enders, Craig K.; Gottschall, Amanda C.

    2011-01-01

    Although structural equation modeling software packages use maximum likelihood estimation by default, there are situations where one might prefer to use multiple imputation to handle missing data rather than maximum likelihood estimation (e.g., when incorporating auxiliary variables). The selection of variables is one of the nuances associated…

  14. Investigation of Multiple Imputation in Low-Quality Questionnaire Data

    ERIC Educational Resources Information Center

    Van Ginkel, Joost R.

    2010-01-01

    The performance of multiple imputation in questionnaire data has been studied in various simulation studies. However, in practice, questionnaire data are usually more complex than simulated data. For example, items may be counterindicative or may have unacceptably low factor loadings on every subscale, or completely missing subscales may…

  15. The Use of SNPs in Pharmacogenomics Studies

    PubMed Central

    Alwi, Zilfalil Bin

    2005-01-01

    Pharmacogenomics is the study of how genetic makeup determines the response to a therapeutic intervention. It has the potential to revolutionize the practice of medicine by individualisation of treatment through the use of novel diagnostic tools. This new science should reduce the trial-and-error approach to the choice of treatment and thereby limit the exposure of patients to drugs that are not effective or are toxic for them. Single Nucleotide Polymorphisms (SNPs) holds the key in defining the risk of an individual’s susceptibility to various illnesses and response to drugs. There is an ongoing process of identifying the common, biologically relevant SNPs, in particular those that are associated with the risk of disease. The identification and characterization of large numbers of these SNPs are necessary before we can begin to use them extensively as genetic tools. As SNP allele frequencies vary considerably across human ethnic groups and populations, the SNP consortium has opted to use an ethnically diverse panel to maximize the chances of SNP discovery. Currently most studies are biased deliberately towards coding regions and the data generated from them therefore are unlikely to reflect the overall distribution of SNPs throughout the genome. The SNP consortium protocol was designed to identify SNPs without any bias towards these coding regions. Most pharmacogenomic studies were carried out in heterogeneous clinical trial populations, using case-control or cohort association study designs employing either candidate gene or Linkage disequilibrium (LD) mapping approaches. Concerns about the required patient sample sizes, the extent of LD, the number of SNPs needed in a map, the cost of genotyping SNPs, and the interpretation of results are some of the challenges that surround this field. While LD mapping is appealing in that it is an unbiased approach and allows a comprehensive genome-wide survey, the challenges and limitations are significant. An alternative such as the candidate gene approach does offer several advantages over LD mapping. Ultimately, as all human genes are discovered, the need for random SNP markers diminishes and gene-based SNP approaches will predominate. The challenges will then be to demonstrate convincing links between genetic variation and drug responses and to translate that information into useful pharmacogenomic tests. PMID:22605952

  16. Missing Data and Multiple Imputation: An Unbiased Approach

    NASA Technical Reports Server (NTRS)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  17. Development and characterisation of an expressed sequence tags (EST)-derived single nucleotide polymorphisms (SNPs) resource in rainbow trout

    PubMed Central

    2012-01-01

    Background There is considerable interest in developing high-throughput genotyping with single nucleotide polymorphisms (SNPs) for the identification of genes affecting important ecological or economical traits. SNPs are evenly distributed throughout the genome and are likely to be functionally relevant. In rainbow trout, in silico screening of EST databases represents an attractive approach for de novo SNP identification. Nevertheless, EST sequencing errors and assembly of EST paralogous sequences can lead to the identification of false positive SNPs which renders the reliability of EST-derived SNPs relatively low. Further validation of EST-derived SNPs is therefore required. The objective of this work was to assess the quality of and to validate a large number of rainbow trout EST-derived SNPs. Results A panel of 1,152 EST-derived SNPs was selected from the INRA Sigenae SNP database and was genotyped in standard and double haploid individuals from several populations using the Illumina GoldenGate BeadXpress assay. High-quality genotyping data were obtained for 958 SNPs representing a genotyping success rate of 83.2 %, out of which, 350 SNPs (36.5 %) were polymorphic in at least one population and were designated as true SNPs. They also proved to be a potential tool to investigate genetic diversity of the species, as the set of SNP successfully sorted individuals into three main groups using STRUCTURE software. Functional annotations revealed 28 non-synonymous SNPs, out of which four substitutions were predicted to affect protein functions. A subset of 223 true SNPs were polymorphic in the two INRA mapping reference families and were integrated into the INRA microsatellite-based linkage map. Conclusions Our results represent the first study of EST-derived SNPs validation in rainbow trout, a species whose genome sequences is not yet available. We designed several specific filters in order to improve the genotyping yield. Nevertheless, our selection criteria should be further improved in order to reduce the observed high rate of false positive SNPs which results from the occurrence of whole genome duplications. PMID:22694767

  18. A spatial haplotype copying model with applications to genotype imputation.

    PubMed

    Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

    2015-05-01

    Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data. PMID:25526526

  19. Performance of selected imputation techniques for missing variances in meta-analysis

    NASA Astrophysics Data System (ADS)

    Idris, N. R. N.; Abdullah, M. H.; Tolos, S. M.

    2013-04-01

    A common method of handling the problem of missing variances in meta-analysis of continuous response is through imputation. However, the performance of imputation techniques may be influenced by the type of model utilised. In this article, we examine through a simulation study the effects of the techniques of imputation of the missing SDs and type of models used on the overall meta-analysis estimates. The results suggest that imputation should be adopted to estimate the overall effect size, irrespective of the model used. However, the accuracy of the estimates of the corresponding standard error (SE) is influenced by the imputation techniques. For estimates based on the fixed effects model, mean imputation provides better estimates than multiple imputations, while those based on the random effects model responds more robustly to the type of imputation techniques. The results showed that although imputation is good in reducing the bias in point estimates, it is more likely to produce coverage probability which is higher than the nominal value.

  20. Imputation and quality control steps for combining multiple genome-wide datasets

    PubMed Central

    Verma, Shefali S.; de Andrade, Mariza; Tromp, Gerard; Kuivaniemi, Helena; Pugh, Elizabeth; Namjou-Khales, Bahram; Mukherjee, Shubhabrata; Jarvik, Gail P.; Kottyan, Leah C.; Burt, Amber; Bradford, Yuki; Armstrong, Gretta D.; Derr, Kimberly; Crawford, Dana C.; Haines, Jonathan L.; Li, Rongling; Crosslin, David; Ritchie, Marylyn D.

    2014-01-01

    The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR. PMID:25566314

  1. Imputation and quality control steps for combining multiple genome-wide datasets.

    PubMed

    Verma, Shefali S; de Andrade, Mariza; Tromp, Gerard; Kuivaniemi, Helena; Pugh, Elizabeth; Namjou-Khales, Bahram; Mukherjee, Shubhabrata; Jarvik, Gail P; Kottyan, Leah C; Burt, Amber; Bradford, Yuki; Armstrong, Gretta D; Derr, Kimberly; Crawford, Dana C; Haines, Jonathan L; Li, Rongling; Crosslin, David; Ritchie, Marylyn D

    2014-01-01

    The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R (2) (estimated correlation between the imputed and true genotypes), and the relationship between allelic R (2) and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR. PMID:25566314

  2. rSNPBase: a database for curated regulatory SNPs

    PubMed Central

    Guo, Liyuan; Du, Yang; Chang, Suhua; Zhang, Kunlin; Wang, Jing

    2014-01-01

    In recent years, human regulatory SNPs (rSNPs) have been widely studied. Here, we present database rSNPBase, freely available at http://rsnp.psych.ac.cn/, to provide curated rSNPs that analyses the regulatory features of all SNPs in the human genome with reference to experimentally supported regulatory elements. In contrast with previous SNP functional annotation databases, rSNPBase is characterized by several unique features. (i) To improve reliability, all SNPs in rSNPBase are annotated with reference to experimentally supported regulatory elements. (ii) rSNPBase focuses on rSNPs involved in a wide range of regulation types, including proximal and distal transcriptional regulation and post-transcriptional regulation, and identifies their potentially regulated genes. (iii) Linkage disequilibrium (LD) correlations between SNPs were analysed so that the regulatory feature is annotated to SNP-set rather than a single SNP. (iv) rSNPBase provides the spatio-temporal labels and experimental eQTL labels for SNPs. In summary, rSNPBase provides more reliable, comprehensive and user-friendly regulatory annotations on rSNPs and will assist researchers in selecting candidate SNPs for further genetic studies and in exploring causal SNPs for in-depth molecular mechanisms of complex phenotypes. PMID:24285297

  3. Multiple ant colony algorithm method for selecting tag SNPs.

    PubMed

    Liao, Bo; Li, Xiong; Zhu, Wen; Li, Renfa; Wang, Shulin

    2012-10-01

    The search for the association between complex disease and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. Finding a set of tag SNPs for haplotyping in a great number of samples is an important step to reduce cost for association study. Therefore, it is essential to select tag SNPs with more efficient algorithms. In this paper, we model problem of selection tag SNPs by MINIMUM TEST SET and use multiple ant colony algorithm (MACA) to search a smaller set of tag SNPs for haplotyping. The various experimental results on various datasets show that the running time of our method is less than GTagger and MLR. And MACA can find the most representative SNPs for haplotyping, so that MACA is more stable and the number of tag SNPs is also smaller than other evolutionary methods (like GTagger and NSGA-II). Our software is available upon request to the corresponding author. PMID:22480582

  4. Comparison of classification methods for detecting associations between SNPs and chick mortality

    PubMed Central

    2009-01-01

    Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies. PMID:19284707

  5. Comparison of classification methods for detecting associations between SNPs and chick mortality.

    PubMed

    Long, Nanye; Gianola, Daniel; Rosa, Guilherme J M; Weigel, Kent A; Avendaño, Santiago

    2009-01-01

    Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies. PMID:19284707

  6. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,

  7. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model

    PubMed Central

    Seaman, Shaun R; White, Ian R; Carpenter, James R

    2015-01-01

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available. PMID:24525487

  8. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  9. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  10. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  11. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... of money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  12. A Simplified Framework for Using Multiple Imputation in Social Work Research

    ERIC Educational Resources Information Center

    Rose, Roderick A.; Fraser, Mark W.

    2008-01-01

    Missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings. Increasingly, social science researchers are turning to multiple imputation to handle missing data. Multiple imputation, in which missing values are replaced by values repeatedly drawn from…

  13. A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

    ERIC Educational Resources Information Center

    Wolkowitz, Amanda A.; Skorupski, William P.

    2013-01-01

    When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…

  14. Correcting for Selective Nonresponse in the National Longitudinal Survey of Youth Using Multiple Imputation.

    ERIC Educational Resources Information Center

    Davey, Adam; Shanahan, Michael J.; Schafer, Joseph L.

    2001-01-01

    Principal components analysis revealed four patterns of nonresponse on children's psychosocial adjustment, lifetime poverty experiences, and family history. Results from examining latent growth curve models using listwise deletion and multiple imputation indicated that multiple imputation corrected for selective nonresponse, providing less-biased…

  15. Estimation of missing rainfall data using spatial interpolation and imputation methods

    NASA Astrophysics Data System (ADS)

    Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Azman, Muhammad Az-zuhri

    2015-02-01

    This study is aimed to estimate missing rainfall data by dividing the analysis into three different percentages namely 5%, 10% and 20% in order to represent various cases of missing data. In practice, spatial interpolation methods are chosen at the first place to estimate missing data. These methods include normal ratio (NR), arithmetic average (AA), coefficient of correlation (CC) and inverse distance (ID) weighting methods. The methods consider the distance between the target and the neighbouring stations as well as the correlations between them. Alternative method for solving missing data is an imputation method. Imputation is a process of replacing missing data with substituted values. A once-common method of imputation is single-imputation method, which allows parameter estimation. However, the single imputation method ignored the estimation of variability which leads to the underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. In this study, comparison of spatial interpolation methods and multiple imputations method are presented to estimate missing rainfall data. The performance of the estimation methods used are assessed using the similarity index (S-index), mean absolute error (MAE) and coefficient of correlation (R).

  16. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the...

  17. Methods of Imputation used in the USDA National Nutrient Database for Standard Reference

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Objective: To present the predominate methods of imputing used to estimate nutrient values for foods in the USDA National Nutrient Database for Standard Reference (SR20). Materials and Methods: The USDA Nutrient Data Laboratory developed standard methods for imputing nutrient values for foods wh...

  18. Imputation of Missing Genotypes From Sparse to High Density Using Long-Range Phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals in a population share long chromosome segments which trace to a common ancestor. We describe a long-range phasing algorithm that makes use of this property to phase whole chromosomes and simultaneously impute a large number of missing markers. We test our method by imputing marke...

  19. When data goes missing: methods for missing score imputation in biometric fusion

    NASA Astrophysics Data System (ADS)

    Ding, Yaohui; Ross, Arun

    2010-04-01

    While fusion can be accomplished at multiple levels in a multibiometric system, score level fusion is commonly used as it offers a good trade-off between fusion complexity and data availability. However, missing scores affect the implementation of several biometric fusion rules. While there are several techniques for handling missing data, the imputation scheme - which replaces missing values with predicted values - is preferred since this scheme can be followed by a standard fusion scheme designed for complete data. This paper compares the performance of three imputation methods: Imputation via Maximum Likelihood Estimation (MLE), Multiple Imputation (MI) and Random Draw Imputation through Gaussian Mixture Model estimation (RD GMM). A novel method called Hot-deck GMM is also introduced and exhibits markedly better performance than the other methods because of its ability to preserve the local structure of the score distribution. Experiments on the MSU dataset indicate the robustness of the schemes in handling missing scores at various missing data rates.

  20. Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.

    PubMed

    Deng, Yi; Chang, Changgee; Ido, Moges Seyoum; Long, Qi

    2016-01-01

    Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples. PMID:26868061

  1. Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

    PubMed Central

    Deng, Yi; Chang, Changgee; Ido, Moges Seyoum; Long, Qi

    2016-01-01

    Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples. PMID:26868061

  2. Imputation of missing values in the case of a multiple item instrument measuring alcohol consumption.

    PubMed

    Gmel, G

    2001-08-15

    Missing values in survey instruments are a common problem for survey researchers. It is aggravated in the case of instruments used to measure alcohol consumption: they usually consist of item batteries from which summary measures, such as grams of pure alcohol per day, are constructed, and a missing value (for example, quantity or frequency) in regard to a single item for only one of several beverages results in a missing summary measure across all of the beverages, though the values for the remaining items are known. The present paper examines different approaches to imputation of missing values, feasible with standard statistical software packages. Hot-deck imputation is shown to have certain advantages, but even single-value imputation (for example, median imputation) results in values that are comparable to those of the other three imputation methods. PMID:11468769

  3. Imputation of KIR Types from SNP Variation Data

    PubMed Central

    Vukcevic, Damjan; Traherne, James A.; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H.; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-01-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  4. Doubly robust multiple imputation using kernel-based techniques.

    PubMed

    Hsu, Chiu-Hsieh; He, Yulei; Li, Yisheng; Long, Qi; Friese, Randall

    2016-05-01

    We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real-data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data. PMID:26647734

  5. A two-step semiparametric method to accommodate sampling weights in multiple imputation.

    PubMed

    Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trviellore E

    2016-03-01

    Multiple imputation (MI) is a well-established method to handle item-nonresponse in sample surveys. Survey data obtained from complex sampling designs often involve features that include unequal probability of selection. MI requires imputation to be congenial, that is, for the imputations to come from a Bayesian predictive distribution and for the observed and complete data estimator to equal the posterior mean given the observed or complete data, and similarly for the observed and complete variance estimator to equal the posterior variance given the observed or complete data; more colloquially, the analyst and imputer make similar modeling assumptions. Yet multiply imputed data sets from complex sample designs with unequal sampling weights are typically imputed under simple random sampling assumptions and then analyzed using methods that account for the sampling weights. This is a setting in which the analyst assumes more than the imputer, which can led to biased estimates and anti-conservative inference. Less commonly used alternatives such as including case weights as predictors in the imputation model typically require interaction terms for more complex estimators such as regression coefficients, and can be vulnerable to model misspecification and difficult to implement. We develop a simple two-step MI framework that accounts for sampling weights using a weighted finite population Bayesian bootstrap method to validly impute the whole population (including item nonresponse) from the observed data. In the second step, having generated posterior predictive distributions of the entire population, we use standard IID imputation to handle the item nonresponse. Simulation results show that the proposed method has good frequentist properties and is robust to model misspecification compared to alternative approaches. We apply the proposed method to accommodate missing data in the Behavioral Risk Factor Surveillance System when estimating means and parameters of regression models. PMID:26393409

  6. Missing data imputation in two phase III trials treating HIV1 infection.

    PubMed

    Huson, L W; Chung, J; Salgo, M

    2007-01-01

    In most longitudinal clinical trials, some patients drop out before the end of the planned follow-up, and, in order to allow an all-patient intent-to-treat analysis to be performed, it is common practice to use some method of imputation to estimate values for missing data. However, different imputation methods may provide different results, and it is essential to investigate the sensitivity of the analysis using different imputation rules. In our analysis of two trials of the new HIV1 fusion inhibitor enfuvirtide, we compared some standard methods of imputing and analyzing HIV1-RNA data with two novel alternatives, to check the robustness of the primary endpoint results. The standard methods were: (1) last-observation-carried-forward, (2) baseline carried forward, and (3) multiple imputation. These were compared with a nearest-neighbour hot-deck method, specifically proposed for imputation of missing HIV1-RNA data, and with a heuristic approach: censored regression analysis of the last-observation-carried-forward. To supplement this analysis of real clinical trial data, we investigated the performance of the same imputation methods on simulated datasets designed to cover a broader range of missing data patterns. PMID:17219761

  7. FAPI: Fast and accurate P-value Imputation for genome-wide association study.

    PubMed

    Kwan, Johnny Sh; Li, Miao-Xin; Deng, Jia-En; Sham, Pak C

    2016-05-01

    Imputing individual-level genotypes (or genotype imputation) is now a standard procedure in genome-wide association studies (GWAS) to examine disease associations at untyped common genetic variants. Meta-analysis of publicly available GWAS summary statistics can allow more disease-associated loci to be discovered, but these data are usually provided for various variant sets. Thus imputing these summary statistics of different variant sets into a common reference panel for meta-analyses is impossible using traditional genotype imputation methods. Here we develop a fast and accurate P-value imputation (FAPI) method that utilizes summary statistics of common variants only. Its computational cost is linear with the number of untyped variants and has similar accuracy compared with IMPUTE2 with prephasing, one of the leading methods in genotype imputation. In addition, based on the FAPI idea, we develop a metric to detect abnormal association at a variant and showed that it had a significantly greater power compared with LD-PAC, a method that quantifies the evidence of spurious associations based on likelihood ratio. Our method is implemented in a user-friendly software tool, which is available at http://statgenpro.psychiatry.hku.hk/fapi. PMID:26306642

  8. Missing Value Imputation Method by Using Bayesian Network with Weighted Learning

    NASA Astrophysics Data System (ADS)

    Miyakoshi, Yoshihiro; Kato, Shohei

    Recently, we can easily have huge database with the development of computer network. Accordingly, it becomes difficult for users to extract knowledge from the database. In this paper, we focus on data mining, especially classification. In the real-world data mining, missing value problem is happened, for example, speech containing noises, facial occlusions, and so on. When the test sample have missing values, classification systems can not classify that. In previous studies, various imputation methods have been developed. Previous imputation methods were developed to solve the missing value problem with lots of explanatory variable, even if some explanatory variables are ineffective for imputation. It has been said that using lots of variable deteriorates in learning efficiency, thus we believe that imputation methods should be developed considering relations among explanatory variables. Moreover, it is effective considering not only relations among explanatory variables but also between the test sample and each of the training sample. Therefore we propose the imputation method by using Bayesian network with weighted learning. Through the experiments, we could confirm that the proposed method imputed missing values with approximate values, and a classification system successfully classified the test sample, in which missing values were imputed by the proposed method, in comparison with some conventional methods.

  9. Missing value imputation improves clustering and interpretation of gene expression microarray data

    PubMed Central

    Tuikkala, Johannes; Elo, Laura L; Nevalainen, Olli S; Aittokallio, Tero

    2008-01-01

    Background Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. Results We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. Conclusion The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA). PMID:18423022

  10. Traffic speed data imputation method based on tensor completion.

    PubMed

    Ran, Bin; Tan, Huachun; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches. PMID:25866501

  11. Traffic Speed Data Imputation Method Based on Tensor Completion

    PubMed Central

    Ran, Bin; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches. PMID:25866501

  12. Differential Network Analysis with Multiply Imputed Lipidomic Data

    PubMed Central

    Kujala, Maiju; Nevalainen, Jaakko; März, Winfried; Laaksonen, Reijo; Datta, Susmita

    2015-01-01

    The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. PMID:25822937

  13. Data imputation through the identification of local anomalies.

    PubMed

    Ozkan, Huseyin; Pelvan, Ozgun Soner; Kozat, Suleyman S

    2015-10-01

    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. PMID:25608311

  14. Localization of Allotetraploid Gossypium SNPs Using Physical Mapping Resources

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent efforts in Gossypium SNP development have produced thousands of putative SNPs for G. barbadense, G. mustelinum, and G. tomentosum relative to G. hirsutum. Here we report on current efforts to localize putative SNPs using physical mapping resources. Recent advances in physical mapping resour...

  15. High-accuracy haplotype imputation using unphased genotype data as the references.

    PubMed

    Li, Wenzhi; Xu, Wei; Fu, Guoxing; Ma, Li; Richards, Jendai; Rao, Weinian; Bythwood, Tameka; Guo, Shiwen; Song, Qing

    2015-11-10

    Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era. PMID:26232609

  16. Multiple imputation as a means to assess Mammographic vs. Ultrasound technology in Determine Breast Cancer Recurrence

    NASA Astrophysics Data System (ADS)

    Helenowski, Irene B.; Demirtas, Hakan; Khan, Seema; Eladoumikdachi, Firas; Shidfar, Ali

    2014-03-01

    Tumor size based on mammographic and ultrasound data are two methods used in predicting recurrence in breast cancer patients. Which technology offers better determination of diagnosis is an ongoing debate among radiologists, biophysicists, and other clinicians, however. Further complications in assessing the performance of each technology arise from missing data. One approach to remedy this problem may involve multiple imputation. Here, we therefore examine how imputation affects our assessment of the relationship between recurrence and tumor size determined either by mammography of ultrasound technology. We specifically employ the semi-parametric approach for imputing mixed continuous and binary data as presented in Helenowski and Demirtas (2013).

  17. Comparison of methods for imputing limited-range variables: a simulation study

    PubMed Central

    2014-01-01

    Background Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values. Methods Using data from a study of adolescent health, we consider three variables based on responses to the General Health Questionnaire (GHQ), a tool for detecting minor psychiatric illness. These variables, based on different scoring methods for the GHQ, resulted in three continuous distributions with mild, moderate and severe positive skewness. In an otherwise complete dataset, we set 33% of the GHQ observations to missing completely at random or missing at random; repeating this process to create 1000 datasets with incomplete data for each scenario. For each dataset, we imputed values on the raw scale and following a zero-skewness log transformation using: univariate regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics. Results Imputation with no rounding performed well when applied to data on the raw scale. Post-imputation rounding and imputation using truncated normal regression produced higher marginal means than the complete data estimate when data had a moderate or severe skew, and this was associated with under-coverage of the complete data estimate. Predictive mean matching also produced under-coverage of the complete data estimate. For the estimate of association, all methods produced similar estimates to the complete data. Conclusions For data with a limited range, multiple imputation using techniques that restrict the range of imputed values can result in biased estimates for the marginal mean when data are highly skewed. PMID:24766825

  18. Mini-haplotypes as lineage informative SNPs and ancestry inference SNPs

    PubMed Central

    Pakstis, Andrew J; Fang, Rixun; Furtado, Manohar R; Kidd, Judith R; Kidd, Kenneth K

    2012-01-01

    We propose that haplotyped loci with high heterozygosity can be useful in human identification, especially within families, if recombination is very low among the sites. Three or more SNPs extending over small molecular intervals (<10 KB) can be identified in the human genome to define miniature haplotypes with moderate levels of linkage disequilibrium. Properly selected, these mini-haplotypes (or minihaps) consist of multiple haplotype lineages (alleles) that have evolved from the ancestral human haplotype but show no evidence of recurring recombination, allowing each distinct haplotype to be equated with an allele, all copies of which are essentially identical by descent. Historic recombinants, representing rare events that have drifted to common frequencies over many generations, can be identified in some cases, they do not equate to frequently recurring recombination. We have identified examples in our data collected on various projects and present eight such mini-haplotypes comprised of informative SNPs. We also discuss the ideal characteristics and advantages of minihaps for human familial identification and ancestry inference, and compare them to other types of forensic markers in use and/or that have been proposed. We expect that it is possible to carry out a systematic search and identify a useful panel of mini-haplotypes, with even better properties than the examples presented here. PMID:22535184

  19. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  20. Large-scale epigenome imputation improves data quality and disease variant enrichment

    PubMed Central

    Ernst, Jason; Kellis, Manolis

    2015-01-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals, and surpass experimental datasets in consistency, recovery of gene annotations, and enrichment for disease-associated variants. We use the imputed data to detect low quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments, and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  1. SNPs selection using support vector regression and genetic algorithms in GWAS

    PubMed Central

    2014-01-01

    Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332

  2. Potentially Functional SNPs (pfSNPs) as Novel Genomic Predictors of 5-FU Response in Metastatic Colorectal Cancer Patients

    PubMed Central

    Zhao, Mingjue; Choo, Su Pin; Ong, Sin Jen; Ong, Simon Y. K.; Chong, Samuel S.; Teo, Yik Ying; Lee, Caroline G. L.

    2014-01-01

    5-Fluorouracil (5-FU) and its pro-drug Capecitabine have been widely used in treating colorectal cancer. However, not all patients will respond to the drug, hence there is a need to develop reliable early predictive biomarkers for 5-FU response. Here, we report a novel potentially functional Single Nucleotide Polymorphism (pfSNP) approach to identify SNPs that may serve as predictive biomarkers of response to 5-FU in Chinese metastatic colorectal cancer (CRC) patients. 1547 pfSNPs and one variable number tandem repeat (VNTR) in 139 genes in 5-FU drug (both PK and PD pathway) and colorectal cancer disease pathways were examined in 2 groups of CRC patients. Shrinkage of liver metastasis measured by RECIST criteria was used as the clinical end point. Four non-responder-specific pfSNPs were found to account for 37.5% of all non-responders (P<0.0003). Five additional pfSNPs were identified from a multivariate model (AUC under ROC = 0.875) that was applied for all other pfSNPs, excluding the non-responder-specific pfSNPs. These pfSNPs, which can differentiate the other non-responders from responders, mainly reside in tumor suppressor genes or genes implicated in colorectal cancer risk. Hence, a total of 9 novel SNPs with potential functional significance may be able to distinguish non-responders from responders to 5-FU. These pfSNPs may be useful biomarkers for predicting response to 5-FU. PMID:25372392

  3. Shrinkage regression-based methods for microarray missing value imputation

    PubMed Central

    2013-01-01

    Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods. PMID:24565159

  4. Imputation of missing data using machine learning techniques

    SciTech Connect

    Lakshminarayan, Kamakshi; Harp, S.A.; Goldman, R.; Samad, T.

    1996-12-31

    A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine learning techniques. The first is an unsupervised clustering strategy which uses a Bayesian approach to cluster the data into classes. The classes so obtained are then used to predict multiple choices for the attribute of interest. The second technique involves modeling missing variables by supervised induction of a decision tree-based classifier. This predicts the most likely value for the attribute of interest. Empirical tests using extracts from industrial databases maintained by Honeywell customers have been done in order to compare the two techniques. These tests show both approaches are useful and have advantages and disadvantages. We argue that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and discuss potential applications for the procedures we are developing.

  5. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets

    SciTech Connect

    Torres-García, Wandaliz; Brown, Steven D; Johnson, Roger; Zhang, Weiwen; Runger, George; Meldrum, Deirdre

    2011-01-01

    Despite significant improvements in recent years, proteomic datasets currently available still suffer large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic da-tasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values for proteins experi-mentally undetected, using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes expression was measured after the cells were exposed to 1 mM potassium chromate for 5-, 30-, 60-, and 90-min, while protein abundance was measured only for 45- and 90-min samples. With the goal of elucidating the relationship between temporal gene expression and protein abundance data, and then using it to impute missing proteomic values for samples of 45-min (which does not have cognate transcriptomic data) and 90-min, we initially used nonlinear Smoothing Splines Curve Fitting (SSCF) to identify temporal relationships among transcriptomic data at different time points and then imputed missing gene expression measurements for the sample at 45-min. After the imputation was validated by biological constrains (i.e. operons), we used a data-driven Gradient Boosted Trees (GBT) model to uncover possible non-linear relationships between temporal transcriptomic and proteomic data, and to impute protein abundance for the proteins experimentally undetected in the 45- and 90-min sam-ples, based on relevant predictors such as temporal mRNA gene expression data, cellular roles, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. The imputed protein values were validated using biological constraints such as operon, regulon and pathway information. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.

  6. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

    NASA Astrophysics Data System (ADS)

    Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

    2013-04-01

    This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

  7. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy

    PubMed Central

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy. PMID:26283989

  8. Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data

    PubMed Central

    Edwards, Jessie K.; Cole, Stephen R.; Troester, Melissa A.; Richardson, David B.

    2013-01-01

    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods. PMID:24627573

  9. PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

    PubMed Central

    Livne, Oren E.; Han, Lide; Alkorta-Aranburu, Gorka; Wentworth-Sheilds, William; Abney, Mark; Ober, Carole; Nicolae, Dan L.

    2015-01-01

    Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost. PMID:25735005

  10. Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches

    PubMed Central

    Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer

    2014-01-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  11. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

    PubMed Central

    2009-01-01

    Background Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies. PMID:19638200

  12. Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis.

    PubMed

    Siddique, Juned; Reiter, Jerome P; Brincks, Ahnalee; Gibbons, Robert D; Crespi, Catherine M; Brown, C Hendricks

    2015-11-20

    There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials and use multiple imputation to fill in missing measurements. We apply our method to five longitudinal adolescent depression trials where four studies used one depression measure and the fifth study used a different depression measure. None of the five studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigate whether external information is appropriately incorporated into the imputed values. PMID:26095855

  13. Multiple imputation and analysis for high-dimensional incomplete proteomics data.

    PubMed

    Yin, Xiaoyan; Levy, Daniel; Willinger, Christine; Adourian, Aram; Larson, Martin G

    2016-04-15

    Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ≫ N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:26565662

  14. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    PubMed

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn Ga; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John Rb

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7 879 351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation. PMID:26014426

  15. Identification of SNPs associated with variola virus virulence

    PubMed Central

    2013-01-01

    Background Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. Conclusions We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity. PMID:23410064

  16. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers.

    PubMed

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  17. Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response.

    PubMed

    Yucel, Recai M

    2008-07-13

    Methods specifically targeting missing values in a wide spectrum of statistical analyses are now part of serious statistical thinking due to many advances in computational statistics and increased awareness among sophisticated consumers of statistics. Despite many advances in both theory and applied methods for missing data, missing-data methods in multilevel applications lack equal development. In this paper, I consider a popular inferential tool via multiple imputation in multilevel applications with missing values. I specifically consider missing values occurring arbitrarily at any level of observational units. I use Bayesian arguments for drawing multiple imputations from the underlying (posterior) predictive distribution of missing data. Multivariate extensions of well-known mixed-effects models form the basis for simulating the posterior predictive distribution, hence creating the multiple imputations. The discussion of these topics is demonstrated in an application assessing correlates to unmet need for mental health care among children with special health care needs. PMID:18407897

  18. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    PubMed Central

    Crespo Turrado, Concepción; Sánchez Lasheras, Fernando; Calvo-Rollé, José Luis; Piñón-Pazos, Andrés José; de Cos Juez, Francisco Javier

    2015-01-01

    Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor) adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS) and compares it with the well-known technique called multivariate imputation by chained equations (MICE). The results obtained demonstrate how the proposed method outperforms the MICE algorithm. PMID:26690437

  19. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation

    PubMed Central

    Graffelman, Jan; Nelson, S.; Gogarten, S. M.; Weir, B. S.

    2015-01-01

    This paper addresses the issue of exact-test based statistical inference for Hardy−Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy−Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ2 statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy−Weinberg disequilibrium. Depending on the imputation method used, 6−13% of the test results changed qualitatively at the 5% level. PMID:26377959

  20. Next generation tools for the annotation of human SNPs

    PubMed Central

    2009-01-01

    Computational biology has the opportunity to play an important role in the identification of functional single nucleotide polymorphisms (SNPs) discovered in large-scale genotyping studies, ultimately yielding new drug targets and biomarkers. The medical genetics and molecular biology communities are increasingly turning to computational biology methods to prioritize interesting SNPs found in linkage and association studies. Many such methods are now available through web interfaces, but the interested user is confronted with an array of predictive results that are often in disagreement with each other. Many tools today produce results that are difficult to understand without bioinformatics expertise, are biased towards non-synonymous SNPs, and do not necessarily reflect up-to-date versions of their source bioinformatics resources, such as public SNP repositories. Here, I assess the utility of the current generation of webservers; and suggest improvements for the next generation of webservers to better deliver value to medical geneticists and molecular biologists. PMID:19181721

  1. Analysis of mitochondrial transcription factor A SNPs in alcoholic cirrhosis

    PubMed Central

    TANG, CHUN; LIU, HONGMING; TANG, YONGLIANG; GUO, YONG; LIANG, XIANCHUN; GUO, LIPING; PI, RUXIAN; YANG, JUNTAO

    2014-01-01

    Genetic susceptibility to alcoholic cirrhosis (AC) exists. We previously demonstrated hepatic mitochondrial DNA (mtDNA) damage in patients with AC compared with chronic alcoholics without cirrhosis. Mitochondrial transcription factor A (mtTFA) is central to mtDNA expression regulation and repair; however, it is unclear whether there are specific mtTFA single nucleotide polymorphisms (SNPs) in patients with AC and whether they affect mtDNA repair. In the present study, we screened mtTFA SNPs in patients with AC and analyzed their impact on the copy number of mtDNA in AC. A total of 50 patients with AC, 50 alcoholics without AC and 50 normal subjects were enrolled in the study. SNPs of full-length mtTFA were analyzed using the polymerase chain reaction (PCR) combined with gene sequencing. The hepatic mtTFA mRNA and mtDNA copy numbers were measured using quantitative PCR (qPCR), and mtTFA protein was measured using western blot analysis. A total of 18 mtTFA SNPs specific to patients with AC with frequencies >10% were identified. Two were located in the coding region and 16 were identified in non-coding regions. Conversely, there were five SNPs that were only present in patients with AC and normal subjects and had a frequency >10%. In the AC group, the hepatic mtTFA mRNA and protein levels were significantly lower than those in the other two groups. Moreover, the hepatic mtDNA copy number was significantly lower in the AC group than in the controls and alcoholics without AC. Based on these data, we conclude that AC-specific mtTFA SNPs may be responsible for the observed reductions in mtTFA mRNA, protein levels and mtDNA copy number and they may also increase the susceptibility to AC. PMID:24348767

  2. Short communication: Imputation performances of 3 low-density marker panels in beef and dairy cattle.

    PubMed

    Dassonneville, R; Fritz, S; Ducrocq, V; Boichard, D

    2012-07-01

    Low-density chips are appealing alternative tools contributing to the reduction of genotyping costs. Imputation enables researchers to predict missing genotypes to recreate the denser coverage of the standard 50K (∼50,000) genotype. Two alternative in silico chips were defined in this study that included markers selected to optimize minor allele frequency and spacing. The objective of this study was to compare the imputation accuracy of these custom low-density chips with a commercially available 3K chip. Data consisted of genotypes of 4,037 Holstein bulls, 1,219 Montbéliarde bulls, and 991 Blonde d'Aquitaine bulls. Criteria to select markers to include in low-density marker panels are described. To mimic a low-density genotype, all markers except the markers present on the low-density panel were masked in the validation population. Imputation was performed using the Beagle software. Combining the directed acyclic graph obtained with Beagle with the PHASEBOOK algorithm provides fast and accurate imputation that is suitable for routine genomic evaluations based on imputed genotypes. Overall, 95 to 99% of alleles were correctly imputed depending on the breed and the low-density chip used. The alternative low-density chips gave better results than the commercially available 3K chip. A low-density chip with 6,000 markers is a valuable genotyping tool suitable for both dairy and beef breeds. Such a tool could be used for preselection of young animals or large-scale screening of the female population. PMID:22720970

  3. A suggested approach for imputation of missing dietary data for young children in daycare

    PubMed Central

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. Results The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children. PMID:26689313

  4. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which may help to identify deleterious alleles that are the basis of inbreeding depression in the species. PMID:23324311

  5. Methods for imputation of missing values in air quality data sets

    NASA Astrophysics Data System (ADS)

    Junninen, Heikki; Niska, Harri; Tuppurainen, Kari; Ruuskanen, Juhani; Kolehmainen, Mikko

    Methods for data imputation applicable to air quality data sets were evaluated in the context of univariate (linear, spline and nearest neighbour interpolation), multivariate (regression-based imputation (REGEM), nearest neighbour (NN), self-organizing map (SOM), multi-layer perceptron (MLP)), and hybrid methods of the previous by using simulated missing data patterns. Additionally, a multiple imputation procedure was considered in order to make comparison between single and multiple imputations schemes. Four statistical criteria were adopted: the index of agreement, the squared correlation coefficient ( R2), the root mean square error and the mean absolute error with bootstrapped standard errors. The results showed that the performance of interpolation in respect to the length of gaps could be estimated separately for each variable of air quality by calculating a gradient and an exponent α (Hurst exponent). This can be further utilised in hybrid approach in which the imputation has been performed either by interpolation or multivariate method depending on the length of gaps and variable under study. Among the multivariate methods, SOM and MLP performed slightly better than REGEM and NN methods. The advantage of SOM over the others was that it was less dependent on the actual location of the missing values. If priority is given to computational speed, however, NN can be recommended. The results in general showed that the slight improvement in the performances of multivariate methods can be achieved by using the hybridisation and more substantial one by using the multiple imputations where a final estimate is composed of the outputs of several multivariate fill-in methods.

  6. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  7. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  8. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  9. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 2 2013-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  10. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 2 2011-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  11. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  12. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 2 2014-04-01 2014-04-01 false May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  13. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 2 2012-04-01 2009-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  14. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false May the Office of National Drug Control Policy impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG... Suspension and Debarment Actions § 1404.630 May the Office of National Drug Control Policy impute conduct...

  15. 22 CFR 1508.630 - May the African Development Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the African Development Foundation impute... FOUNDATION GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1508.630 May the African Development Foundation impute conduct of one...

  16. 22 CFR 1006.630 - May the Inter-American Foundation impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 2 2010-04-01 2010-04-01 true May the Inter-American Foundation impute conduct of one person to another? 1006.630 Section 1006.630 Foreign Relations INTER-AMERICAN FOUNDATION... Actions § 1006.630 May the Inter-American Foundation impute conduct of one person to another? For...

  17. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough

    PubMed Central

    McMahon, George; Ring, Susan M.; Davey-Smith, George; Timpson, Nicholas J.

    2015-01-01

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case–control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E − 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  18. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough.

    PubMed

    McMahon, George; Ring, Susan M; Davey-Smith, George; Timpson, Nicholas J

    2015-10-15

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case-control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E - 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  19. Quality assessment parameters for EST-derived SNPs from catfish

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two factors were found to be most significant for validation of EST-derived SNPs: the contig size and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contig sizes were equal to or larger than...

  20. Association analysis of candidate SNPs on reproductive traits in swine

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Being able to identify young females with superior reproduction traits would have a large financial impact on commercial swine producers. Previous studies have discovered SNPs associated with economically important traits such as litter size, growth rate, fat deposition, and feed intake. The objecti...

  1. Genotyping of Snps in a polyploid genome by pyrosequencing.

    PubMed

    Rickert, Andreas M; Premstaller, Andreas; Gebhardt, Christiane; Oefner, Peter J

    2002-03-01

    Single-nucleotide polymorphisms (SNPs) are the most frequent DNA sequence variations, and they have become increasingly popular markers for association studies. Allelic discrimination of the mostly binary SNPs has been reported for diploid species, mainly the human, but not for polyploid genomes such as the agriculturally important crops. In the present study, we analyzed the applicability of pyrosequencing to genotyping SNPs in tetraploid potatoes. Out of 94 polymorphic loci tested, 76 (81%) proved to be amenable to allelic discrimination by pyrosequencing. An additional locus could be genotyped by the addition of an ssDNA binding protein to the pyrosequencing reaction. Of the remaining 17 loci, two failed because of the presence of paralogs in the genome, while in the other cases, self-annealing of the primer or template at the low reaction temperature (28 degrees C) employed in pyrosequencing rendered allelic discrimination impossible. The quantitative precision ofpyrosequencing was found to be similar to that of conventional dideoxy sequencing and single-nucleotide primer extension. Exceptfor some sequencespecific limitations, pyrosequencing appears to be an appropriate method for genotying SNPs in polyploid species because it is possible to distinguish not only between homoand heterozygosity but also between the different heterozygous states. PMID:11911662

  2. Effects of reduced panel, reference origin, and genetic relationship on imputation of genotypes in Hereford cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to investigate alternative methods for designing and utilizing reduced single nucleotide polymorphism (SNP) panels for imputing SNP genotypes. Two purebred Hereford populations, an experimental population known as Line 1 Hereford (L1, N=240) and registered Hereford wi...

  3. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    PubMed

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. PMID:26477633

  4. Imputation of missing genotypes from sparse to high density using long-range phasing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Related individuals share potentially long chromosome segments that trace to a common ancestor. A phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations was developed to phase large sections of a chromosome. In addition to phasing, ChromoPhase imputes missing genotyp...

  5. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  6. Missing value imputation in DNA microarrays based on conjugate gradient method.

    PubMed

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. PMID:22154717

  7. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their…

  8. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    ERIC Educational Resources Information Center

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their

  9. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    ERIC Educational Resources Information Center

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a…

  10. Multiple Imputation to Correct for Nonresponse Bias: Application in Non-Communicable Disease Risk Factors Survey

    PubMed Central

    Miri, Hamid Heidarian; Hassanzadeh, Jafar; Rajaeefard, Abdolreza; Mirmohammadkhani, Majid; Angali, Kambiz Ahmadi

    2016-01-01

    Background: This study was carried out to use multiple imputation (MI) in order to correct for the potential nonresponse bias in measurements related to variable fasting blood glucose (FBS) in non-communicable disease risk factors survey conducted in Iran in 2007. Methods: Five multiple imputation methods as bootstrap expectation maximization, multivariate normal regression, univariate linear regression, MI by chained equation, and predictive mean matching were applied to impute variable fasting blood sugar. To make FBS consistent with normality assumption natural logarithm (Ln) and Box-Cox (BC) transformations were used prior to imputation. Measurements from which we intended to remove nonresponse bias included mean of FBS and percentage of those with high FBS. Results: For mean of FBS results didn’t considerably change after applying MI methods. Regarding the prevalence of high blood sugar all methods on original scale tended to increase the estimates except for predictive mean matching that along with all methods on BC or Ln transformed data didn’t change the results. Conclusions: FBS-related measurements didn’t change after applying different MI methods. It seems that nonresponse bias was not an important challenge regarding these measurements. However use of MI methods resulted in more efficient estimations. Further studies are encouraged on accuracy of MI methods in these settings. PMID:26234966

  11. Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.

    ERIC Educational Resources Information Center

    Schafer, Joseph L.; Olsen, Maren K.

    1998-01-01

    The key ideas of multiple imputation for multivariate missing data problems are reviewed. Software programs available for this analysis are described, and their use is illustrated with data from the Adolescent Alcohol Prevention Trial (W. Hansen and J. Graham, 1991). (SLD)

  12. Single imputation method of missing values in environmental pollution data sets

    NASA Astrophysics Data System (ADS)

    Plaia, A.; Bondì, A. L.

    Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of concentration measured every 2h by eight monitoring stations distributed over the metropolitan area of Palermo, Sicily, during 2003, simulated incomplete data have been generated, and the performance of the imputation methods have been compared on the correlation coefficient (ρ), the index of agreement ( d), the root mean square deviation (RMSD) and the mean absolute deviation (MAD). All the performance indicators agree to evaluate the proposed method as the best among the ones compared, independently on the gap length and on the number of stations with missing data.

  13. Generating Multiple Imputations for Matrix Sampling Data Analyzed with Item Response Models.

    ERIC Educational Resources Information Center

    Thomas, Neal; Gan, Nianci

    1997-01-01

    Describes and assesses missing data methods currently used to analyze data from matrix sampling designs implemented by the National Assessment of Educational Progress. Several improved methods are developed, and these models are evaluated using an EM algorithm to obtain maximum likelihood estimates followed by multiple imputation of complete data…

  14. The Effects of Methods of Imputation for Missing Values on the Validity and Reliability of Scales

    ERIC Educational Resources Information Center

    Cokluk, Omay; Kayri, Murat

    2011-01-01

    The main aim of this study is the comparative examination of the factor structures, corrected item-total correlations, and Cronbach-alpha internal consistency coefficients obtained by different methods used in imputation for missing values in conditions of not having missing values, and having missing values of different rates in terms of testing…

  15. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

  16. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence

  17. Probability genotype imputation method and integrated weighted lasso for QTL identification

    PubMed Central

    2013-01-01

    Background Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings “sparsity” and “causal inference”. The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest. Results Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax. Conclusions Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification. PMID:24378210

  18. Imputation methods for temporal radiographic texture analysis in the detection of periprosthetic osteolysis

    NASA Astrophysics Data System (ADS)

    Wilkie, Joel R.; Giger, Maryellen L.; Pesce, Lorenzo L.; Engh, Charles A., Sr.; Hopper, Robert H., Jr.; Martell, John M.

    2007-03-01

    Periprosthetic osteolysis is a disease triggered by the body's response to tiny wear fragments from total hip replacements (THR), which leads to localized bone loss and disappearance of the trabecular bone texture. We have been investigating methods of temporal radiographic texture analysis (tRTA) to help detect periprosthetic osteolysis. One method involves merging feature measurements at multiple time points using an LDA or BANN. The major drawback of this method is that several cases do not meet the inclusion criteria because of missing data, i.e., missing image data at the necessary time intervals. In this research, we investigated imputation methods to fill in missing data points using feature averaging, linear interpolation, and first and second order polynomial fitting. The database consisted of 101 THR cases with full data available from four follow-up intervals. For 200 iterations, missing data were randomly created to simulate a typical THR database, and the missing points were then filled in using the imputation methods. ROC analysis was used to assess the performance of tRTA in distinguishing between osteolysis and normal cases for the full database and each simulated database. The calculated values from the 200 iterations showed that the imputation methods produced negligible bias, and substantially decreased the variance of the AUC estimator, relative to excluding incomplete cases. The best performing imputation methods were those that heavily weighted the data points closest to the missing data. The results suggest that these imputation methods appear to be acceptable means to include cases with missing data for tRTA.

  19. Uncovering nativity disparities in cancer patterns: A multiple imputation strategy to handle missing nativity data in the SEER data file

    PubMed Central

    Montealegre, Jane R.; Zhou, Renke; Amirian, E. Susan; Scheurer, Michael E.

    2014-01-01

    Background While birthplace data are routinely collected in the participating Surveillance, Epidemiology, and End Results (SEER) registries, such data are missing in a non-random manner for a large proportion of cases. This hinders analysis of nativity-related cancer disparities. We evaluate multiple imputation of nativity status among Hispanic patients diagnosed with cervix, prostate, and colorectal cancer and demonstrate the effect of multiple imputation on apparent nativity disparities in survival. Methods We used multiple imputation by logistic regression to generate nativity values (U.S.- versus foreign-born) using a priori-defined variables. The accuracy of the method was evaluated among a subset of cases. We used Kaplan-Meier curves to illustrate the effect of imputation by comparing survival among U.S.- and foreign-born Hispanics, with and without imputation of nativity. Results Birthplace was missing for 31%, 49%, and 39% of cervical, prostate, and colorectal cancer cases, respectively. The sensitivity of the imputation strategy for detecting foreign-born status was ? 90% and the specificity ? 86%. The agreement between the true and imputed values was ? 0.80 and the misclassification error was ? 10%. Kaplan-Meier survival curves indicated different associations between nativity and survival when nativity was imputed versus when cases with missing birthplace were omitted from the analysis. Conclusions Multiple imputation using variables available in the SEER data file can be used to accurately detect foreign-born status. This simple strategy may aid researchers to disaggregate analyses by nativity and uncover important nativity disparities in regard to cancer diagnosis, treatment, and survival. PMID:24436157

  20. Constructing bootstrap confidence intervals for principal component loadings in the presence of missing data: a multiple-imputation approach.

    PubMed

    van Ginkel, Joost R; Kiers, Henk A L

    2011-11-01

    Earlier research has shown that bootstrap confidence intervals from principal component loadings give a good coverage of the population loadings. However, this only applies to complete data. When data are incomplete, missing data have to be handled before analysing the data. Multiple imputation may be used for this purpose. The question is how bootstrap confidence intervals for principal component loadings should be corrected for multiply imputed data. In this paper, several solutions are proposed. Simulations show that the proposed corrections for multiply imputed data give a good coverage of the population loadings in various situations. PMID:21973098

  1. Disk filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  2. Disk filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

  3. SNP-VISTA: An Interactive SNPs Visualization Tool

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.

  4. In-Silico Computing of the Most Deleterious nsSNPs in HBA1 Gene

    PubMed Central

    AbdulAzeez, Sayed; Borgio, J. Francis

    2016-01-01

    Background ?-Thalassemia (?-thal) is a genetic disorder caused by the substitution of single amino acid or large deletions in the HBA1 and/or HBA2 genes. Method Using modern bioinformatics tools as a systematic in-silico approach to predict the deleterious SNPs in the HBA1 gene and its significant pathogenic impact on the functions and structure of HBA1 protein was predicted. Results and Discussion A total of 389 SNPs in HBA1 were retrieved from dbSNP database, which includes: 201 non-coding synonymous (nsSNPs), 43 human active SNPs, 16 intronic SNPs, 11 mRNA 3? UTR SNPs, 9 coding synonymous SNPs, 9 5? UTR SNPs and other types. Structural homology-based method (PolyPhen) and sequence homology-based tool (SIFT), SNPs&Go, PROVEAN and PANTHER revealed that 2.4% of the nsSNPs are pathogenic. Conclusions A total of 5 nsSNPs (G60V, K17M, K17T, L92F and W15R) were predicted to be responsible for the structural and functional modifications of HBA1 protein. It is evident from the deep comprehensive in-silico analysis that, two nsSNPs such as G60Vand W15R in HBA1 are highly deleterious. These 2 pathogenic nsSNPs can be considered for wet-lab confirmatory analysis. PMID:26824843

  5. Mapping Insertions, Deletions and SNPs on Venter's Chromosomes

    PubMed Central

    Costantini, Maria; Bernardi, Giorgio

    2009-01-01

    Background The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. Results We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. Conclusions Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations. PMID:19543403

  6. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    PubMed Central

    2013-01-01

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while allowing timely incorporation of empirical data (e.g., annual forest inventory). PMID:23305341

  7. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage.

    PubMed

    Wilson, Barry Tyler; Woodall, Christopher W; Griffith, Douglas M

    2013-01-01

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.'s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while allowing timely incorporation of empirical data (e.g., annual forest inventory). PMID:23305341

  8. Comparison of SNPs and microsatellites in identifying offtypes of cacao clones from Cameroon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single Nucleotide Polymorphism (SNP) markers are increasingly being used in crop breeding programs, slowly replacing microsatellites and other markers. SNPs provide many benefits over microsatellites, including ease of analysis and unambiguous results across various platforms. We compare SNPs to m...

  9. Water Filters

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The Aquaspace H2OME Guardian Water Filter, available through Western Water International, Inc., reduces lead in water supplies. The filter is mounted on the faucet and the filter cartridge is placed in the "dead space" between sink and wall. This filter is one of several new filtration devices using the Aquaspace compound filter media, which combines company developed and NASA technology. Aquaspace filters are used in industrial, commercial, residential, and recreational environments as well as by developing nations where water is highly contaminated.

  10. Accounting for Dependence Induced by Weighted KNN Imputation in Paired Samples, Motivated by a Colorectal Cancer Study

    PubMed Central

    Suyundikov, Anvar; Stevens, John R.; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K.; Slattery, Martha L.

    2015-01-01

    Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control. PMID:25849489

  11. Accounting for dependence induced by weighted KNN imputation in paired samples, motivated by a colorectal cancer study.

    PubMed

    Suyundikov, Anvar; Stevens, John R; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K; Slattery, Martha L

    2015-01-01

    Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control. PMID:25849489

  12. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    PubMed Central

    Jattawa, Danai; Elzo, Mauricio A.; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-01-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  13. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population.

    PubMed

    Jattawa, Danai; Elzo, Mauricio A; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-04-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information. PMID:26949946

  14. SNPs Selection using Gravitational Search Algorithm and Exhaustive Search for Association Mapping

    NASA Astrophysics Data System (ADS)

    Kusuma, W. A.; Hasibuan, L. S.; Istiadi, M. A.

    2016-01-01

    Single Nucleotide Polymorphisms (SNPs) are known having association to phenotipic variations. The study of linking SNPs to interest phenotype is refer to Association Mapping (AM), which is classified as a combinatorial problem. Exhaustive Search (ES) approach is able to be implemented to select targeted SNPs exactly since it evaluate all possible combinations of SNPs, but it is not efficient in terms of computer resources and computation time. Heuristic Search (HS) approach is an alternative to improve the performance of ES in those terms, but it still suffers high false positive SNPs in each combinations. Gravitational Search Algorithm (GSA) is a new HS algorithm that yields better performance than other nature inspired HS. This paper proposed a new method which combined GSA and ES to identify the most appropriate combination of SNPs linked to interest phenotype. Testing was conducted using dataset without epistasis and dataset with epistasis. Using dataset without epistasis with 7 targeted SNPs, the proposed method identified 7 SNPs - 6 True Positive (TP) SNPs and 1 False Positive (FP) SNP- with association value of 0.83. In addition, the proposed method could identified 3 SNPs- 2 TP SNP and 1 FP SNP with association value of 0.87 by using dataset with epistases and 5 targeted SNPs. The results showed that the method is robust in reducing redundant SNPs and identifying main markers.

  15. Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of The Netherlands'.

    PubMed

    Deelen, Patrick; Menelaou, Androniki; van Leeuwen, Elisabeth M; Kanterakis, Alexandros; van Dijk, Freerk; Medina-Gomez, Carolina; Francioli, Laurent C; Hottenga, Jouke Jan; Karssen, Lennart C; Estrada, Karol; Kreiner-Møller, Eskil; Rivadeneira, Fernando; van Setten, Jessica; Gutierrez-Achury, Javier; Westra, Harm-Jan; Franke, Lude; van Enckevort, David; Dijkstra, Martijn; Byelas, Heorhiy; van Duijn, Cornelia M; de Bakker, Paul I W; Wijmenga, Cisca; Swertz, Morris A

    2014-11-01

    Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results. PMID:24896149

  16. Biological Filters.

    ERIC Educational Resources Information Center

    Klemetson, S. L.

    1978-01-01

    Presents the 1978 literature review of wastewater treatment. The review is concerned with biological filters, and it covers: (1) trickling filters; (2) rotating biological contractors; and (3) miscellaneous reactors. A list of 14 references is also presented. (HM)

  17. Bayesian multiple imputation for missing multivariate longitudinal data from a Parkinson's disease clinical trial.

    PubMed

    Luo, Sheng; Lawson, Andrew B; He, Bo; Elm, Jordan J; Tilley, Barbara C

    2016-04-01

    In Parkinson's disease (PD) clinical trials, Parkinson's disease is studied using multiple outcomes of various types (e.g. binary, ordinal, continuous) collected repeatedly over time. The overall treatment effects across all outcomes can be evaluated based on a global test statistic. However, missing data occur in outcomes for many reasons, e.g. dropout, death, etc., and need to be imputed in order to conduct an intent-to-treat analysis. We propose a Bayesian method based on item response theory to perform multiple imputation while accounting for multiple sources of correlation. Sensitivity analysis is performed under various scenarios. Our simulation results indicate that the proposed method outperforms standard methods such as last observation carried forward and separate random effects model for each outcome. Our method is motivated by and applied to a Parkinson's disease clinical trial. The proposed method can be broadly applied to longitudinal studies with multiple outcomes subject to missingness. PMID:23242384

  18. Comparison of multiple imputation and complete-case in a simulated longitudinal data with missing covariate

    NASA Astrophysics Data System (ADS)

    Yoke, Chin Wan; Khalid, Zarina Mohd

    2014-07-01

    Along a continual process of collecting data, missing recorded datum always a main problem faced by the real application. It happens due to the carelessness or the unawareness of a recorder to the importance of data documentation. In this study, a random-effects analysis which simulates data from a proposed algorithm is presented with a missing covariate. It is an improved simulation method which involves first-order autoregressive (AR(1)) process in measuring the correlation between measurements of a subject across two time sequence. Complete-case analysis and multiple imputation method are comparatively implemented for the estimation procedure. This study shows that the multiple imputation method results in estimations which fit well to the data which are not only missing completely at random (MCAR) but also missing at random (MAR). However, the complete-case analysis results in estimators which fit well to the data which are only MCAR.

  19. Integration of multiethnic fine-mapping and genomic annotation to prioritize candidate functional SNPs at prostate cancer susceptibility regions.

    PubMed

    Han, Ying; Hazelett, Dennis J; Wiklund, Fredrik; Schumacher, Fredrick R; Stram, Daniel O; Berndt, Sonja I; Wang, Zhaoming; Rand, Kristin A; Hoover, Robert N; Machiela, Mitchell J; Yeager, Merideth; Burdette, Laurie; Chung, Charles C; Hutchinson, Amy; Yu, Kai; Xu, Jianfeng; Travis, Ruth C; Key, Timothy J; Siddiq, Afshan; Canzian, Federico; Takahashi, Atsushi; Kubo, Michiaki; Stanford, Janet L; Kolb, Suzanne; Gapstur, Susan M; Diver, W Ryan; Stevens, Victoria L; Strom, Sara S; Pettaway, Curtis A; Al Olama, Ali Amin; Kote-Jarai, Zsofia; Eeles, Rosalind A; Yeboah, Edward D; Tettey, Yao; Biritwum, Richard B; Adjei, Andrew A; Tay, Evelyn; Truelove, Ann; Niwa, Shelley; Chokkalingam, Anand P; Isaacs, William B; Chen, Constance; Lindstrom, Sara; Le Marchand, Loic; Giovannucci, Edward L; Pomerantz, Mark; Long, Henry; Li, Fugen; Ma, Jing; Stampfer, Meir; John, Esther M; Ingles, Sue A; Kittles, Rick A; Murphy, Adam B; Blot, William J; Signorello, Lisa B; Zheng, Wei; Albanes, Demetrius; Virtamo, Jarmo; Weinstein, Stephanie; Nemesure, Barbara; Carpten, John; Leske, M Cristina; Wu, Suh-Yuh; Hennis, Anselm J M; Rybicki, Benjamin A; Neslund-Dudas, Christine; Hsing, Ann W; Chu, Lisa; Goodman, Phyllis J; Klein, Eric A; Zheng, S Lilly; Witte, John S; Casey, Graham; Riboli, Elio; Li, Qiyuan; Freedman, Matthew L; Hunter, David J; Gronberg, Henrik; Cook, Michael B; Nakagawa, Hidewaki; Kraft, Peter; Chanock, Stephen J; Easton, Douglas F; Henderson, Brian E; Coetzee, Gerhard A; Conti, David V; Haiman, Christopher A

    2015-10-01

    Interpretation of biological mechanisms underlying genetic risk associations for prostate cancer is complicated by the relatively large number of risk variants (n = 100) and the thousands of surrogate SNPs in linkage disequilibrium. Here, we combined three distinct approaches: multiethnic fine-mapping, putative functional annotation (based upon epigenetic data and genome-encoded features), and expression quantitative trait loci (eQTL) analyses, in an attempt to reduce this complexity. We examined 67 risk regions using genotyping and imputation-based fine-mapping in populations of European (cases/controls: 8600/6946), African (cases/controls: 5327/5136), Japanese (cases/controls: 2563/4391) and Latino (cases/controls: 1034/1046) ancestry. Markers at 55 regions passed a region-specific significance threshold (P-value cutoff range: 3.9 × 10(-4)-5.6 × 10(-3)) and in 30 regions we identified markers that were more significantly associated with risk than the previously reported variants in the multiethnic sample. Novel secondary signals (P < 5.0 × 10(-6)) were also detected in two regions (rs13062436/3q21 and rs17181170/3p12). Among 666 variants in the 55 regions with P-values within one order of magnitude of the most-associated marker, 193 variants (29%) in 48 regions overlapped with epigenetic or other putative functional marks. In 11 of the 55 regions, cis-eQTLs were detected with nearby genes. For 12 of the 55 regions (22%), the most significant region-specific, prostate-cancer associated variant represented the strongest candidate functional variant based on our annotations; the number of regions increased to 20 (36%) and 27 (49%) when examining the 2 and 3 most significantly associated variants in each region, respectively. These results have prioritized subsets of candidate variants for downstream functional evaluation. PMID:26162851

  20. Metallic Filters

    NASA Technical Reports Server (NTRS)

    1985-01-01

    Filtration technology originated in a mid 1960's NASA study. The results were distributed to the filter industry, an HR Textron responded, using the study as a departure for the development of 421 Filter Media. The HR system is composed of ultrafine steel fibers metallurgically bonded and compressed so that the pore structure is locked in place. The filters are used to filter polyesters, plastics, to remove hydrocarbon streams, etc. Several major companies use the product in chemical applications, pollution control, etc.

  1. Smoothing filters

    NASA Technical Reports Server (NTRS)

    Lear, W. H.

    1980-01-01

    The improvement of accuracy in using the smoothing filter instead of the Kalman filter is discussed. Factors of improvement for velocity errors of up to four are shown for position measurements. Smoothing equations are presented, and it is shown that smoothing equations for the smoothing filter appear to be stable.

  2. Water Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    A compact, lightweight electrolytic water filter generates silver ions in concentrations of 50 to 100 parts per billion in the water flow system. Silver ions serve as effective bactericide/deodorizers. Ray Ward requested and received from NASA a technical information package on the Shuttle filter, and used it as basis for his own initial development, a home use filter.

  3. FILTER TREATMENT

    DOEpatents

    Sutton, J.B.; Torrey, J.V.P.

    1958-08-26

    A process is described for reconditioning fused alumina filters which have become clogged by the accretion of bismuth phosphate in the filter pores, The method consists in contacting such filters with faming sulfuric acid, and maintaining such contact for a substantial period of time.

  4. Normalization and missing value imputation for label-free LC-MS analysis

    SciTech Connect

    Karpievitch, Yuliya; Dabney, Alan R.; Smith, Richard D.

    2012-11-05

    Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

  5. The Use of Multiple Imputation for Data Subject to Limits of Detection

    PubMed Central

    Harel, Ofer; Perkins, Neil; Schisterman, Enrique F.

    2016-01-01

    Missing data due to limit of detection and limit of quantification is a common obstacle in epidemiological and biomedical research. We are interested in methodologies that provide unbiased and efficient estimates of these missing data while using popular statistical software. We describe a multiple imputation (MI) procedure for cross-sectional and longitudinal data which examines the sources of variation of hormones levels throughout the menstrual cycle conditional on specific biomarkers. We describe the rational, procedure, advantages and disadvantages of the multiple imputation procedure. We also provide a comparison to commonly used missing data procedures (complete cases analysis and single imputation). We illustrate our approach using the BioCycle data where we are interested in the effects of Vitamin E and Beta-carotene on Progesterone levels. We also evaluate the longitudinal impact of changes in Vitamin E on Progesterone levels over time. Finaly, we demonstrate the advantages of using MI over complete case analysis or naive single replacement in both cross-sectional and longitudinal analysis where measurements below the limit of quantification (LOQ) are unreported. We also illustrate that if available, inclusion of potentially demined unreliable data below the limit of detection (LOD) improves simple estimation substantially.

  6. Imputation of missing covariate values in epigenome-wide analysis of DNA methylation data

    PubMed Central

    Wu, Chong; Demerath, Ellen W.; Pankow, James S.; Bressler, Jan; Fornage, Myriam; Grove, Megan L.; Chen, Wei; Guan, Weihua

    2016-01-01

    ABSTRACT DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed. PMID:26890800

  7. Missing data imputation of solar radiation data under different atmospheric conditions.

    PubMed

    Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  8. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    PubMed Central

    Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  9. Imputation of missing covariate values in epigenome-wide analysis of DNA methylation data.

    PubMed

    Wu, Chong; Demerath, Ellen W; Pankow, James S; Bressler, Jan; Fornage, Myriam; Grove, Megan L; Chen, Wei; Guan, Weihua

    2016-02-01

    DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed. PMID:26890800

  10. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti

    PubMed Central

    2014-01-01

    Background Genetic markers are widely used to understand the biology and population dynamics of disease vectors, but often markers are limited in the resolution they provide. In particular, the delineation of population structure, fine scale movement and patterns of relatedness are often obscured unless numerous markers are available. To address this issue in the major arbovirus vector, the yellow fever mosquito (Aedes aegypti), we used double digest Restriction-site Associated DNA (ddRAD) sequencing for the discovery of genome-wide single nucleotide polymorphisms (SNPs). We aimed to characterize the new SNP set and to test the resolution against previously described microsatellite markers in detecting broad and fine-scale genetic patterns in Ae. aegypti. Results We developed bioinformatics tools that support the customization of restriction enzyme-based protocols for SNP discovery. We showed that our approach for RAD library construction achieves unbiased genome representation that reflects true evolutionary processes. In Ae. aegypti samples from three continents we identified more than 18,000 putative SNPs. They were widely distributed across the three Ae. aegypti chromosomes, with 47.9% found in intergenic regions and 17.8% in exons of over 2,300 genes. Pattern of their imputed effects in ORFs and UTRs were consistent with those found in a recent transcriptome study. We demonstrated that individual mosquitoes from Indonesia, Australia, Vietnam and Brazil can be assigned with a very high degree of confidence to their region of origin using a large SNP panel. We also showed that familial relatedness of samples from a 0.4 km2 area could be confidently established with a subset of SNPs. Conclusions Using a cost-effective customized RAD sequencing approach supported by our bioinformatics tools, we characterized over 18,000 SNPs in field samples of the dengue fever mosquito Ae. aegypti. The variants were annotated and positioned onto the three Ae. aegypti chromosomes. The new SNP set provided much greater resolution in detecting population structure and estimating fine-scale relatedness than a set of polymorphic microsatellites. RAD-based markers demonstrate great potential to advance our understanding of mosquito population processes, critical for implementing new control measures against this major disease vector. PMID:24726019

  11. Genetic profile of SNP(s) and ovulation induction.

    PubMed

    Loutradis, D; Theofanakis, Ch; Anagnostou, E; Mavrogianni, D; Partsinevelos, G A

    2012-03-01

    Obtaining an adequate number of good quality oocytes while minimizing adverse drug reactions (ADRs) and cycle cancellation rates is considered the gold standard in controlled ovarian hyperstimulation (COH) for fertility treatment. Patients who undergo IVF/ICSI cycles tend to present with different responses to exogenous gonadotrophin administration. Research has shown that the secret probably lies in the various single nucleotide polymorhisms (SNPs) in their receptor genes. The decryption of human genome provided specialists with additional information in assessing and even predicting ovarian response to COH. In this context, the study of Pharmacogenomics, Pharmacogenetics and SNPs unravels as a promising field in optimizing fertility treatment. Several SNPs in FSH and estrogen receptor genes have been detected so far, but only three of them, one in FSH receptor and two in estrogen receptor genes have been associated with ovarian response to COH. It seems that the Asn/Ser variant of the FSH receptor functions more efficiently, while the Ser/Ser and Asn/Asn variants have a tendency to resist to FSH stimulation. With regards to estrogen receptor 1 (ESR1), the Pvull and the Xbal polymorphisms seem to be associated with differences in the response to ovarian stimulation, while the Rsal polymorphism in estrogen receptor 2 (ESR2) is currently under investigation. There exists evidence supporting the hypothesis that a set of genes, all related to the FSH hormone mechanism of action, may participate along with other factors to the control of ovarian response to FSH, thus a cautious interpretation of polymorphism detection results is considered mandatory. However, identifying potential genetic markers that could predict ovarian response and implementing them in routine screening tests for every woman entering an IVF/ICSI cycle, would be able to tailor fertility treatment to each patients needs thus maximizing the success rate and eliminating potential side-effects of fertility drugs. PMID:21657995

  12. Genotyping of Brucella species using clade specific SNPs

    PubMed Central

    2012-01-01

    Background Brucellosis is a worldwide disease of mammals caused by Alphaproteobacteria in the genus Brucella. The genus is genetically monomorphic, requiring extensive genotyping to differentiate isolates. We utilized two different genotyping strategies to characterize isolates. First, we developed a microarray-based assay based on 1000 single nucleotide polymorphisms (SNPs) that were identified from whole genome comparisons of two B. abortus isolates , one B. melitensis, and one B. suis. We then genotyped a diverse collection of 85 Brucella strains at these SNP loci and generated a phylogenetic tree of relationships. Second, we developed a selective primer-extension assay system using capillary electrophoresis that targeted 17 high value SNPs across 8 major branches of the phylogeny and determined their genotypes in a large collection ( n?=?340) of diverse isolates. Results Our 1000 SNP microarray readily distinguished B. abortus, B. melitensis, and B. suis, differentiating B. melitensis and B. suis into two clades each. Brucella abortus was divided into four major clades. Our capillary-based SNP genotyping confirmed all major branches from the microarray assay and assigned all samples to defined lineages. Isolates from these lineages and closely related isolates, among the most commonly encountered lineages worldwide, can now be quickly and easily identified and genetically characterized. Conclusions We have identified clade-specific SNPs in Brucella that can be used for rapid assignment into major groups below the species level in the three main Brucella species. Our assays represent SNP genotyping approaches that can reliably determine the evolutionary relationships of bacterial isolates without the need for whole genome sequencing of all isolates. PMID:22712667

  13. Molecular Beacon CNT-based Detection of SNPs

    NASA Astrophysics Data System (ADS)

    Egorova, V. P.; Krylova, H. V.; Lipnevich, I. V.; Veligura, A. A.; Shulitsky, B. G.; Y Fedotenkova, L.

    2015-11-01

    An fluorescence quenching effect due to few-walled carbon nanotubes chemically modified by carboxyl groups has been utilized to discriminate Single Nucleotide Polymorphism (SNP). It was shown that the complex obtained from these nanotube and singlestranded primer DNA is formed due to stacking interactions between the hexagons of the nanotubes and aromatic rings of nucleotide bases as well as due to establishing of hydrogen bonds between acceptor amine groups of nucleotide bases and donor carboxyl groups of the nanotubes. It has been demonstrated that these complexes may be used to make highly effective DNA biosensors detecting SNPs which operate as molecular beacons.

  14. Cd14 SNPs Regulate the Innate Immune Response

    PubMed Central

    Liu, Hong-Hsing; Hu, Yajing; Zheng, Ming; Suhoski, Megan M.; Engleman, Edgar G.; Dill, David; Hudnall, Matt; Wang, Jianmei; Spolski, Rosanne; Leonard, Warren J.; Peltz, Gary

    2012-01-01

    CD14 is a monocytic differentiation antigen that regulates innate immune responses to pathogens. Here, we show that murine Cd14 SNPs regulate the length of Cd14 mRNA and CD14 protein translation efficiency, and consequently the basal level of soluble CD14 (sCD14) and type I IFN production by murine macrophages. This has substantial downstream consequences for the innate immune response; the level of expression of at least 40 IFN-responsive murine genes was altered by this mechanism. We also observed that there was substantial variation in the length of human CD14 mRNAs and in their translation efficiency. sCD14 increased cytokine production by human dendritic cells (DCs), and sCD14-primed DCs augmented human CD4 T cell proliferation. These findings may provide a mechanism for exploring the complex relationship between CD14 SNPs, serum sCD14 levels, and susceptibility to human infectious and allergic diseases. PMID:22445606

  15. Filtering apparatus

    DOEpatents

    Haldipur, Gaurang B.; Dilmore, William J.

    1992-01-01

    A vertical vessel having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas.

  16. Filtering apparatus

    DOEpatents

    Haldipur, G.B.; Dilmore, W.J.

    1992-09-01

    A vertical vessel is described having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas. 18 figs.

  17. Detection of Regulatory SNPs in Human Genome Using ChIP-seq ENCODE Data

    PubMed Central

    Matveeva, Marina Yu.; Shilov, Alexander G.; Kashina, Elena V.; Mordvinov, Viatcheslav A.; Merkulova, Tatyana I.

    2013-01-01

    A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project. PMID:24205329

  18. Chemical derivatization of compact disc polycarbonate surfaces for SNPs detection.

    PubMed

    Bañuls, María-José; García-Piñón, Francisco; Puchades, Rosa; Maquieira, Angel

    2008-03-01

    Compact discs have been proposed as an efficient analytical platform, with potential to develop high-throughput affinity assays for genomics, proteomics, clinics, and health monitoring. Chemical derivatization of CD surfaces is one of the keys to developing highly efficient microarraying-based assays on discs. Approaches for mild chemical modification of polycarbonate (PC) disc surface based on nitration, reduction, and chloromethylation reactions have been developed. Derivatized surfaces as amino and thiol are obtained for PC, maintaining unchanged the mechanical and optical properties of the discs. Studies of covalent attachment of oligonucleotide probes (5' Cy5-labeled, 3' NH 2-ended) on the modified surfaces have been performed to develop microarraying assays based on hybridization of cDNA strands and single nucleotide polymorphism discrimination (SNPs). A demonstration of the applicability to the compact disc audio/video technology for its use as analytical system is performed, including the employment of a commercial CD player to read the results on disc. PMID:18254580

  19. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods

    PubMed Central

    Shara, Nawar; Yassin, Sayf A.; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V.; Wang, Wenyu; Lee, Elisa T.; Umans, Jason G.

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989–1991), 2 (1993–1995), and 3 (1998–1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results. PMID:26414328

  20. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    PubMed

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context. PMID:26906401

  1. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

    PubMed

    Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J Brent; Durbin, Richard; Timpson, Nicholas J; Marchini, Jonathan; Soranzo, Nicole

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants. PMID:26368830

  2. The operating regimes and basic control principles of SNPS Topaz''. [Cs

    SciTech Connect

    Makarov, A.N.; Volberg, M.S.; Grayznov, G.M.; Zhabotinsky, E.E.; Serbin, V.I. )

    1991-01-05

    The basic operating regimes of space nuclear power system (SNPS) Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

  3. Tools, resources and databases for SNPs and indels in sequences: a review.

    PubMed

    Seal, Abhik; Gupta, Arun; Mahalaxmi, M; Aykkal, Riju; Singh, Tiratha Raj; Arunachalam, Vadivel

    2014-01-01

    Single Nucleotide Polymorphism (SNP) is a mutation where, a single base in the DNA differs from the usual base at that position. SNPs are the marker of choice in genetic analysis and also useful in locating genes associated with diseases. SNPs are important and frequently occurring point mutations in genomes and have many practical implications. In silico methods are easy to study the SNPs that are occurring in known genomes or sequences of a species of interest during the post genomic era. There are many on-line and stand alone tools to analyse the SNPs. We intend to guide the reader with the software details such as algorithmic background, file requirements, operating system specificity and species specificity, if any, for the tools of SNPs detection in plants and animals. We also list many databases and resources available today to describe SNPs in wide range of organisms. PMID:24794070

  4. Database filters

    SciTech Connect

    Pramanik, S.

    1982-01-01

    Several hardware database-searchers for a large number of patterns or keys are presented. These searchers can be implemented by a random access memory and are suitable for VLSI implementation. Application of these searchers as database filters is described; a filter detects all the matched records in the database, as well as a few others. The percentage of unmatched records can be reduced to any arbitrary minimum value by using several filters together, or passing the output records repeatedly through the same filters. The performance of the filters using the iterative approach depends very much on the regrouping algorithms of the patterns/keys. Several such algorithms are presented and their performances compared. A single pass is required if they are pipelined. Hardware organisation for different pipelined approaches are also studied. Experiments are performed for all the different hardware organisations mentioned above on an employee-name database. 25 references.

  5. Imputation of Continuous Tree Suitability over the Continental United States from Sparse Measurements Using Associative Clustering

    NASA Astrophysics Data System (ADS)

    Hargrove, W. W.; Kumar, J.; Hoffman, F. M.; Potter, K. M.; Mills, R. T.

    2012-12-01

    Up-scaling from sparse measurements to a continuous raster of estimated values is a common problem in Earth System Science. We present a new general-purpose empirical imputation method based on associative clustering, which associates sparse measurements of dependent variables with particular multivariate clustered combinations of the independent variables, and then uses several methods to estimate values for unmeasured clusters, based on directional proximity in multidimensional data space, at both the cluster and map cell levels of resolution. We demonstrate this new imputation tool on tree species range distribution maps, which describe the suitable extent and expected growth performance of a particular tree species over a wide area. Range maps having continuous estimates of tree growth performance are more useful than more classical tree range maps that simply show binary occurence suitability. The USDA Forest Service Forest Inventory Assessment (FIA) plots provide information about the occurence and growth performance for various tree species across the US, but such measurements are limited to FIA plots. Using Associative Clustering, we scale up the discontinuous FIA Inventory growth measurements into continuous maps that show the expected growth and suitabilty for individual tree species covering the Continental United States. A multivariate cluster analysis was applied to global output from a General Circulation Model (GCM) consisting of 17 variables downscaled to 4km2 resolution. Present global growing conditions were divided into 30 thousand relatively homogeneous ecoregions describing climatic and topographic conditions. At every mapcell a multi-linear regression was applied in 17 dimensional hyperspace to derive the suitability of a tree species where not measured using the forest inventory data. The continuous species distribution maps obtained were compared and validated against existing tree range suitability maps. Associative Clustering is intended to be a general-purpose imputation tool, is model-free, and can be used to derive tree growth for future conditions that have no present-day analog.

  6. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.

  7. PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

    PubMed Central

    Paschou, Peristera; Ziv, Elad; Burchard, Esteban G; Choudhry, Shweta; Rodriguez-Cintron, William; Mahoney, Michael W; Drineas, Petros

    2007-01-01

    Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population substructure, stratification assessment in multi-stage whole-genome association studies, and the study of demographic history in human populations. PMID:17892327

  8. Prediction and experimental characterization of nsSNPs altering human PDZ-binding motifs.

    PubMed

    Gfeller, David; Ernst, Andreas; Jarvik, Nick; Sidhu, Sachdev S; Bader, Gary D

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are a major contributor to genetic and phenotypic variation within populations. Non-synonymous SNPs (nsSNPs) modify the sequence of proteins and can affect their folding or binding properties. Experimental analysis of all nsSNPs is currently unfeasible and therefore computational predictions of the molecular effect of nsSNPs are helpful to guide experimental investigations. While some nsSNPs can be accurately characterized, for instance if they fall into strongly conserved or well annotated regions, the molecular consequences of many others are more challenging to predict. In particular, nsSNPs affecting less structured, and often less conserved regions, are difficult to characterize. Binding sites that mediate protein-protein or other protein interactions are an important class of functional sites on proteins and can be used to help interpret nsSNPs. Binding sites targeted by the PDZ modular peptide recognition domain have recently been characterized. Here we use this data to show that it is possible to computationally identify nsSNPs in PDZ binding motifs that modify or prevent binding to the proteins containing the motifs. We confirm these predictions by experimentally validating a selected subset with ELISA. Our work also highlights the importance of better characterizing linear motifs in proteins as many of these can be affected by genetic variations. PMID:24722214

  9. Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study

    PubMed Central

    2010-01-01

    Background The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. Methods Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. Results CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. Conclusions Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness. PMID:21194416

  10. Short communication: Validation of 4 candidate causative trait variants in 2 cattle breeds using targeted sequence imputation.

    PubMed

    Pausch, Hubert; Wurmser, Christine; Reinhardt, Friedrich; Emmerling, Reiner; Fries, Ruedi

    2015-06-01

    Most association studies for pinpointing trait-associated variants are performed within breed. The availability of sequence data from key ancestors of several cattle breeds now enables immediate assessment of the frequency of trait-associated variants in populations different from the mapping population and their imputation into large validation populations. The objective of this study was to validate the effects of 4 putatively causative variants on milk production traits, male fertility, and stature in German Fleckvieh and Holstein-Friesian animals using targeted sequence imputation. We used whole-genome sequence data of 456 animals to impute 4 missense mutations in DGAT1, GHR, PRLR, and PROP1 into 10,363 Fleckvieh and 8,812 Holstein animals. The accuracy of the imputed genotypes exceeded 95% for all variants. Association testing with imputed variants revealed consistent antagonistic effects of the DGAT1 p.A232K and GHR p.F279Y variants on milk yield and protein and fat contents, respectively, in both breeds. The allele frequency of both polymorphisms has changed considerably in the past 20 yr, indicating that they were targets of recent selection for milk production traits. The PRLR p.S18N variant was associated with yield traits in Fleckvieh but not in Holstein, suggesting that it may be in linkage disequilibrium with a mutation affecting yield traits rather than being causal. The reported effects of the PROP1 p.H173R variant on milk production, male fertility, and stature could not be confirmed. Our results demonstrate that population-wide imputation of candidate causal variants from sequence data is feasible, enabling their rapid validation in large independent populations. PMID:25892690

  11. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    PubMed Central

    Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

    2008-01-01

    Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity, while neighbour-based methods (KNN, OLS, LSA, LLS) performed better in data with higher complexity. We also found that the EBS and STS schemes serve as complementary and effective tools for selecting the optimal imputation algorithm. PMID:18186917

  12. Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors

    NASA Astrophysics Data System (ADS)

    Mardani, Morteza; Mateos, Gonzalo; Giannakis, Georgios B.

    2015-05-01

    Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with `Big Data' analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for {\\em real-time} processing of streaming data pose major challenges to this end. In this context, the present paper permeates benefits from rank minimization to scalable imputation of missing data, via tracking low-dimensional subspaces and unraveling latent (possibly multi-way) structure from \\emph{incomplete streaming} data. For low-rank matrix data, a subspace estimator is proposed based on an exponentially-weighted least-squares criterion regularized with the nuclear norm. After recasting the non-separable nuclear norm into a form amenable to online optimization, real-time algorithms with complementary strengths are developed and their convergence is established under simplifying technical assumptions. In a stationary setting, the asymptotic estimates obtained offer the well-documented performance guarantees of the {\\em batch} nuclear-norm regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm is developed to obtain multi-way decompositions of \\emph{low-rank tensors} with missing entries, and perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed algorithms, and their superior performance relative to state-of-the-art alternatives.

  13. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

    PubMed Central

    2012-01-01

    Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project. PMID:23110314

  14. Collaborative development of SNPs for cotton research, introgression, MAS and breeding

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Extensive use of genome-wide analyses requires that molecular markers be highly abundant, informative and, once developed, extremely cost-effective to use, such as single-nucleotide polymorphisms (SNPs). The efforts toward development of cotton SNPs have been few and small-scale. The novel cotton ...

  15. Scoring the collective effects of SNPs: association of minor alleles with complex traits in model organisms.

    PubMed

    Yuan, DeJian; Zhu, ZuoBin; Tan, XiaoHua; Liang, Jie; Zeng, Chen; Zhang, JieGen; Chen, Jun; Ma, Long; Dogan, Ayca; Brockmann, Gudrun; Goldmann, Oliver; Medina, Eva; Rice, Amanda D; Moyer, Richard W; Man, Xian; Yi, Ke; Li, YanKe; Lu, Qing; Huang, YiMin; Huang, Shi

    2014-09-01

    It has long been assumed that most parts of a genome and most genetic variations or SNPs are non-functional with regard to reproductive fitness. However, the collective effects of SNPs have yet to be examined by experimental science. We here developed a novel approach to examine the relationship between traits and the total amount of SNPs in panels of genetic reference populations. We identified the minor alleles (MAs) in each panel and the MA content (MAC) that each inbred strain carried for a set of SNPs with genotypes determined in these panels. MAC was nearly linearly linked to quantitative variations in numerous traits in model organisms, including life span, tumor susceptibility, learning and memory, sensitivity to alcohol and anti-psychotic drugs, and two correlated traits poor reproductive fitness and strong immunity. These results suggest that the collective effects of SNPs are functional and do affect reproductive fitness. PMID:25104319

  16. Discovery and evaluation of single nucleotide polymorphisms (SNPs) for Haliotis midae: a targeted EST approach.

    PubMed

    Bester, A E; Roodt-Wilding, R; Whitaker, H A

    2008-06-01

    In this study, we describe the first set of SNP markers for the South African abalone, Haliotis midae. A cDNA library was constructed from which ESTs were selected for the screening of SNPs. The observed frequency of SNPs in this species was estimated at one every 185 bp. When characterized in wild-caught abalone, the minor allele frequencies and F(ST) estimates for every SNP indicated that these markers may potentially be useful for population analysis, parentage assignment and linkage mapping in Haliotis midae. No linkage disequilibrium was observed between SNPs originating from different EST sequences. These SNPs, together with additional SNPs currently being developed, will provide a useful complementary set of markers to the currently available genetic markers in abalone. PMID:18454808

  17. Common SNPs explain a large proportion of heritability for human height

    PubMed Central

    Yang, Jian; Benyamin, Beben; McEvoy, Brian P; Gordon, Scott; Henders, Anjali K; Nyholt, Dale R; Madden, Pamela A; Heath, Andrew C; Martin, Nicholas G; Montgomery, Grant W; Goddard, Michael E; Visscher, Peter M

    2011-01-01

    Single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method by simulations based upon the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency (MAF) than the SNPs explored to date. PMID:20562875

  18. Thermal state of SNPS Topaz'' units: Calculation basing and experimental confirmation

    SciTech Connect

    Bogush, I.P.; Bushinsky, A.V.; Galkin, A.Y.; Serbin, V.I.; Zhabotinsky, E.E. )

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS Topaz''.

  19. Thermal state of SNPS ``Topaz'' units: Calculation basing and experimental confirmation

    NASA Astrophysics Data System (ADS)

    Bogush, Igor P.; Bushinsky, Alexander V.; Galkin, Anatoly Ya.; Serbin, Victor I.; Zhabotinsky, Evgeny E.

    1991-01-01

    The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS ``Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS ``Topaz''.

  20. RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera)

    PubMed Central

    Yang, Mei; Xu, Liming; Liu, Yanling; Yang, Pingfang

    2015-01-01

    RNA-Seq is an efficient way to comprehensively identify single nucleotide polymorphisms (SNPs) and alternative splicing (AS) events from the expressed genes. In this study, we conducted transcriptome sequencing of four Asian lotus (Nelumbo nucifera) cultivars using Illumina HiSeq2000 platform to identify SNPs and AS events in lotus. A total of 505 million pair-end RNA-Seq reads were generated from four cultivars, of which 86% were mapped to the lotus reference genome. Using the four sets of data together, a total of 357,689 putative SNPs were identified with an average density of one SNP per 2.2 kb. These SNPs were located in 1,253 scaffolds and 15,016 expressed genes. A/G and C/T were the two major types of SNPs in the Asian lotus transcriptome. In parallel, a total of 177,540 AS events were detected in the four cultivars and were distributed in 64% of the expressed genes of lotus. The predominant type of AS events was alternative 5’ first exon, which accounted for 41.2% of all the observed AS events, and exon skipping only accounted for 4.3% of all AS. Gene Ontology analysis was conducted to analyze the function of the genes containing SNPs and AS events. Validation of selected SNPs and AS events revealed that 74% of SNPs and 80% of AS events were reliable, which indicates that RNA-Seq is an efficient approach to uncover gene-associated SNPs and AS events. A large number of SNPs and AS events identified in our study will facilitate further genetic and functional genomics research in lotus. PMID:25928215

  1. RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera).

    PubMed

    Yang, Mei; Xu, Liming; Liu, Yanling; Yang, Pingfang

    2015-01-01

    RNA-Seq is an efficient way to comprehensively identify single nucleotide polymorphisms (SNPs) and alternative splicing (AS) events from the expressed genes. In this study, we conducted transcriptome sequencing of four Asian lotus (Nelumbo nucifera) cultivars using Illumina HiSeq2000 platform to identify SNPs and AS events in lotus. A total of 505 million pair-end RNA-Seq reads were generated from four cultivars, of which 86% were mapped to the lotus reference genome. Using the four sets of data together, a total of 357,689 putative SNPs were identified with an average density of one SNP per 2.2 kb. These SNPs were located in 1,253 scaffolds and 15,016 expressed genes. A/G and C/T were the two major types of SNPs in the Asian lotus transcriptome. In parallel, a total of 177,540 AS events were detected in the four cultivars and were distributed in 64% of the expressed genes of lotus. The predominant type of AS events was alternative 5' first exon, which accounted for 41.2% of all the observed AS events, and exon skipping only accounted for 4.3% of all AS. Gene Ontology analysis was conducted to analyze the function of the genes containing SNPs and AS events. Validation of selected SNPs and AS events revealed that 74% of SNPs and 80% of AS events were reliable, which indicates that RNA-Seq is an efficient approach to uncover gene-associated SNPs and AS events. A large number of SNPs and AS events identified in our study will facilitate further genetic and functional genomics research in lotus. PMID:25928215

  2. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.

    PubMed

    Yang, Jian; Bakshi, Andrew; Zhu, Zhihong; Hemani, Gibran; Vinkhuyzen, Anna A E; Lee, Sang Hong; Robinson, Matthew R; Perry, John R B; Nolte, Ilja M; van Vliet-Ostaptchouk, Jana V; Snieder, Harold; Esko, Tonu; Milani, Lili; Mägi, Reedik; Metspalu, Andres; Hamsten, Anders; Magnusson, Patrik K E; Pedersen, Nancy L; Ingelsson, Erik; Soranzo, Nicole; Keller, Matthew C; Wray, Naomi R; Goddard, Michael E; Visscher, Peter M

    2015-10-01

    We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices. PMID:26323059

  3. 7 CFR 3017.630 - May the Department of Agriculture impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false May the Department of Agriculture impute conduct of one person to another? 3017.630 Section 3017.630 Agriculture Regulations of the Department of Agriculture (Continued) OFFICE OF THE CHIEF FINANCIAL OFFICER, DEPARTMENT OF AGRICULTURE...

  4. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    PubMed Central

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  5. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index

    PubMed Central

    Yang, Jian; Bakshi, Andrew; Zhu, Zhihong; Hemani, Gibran; Vinkhuyzen, Anna A.E.; Lee, Sang Hong; Robinson, Matthew R.; Perry, John R.B.; Nolte, Ilja M.; van Vliet-Ostaptchouk, Jana V.; Snieder, Harold; Esko, Tonu; Milani, Lili; Mägi, Reedik; Metspalu, Andres; Hamsten, Anders; Magnusson, Patrik K.E.; Pedersen, Nancy L.; Ingelsson, Erik; Soranzo, Nicole; Keller, Matthew C.; Wray, Naomi R.; Goddard, Michael E.; Visscher, Peter M.

    2015-01-01

    We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing (WGS) data. We demonstrate using simulations based on WGS data that ~97% and ~68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ~17M imputed variants explain 56% (s.e. = 2.3%) of variance for height and 27% (s.e. = 2.5%) for body mass index (BMI), and find evidence that height- and BMI-associated variants have been under natural selection. Considering imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60–70% for height and 30–40% for BMI. Therefore, missing heritability is small for both traits. For further gene discovery of complex traits, a design with SNP arrays followed by imputation is more cost-effective than WGS at current prices. PMID:26323059

  6. Cautions on the Use of Multiple Imputation When Selecting Between Latent Categorical versus Continuous Models for Psychological Constructs.

    PubMed

    Sterba, Sonya K

    2016-01-01

    Clinical psychology researchers studying adolescents and young adults long have been interested in characterizing the latent categorical (classes/profiles) versus continuous (factors) nature of psychological syndromes. To inform this debate, researchers sometimes compare the fit of finite mixture versus factor analysis models to symptom data. This study explains and evaluates how missing data handling methods can impact results of this important model fit comparison. Via simulation, we assess three missing data-handling methods previously recommended to researchers fitting these models: multiple imputation using a saturated multivariate normal imputation model, multiple imputation using a hypothesized model, or full information maximum likelihood using the EM algorithm (FIML-EM). Results show that, under certain conditions, the method used to handle missing data can interfere with clinical psychologists' ability to accurately discriminate latent classes from continua. For instance, certain imputation methods increase the chance of selecting latent continua when latent classes truly exist. FIML-EM performed best overall. Recommendations for practice are discussed. PMID:25491166

  7. Imputation of Test Scores in the National Education Longitudinal Study of 1988 (NELS:88). Working Paper Series.

    ERIC Educational Resources Information Center

    Bokossa, Maxime C.; Huang, Gary G.

    This report describes the imputation procedures used to deal with missing data in the National Education Longitudinal Study of 1988 (NELS:88), the only current National Center for Education Statistics (NCES) dataset that contains scores from cognitive tests given the same set of students at multiple time points. As is inevitable, cognitive test…

  8. 34 CFR 85.630 - May the Department of Education impute conduct of one person to another?

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 34 Education 1 2011-07-01 2011-07-01 false May the Department of Education impute conduct of one person to another? 85.630 Section 85.630 Education Office of the Secretary, Department of Education GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 85.630 May the Department...

  9. Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...

  10. 2 CFR 180.630 - May a Federal agency impute the conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 2 Grants and Agreements 1 2010-01-01 2010-01-01 false May a Federal agency impute the conduct of one person to another? 180.630 Section 180.630 Grants and Agreements OFFICE OF MANAGEMENT AND BUDGET GOVERNMENTWIDE GUIDANCE FOR GRANTS AND AGREEMENTS Reserved OMB GUIDELINES TO AGENCIES ON GOVERNMENTWIDE...

  11. High-throughput SNPs for all: genotyping-in-thousands.

    PubMed

    Pavey, Scott A

    2015-07-01

    Understanding the genetic structure of species is essential for conservation. It is only with this information that managers, academics, user groups and land-use planners can understand the spatial scale of migration and local adaptation, source-sink dynamics and effective population size. Such information is essential for a multitude of applications including delineating management units, balancing management priorities, discovering cryptic species and implementing captive breeding programmes. Species can range from locally adapted by hundreds of metres (Pavey et al. ) to complete species panmixia (Côté et al. ). Even more remarkable is that this essential information can be obtained without fully sequenced or annotated genomes, but from mere (putatively) nonfunctional variants. First with allozymes, then microsatellites and now SNPs, this neutral genetic variation carries a wealth of information about migration and drift. For many of us, it may be somewhat difficult to remember our understanding of species conservation before the widespread usage of these useful tools. However most species on earth have yet to give us that 'peek under the curtain'. With the current diversity on earth estimated to be nearly 9 million species (Mora et al. ), we have a long way to go for a comprehensive meta-phylogeographic understanding. A method presented in this issue by Campbell and colleagues (Campbell et al. ) is a tool that will accelerate the pace in this area. Genotyping-in-thousands (GT-seq) leverages recent advancements in sequencing technology to save many hours and dollars over previous methods to generate this important neutral genetic information. PMID:26095005

  12. Sigma Filter

    NASA Technical Reports Server (NTRS)

    Balgovind, R. C.

    1985-01-01

    The GLA Fourth-Order model is needed to smooth the topography. This is to remove the Gibbs phenomenon. The Gibbs phenomenon occurs whenever we truncate a Fourier Series. The Sigma factors were introduced to reduce the Gibbs phenomenon. It is found that the smooth Fourier series is nothing but the original Fourier series with its coefficients multiplied by corresponding sigma factors. This operator can be applied many times to obtain high order sigma filtered field and is easily applicable using FFT. It is found that this filter is beneficial in deriving the topography.

  13. Water Filters

    NASA Technical Reports Server (NTRS)

    1988-01-01

    Seeking to find a more effective method of filtering potable water that was highly contaminated, Mike Pedersen, founder of Western Water International, learned that NASA had conducted extensive research in methods of purifying water on board manned spacecraft. The key is Aquaspace Compound, a proprietary WWI formula that scientifically blends various types of glandular activated charcoal with other active and inert ingredients. Aquaspace systems remove some substances; chlorine, by atomic adsorption, other types of organic chemicals by mechanical filtration and still others by catalytic reaction. Aquaspace filters are finding wide acceptance in industrial, commercial, residential and recreational applications in the U.S. and abroad.

  14. 1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data.

    PubMed

    Huang, Jie; Ellinghaus, David; Franke, Andre; Howie, Bryan; Li, Yun

    2012-07-01

    We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two 'missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10(-16). The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, 'missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation. PMID:22293688

  15. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    PubMed Central

    2011-01-01

    Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs). By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/. PMID:21609440

  16. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    SciTech Connect

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.; Loots, Gabriela G.; Houston, Kathryn A.; Dubchak, Inna; Speed, Terence P.; Rubin, Edward M.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs in gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.

  17. Advanced statistics: missing data in clinical research--part 2: multiple imputation.

    PubMed

    Newgard, Craig D; Haukoos, Jason S

    2007-07-01

    In part 1 of this series, the authors describe the importance of incomplete data in clinical research, and provide a conceptual framework for handling incomplete data by describing typical mechanisms and patterns of censoring, and detailing a variety of relatively simple methods and their limitations. In part 2, the authors will explore multiple imputation (MI), a more sophisticated and valid method for handling incomplete data in clinical research. This article will provide a detailed conceptual framework for MI, comparative examples of MI versus naive methods for handling incomplete data (and how different methods may impact subsequent study results), plus a practical user's guide to implementing MI, including sample statistical software MI code and a deidentified precoded database for use with the sample code. PMID:17595237

  18. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    PubMed Central

    Artigas, María Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria; Viñuela, Ana; Völzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10−8) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  19. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation.

    PubMed

    Soler Artigas, María; Wain, Louise V; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R; Grallert, Harald; Hammond, Chris J; Harris, Sarah E; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W; Navarro, Pau; Nickle, David C; Padmanabhan, Sandosh; Raitakari, Olli T; Ried, Janina S; Ripatti, Samuli; Schulz, Holger; Scott, Robert A; Sin, Don D; Starr, John M; Viñuela, Ana; Völzke, Henry; Wild, Sarah H; Wright, Alan F; Zemunik, Tatijana; Jarvis, Deborah L; Spector, Tim D; Evans, David M; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J; Karrasch, Stefan; Probst-Hensch, Nicole M; Heinrich, Joachim; Stubbe, Beate; Wilson, James F; Wareham, Nicholas J; James, Alan L; Morris, Andrew P; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P; Hall, Ian P; Tobin, Martin D

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10(-8)) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  20. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    PubMed Central

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  1. Impute DC link (IDCL) cell based power converters and control thereof

    DOEpatents

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  2. Notch filter

    NASA Technical Reports Server (NTRS)

    Shelton, G. B. (Inventor)

    1977-01-01

    A notch filter for the selective attenuation of a narrow band of frequencies out of a larger band was developed. A helical resonator is connected to an input circuit and an output circuit through discrete and equal capacitors, and a resistor is connected between the input and the output circuits.

  3. Phosphorus Filter

    Tom Kehler, fishery biologist at the U.S. Fish and Wildlife Service's Northeast Fishery Center in Lamar, Pennsylvania, checks the flow rate of water leaving a phosphorus filter column. The USGS has pioneered a new use for acid mine drainage residuals that are currently a disposal challenge, usi...

  4. Analysis of partially observed clustered data using generalized estimating equations and multiple imputation

    PubMed Central

    Aloisio, Kathryn M.; Swanson, Sonja A.; Micali, Nadia; Field, Alison; Horton, Nicholas J.

    2015-01-01

    Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple–source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject’s symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non–monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991–92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness. PMID:25642154

  5. Handling Missing Data in Matched Case-Control Studies Using Multiple Imputation

    PubMed Central

    Seaman, Shaun R.; Keogh, Ruth H.

    2016-01-01

    SUMMARY Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin’s Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data. PMID:26237003

  6. Screening and Evaluation of Deleterious SNPs in APOE Gene of Alzheimer's Disease

    PubMed Central

    Masoodi, Tariq Ahmad; Al Shammari, Sulaiman A.; Al-Muammar, May N.; Alhamdan, Adel A.

    2012-01-01

    Introduction. Apolipoprotein E (APOE) is an important risk factor for Alzheimer's disease (AD) and is present in 30–50% of patients who develop late-onset AD. Several single-nucleotide polymorphisms (SNPs) are present in APOE gene which act as the biomarkers for exploring the genetic basis of this disease. The objective of this study is to identify deleterious nsSNPs associated with APOE gene. Methods. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nonsynonymous (ns) SNPs and their effect on protein was predicted by PolyPhen and SIFT, respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the APOE protein was evaluated by using Swiss PDB viewer and NOMAD-Ref server. Results. Six nsSNPs were found to be least stable by I-Mutant 2.0 with DDG value of >−1.0. Four nsSNPs showed a highly deleterious tolerance index score of 0.00. Nine nsSNPs were found to be probably damaging with position-specific independent counts (PSICs) score of ≥2.0. Seven nsSNPs were found to be highly polymorphic with a risk score of 3-4. The total energies and root-mean-square deviation (RMSD) values were higher for three mutant-type structures compared to the native modeled structure. Conclusion. We concluded that three nsSNPs, namely, rs11542041, rs11542040, and rs11542034, to be potentially functional polymorphic. PMID:22530123

  7. Screening and Evaluation of Deleterious SNPs in APOE Gene of Alzheimer's Disease.

    PubMed

    Masoodi, Tariq Ahmad; Al Shammari, Sulaiman A; Al-Muammar, May N; Alhamdan, Adel A

    2012-01-01

    Introduction. Apolipoprotein E (APOE) is an important risk factor for Alzheimer's disease (AD) and is present in 30-50% of patients who develop late-onset AD. Several single-nucleotide polymorphisms (SNPs) are present in APOE gene which act as the biomarkers for exploring the genetic basis of this disease. The objective of this study is to identify deleterious nsSNPs associated with APOE gene. Methods. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nonsynonymous (ns) SNPs and their effect on protein was predicted by PolyPhen and SIFT, respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the APOE protein was evaluated by using Swiss PDB viewer and NOMAD-Ref server. Results. Six nsSNPs were found to be least stable by I-Mutant 2.0 with DDG value of >-1.0. Four nsSNPs showed a highly deleterious tolerance index score of 0.00. Nine nsSNPs were found to be probably damaging with position-specific independent counts (PSICs) score of ≥2.0. Seven nsSNPs were found to be highly polymorphic with a risk score of 3-4. The total energies and root-mean-square deviation (RMSD) values were higher for three mutant-type structures compared to the native modeled structure. Conclusion. We concluded that three nsSNPs, namely, rs11542041, rs11542040, and rs11542034, to be potentially functional polymorphic. PMID:22530123

  8. Plasmonic filters.

    SciTech Connect

    Passmore, Brandon Scott; Shaner, Eric Arthur; Barrick, Todd A.

    2009-09-01

    Metal films perforated with subwavelength hole arrays have been show to demonstrate an effect known as Extraordinary Transmission (EOT). In EOT devices, optical transmission passbands arise that can have up to 90% transmission and a bandwidth that is only a few percent of the designed center wavelength. By placing a tunable dielectric in proximity to the EOT mesh, one can tune the center frequency of the passband. We have demonstrated over 1 micron of passive tuning in structures designed for an 11 micron center wavelength. If a suitable midwave (3-5 micron) tunable dielectric (perhaps BaTiO{sub 3}) were integrated with an EOT mesh designed for midwave operation, it is possible that a fast, voltage tunable, low temperature filter solution could be demonstrated with a several hundred nanometer passband. Such an element could, for example, replace certain components in a filter wheel solution.

  9. Water Filter

    NASA Astrophysics Data System (ADS)

    1982-01-01

    A compact, lightweight electrolytic water sterilizer available through Ambassador Marketing, generates silver ions in concentrations of 50 to 100 parts per billion in water flow system. The silver ions serve as an effective bactericide/deodorizer. Tap water passes through filtering element of silver that has been chemically plated onto activated carbon. The silver inhibits bacterial growth and the activated carbon removes objectionable tastes and odors caused by addition of chlorine and other chemicals in municipal water supply. The three models available are a kitchen unit, a "Tourister" unit for portable use while traveling and a refrigerator unit that attaches to the ice cube water line. A filter will treat 5,000 to 10,000 gallons of water.

  10. Eyeglass Filters

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Biomedical Optical Company of America's suntiger lenses eliminate more than 99% of harmful light wavelengths. NASA derived lenses make scenes more vivid in color and also increase the wearer's visual acuity. Distant objects, even on hazy days, appear crisp and clear; mountains seem closer, glare is greatly reduced, clouds stand out. Daytime use protects the retina from bleaching in bright light, thus improving night vision. Filtering helps prevent a variety of eye disorders, in particular cataracts and age related macular degeneration.

  11. Imputation of the Rare HOXB13 G84E Mutation and Cancer Risk in a Large Population-Based Cohort

    PubMed Central

    Hoffmann, Thomas J.; Sakoda, Lori C.; Shen, Ling; Jorgenson, Eric; Habel, Laurel A.; Liu, Jinghua; Kvale, Mark N.; Asgari, Maryam M.; Banda, Yambazi; Corley, Douglas; Kushi, Lawrence H.; Quesenberry, Charles P.; Schaefer, Catherine; Van Den Eeden, Stephen K.; Risch, Neil; Witte, John S.

    2015-01-01

    An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37−0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4×10−12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8×10−4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects. PMID:25629170

  12. Evaluation of developed low-density genotype panels for imputation to higher density in independent dairy and beef cattle populations.

    PubMed

    Judge, M M; Kearney, J F; McClure, M C; Sleator, R D; Berry, D P

    2016-03-01

    The objective of this study was to develop, using alternative algorithms, low-density SNP genotyping panels (384 to 12,000 SNP), which can be accurately imputed to higher-density panels across independent cattle populations. Single nucleotide polymorphisms were selected based on genomic characteristics (i.e., linkage disequilibrium [LD], minor allele frequency [MAF], and genomic distance) in a population of 1,267 Holstein-Friesian animals genotyped on the Illumina Bovine50 Beadchip (54,001 SNP). Single nucleotide polymorphism selection methods included 1) random; 2) equidistant location; 3) combination of SNP MAF and LD structure while maintaining relatively equal genomic distance between adjacent SNP; 4) a combination of high MAF, genomic distance between selected and candidate SNP, and correlation between genotypes of selected and candidate SNP; and 5) a machine learning algorithm. The panels were validated separately in 1) a population of 750 Holstein-Friesian animals with masked genotypes to reflect the lower-density SNP densities under investigation (1,249 animals with complete genotypes included in reference population) and 2) a population of 359 Limousin and Charolais cattle with high (777,962 SNP)-density genotypes (1,918 animals with complete genotypes included in the reference population). Irrespective of SNP selection method, imputation accuracy in both populations improved at a diminishing rate as the number of SNP included in the lower-density genotype panel increased. Additionally, the variability in mean imputation accuracy per individual decreased as the panel density increased. The SNP selection method had a major impact on the mean allele concordance rate, although its impact diminished as the panel density increased. Imputation accuracy for SNP selected using a combination of high SNP MAF, LD structure, and relatively equal genomic distance between SNP outperformed all other selection methods in densities < 12,000 SNP. Using this method of SNP selection, the correlation between the imputed and actual genotypes for the 3,000 SNP panel was 0.90 and 0.96 when applied to the beef and dairy populations, respectively; the respective correlations for the 6,000 SNP panel were 0.95 and 0.98. It is necessary to include between 3,000 and 6,000 SNP in a low-density panel to achieve adequate imputation accuracy to either medium density (approximately 50,000 SNP in the dairy population) or high density (approximately 700,000 SNP in the beef population) across diverse and independent populations. PMID:27065257

  13. SNP-Seek database of SNPs derived from 3000 rice genomes.

    PubMed

    Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L

    2015-01-01

    We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

  14. Chromosome 9p21 SNPs Associated with Multiple Disease Phenotypes Correlate with ANRIL Expression

    PubMed Central

    Cunnington, Michael S.; Santibanez Koref, Mauro; Mayosi, Bongani M.; Burn, John; Keavney, Bernard

    2010-01-01

    Single nucleotide polymorphisms (SNPs) on chromosome 9p21 are associated with coronary artery disease, diabetes, and multiple cancers. Risk SNPs are mainly non-coding, suggesting that they influence expression and may act in cis. We examined the association between 56 SNPs in this region and peripheral blood expression of the three nearest genes CDKN2A, CDKN2B, and ANRIL using total and allelic expression in two populations of healthy volunteers: 177 British Caucasians and 310 mixed-ancestry South Africans. Total expression of the three genes was correlated (P<0.05), suggesting that they are co-regulated. SNP associations mapped by allelic and total expression were similar (r = 0.97, P = 4.8×10−99), but the power to detect effects was greater for allelic expression. The proportion of expression variance attributable to cis-acting effects was 8% for CDKN2A, 5% for CDKN2B, and 20% for ANRIL. SNP associations were similar in the two populations (r = 0.94, P = 10−72). Multiple SNPs were independently associated with expression of each gene (P<0.05 after correction for multiple testing), suggesting that several sites may modulate disease susceptibility. Individual SNPs correlated with changes in expression up to 1.4-fold for CDKN2A, 1.3-fold for CDKN2B, and 2-fold for ANRIL. Risk SNPs for coronary disease, stroke, diabetes, melanoma, and glioma were all associated with allelic expression of ANRIL (all P<0.05 after correction for multiple testing), while association with the other two genes was only detectable for some risk SNPs. SNPs had an inverse effect on ANRIL and CDKN2B expression, supporting a role of antisense transcription in CDKN2B regulation. Our study suggests that modulation of ANRIL expression mediates susceptibility to several important human diseases. PMID:20386740

  15. SNPRanker: a tool for identification and scoring of SNPs associated to target genes.

    PubMed

    Calabria, Andrea; Mosca, Ettore; Viti, Federica; Merelli, Ivan; Milanesi, Luciano

    2010-01-01

    The identification of genes and SNPs involved in human diseases remains a challenge. Many public resources, databases and applications, collect biological data and perform annotations, increasing the global biological knowledge. The need of SNPs prioritization is emerging with the development of new high-throughput genotyping technologies, which allow to develop customized disease-oriented chips. Therefore, given a list of genes related to a specific biological process or disease as input, a crucial issue is finding the most relevant SNPs to analyse. The selection of these SNPs may rely on the relevant a-priori knowledge of biomolecular features characterising all the annotated SNPs and genes of the provided list. The bioinformatics approach described here allows to retrieve a ranked list of significant SNPs from a set of input genes, such as candidate genes associated with a specific disease. The system enriches the genes set by including other genes, associated to the original ones by ontological similarity evaluation. The proposed method relies on the integration of data from public resources in a vertical perspective (from genomics to systems biology data), the evaluation of features from biomolecular knowledge, the computation of partial scores for SNPs and finally their ranking, relying on their global score. Our approach has been implemented into a web based tool called SNPRanker, which is accessible through at the URL http://www.itb.cnr.it/snpranker . An interesting application of the presented system is the prioritisation of SNPs related to genes involved in specific pathologies, in order to produce custom arrays. PMID:20375450

  16. Computational Characterization of Osteoporosis Associated SNPs and Genes Identified by Genome-Wide Association Studies

    PubMed Central

    Wang, Ya; Wu, Guiju; Chen, Jie; Ye, Weiyuan; Yang, Jiancai; Huang, Qingyang

    2016-01-01

    Objectives Genome-wide association studies (GWASs) have revealed many SNPs and genes associated with osteoporosis. However, influence of these SNPs and genes on the predisposition to osteoporosis is not fully understood. We aimed to identify osteoporosis GWASs-associated SNPs potentially influencing the binding affinity of transcription factors and miRNAs, and reveal enrichment signaling pathway and “hub” genes of osteoporosis GWAS-associated genes. Methods We conducted multiple computational analyses to explore function and mechanisms of osteoporosis GWAS-associated SNPs and genes, including SNP conservation analysis and functional annotation (influence of SNPs on transcription factors and miRNA binding), gene ontology analysis, pathway analysis and protein-protein interaction analysis. Results Our results suggested that a number of SNPs potentially influence the binding affinity of transcription factors (NFATC2, MEF2C, SOX9, RUNX2, ESR2, FOXA1 and STAT3) and miRNAs. Osteoporosis GWASs-associated genes showed enrichment of Wnt signaling pathway, basal cell carcinoma and Hedgehog signaling pathway. Highly interconnected “hub” genes revealed by interaction network analysis are RUNX2, SP7, TNFRSF11B, LRP5, DKK1, ESR1 and SOST. Conclusions Our results provided the targets for further experimental assessment and further insight on osteoporosis pathophysiology. PMID:26930606

  17. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution. PMID:26833483

  18. Evaluating information content of SNPs for sample-tagging in re-sequencing projects

    PubMed Central

    Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F

    2015-01-01

    Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness. PMID:25975447

  19. netview p: a network visualization tool to unravel complex population structure using genome-wide SNPs.

    PubMed

    Steinig, Eike J; Neuditschko, Markus; Khatkar, Mehar S; Raadsma, Herman W; Zenger, Kyall R

    2016-01-01

    Network-based approaches are emerging as valuable tools for the analysis of complex genetic structure in wild and captive populations. netview p combines data quality control with the construction of population networks through mutual k-nearest neighbours thresholds applied to genome-wide SNPs. The program is cross-platform compatible, open-source and efficiently operates on data ranging from hundreds to hundreds of thousands of SNPs. The pipeline was used for the analysis of pedigree data from simulated (n = 750, SNPs = 1279) and captive silver-lipped pearl oysters (n = 415, SNPs = 1107), wild populations of the European hake from the Atlantic and Mediterranean (n = 834, SNPs = 380) and grey wolves from North America (n = 239, SNPs = 78 255). The population networks effectively visualize large- and fine-scale genetic structure within and between populations, including family-level structure and relationships. netview p comprises a network-based addition to other population analysis tools and provides user-friendly access to a complex network analysis pipeline through implementation in python. PMID:26129944

  20. Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs

    PubMed Central

    Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

    2014-01-01

    Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100

  1. A novel method to select informative SNPs and their application in genetic association studies.

    PubMed

    Liao, Bo; Li, Xiong; Zhu, Wen; Cao, Zhi

    2012-01-01

    The association studies between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes have recently received great attention. However, these studies are limited by the cost of genotyping all SNPs. Therefore, it is essential to find a small subset of tag SNPs representing the rest of the SNPs. The presence of linkage disequilibrium between tag SNPs and the disease variant (genotyped or not), may allow fine mapping study. In this paper, we combine a nearest-means classifier (NMC) and ant colony algorithm to select tags. Results show that our method (ACO/NMC) can get a similar prediction accuracy with method BPSO/SVM and is better than BPSO/STAMPA for small data sets. For large data sets, although the prediction accuracy of our method is lower than BPSO/SVM, ACO/NMC can reach a high accuracy (>99 percent) in a relatively short time. when the number of tags increases, the time complexity of NMC is nearly linear growth. To find out that the ability of tags to locate disease locus, we simulate a case-control study and use two-locus haplotype analysis to quantitatively assess the power. The result showed that 20 percent of all SNPs selected by NMC have about 10 percent higher power than random tags, on average. PMID:22585142

  2. Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs.

    PubMed

    Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

    2014-01-01

    Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40-60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor-SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100

  3. Verification of SNPs Associated with Growth Traits in Two Populations of Farmed Atlantic Salmon

    PubMed Central

    Tsai, Hsin Y.; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Bishop, Steve C.; Houston, Ross D.

    2015-01-01

    Understanding the relationship between genetic variants and traits of economic importance in aquaculture species is pertinent to selective breeding programmes. High-throughput sequencing technologies have enabled the discovery of large numbers of SNPs in Atlantic salmon, and high density SNP arrays now exist. A previous genome-wide association study (GWAS) using a high density SNP array (132K SNPs) has revealed the polygenic nature of early growth traits in salmon, but has also identified candidate SNPs showing suggestive associations with these traits. The aim of this study was to test the association of the candidate growth-associated SNPs in a separate population of farmed Atlantic salmon to verify their effects. Identifying SNP-trait associations in two populations provides evidence that the associations are true and robust. Using a large cohort (N = 1152), we successfully genotyped eight candidate SNPs from the previous GWAS, two of which were significantly associated with several growth and fillet traits measured at harvest. The genes proximal to these SNPs were identified by alignment to the salmon reference genome and are discussed in the context of their potential role in underpinning genetic variation in salmon growth. PMID:26703584

  4. Multiple imputation for estimation of an occurrence rate in cohorts with attrition and discrete follow-up time points: a simulation study

    PubMed Central

    2010-01-01

    Background In longitudinal cohort studies, subjects may be lost to follow-up at any time during the study. This leads to attrition and thus to a risk of inaccurate and biased estimations. The purpose of this paper is to show how multiple imputation can take advantage of all the information collected during follow-up in order to estimate the cumulative probability P(E) of an event E, when the first occurrence of this event is observed at t successive time points of a longitudinal study with attrition. Methods We compared the performance of multiple imputation with that of Kaplan-Meier estimation in several simulated attrition scenarios. Results In missing-completely-at-random scenarios, the multiple imputation and Kaplan-Meier methods performed well in terms of bias (less than 1%) and coverage rate (range = [94.4%; 95.8%]). In missing-at-random scenarios, the Kaplan-Meier method was associated with a bias ranging from -5.1% to 7.0% and with a very poor coverage rate (as low as 0.2%). Multiple imputation performed much better in this situation (bias <2%, coverage rate >83.4%). Conclusions Multiple imputation shows promise for estimation of an occurrence rate in cohorts with attrition. This study is a first step towards defining appropriate use of multiple imputation in longitudinal studies. PMID:20815883

  5. Ceramic filters

    SciTech Connect

    Holmes, B.L.; Janney, M.A.

    1995-12-31

    Filters were formed from ceramic fibers, organic fibers, and a ceramic bond phase using a papermaking technique. The distribution of particulate ceramic bond phase was determined using a model silicon carbide system. As the ceramic fiber increased in length and diameter the distance between particles decreased. The calculated number of particles per area showed good agreement with the observed value. After firing, the papers were characterized using a biaxial load test. The strength of papers was proportional to the amount of bond phase included in the paper. All samples exhibited strain-tolerant behavior.

  6. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    PubMed

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial. PMID:21390998

  7. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions

    PubMed Central

    Druet, T; Macleod, I M; Hayes, B J

    2014-01-01

    Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%. PMID:23549338

  8. 34 CFR 85.630 - May the Department of Education impute conduct of one person to another?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... (3 CFR, 1986 Comp., p. 189); E.O 12689 (3 CFR, 1989 Comp., p. 235); 20 U.S.C. 1082, 1094, 1221e-3 and... 34 Education 1 2010-07-01 2010-07-01 false May the Department of Education impute conduct of one person to another? 85.630 Section 85.630 Education Office of the Secretary, Department of...

  9. Tailored selection of study individuals to be sequenced in order to improve the accuracy of genotype imputation.

    PubMed

    Peil, Barbara; Kabisch, Maria; Fischer, Christine; Hamann, Ute; Bermejo, Justo Lorenzo

    2015-02-01

    The addition of sequence data from own-study individuals to genotypes from external data repositories, for example, the HapMap, has been shown to improve the accuracy of imputed genotypes. Early approaches for reference panel selection favored individuals who best reflect recombination patterns in the study population. By contrast, a maximization of genetic diversity in the reference panel has been recently proposed. We investigate here a novel strategy to select individuals for sequencing that relies on the characterization of the ancestral kernel of the study population. The simulated study scenarios consisted of several combinations of subpopulations from HapMap. HapMap individuals who did not belong to the study population constituted an external reference panel which was complemented with the sequences of study individuals selected according to different strategies. In addition to a random choice, individuals with the largest statistical depth according to the first genetic principal components were selected. In all simulated scenarios the integration of sequences from own-study individuals increased imputation accuracy. The selection of individuals based on the statistical depth resulted in the highest imputation accuracy for European and Asian study scenarios, whereas random selection performed best for an African-study scenario. Present findings indicate that there is no universal 'best strategy' to select individuals for sequencing. We propose to use the methodology described in the manuscript to assess the advantage of focusing on the ancestral kernel under own study characteristics (study size, genetic diversity, availability and properties of external reference panels, frequency of imputed variants). PMID:25537753

  10. Using multiple imputation to efficiently correct cerebral MRI whole brain lesion and atrophy data in patients with multiple sclerosis.

    PubMed

    Chua, Alicia S; Egorova, Svetlana; Anderson, Mark C; Polgar-Turcsanyi, Mariann; Chitnis, Tanuja; Weiner, Howard L; Guttmann, Charles R G; Bakshi, Rohit; Healy, Brian C

    2015-10-01

    Automated segmentation of brain MRI scans into tissue classes is commonly used for the assessment of multiple sclerosis (MS). However, manual correction of the resulting brain tissue label maps by an expert reader remains necessary in many cases. Since automated segmentation data awaiting manual correction are "missing", we proposed to use multiple imputation (MI) to fill-in the missing manually-corrected MRI data for measures of normalized whole brain volume (brain parenchymal fraction-BPF) and T2 hyperintense lesion volume (T2LV). Automated and manually corrected MRI measures from 1300 patients enrolled in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at the Brigham and Women's Hospital (CLIMB) were identified. Simulation studies were conducted to assess the performance of MI with missing data both missing completely at random and missing at random. An imputation model including the concurrent automated data as well as clinical and demographic variables explained a high proportion of the variance in the manually corrected BPF (R(2)=0.97) and T2LV (R(2)=0.89), demonstrating the potential to accurately impute the missing data. Further, our results demonstrate that MI allows for the accurate estimation of group differences with little to no bias and with similar precision compared to an analysis with no missing data. We believe that our findings provide important insights for efficient correction of automated MRI measures to obviate the need to perform manual correction on all cases. PMID:26093330

  11. Evaluation of transethnic fine mapping with population-specific and cosmopolitan imputation reference panels in diverse Asian populations.

    PubMed

    Wang, Xu; Cheng, Ching-Yu; Liao, Jiemin; Sim, Xueling; Liu, Jianjun; Chia, Kee-Seng; Tai, E-Shyong; Little, Peter; Khor, Chiea-Chuen; Aung, Tin; Wong, Tien-Yin; Teo, Yik-Ying

    2016-04-01

    There has been limited success in identifying causal variants underlying association signals observed in genome-wide association studies (GWAS). The use of 1000 Genomes Project (1KGP) allows the imputation to estimate the genetic information at untyped variants. However, long stretches of high linkage disequilibrium within the genome prevent us from differentiating between causal variants and perfect surrogates, thus limiting our ability to identify causal variants. Transethnic strategies have been proposed as a possible solution to mitigate this. However, these studies generally rely on imputing genotypes from multiple ancestries from 1KGP but not against population-specific reference panels. Here, we perform the first transethnic fine-mapping study across three Asian cohorts from diverse ancestries at the loci implicated with eye and blood lipid traits, using population-specific reference panels that have been generated by whole-genome sequencing samples from the same ancestry groups. Our study outlines several challenges faced in a fine-mapping exercise where one simply aims to meta-analyse existing GWAS that have been imputed against reference haplotypes from the 1KGP. PMID:26130488

  12. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    NASA Astrophysics Data System (ADS)

    Riggi, S.; Riggi, D.; Riggi, F.

    2015-04-01

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures' models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers' Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  13. Imputational Modeling of Spatial Context and Social Environmental Predictors of Walking in an Underserved Community: The PATH Trial

    PubMed Central

    Ellerbe, Caitlyn; Lawson, Andrew B.; Alia, Kassandra A.; Meyers, Duncan C.; Coulon, Sandra M.; Lawman, Hannah G.

    2013-01-01

    Background This study examined imputational modeling effects of spatial proximity and social factors of walking in African American adults. Purpose Models were compared that examined relationships between household proximity to a walking trail and social factors in determining walking status. Methods Participants (N=133; 66% female; mean age=55 yrs) were recruited to a police-supported walking and social marketing intervention. Bayesian modeling was used to identify predictors of walking at 12 months. Results Sensitivity analysis using different imputation approaches, and spatial contextual effects, were compared. All the imputation methods showed social life and income were significant predictors of walking, however, the complete data approach was the best model indicating Age (1.04, 95% OR: 1.00, 1.08), Social Life (0.83, 95% OR: 0.69, 0.98) and Income > $10,000 (0.10, 95% OR: 0.01, 0.97) were all predictors of walking. Conclusions The complete data approach was the best model of predictors of walking in African Americans. PMID:23481250

  14. Rocket noise filtering system using digital filters

    NASA Technical Reports Server (NTRS)

    Mauritzen, David

    1990-01-01

    A set of digital filters is designed to filter rocket noise to various bandwidths. The filters are designed to have constant group delay and are implemented in software on a general purpose computer. The Parks-McClellan algorithm is used. Preliminary tests are performed to verify the design and implementation. An analog filter which was previously employed is also simulated.

  15. Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling

    PubMed Central

    Hieke, Stefanie; Benner, Axel; Schlenk, Richard F.; Schumacher, Martin; Bullinger, Lars; Binder, Harald

    2016-01-01

    Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses. PMID:27159447

  16. SNPs for parentage testing and traceability in globally diverse breeds of sheep.

    PubMed

    Heaton, Michael P; Leymaster, Kreg A; Kalbfleisch, Theodore S; Kijas, James W; Clarke, Shannon M; McEwan, John; Maddox, Jillian F; Basnayake, Veronica; Petrik, Dustin T; Simpson, Barry; Smith, Timothy P L; Chitko-McKown, Carol G

    2014-01-01

    DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular "parentage SNP" varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF?0.3) in 485 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parent's genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.110(-39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the world's sheep breeds. PMID:24740156

  17. SNPs for Parentage Testing and Traceability in Globally Diverse Breeds of Sheep

    PubMed Central

    Heaton, Michael P.; Leymaster, Kreg A.; Kalbfleisch, Theodore S.; Kijas, James W.; Clarke, Shannon M.; McEwan, John; Maddox, Jillian F.; Basnayake, Veronica; Petrik, Dustin T.; Simpson, Barry; Smith, Timothy P. L.; Chitko-McKown, Carol G.

    2014-01-01

    DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF≥0.3) in 48±5 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionization–time-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parent’s genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.1×10(−39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the world’s sheep breeds. PMID:24740156

  18. The search for stable prognostic models in multiple imputed data sets

    PubMed Central

    2010-01-01

    Background In prognostic studies model instability and missing data can be troubling factors. Proposed methods for handling these situations are bootstrapping (B) and Multiple imputation (MI). The authors examined the influence of these methods on model composition. Methods Models were constructed using a cohort of 587 patients consulting between January 2001 and January 2003 with a shoulder problem in general practice in the Netherlands (the Dutch Shoulder Study). Outcome measures were persistent shoulder disability and persistent shoulder pain. Potential predictors included socio-demographic variables, characteristics of the pain problem, physical activity and psychosocial factors. Model composition and performance (calibration and discrimination) were assessed for models using a complete case analysis, MI, bootstrapping or both MI and bootstrapping. Results Results showed that model composition varied between models as a result of how missing data was handled and that bootstrapping provided additional information on the stability of the selected prognostic model. Conclusion In prognostic modeling missing data needs to be handled by MI and bootstrap model selection is advised in order to provide information on model stability. PMID:20846460

  19. An imputation of air pollution social cost of energy: A case study of Taiwan

    SciTech Connect

    Chi-Yuan Liang

    1995-12-31

    Based on the Air Pollution Control Act, the Environmental Protection Administration, Taiwan is scheduled to implement an anti-air-pollution fee on energy products in the coming July. The revenue of the anti-air-pollution fee will be used solely for air pollution control. The rationale of this fee is to endogenize the social cost of air pollution attributed to energy consumption and hence to curb the consumption of energy through price mechanism for a cleaner environment. Thus, to impute the social cost of air pollution caused by types of energy consumption is imminent for policy making. The objective of this paper is to propose a methodology to estimate the air pollution social cost of air pollution for types of energy in Taiwan. It is useful for policy making of the government in Taiwan and other countries as well. We employ data on epidemiology study and CVM study as well as energy consumption and pollution statistics to evaluate the social cost of air pollution for types of energy. This paper contains the following sections: (1) Introduction; (2) Methodology and Estimation Procedure; (3) Empirical Results; (4) Conclusions and Implications.

  20. A comparison of two methods of estimating propensity scores after multiple imputation.

    PubMed

    Mitra, Robin; Reiter, Jerome P

    2016-02-01

    In many observational studies, analysts estimate treatment effects using propensity scores, e.g. by matching or sub-classifying on the scores. When some values of the covariates are missing, analysts can use multiple imputation to fill in the missing data, estimate propensity scores based on the m completed datasets, and use the propensity scores to estimate treatment effects. We compare two approaches to implement this process. In the first, the analyst estimates the treatment effect using propensity score matching within each completed data set, and averages the m treatment effect estimates. In the second approach, the analyst averages the m propensity scores for each record across the completed datasets, and performs propensity score matching with these averaged scores to estimate the treatment effect. We compare properties of both methods via simulation studies using artificial and real data. The simulations suggest that the second method has greater potential to produce substantial bias reductions than the first, particularly when the missing values are predictive of treatment assignment. PMID:22687877

  1. A multiple imputation approach to disclosure limitation for high-age individuals in longitudinal studies.

    PubMed

    An, Di; Little, Roderick J A; McNally, James W

    2010-07-30

    Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because of the risk of disclosure, ages of very old respondents can often not be released; in particular, this is a specific stipulation of the Health Insurance Portability and Accountability Act (HIPAA) for the release of health data for individuals. Top-coding of individuals beyond a certain age is a standard way of dealing with this issue, and it may be adequate for cross-sectional data, when a modest number of cases are affected. However, this approach leads to serious loss of information in longitudinal studies when individuals have been followed for many years. We propose and evaluate an alternative to top-coding for this situation based on multiple imputation (MI). This MI method is applied to a survival analysis of simulated data, and data from the Charleston Heart Study (CHS), and is shown to work well in preserving the relationship between hazard and covariates. PMID:20552576

  2. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

    PubMed Central

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules. PMID:27199552

  3. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

    PubMed

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules. PMID:27199552

  4. A small number of candidate gene SNPs reveal continental ancestry in African Americans

    PubMed Central

    KODAMAN, NURI; ALDRICH, MELINDA C.; SMITH, JEFFREY R.; SIGNORELLO, LISA B.; BRADLEY, KEVIN; BREYER, JOAN; COHEN, SARAH S.; LONG, JIRONG; CAI, QIUYIN; GILES, JUSTIN; BUSH, WILLIAM S.; BLOT, WILLIAM J.; MATTHEWS, CHARLES E.; WILLIAMS, SCOTT M.

    2013-01-01

    SUMMARY Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K=2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99±0.003; 0.98± 0.01; 0.93±0.03; and 0.81± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably. PMID:23278390

  5. Profiling single nucleotide polymorphisms (SNPs) across intracellular folate metabolic pathway in healthy Indians

    PubMed Central

    Ghodke, Yogita; Chopra, Arvind; Shintre, Pooja; Puranik, Amrutesh; Joshi, Kalpana; Patwardhan, Bhushan

    2011-01-01

    Background & objectives: Many pharmacologically-relevant polymorphisms show variability among different populations. Though limited, data from Caucasian subjects have reported several single nucleotide polymorphism (SNPs) in folate biosynthetic pathway. These SNPs may be subjected to racial and ethnic differences. We carried out a study to determine the allelic frequencies of these SNPs in an Indian ethnic population. Methods: Whole blood samples were withdrawn from 144 unrelated healthy subjects from west India. DNA was extracted and genotyping was performed using PCR-RFLP and Real-time Taqman allelic discrimination for 12 polymorphisms in 9 genes of folate-methotrexate (MTX) metabolism. Results: Allele frequencies were obtained for MTHFR 677T (10%) and 1298 C (30%), TS 3UTR 0bp (46%), MDR1 3435T and 1236T (62%), RFC1 80A (57%), GGH 401T (61%), MS 2756G (34%), ATIC 347G (52%) and SHMT1 1420T (80%) in healthy subjects (frequency of underlined SNPs were different from published study data of European and African populations). Interpretation & conclusions: The current study describes the distribution of folate biosynthetic pathway SNPs in healthy Indians and validates the previous finding of differences due to race and ethnicity. Our results pave way to study the pharmacogenomics of MTX in the Indian population. PMID:21441680

  6. Interaction of silver nanoparticles (SNPs) with bacterial extracellular proteins (ECPs) and its adsorption isotherms and kinetics.

    PubMed

    Khan, S Sudheer; Srivatsan, P; Vaishnavi, N; Mukherjee, Amitava; Chandrasekaran, N

    2011-08-15

    Indiscriminate and increased use of silver nanoparticles (SNPs) in consumer products leads to the release of it into the environment. The fate and transport of SNPs in environment remains unknown. We have studied the interaction of SNPs with extracellular protein (ECP) produced by two environmental bacterial species and the adsorption behavior in aqueous solutions. The effect of pH and salt concentrations on the adsorption was also investigated. The adsorption process was found to be dependent on surface charge (zeta potential). The capping of SNPs by ECP was confirmed by Fourier transform infrared spectroscopy and X-ray diffraction. The adsorption of ECP on SNPs was analyzed by Langmuir and Freundlich models, suggesting that the equilibrium adsorption data fitted well with Freundlich model. The equilibrium adsorption data were modeled using the pseudo-first-order and pseudo-second-order kinetic equations. The results indicated that pseudo-second-order kinetic equation would better describe the adsorption kinetics. The capping was stable at environmental pH and salt concentration. The destabilization of nanoparticles was observed at alkaline pH. The study suggests that the stabilization of nanoparticles in the environment might lead to the accumulation and transport of nanomaterials in the environment, and ultimately destabilizes the functioning of the ecosystem. PMID:21684082

  7. Association Analysis Identifies Melampsora ×columbiana Poplar Leaf Rust Resistance SNPs

    PubMed Central

    La Mantia, Jonathan; Klápště, Jaroslav; El-Kassaby, Yousry A.; Azam, Shofiul; Guy, Robert D.; Douglas, Carl J.; Mansfield, Shawn D.; Hamelin, Richard

    2013-01-01

    Populus species are currently being domesticated through intensive time- and resource-dependent programs for utilization in phytoremediation, wood and paper products, and conversion to biofuels. Poplar leaf rust disease can greatly reduce wood volume. Genetic resistance is effective in reducing economic losses but major resistance loci have been race-specific and can be readily defeated by the pathogen. Developing durable disease resistance requires the identification of non-race-specific loci. In the presented study, area under the disease progress curve was calculated from natural infection of Melampsora ×columbiana in three consecutive years. Association analysis was performed using 412 P. trichocarpa clones genotyped with 29,355 SNPs covering 3,543 genes. We found 40 SNPs within 26 unique genes significantly associated (permutated P<0.05) with poplar rust severity. Moreover, two SNPs were repeated in all three years suggesting non-race-specificity and three additional SNPs were differentially expressed in other poplar rust interactions. These five SNPs were found in genes that have orthologs in Arabidopsis with functionality in pathogen induced transcriptome reprogramming, Ca2+/calmodulin and salicylic acid signaling, and tolerance to reactive oxygen species. The additive effect of non-R gene functional variants may constitute high levels of durable poplar leaf rust resistance. Therefore, these findings are of significance for speeding the genetic improvement of this long-lived, economically important organism. PMID:24236018

  8. Studies on interaction of colloidal silver nanoparticles (SNPs) with five different bacterial species.

    PubMed

    Khan, S Sudheer; Mukherjee, Amitava; Chandrasekaran, N

    2011-10-01

    Silver nanoparticles (SNPs) are being increasingly used in many consumer products like textile fabrics, cosmetics, washing machines, food and drug products owing to its excellent antimicrobial properties. Here we have studied the adsorption and toxicity of SNPs on bacterial species such as Pseudomonas aeruginosa, Micrococcus luteus, Bacillus subtilis, Bacillus barbaricus and Klebsiella pneumoniae. The influence of zeta potential on the adsorption of SNPs on bacterial cell surface was investigated at acidic, neutral and alkaline pH and with varying salt (NaCl) concentrations (0.05, 0.1, 0.5, 1 and 1.5 M). The survival rate of bacterial species decreased with increase in adsorption of SNPs. Maximum adsorption and toxicity was observed at pH 5, and NaCl concentration of <0.5 M. A very less adsorption was observed at pH 9 and NaCl concentration >0.5 M, there by resulting in less toxicity. The zeta potential study suggests that, the adsorption of SNPs on the cell surface was related to electrostatic force of attraction. The equilibrium and kinetics of the adsorption process were also studied. The adsorption equilibrium isotherms fitted well to the Langmuir model. The kinetics of adsorption fitted best to pseudo-first-order. These findings form a basis for interpreting the interaction of nanoparticles with environmental bacterial species. PMID:21640562

  9. TRES: Identification of Discriminatory and Informative SNPs from Population Genomic Data.

    PubMed

    Kavakiotis, Ioannis; Triantafyllidis, Alexandros; Ntelidou, Despoina; Alexandri, Panoraia; Megens, Hendrik-Jan; Crooijmans, Richard P M A; Groenen, Martien A M; Tsoumakas, Grigorios; Vlahavas, Ioannis

    2015-01-01

    The advent of high-throughput genomic technologies is enabling analyses on thousands or even millions of single-nucleotide polymorphisms (SNPs). At the same time, the selection of a minimum number of SNPs with the maximum information content is becoming increasingly problematic. Available locus ranking programs have been accused of providing upwardly biased results (concerning the predicted accuracy of the chosen set of markers for population assignment), cannot handle high-dimensional datasets, and some of them are computationally intensive. The toolbox for ranking and evaluation of SNPs (TRES) is a collection of algorithms built in a user-friendly and computationally efficient software that can manipulate and analyze datasets even in the order of millions of genotypes in a matter of seconds. It offers a variety of established methods for evaluating and ranking SNPs on user defined groups of populations and produces a set of predefined number of top ranked loci. Moreover, dataset manipulation algorithms enable users to convert datasets in different file formats, split the initial datasets into train and test sets, and finally create datasets containing only selected SNPs occurring from the SNP selection analysis for later on evaluation in dedicated software such as GENECLASS. This application can aid biologists to select loci with maximum power for optimization of cost-effective panels with applications related to e.g. species identification, wildlife management, and forensic problems. TRES is available for all operating systems at http://mlkd.csd.auth.gr/bio/tres. PMID:26137847

  10. Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability

    PubMed Central

    Das, Avinash; Morley, Michael; Moravec, Christine S.; Tang, W. H. W.; Hakonarson, Hakon; Ashley, Euan A.; Brandimarto, Jeffrey; Hu, Ray; Li, Mingyao; Li, Hongzhe; Liu, Yichuan; Qu, Liming; Sanchez, Pablo; Margulies, Kenneth B.; Cappola, Thomas P.; Jensen, Shane; Hannenhalli, Sridhar

    2015-01-01

    The standard expression quantitative trait loci (eQTL) detects polymorphisms associated with gene expression without revealing causality. We introduce a coupled Bayesian regression approach—eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combination of regulatory single-nucleotide polymorphisms (SNPs) that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance but also predicts gene expression more accurately than other methods. Based on realistic simulated data, we demonstrate that eQTeL accurately detects causal regulatory SNPs, including those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. PMID:26456756

  11. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    SciTech Connect

    Yang, Jing; Li, Yuan-Yuan; Shanghai Center for Bioinformation Technology, Shanghai 200235 ; Li, Yi-Xue; Shanghai Center for Bioinformation Technology, Shanghai 200235 ; Ye, Zhi-Qiang; Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

  12. Portability of tag SNPs across isolated population groups: an example from India.

    PubMed

    Sarkar Roy, N; Farheen, S; Roy, N; Sengupta, S; Majumder, P P

    2008-01-01

    Isolated population groups are useful in conducting association studies of complex diseases to avoid various pitfalls, including those arising from population stratification. Since DNA resequencing is expensive, it is recommended that genotyping be carried out at tagSNP (tSNP) loci. For this, tSNPs identified in one isolated population need to be used in another. Unless tSNPs are highly portable across populations this strategy may result in loss of information in association studies. We examined the issue of tSNP portability by sampling individuals from 10 isolated ethnic groups from India. We generated DNA resequencing data pertaining to 3 genomic regions and identified tSNPs in each population. We defined an index of tSNP portability and showed that portability is low across isolated Indian ethnic groups. The extent of portability did not significantly correlate with genetic similarity among the populations studied here. We also analyzed our data with sequence data from individuals of African and European descent. Our results indicated that it may be necessary to carry out resequencing in a small number of individuals to discover SNPs and identify tSNPs in the specific isolated population in which a disease association study is to be conducted. PMID:17627800

  13. Bioinformatics prioritization of SNPs perturbing microRNA regulation of hematological malignancy-implicated genes.

    PubMed

    Ghaedi, Hamid; Bastami, Milad; Zare-Abdollahi, Davood; Alipoor, Behnam; Movafagh, Abolfazl; Mirfakhraie, Reza; Omrani, Mir Davood; Masotti, Andrea

    2015-12-01

    The contribution of microRNAs (miRNAs) to cancer has been extensively investigated and it became obvious that a strict regulation of miRNA-mRNA regulatory network is crucial for safeguarding cell health. Apart from the direct impact of miRNA dysregulation in cancer pathogenesis, genetic variations in miRNAs are likely to disrupt miRNA-target interaction. Indeed, many evidences suggested that SNPs within miRNA regulome are associated with the development of different hematological malignancies. However, a full catalog of SNPs within miRNAs target sites of genes relevant to hematopoiesis and hematological malignancies is still lacking. Accordingly, we aimed to systematically identify and characterize such SNPs and provide a prioritized list of most potentially disrupting SNPs. Although in the present study we did not address the functional significance of these potential disturbing variants, we believe that our compiled results will be valuable for researchers interested in determining the role of target-SNPs in the development of hematological malignancies. PMID:26520014

  14. Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips.

    PubMed

    Boison, S A; Santos, D J A; Utsunomiya, A H T; Carvalheiro, R; Neves, H H R; O'Brien, A M Perez; Garcia, J F; Sölkner, J; da Silva, M V G B

    2015-07-01

    Genotype imputation is widely used as a cost-effective strategy in genomic evaluation of cattle. Key determinants of imputation accuracies, such as linkage disequilibrium patterns, marker densities, and ascertainment bias, differ between Bos indicus and Bos taurus breeds. Consequently, there is a need to investigate effectiveness of genotype imputation in indicine breeds. Thus, the objective of the study was to investigate strategies and factors affecting the accuracy of genotype imputation in Gyr (Bos indicus) dairy cattle. Four imputation scenarios were studied using 471 sires and 1,644 dams genotyped on Illumina BovineHD (HD-777K; San Diego, CA) and BovineSNP50 (50K) chips, respectively. Scenarios were based on which reference high-density single nucleotide polymorphism (SNP) panel (HDP) should be adopted [HD-777K, 50K, and GeneSeek GGP-75Ki (Lincoln, NE)]. Depending on the scenario, validation animals had their genotypes masked for one of the lower-density panels: Illumina (3K, 7K, and 50K) and GeneSeek (SGGP-20Ki and GGP-75Ki). We randomly selected 171 sires as reference and 300 as validation for all the scenarios. Additionally, all sires were used as reference and the 1,644 dams were imputed for validation. Genotypes of 98 individuals with 4 and more offspring were completely masked and imputed. Imputation algorithms FImpute and Beagle v3.3 and v4 were used. Imputation accuracies were measured using the correlation and allelic correct rate. FImpute resulted in highest accuracies, whereas Beagle 3.3 gave the least-accurate imputations. Accuracies evaluated as correlation (allelic correct rate) ranged from 0.910 (0.942) to 0.961 (0.974) using 50K as HDP and with 3K (7K) as low-density panels. With GGP-75Ki as HDP, accuracies were moderate for 3K, 7K, and 50K, but high for SGGP-20Ki. The use of HD-777K as HDP resulted in accuracies of 0.888 (3K), 0.941 (7K), 0.980 (SGGP-20Ki), 0.982 (50K), and 0.993 (GGP-75Ki). Ungenotyped individuals were imputed with an average accuracy of 0.970. The average top 5 kinship coefficients between reference and imputed individuals was a strong predictor of imputation accuracy. FImpute was faster and used less memory than Beagle v4. Beagle v4 outperformed Beagle v3.3 in accuracy and speed of computation. A genotyping strategy that uses the HD-777K SNP chip as a reference panel and SGGP-20Ki as the lower-density SNP panel should be adopted as accuracy was high and similar to that of the 50K. However, the effect of using imputed HD-777K genotypes from the SGGP-20Ki on genomic evaluation is yet to be studied. PMID:25958293

  15. Tunable birefringent filters

    NASA Technical Reports Server (NTRS)

    Title, A. M.; Rosenberg, W. J.

    1981-01-01

    This article reviews the types and capabilities of birefringent filters. The general operating principles of Lyot (perfect polarizers), partial polarizing, and Solc (no internal polarizers) filters are introduced. Appropriate techniques for tuning each filter type are presented. Field of view of birefringent filters is discussed and is compared to Fabry-Perot and interference filters. The transmission and throughput advantages of birefringent filters are shown. Finally, the current state of the art in practical filters is reviewed.

  16. All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

    PubMed Central

    Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  17. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass

    PubMed Central

    Ramstein, Guillaume P.; Lipka, Alexander E.; Lu, Fei; Costich, Denise E.; Cherney, Jerome H.; Buckler, Edward S.; Casler, Michael D.

    2015-01-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  18. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

    PubMed

    Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D

    2015-05-01

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. PMID:25770100

  19. Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs

    NASA Astrophysics Data System (ADS)

    Watson, Corey T.; Disanto, Giulio; Breden, Felix; Giovannoni, Gavin; Ramagopalan, Sreeram V.

    2012-10-01

    Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.

  20. Application of thermionic SNPS with thermal reactor for spacecraft orbital transfer mission

    SciTech Connect

    Andreev, P.V.; Gryaznov, G.M.; Zhabotinsky, E.E.; Nikonov, A.M.; Serbin, V.I. )

    1991-01-05

    The region of expedient using of SNPS with in-core thermal thermionic reactor (ITR) is limited by electric power level of about 100 kWe under SNPS lifetime from 3 to 5 years. At the same time the reactor power may be forced from two to three times during the period of about half a year. The mathematical model of SNPS mass dependence on a degree of forcing is given. The results of calculation of payload masses and transfer times for transfer from low orbit to geostationary orbit for two thermal reactors having emission area 1.6 m{sup 2} and 2.5 m{sup 2} are given for different types of electrojets.

  1. Complete genome sequence and SNPs of Raja pulchra (Rajiformes, Rajidae) mitochondria.

    PubMed

    Hwang, Jae Yeon; Jin, Gwi-Deuk; Park, Jongbin; Kim, Heebal; Lee, Chang-Kyu; Kwak, Woori; Nam, Bo-Hye; An, Cheul Min; Park, Jung Youn; Park, Kyu-Hyun; Huh, Chul-Sung; Kim, Eun Bae

    2016-07-01

    Mitochondrial genomes were sequenced from five Raja pulchra individuals, and single-nucleotide polymorphisms (SNPs) were identified by comparing previously announced sequences in this study. Total 117 SNPs were detected and they were present in 2 rRNA genes, 9 tRNA genes, 13 protein coding genes and non-coding region. One deleted polymorphic site, which was located in 16S rRNA gene, was observed in two individuals. Six polymorphic sites were non-synonymous SNPs, which were distributed in ND1, ND2, ATP6 and ND4 gene. Phylogenic analysis validated current taxa. The genome sequences of R. pulchra mitochondria could be comparable information for understanding species divergence and genomic variation among the populations. PMID:26122344

  2. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris).

    PubMed

    Villordo-Pineda, Emiliano; González-Chavira, Mario M; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A; Caballero-Pérez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  3. Identification of novel drought-tolerant-associated SNPs in common bean (Phaseolus vulgaris)

    PubMed Central

    Villordo-Pineda, Emiliano; González-Chavira, Mario M.; Giraldo-Carbajo, Patricia; Acosta-Gallegos, Jorge A.; Caballero-Pérez, Juan

    2015-01-01

    Common bean (Phaseolus vulgaris L.) is a leguminous in high demand for human nutrition and a very important agricultural product. Production of common bean is constrained by environmental stresses such as drought. Although conventional plant selection has been used to increase production yield and stress tolerance, drought tolerance selection based on phenotype is complicated by associated physiological, anatomical, cellular, biochemical, and molecular changes. These changes are modulated by differential gene expression. A common method to identify genes associated with phenotypes of interest is the characterization of Single Nucleotide Polymorphims (SNPs) to link them to specific functions. In this work, we selected two drought-tolerant parental lines from Mesoamerica, Pinto Villa, and Pinto Saltillo. The parental lines were used to generate a population of 282 families (F3:5) and characterized by 169 SNPs. We associated the segregation of the molecular markers in our population with phenotypes including flowering time, physiological maturity, reproductive period, plant, seed and total biomass, reuse index, seed yield, weight of 100 seeds, and harvest index in three cultivation cycles. We observed 83 SNPs with significant association (p < 0.0003 after Bonferroni correction) with our quantified phenotypes. Phenotypes most associated were days to flowering and seed biomass with 58 and 44 associated SNPs, respectively. Thirty-seven out of the 83 SNPs were annotated to a gene with a potential function related to drought tolerance or relevant molecular/biochemical functions. Some SNPs such as SNP28 and SNP128 are related to starch biosynthesis, a common osmotic protector; and SNP18 is related to proline biosynthesis, another well-known osmotic protector. PMID:26257755

  4. Filter apparatus

    SciTech Connect

    Zahedi, K.; Alexander, J. C.; Zieve, P. B.

    1985-03-19

    Electrified filter bed apparatus includes inner and outer cylindrical bed-retaining structures for confining a granular bed therebetween. The inner cylindrical structure may comprise a cage of superposed frusto-conical louvers and the outer structure may comprise a similar cage or a perforated cylindrical, liquid-drainage sheet. A cylindrical bed electrode for electrically charging the bed granules is suspended between the retaining structures. The tubular bed surrounds an internal gas passage from which polluted gas flows through the bed from the inside out. Gas enters the internal passage from above through an ionizer section of the apparatus. The ionizer section may include a disc-type ionizer assembly in an ionizer tube. The tube may form an extension of the inner louver cage. A corona discharge may be formed between the disc and the ionizer tube by providing electric current to the discs, whereby the corona discharge electrically charges particulate material within the gas stream. The discs may carry radially protruding needles defining circumferential corona discharge points. A blowdown system may be provided for cleaning the ionizer discs and the tube wall in the region of the discs. The apparatus may include means for avoiding blowout of bed granules from between the outer louvers, and a system for washing pollutant-coated bed granules.

  5. The conceptual design and main characteristics of long lifetime thermionic SNPS with thermal reactor

    NASA Astrophysics Data System (ADS)

    Andreev, Pavel V.; Griaznov, Georgii M.; Zhabotinskii, Evgenii E.; Zaritskii, Gennadii A.; Nikonov, Anatolii M.; Serbin, Victor I.

    A mass optimization study has been conducted for several thermionic Space Nuclear Power System (SNPS) conceptual designs and permissible levels of ionizing radiation. The optimal lengths from the core center to user interface module planes has been thereby defined. Specific mass dependencies of an optimized SNPS on electrical power levels in the 15-60 kW(e) range, in the framework of permissible radiation doses in the 10 exp 5-10 exp 6 range, are presented. Such SNPSs may furnish power supplies for an entire class of prospective spacecraft.

  6. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    PubMed Central

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  7. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references.

    PubMed

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-12-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case-control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5-98.8% obtained with the original HIBAG references to 95.2-99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets. PMID:25707395

  8. Exonic versus intronic SNPs: contrasting roles in revealing the population genetic differentiation of a widespread bird species

    PubMed Central

    Zhan, X; Dixon, A; Batbayar, N; Bragin, E; Ayas, Z; Deutschova, L; Chavko, J; Domashevsky, S; Dorosencu, A; Bagyura, J; Gombobaatar, S; Grlica, I D; Levin, A; Milobog, Y; Ming, M; Prommer, M; Purev-Ochir, G; Ragyov, D; Tsurkanu, V; Vetrov, V; Zubkov, N; Bruford, M W

    2015-01-01

    Recent years have seen considerable progress in applying single nucleotide polymorphisms (SNPs) to population genetics studies. However, relatively few have attempted to use them to study the genetic differentiation of wild bird populations and none have examined possible differences of exonic and intronic SNPs in these studies. Here, using 144 SNPs, we examined population genetic differentiation in the saker falcon (Falco cherrug) across Eurasia. The position of each SNP was verified using the recently sequenced saker genome with 108 SNPs positioned within the introns of 10 fragments and 36 SNPs in the exons of six genes, comprising MHC, MC1R and four others. In contrast to intronic SNPs, both Bayesian clustering and principal component analyses using exonic SNPs consistently revealed two genetic clusters, within which the least admixed individuals were found in Europe/central Asia and Qinghai (China), respectively. Pairwise D analysis for exonic SNPs showed that the two populations were significantly differentiated and between the two clusters the frequencies of five SNP markers were inferred to be influenced by selection. Central Eurasian populations clustered in as intermediate between the two main groups, consistent with their geographic position. But the westernmost populations of central Europe showed evidence of demographic isolation. Our work highlights the importance of functional exonic SNPs for studying population genetic pattern in a widespread avian species. PMID:25074575

  9. Identification of immune-related SNPs in the transcriptome of Mytilus chilensis through high-throughput sequencing.

    PubMed

    Núñez-Acuña, Gustavo; Gallardo-Escárate, Cristian

    2013-12-01

    Single nucleotide polymorphisms (SNPs) identified in coding regions represent a useful tool for understanding the immune response against pathogens and stressful environmental conditions. In this study, a SNPs database was generated from transcripts involved in the innate immune response of the mussel Mytilus chilensis. The SNPs were identified through hemocytes transcriptome sequencing from 18 individuals, and SNPs mining was performed in 225,336 contigs, yielding 20,306 polymorphisms associated to immune-related genes. Classification of identified SNPs was based on different pathways of the immune response for Mytilus sp. A total of 28 SNPs were identified in the Toll-like receptor pathway and included 5 non-synonymous polymorphisms; 19 SNPs were identified in the apoptosis pathway and included 3 non-synonymous polymorphisms; 35 SNPs were identified in the Ubiquitin-mediated proteolysis pathway and included 4 non-synonymous variants; and 54 SNPs involved in other molecular functions related to the immune response, such as molecular chaperones, antimicrobial peptides, and genes that interacts with marine toxins were also identified. The molecular markers identified in this work could be useful for novel studies, such as those related to associations between high-resolution molecular markers and functional response to pathogen agents. PMID:24080470

  10. Polymorphisms involving gain or loss of CpG sites are significantly enriched in trait-associated SNPs

    PubMed Central

    Zhou, Dan; Li, Zhenli; Yu, Dan; Wan, Ledong; Zhu, Yimin; Lai, Maode; Zhang, Dandan

    2015-01-01

    Some single nucleotide polymorphisms (SNPs) influence the existence of CpG sites, the basis of DNA modification such as methylation and hydroxymethylation. These polymorphisms can lead to gain or loss of CpG sites and were defined as CpG site related SNPs (cgSNPs) in this study. The cgSNPs change DNA sequence and might potentially affect DNA modification such as methylation. However, the functional consequence of cgSNPs is poorly understood. We observed that a considerable proportion (23.0%) of common variants were cgSNPs in human genome. Mutations involving loss of CpG sites were associated with reduced levels of methylation (~20.2%) using The Cancer Genome Atlas (TCGA) data. Using public databases (SCAN and seeQTL) of expression quantitative trait loci (eQTLs), we found that the cgSNPs were significantly enriched in eQTLs via logistic regression and simulation test. Furthermore, we observed that cgSNPs were more likely to be trait-associated loci especially cancers using a catalog of published genome-wide association studies (GWAS) recorded by National Human Genome Research Institute (NHGRI). Our results indicated that cgSNP might be meaningful as annotation either in SNP functional prediction or in screening for trait-associated SNPs. PMID:26503467

  11. Application of Population Sequencing (POPSEQ) for Ordering and Imputing Genotyping-by-Sequencing Markers in Hexaploid Wheat

    PubMed Central

    Edae, Erena A.; Bowden, Robert L.; Poland, Jesse

    2015-01-01

    The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based, high-resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for association mapping and genomics-assisted breeding is limited by a large proportion of missing data per marker. For species with a reference genomic sequence, markers can be ordered on the physical map. However, in the absence of reference marker order, the use and imputation of GBS markers is challenging. Here, we demonstrate how the population sequencing (POPSEQ) approach can be used to provide marker context for GBS in wheat. The utility of a POPSEQ-based genetic map as a reference map to create genetically ordered markers on a chromosome for hexaploid wheat was validated by constructing an independent de novo linkage map of GBS markers from a Synthetic W7984 × Opata M85 recombinant inbred line (SynOpRIL) population. The results indicated that there is strong agreement between the independent de novo linkage map and the POPSEQ mapping approach in mapping and ordering GBS markers for hexaploid wheat. After ordering, a large number of GBS markers were imputed, thus providing a high-quality reference map that can be used for QTL mapping for different traits. The POPSEQ-based reference map and whole-genome sequence assemblies are valuable resources that can be used to order GBS markers and enable the application of highly accurate imputation methods to leverage the application GBS markers in wheat. PMID:26530417

  12. Alteration of Antiviral Signalling by Single Nucleotide Polymorphisms (SNPs) of Mitochondrial Antiviral Signalling Protein (MAVS)

    PubMed Central

    Xing, Fei; Matsumiya, Tomoh; Hayakari, Ryo; Yoshida, Hidemi; Kawaguchi, Shogo; Takahashi, Ippei; Nakaji, Shigeyuki; Imaizumi, Tadaatsu

    2016-01-01

    Genetic variation is associated with diseases. As a type of genetic variation occurring with certain regularity and frequency, the single nucleotide polymorphism (SNP) is attracting more and more attention because of its great value for research and real-life application. Mitochondrial antiviral signalling protein (MAVS) acts as a common adaptor molecule for retinoic acid-inducible gene-I (RIG-I)-like receptors (RLRs), which can recognize foreign RNA, including viral RNA, leading to the induction of type I interferons (IFNs). Therefore, MAVS is thought to be a crucial molecule in antiviral innate immunity. We speculated that genetic variation of MAVS may result in susceptibility to infectious diseases. To assess the risk of viral infection based on MAVS variation, we tested the effects of twelve non-synonymous MAVS coding-region SNPs from the National Center for Biotechnology Information (NCBI) database that result in amino acid substitutions. We found that five of these SNPs exhibited functional alterations. Additionally, four resulted in an inhibitory immune response, and one had the opposite effect. In total, 1,032 human genomic samples obtained from a mass examination were genotyped at these five SNPs. However, no homozygous or heterozygous variation was detected. We hypothesized that these five SNPs are not present in the Japanese population and that such MAVS variations may result in serious immune diseases. PMID:26954674

  13. Large-scale enrichment and discovery of gene-associated SNPs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...

  14. Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNPs that Associate with Disease

    PubMed Central

    Guy, Richard T.; Santago, Peter; Langefeld, Carl D.

    2013-01-01

    Complex genetic disorders are a result of a combination of genetic and non-genetic factors, all potentially interacting. Machine learning methods hold the potential to identify multi-locus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of SNPs of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of ADTrees. The algorithm is order nk2, where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7822 SNPs in 3548 individuals). Our results suggest that BADTrees holds promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease. PMID:22851473

  15. Mining SNPs and Indels in Mung Bean (Vigna radiata) by Ecotilling

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ecotilling is a powerful genetic analysis tool. It can provide rapid identification of naturally occurring Single Nucleotide Polymorphisms (SNPs) and small insertion/deletions (indels) in a pool of accessions for a gene of interest. This technique eliminates the time consuming and expensive proced...

  16. Validation of 58 autosomal individual identification SNPs in three Chinese populations

    PubMed Central

    Wei, Yi-Liang; Qin, Cui-Jiao; Liu, Hai-Bo; Jia, Jing; Hu, Lan; Li, Cai-Xia

    2014-01-01

    Aim To genotype and evaluate a panel of single-nucleotide polymorphisms for individual identification (IISNPs) in three Chinese populations: Chinese Han, Uyghur, and Tibetan. Methods Two previously identified panels of IISNPs, 86 unlinked IISNPs and SNPforID 52-plex markers, were pooled and analyzed. Four SNPs were included in both panels. In total, 132 SNPs were typed on Sequenom MassARRAY platform in 330 individuals from Han Chinese, Uyghur, and Tibetan populations. Population genetic indices and forensic parameters were determined for all studied markers. Results No significant deviation from Hardy-Weinberg equilibrium was observed for any of the SNPs in 3 populations. Expected heterozygosity (He) ranged from 0.144 to 0.500 in Han Chinese, from 0.197 to 0.500 in Uyghur, and from 0.018 to 0.500 in Tibetan population. Wright's Fst values ranged from 0.0001 to 0.1613. Pairwise linkage disequilibrium (LD) calculations for all 132 SNPs showed no significant LD across the populations (r2<0.147). A subset of 58 unlinked IISNPs (r2<0.094) with He>0.450 and Fst values from 0.0002 to 0.0536 gave match probabilities of 10?25 and a cumulative probability of exclusion of 0.999992. Conclusion The 58 unlinked IISNPs with high heterozygosity have low allele frequency variation among 3 Chinese populations, which makes them excellent candidates for the development of multiplex assays for individual identification and paternity testing. PMID:24577821

  17. Assessing SNPs versus RAPDs for predicting heterogeneity and screening efficiency in wild potato (Solanum)species

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Knowing how genetic diversity is partitioned among and within wild potato species populations is important for efficient sampling for collection, preservation and evaluation. We sought to evaluate the effectiveness of SNPs for assessing germplasm by using the exact set of four model species previous...

  18. SNPs for parentage testing and traceability in globally diverse breeds of sheep

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA-based parentage determination accelerates genetic improvement by increasing pedigree accuracy. However, the utility of any “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities...

  19. Identification of new SNPs in native South American populations by resequencing the Y chromosome.

    PubMed

    Geppert, M; Ayub, Q; Xue, Y; Santos, S; Ribeiro-dos-Santos, Â; Baeta, M; Núñez, C; Martínez-Jarreta, B; Tyler-Smith, C; Roewer, L

    2015-03-01

    The Y-chromosomal genetic landscape of South America is relatively homogenous. The majority of native Amerindian people are assigned to haplogroup Q and only a small percentage belongs to haplogroup C. With the aim of further differentiating the major Q lineages and thus obtaining new insights into the population history of South America, two individuals, both belonging to the sub-haplogroup Q-M3, were analyzed with next-generation sequencing. Several new candidate SNPs were evaluated and four were confirmed to be new, haplogroup Q-specific, and variable. One of the new SNPs, named MG2, identifies a new sub-haplogroup downstream of Q-M3; the other three (MG11, MG13, MG15) are upstream of Q-M3 but downstream of M242, and describe branches at the same phylogenetic positions as previously known SNPs in the samples tested. These four SNPs were typed in 100 individuals belonging to haplogroup Q. PMID:25303787

  20. Hansa: an automated method for discriminating disease and neutral human nsSNPs.

    PubMed

    Acharya, Vishal; Nagarajaram, Hampapathalu A

    2012-02-01

    Variations are mostly due to nonsynonymous single nucleotide polymorphisms (nsSNPs), some of which are associated with certain diseases. Phenotypic effects of a large number of nsSNPs have not been characterized. Although several methods have been developed to predict the effects of nsSNPs as "disease" or "neutral," there is still a need for development of methods with improved prediction accuracies. We, therefore, developed a support vector machine (SVM) based method named Hansa which uses a novel set of discriminatory features to classify nsSNPs into disease (pathogenic) and benign (neutral) types. Validation studies on a benchmark dataset and further on an independent dataset of well-characterized known disease and neutral mutations show that Hansa outperforms the other known methods. For example, fivefold cross-validation studies using the benchmark HumVar dataset reveal that at the false positive rate (FPR) of 20% Hansa yields a true positive rate (TPR) of 82% that is about 10% higher than the best-known method. Hansa is available in the form of a web server at http://hansa.cdfd.org.in:8080. PMID:22045683

  1. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing

    PubMed Central

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  2. Heritability Estimated Using 50K SNPs Indicates Missing Heritability Problem in Holstein Breeding

    PubMed Central

    Shin, Donghyun; Park, Kyoung-Do; Ka, Sojoeng

    2015-01-01

    Previous studies in Holstein have shown 35% to 51.8% heritability in milk production traits, such as milk yield, fat, and protein, using pedigree data. Other studies in complex human traits could be captured by common single-nucleotide polymorphisms (SNPs), and their genetic variations, attributed to chromosomes, are in proportion to their length. Using genome-wide estimation and partitioning approaches, we analyzed three quantitative Holstein traits relevant to milk production in Korean Holstein data harvested from 462 individuals genotyped for 54,609 SNPs. For all three traits (milk yield, fat, and protein), we estimated a nominally significant (p = 0.1) proportion of variance explained by all SNPs on the Illumina BovineSNP50 Beadchip (h2G). These common SNPs explained approximately most of the narrow-sense heritability. Longer genomic regions tended to provide more phenotypic variation information, with a correlation of 0.46~0.53 between the estimate of variance explained by individual chromosomes and their physical length. These results suggested that polygenicity was ubiquitous for Holstein milk production traits. These results will expand our knowledge on recent animal breeding, such as genomic selection in Holstein. PMID:26865846

  3. Connecting SNPs in Diabetes: A Spatial Analysis of Meta-GWAS Loci

    PubMed Central

    Schierding, William; O’Sullivan, Justin M.

    2015-01-01

    Meta-analyses of genome-wide association studies (GWAS) have improved our understanding of the genetic foundations of a number of diseases, including diabetes. However, single nucleotide polymorphisms (SNPs) that are identified by GWAS, especially those that fall outside of gene regions, do not always clearly link to the underlying biology. Despite this, these SNPs have often been validated through re-sequencing efforts as not just tag SNPs, but as causative SNPs, and so must play a role in disease development or progression. In this study, we show how the 3D genome (spatial connections) and trans-expression Quantitative Trait Loci connect diabetes loci from different GWAS meta-analyses, informing the backbone of regulatory networks. Our findings include a three-way functional–spatial connection between the TM6SF2, CTRB1–BCAR1, and CELSR2–PSRC1 loci (rs201189528, rs7202844, and rs7202844, respectively) connected through the KCNIP3 and BCAR1/BCAR3 loci, respectively. These spatial hubs serve as an example of how loci in genes with little biological connection to disease come together to contribute to the diabetes phenotype. PMID:26191039

  4. Catalog of 320 single nucleotide polymorphisms (SNPs) in 20 quinone oxidoreductase and sulfotransferase genes.

    PubMed

    Iida, A; Sekine, A; Saito, S; Kitamura, Y; Kitamoto, T; Osawa, S; Mishima, C; Nakamura, Y

    2001-01-01

    Single nucleotide polymorphisms (SNPs) in genes encoding drug-metabolizing enzymes, transporters, receptors, and other drug targets have been widely implicated as contributors to differences among individuals as regards the efficacy and toxicity of many medications, as well as the susceptibility to complex diseases. By combining the polymerase chain reaction (PCR) technique with direct sequencing, we screened genomic DNAs from 48 Japanese volunteers for SNPs in genes encoding three quinone oxidoreductases (NQO1, NQO2, and PIG3) and 17 sulfotransferases (SULT1A1, SULT1A2, SULT1A3, SULT1C1, SULT1C2, SULT2A1, SULT2B1, ST1B2, TPST1, TPST2, SULTX3, STE, CST, HNK-1 ST, CHST2, CHST4, and CHST5). In all, we identified 320 SNPs from these 20 loci: 22 within coding elements, 21 in 5' flanking regions, 10 in 5' untranslated regions, 223 in introns, 19 in 3' untranslated regions, and 25 in 3' flanking regions. The ratio of transitions to transversions was approximately 2.3 to 1. Of the 22 coding SNPs, 6 were nonsynonymous substitutions that resulted in amino-acid substitutions. The high-density SNP maps we constructed from this data for each of the quinone oxidoreductases and sulfotransferases examined here should provide useful information for investigations designed to detect association(s) between genetic variations and common diseases or responsiveness to drug therapy. PMID:11322664

  5. Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA Segments

    PubMed Central

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  6. Cross-Amplification and Validation of SNPs Conserved over 44 Million Years between Seals and Dogs

    PubMed Central

    Hoffman, Joseph I.; Thorne, Michael A. S.; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  7. Cross-amplification and validation of SNPs conserved over 44 million years between seals and dogs.

    PubMed

    Hoffman, Joseph I; Thorne, Michael A S; McEwing, Rob; Forcada, Jaume; Ogden, Rob

    2013-01-01

    High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

  8. Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing.

    PubMed

    Zhang, Suhua; Bian, Yingnan; Zhang, Zheren; Zheng, Hancheng; Wang, Zheng; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Hou, Yiping; Li, Chengtao

    2015-01-01

    SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R(2) = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed. PMID:26691610

  9. Angiogenic, neurotrophic, and inflammatory system SNPs moderate the association between birth weight and ADHD symptom severity.

    PubMed

    Smith, Taylor F; Anastopoulos, Arthur D; Garrett, Melanie E; Arias-Vasquez, Alejandro; Franke, Barbara; Oades, Robert D; Sonuga-Barke, Edmund; Asherson, Philip; Gill, Michael; Buitelaar, Jan K; Sergeant, Joseph A; Kollins, Scott H; Faraone, Stephen V; Ashley-Koch, Allison

    2014-12-01

    Low birth weight is associated with increased risk for Attention-Deficit/Hyperactivity Disorder (ADHD); however, the etiological underpinnings of this relationship remain unclear. This study investigated if genetic variants in angiogenic, dopaminergic, neurotrophic, kynurenine, and cytokine-related biological pathways moderate the relationship between birth weight and ADHD symptom severity. A total of 398 youth from two multi-site, family-based studies of ADHD were included in the analysis. The sample consisted of 360 ADHD probands, 21 affected siblings, and 17 unaffected siblings. A set of 164 SNPs from 31 candidate genes, representing five biological pathways, were included in our analyses. Birth weight and gestational age data were collected from a state birth registry, medical records, and parent report. Generalized Estimating Equations tested for main effects and interactions between individual SNPs and birth weight centile in predicting ADHD symptom severity. SNPs within neurotrophic (NTRK3) and cytokine genes (CNTFR) were associated with ADHD inattentive symptom severity. There was no main effect of birth weight centile on ADHD symptom severity. SNPs within angiogenic (NRP1 & NRP2), neurotrophic (NTRK1 & NTRK3), cytokine (IL16 & S100B), and kynurenine (CCBL1 & CCBL2) genes moderate the association between birth weight centile and ADHD symptom severity. The SNP main effects and SNP × birth weight centile interactions remained significant after adjusting for multiple testing. Genetic variability in angiogenic, neurotrophic, and inflammatory systems may moderate the association between restricted prenatal growth, a proxy for an adverse prenatal environment, and risk to develop ADHD. PMID:25346392

  10. The effects of single nucleotide polymorphisms (SNPs) of calpastatin (CAST) gene on meat tenderness of yak.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The association of single nucleotide polymorphisms (SNPs) of calpastatin (CAST) gene with shear force of 2.54 cm steaks from M. longissimus dorsi from Gannan yaks (Bos grunniens, n=181) was studied. Yaks were harvested at 2, 3, and 4 yr of age (n=51, 59, and 71, respectively), and samples of each ya...

  11. Identification of pummelo cultivars by using a panel of 25 selected SNPs and 12 DNA segments.

    PubMed

    Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

    2014-01-01

    Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

  12. The effect of simple imputation on inferences about population means when data are missing in biomedical research due to detection limits

    PubMed Central

    WANG, Hongyue; CHEN, Guanqing; LU, Xiang; ZHANG, Hui; FENG, Changyong

    2015-01-01

    Summary The sample geometric mean has been widely used in biomedical and psychosocial research to estimate and compare population geometric means. However, due to the detection limit of measurement instruments, the actual value of the measurement is not always observable. A common practice to deal with this problem is to replace missing values by small positive constants and make inferences based on the imputed data. However, no work has been carried out to study the effect of this naïve imputation method on inference. In this report, we show that this simple imputation method may dramatically change the reported outcomes of a study and, thus, make the results uninterpretable, even if the detection limit is very small. PMID:26977131

  13. Identification of putative SNPs in progressive retinal atrophy affected Canis lupus familiaris using exome sequencing.

    PubMed

    Reddy, Bhaskar; Kelawala, Divyesh N; Shah, Tejas; Patel, Anand B; Patil, Deepak B; Parikh, Pinesh V; Patel, Namrata; Parmar, Nidhi; Mohapatra, Amit B; Singh, Krishna M; Menon, Ramesh; Pandya, Dipal; Jakhesara, Subhash J; Koringa, Prakash G; Rao, Mandava V; Joshi, Chaitanya G

    2015-12-01

    Progressive retinal atrophy (PRA) is one of the major causes of retinal photoreceptor cell degeneration in canines. The inheritance pattern of PRA is autosomal recessive and genetically heterogeneous. Here, using targeted sequencing technology, we have performed exome sequencing of 10 PRA-affected (Spitz=7, Cocker Spaniel=1, Lhasa Aphso=1 and Spitz-Labrador cross breed=1) and 6 normal (Spitz=5, Cocker Spaniel=1) dogs. The high-throughput sequencing using 454-Roche Titanium sequencer generated about 2.16 Giga bases of raw data. Initially, we have successfully identified 25,619 single nucleotide polymorphisms (SNPs) that passed the stringent SNP calling parameters. Further, we performed association study on the cohort, and the highly significant (0.001) associations were short-listed and investigated in-depth. Out of the 171 significant SNPs, 113 were previously unreported. Interestingly, six among them were non-synonymous coding (NSC) SNPs, which includes CPPED1 A>G (p.M307V), PITRM1 T>G (p.S715A), APP G>A (p.T266M), RNF213 A>G (p.V1482A), C>A (p.V1456L), and SLC46A3 G>A (p.R168Q). On the other hand, 35 out of 113 unreported SNPs were falling in regulatory regions such as 3'-UTR, 5'-UTR, etc. In-depth bioinformatics analysis revealed that majority of NSC SNPs have damaging effect and alter protein stability. This study highlighted the genetic markers associated with PRA, which will help to develop genetic assay-based screening in effective breeding. PMID:26515695

  14. PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs.

    PubMed

    Cheng, Zhongshan; Chu, Hin; Fan, Yanhui; Li, Cun; Song, You-Qiang; Zhou, Jie; Yuen, Kwok-Yung

    2015-01-01

    Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228 K SNP cis-eQTLs and 21 K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets. PMID:26612672

  15. PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs

    PubMed Central

    Cheng, Zhongshan; Chu, Hin; Fan, Yanhui; Li, Cun; Song, You-Qiang; Zhou, Jie; Yuen, Kwok-Yung

    2015-01-01

    Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228 K SNP cis-eQTLs and 21 K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets. PMID:26612672

  16. MiR-SNPs as Markers of Toxicity and Clinical Outcome in Hodgkin Lymphoma Patients

    PubMed Central

    Navarro, Alfons; Muoz, Carmen; Gaya, Anna; Daz-Bey, Marina; Gel, Bernat; Tejero, Rut; Daz, Tania; Martinez, Antonio; Monz, Mariano

    2013-01-01

    Background In recent years, microRNA (miRNA) pathways have emerged as a crucial system for the regulation of tumorogenesis. miR-SNPs are a novel class of single nucleotide polymorphisms that can affect miRNA pathways. Design and Methods We analyzed eight miR-SNPs by allelic discrimination in 141 patients with Hodgkin lymphoma and correlated the results with treatment-related toxicity, response, disease-free survival (DFS) and overall survival (OS). Results The KRT81 (rs3660) GG genotype was associated with an increased risk of neurological toxicity (P?=?0.016), while patients with XPO5 (rs11077) AA or CC genotypes had a higher rate of bleomycin-associated pulmonary toxicity (P?=?0.048). Both miR-SNPs emerged as independent factors in the multivariate analysis. The XPO5 AA and CC genotypes were also associated with a lower response rate (P?=?0.036). XPO5 (P?=?0.039) and TRBP (rs784567) (P?=?0.022) genotypes emerged as prognostic markers for DFS, and XPO5 was also associated with OS (P?=?0.033). In the multivariate analysis, only XPO5 emerged as an independent prognostic factor for DFS (HR: 2.622; 95%CI 1.0396.620; P?=?0.041). Given the influence of XPO5 and TRBP as individual markers, we then investigated the combined effect of these miR-SNPs. Patients with both the XPO5 AA/CC and TRBP TT/TC genotypes had the shortest DFS (P?=?0.008) and OS (P?=?0.008). Conclusion miR-SNPs can add useful prognostic information on treatment-related toxicity and clinical outcome in Hodgkin lymphoma and can be used to identify patients likely to be chemoresistant or to relapse. PMID:23705004

  17. Selection vector filter framework

    NASA Astrophysics Data System (ADS)

    Lukac, Rastislav; Plataniotis, Konstantinos N.; Smolka, Bogdan; Venetsanopoulos, Anastasios N.

    2003-10-01

    We provide a unified framework of nonlinear vector techniques outputting the lowest ranked vector. The proposed framework constitutes a generalized filter class for multichannel signal processing. A new class of nonlinear selection filters are based on the robust order-statistic theory and the minimization of the weighted distance function to other input samples. The proposed method can be designed to perform a variety of filtering operations including previously developed filtering techniques such as vector median, basic vector directional filter, directional distance filter, weighted vector median filters and weighted directional filters. A wide range of filtering operations is guaranteed by the filter structure with two independent weight vectors for angular and distance domains of the vector space. In order to adapt the filter parameters to varying signal and noise statistics, we provide also the generalized optimization algorithms taking the advantage of the weighted median filters and the relationship between standard median filter and vector median filter. Thus, we can deal with both statistical and deterministic aspects of the filter design process. It will be shown that the proposed method holds the required properties such as the capability of modelling the underlying system in the application at hand, the robustness with respect to errors in the model of underlying system, the availability of the training procedure and finally, the simplicity of filter representation, analysis, design and implementation. Simulation studies also indicate that the new filters are computationally attractive and have excellent performance in environments corrupted by bit errors and impulsive noise.

  18. eQuIPS: eQTL Analysis Using Informed Partitioning of SNPs - A Fully Bayesian Approach.

    PubMed

    Boggis, E M; Milo, M; Walters, K

    2016-05-01

    We develop a Bayesian multi-SNP Markov chain Monte Carlo approach that allows published functional significance scores to objectively inform single nucleotide polymorphism (SNP) prior effect sizes in expression quantitative trait locus (eQTL) studies. We developed the Normal Gamma prior to allow the inclusion of functional information. We partition SNPs into predefined functional groups and select prior distributions that fit the group-specific observed functional significance scores. We test our method on two simulated datasets and previously analysed human eQTL data containing validated causal SNPs. In our simulations the modified Normal Gamma always performs at least as well, and generally outperforms, the other methods considered. When analysing the human eQTL data, we placed all SNPs into their actual functional group. The ranks of the four validated causal SNPs analysed using the modified Normal Gamma increase dramatically compared to those of the other methods considered. Using our new method, three of the four validated SNPs are ranked in the top 1% of SNPs and the other is in the top 2%. For the standard Normal Gamma, the best of the other methods, the four validated SNPs had ranks in the top 1%, 4%, 20% and 59%. Crucially these substantive improvements in the ranks make it highly likely that most, if not all, of these validated SNPs would have been flagged for follow-up using our new method, whereas at least two of them would certainly not have been using the current approaches. PMID:26989050

  19. HEPA filter dissolution process

    SciTech Connect

    Brewer, K.N.; Murphy, J.A.

    1992-12-31

    This invention is comprised of a process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  20. Recirculating electric air filter

    DOEpatents

    Bergman, Werner

    1986-01-01

    An electric air filter cartridge has a cylindrical inner high voltage eleode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  1. Recirculating electric air filter

    DOEpatents

    Bergman, W.

    1985-01-09

    An electric air filter cartridge has a cylindrical inner high voltage electrode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

  2. HEPA filter dissolution process

    DOEpatents

    Brewer, K.N.; Murphy, J.A.

    1994-02-22

    A process is described for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal. 4 figures.

  3. Hepa filter dissolution process

    DOEpatents

    Brewer, Ken N.; Murphy, James A.

    1994-01-01

    A process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

  4. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  5. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    PubMed Central

    Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

    2012-01-01

    Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses. PMID:22013517

  6. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island

    PubMed Central

    Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A.; Shouche, Yogesh S.; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1–40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1–20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25–40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  7. A Large-Scale Analysis of the Relationship of Synonymous SNPs Changing MicroRNA Regulation with Functionality and Disease

    PubMed Central

    Wang, Yuchen; Qiu, Chengxiang; Cui, Qinghua

    2015-01-01

    Historically, owing to not changing amino acid composition of protein sequences, synonymous mutations are commonly assumed to be neutral during evolution and therefore have no effect on the phenotype and disease. Here, based on observations from large-scale analysis of genomic data, we predicted the putative synonymous SNPs that could result in functional consequences and disease risk through changing the microRNA-mediated gene regulation. We found that nearly half of the synonymous SNPs could affect protein expression by changing microRNA regulation in human genome and these SNPs significantly prefer to be associated with human diseases and traits. The synonymous SNPs changing microRNA-mediated gene regulation tend to be more under recent positive selection, prefer to affect gene expression, and implicate in human disease. We conclude that the miRNA-mediated regulation changes could be a potential mechanism for the contributions of synonymous SNPs to protein functions and disease risks. PMID:26437399

  8. Properties of multilayer filters

    NASA Technical Reports Server (NTRS)

    Baumeister, P. W.

    1973-01-01

    New methods were investigated of using optical interference coatings to produce bandpass filters for the spectral region 110 nm to 200 nm. The types of filter are: triple cavity metal dielectric filters; all dielectric reflection filters; and all dielectric Fabry Perot type filters. The latter two types use thorium fluoride and either cryolite films or magnesium fluoride films in the stacks. The optical properties of the thorium fluoride were also measured.

  9. Tap water filters.

    PubMed

    2003-02-01

    Moen PureTouch filters remove impurities from tap water without removing fluoride. These carbon block filters consist of finely powdered activated carbon that is combined with a plastic binder material and heated to form a hollow cylinder. The blocks are further wrapped with material to improve performance and reduce clogging. The filters are available with different filtering capabilities (Table 1). The filters mount in the faucet spout or under the sink. PMID:12636128

  10. Mitochondrial genome DNA analysis of the domestic dog: identifying informative SNPs outside of the control region.

    PubMed

    Webb, Kristen M; Allard, Marc W

    2009-03-01

    While the mitochondrial control region has proven successful for human forensic evaluations by indicating ethnic origin, domestic dogs (Canis lupus familiaris) of seemingly unrelated breeds often form large groups based on identical control region sequences. In an attempt to break up these large haplotype groups, we have analyzed the remaining c. 15,484 base pairs of the canine mitochondrial genome for 79 dogs and used phylogenetic and population genetic methods to search for additional variability in the form of single nucleotide polymorphisms (SNPs). We have identified 356 SNPs and 65 haplotypes in the remainder of the mitochondrial genome excluding the control region. The exclusion capacity was found to be 0.018. The mitochondrial control region was also evaluated for the same 79 dogs. The signals from the different fragments do not conflict, but instead support one another and provide a larger fragment of DNA that can be analyzed as forensic evidence. PMID:19261050

  11. SNPs in microRNA binding sites as prognostic and predictive cancer biomarkers.

    PubMed

    Preskill, Carina; Weidhaas, Joanne B

    2013-01-01

    Single-nucleotide polymorphisms within microRNA (miRNA) binding sites comprise a novel genre of cancer biomarkers. Since miRNA regulation is dependent on sequence complementarity between the mRNA transcript and the miRNA, even single-nucleotide aberrations can have significant effects. Over the past few years, many examples of these functional miRNA binding site SNPs have been identified as cancer biomarkers. While most of the research to date focuses on associations with cancer risk, more and more studies are linking these SNPs to cancer prognosis and response to treatment as well. This review summarizes the state of the field and draws importance to this rapidly expanding area of cancer biomarkers. PMID:23614619

  12. Collective effects of SNPs on transgenerational inheritance in Caenorhabditis elegans and budding yeast.

    PubMed

    Zhu, Zuobin; Man, Xian; Xia, Mengying; Huang, Yimin; Yuan, Dejian; Huang, Shi

    2015-07-01

    We studied the collective effects of single nucleotide polymorphisms (SNPs) on transgenerational inheritance in Caenorhabditis elegans recombinant inbred advanced intercross lines (RIAILs) and yeast segregants. We divided the RIAILs and segregants into two groups of high and low minor allele content (MAC). RIAILs with higher MAC needed less generations of benzaldehyde training to gain a stable olfactory imprint and showed a greater change from normal after benzaldehyde training. Yeast segregants with higher MAC showed a more dramatic shortening of the lag phase length after ethanol exposure. The short lag phase as acquired by ethanol training was more dramatically lost after recovery in ethanol free medium for the high MAC group. We also found a preferential association between MAC and traits linked with higher number of additive QTLs. These results suggest a role for the collective effects of SNPs in transgenerational inheritance, and may help explain human variations in disease susceptibility. PMID:25882787

  13. ARRANGEMENT FOR REPLACING FILTERS

    DOEpatents

    Blomgren, R.A.; Bohlin, N.J.C.

    1957-08-27

    An improved filtered air exhaust system which may be continually operated during the replacement of the filters without the escape of unfiltered air is described. This is accomplished by hermetically sealing the box like filter containers in a rectangular tunnel with neoprene covered sponge rubber sealing rings coated with a silicone impregnated pneumatic grease. The tunnel through which the filters are pushed is normal to the exhaust air duct. A number of unused filters are in line behind the filters in use, and are moved by a hydraulic ram so that a fresh filter is positioned in the air duct. The used filter is pushed into a waiting receptacle and is suitably disposed. This device permits a rapid and safe replacement of a radiation contaminated filter without interruption to the normal flow of exhaust air.

  14. Endothelial nitric oxide synthase tagSNPs influence the effects of enalapril in essential hypertension.

    PubMed

    Oliveira-Paula, Gustavo H; Lacchini, Riccardo; Luizon, Marcelo R; Fontana, Vanessa; Silva, Pamela S; Biagi, Celso; Tanus-Santos, Jose E

    2016-05-01

    The antihypertensive effects of angiotensin-converting enzyme inhibitors (ACEi) are associated with up-regulation of endothelial nitric oxide synthase (NOS3) activity. This mechanism may explain how polymorphisms in NOS3 gene affect the antihypertensive responses to ACEi. While clinically relevant NOS3 polymorphisms were previously shown to affect the antihypertensive responses to enalapril, no study has tested the hypothesis that NOS3 tagSNPs influence the antihypertensive effects of this drug. We examined whether the NOS3 tagSNPs rs3918226, rs3918188, and rs743506, and their haplotypes, affect the antihypertensive responses to enalapril in 101 patients with essential hypertension. Subjects were prospectively treated only with enalapril for 8 weeks. Genotypes were determined by Taqman(®) allele discrimination assay and real-time polymerase chain reaction (PCR) and haplotype frequencies were estimated. We compared the effects of NOS3 tagSNPs on changes in blood pressure after enalapril treatment. To confirm our findings, multiple linear regression analysis was performed adjusting for age, gender, ethnicity, and alcohol consumption. We found that hypertensive patients carrying the AA genotype for the tagSNP rs3918188 showed lower decreases in blood pressure in response to enalapril. Moreover, the TCA haplotype was associated with improved decreases in blood pressure in response to enalapril compared with the CAG haplotype. Adjustment for covariates in multiple linear regression analysis did not change these effects. In addition, when patients were stratified according to the dose of enalapril used, we found that the carries of the T allele for the functional tagSNP rs3918226 showed more intense decreases in blood pressure in response to enalapril 20 mg/day. Our findings suggest that NOS3 tagSNPs influence the effects of enalapril in essential hypertension. PMID:27060232

  15. Impulsiveness mediates the association between GABRA2 SNPs and lifetime alcohol problems

    PubMed Central

    Villafuerte, Sandra; Strumba, Viktorya; Stoltenberg, Scott F.; Zucker, Robert A.; Burmeister, Margit

    2013-01-01

    Genetic variants in GABRA2 have previously been shown to be associated with alcohol measures, EEG β waves, and impulsiveness-related traits. Impulsiveness is a behavioral risk factor for alcohol and other substance abuse. Here, we tested association between 11 variants in GABRA2 with NEO- impulsiveness and problem drinking. Our sample of 295 unrelated adult subjects was from a community of families with at least one male with DSM-IV Alcohol use diagnosis, and from a socioeconomically comparable control group. Ten GABRA2 SNPs were associated with the NEO-impulsiveness (p < 0.03). The alleles associated with higher impulsiveness correspond to the minor alleles identified in previous alcohol dependence studies. All ten SNPs are in LD with each other and represent one effect on impulsiveness. Four SNPs and the corresponding haplotype from intron 3 to intron 4 were also associated with Lifetime Alcohol Problems Score (LAPS, p < 0.03) (not corrected for multiple testing). Impulsiveness partially mediates (22.6% average) this relation between GABRA2 and LAPS. Our results suggest that GABRA2 variation in the region between introns 3 and 4 is associated with impulsiveness and this effect partially influences the development of alcohol problems, but a direct effect of GABRA2 on problem drinking remains. A potential functional SNP rs279827, located next to a splice site, is located in the most significant region for both impulsiveness and LAPS. The high degree of LD among nine of these SNPs and the conditional analyses we have performed suggest that all variants represent one signal. PMID:23566244

  16. Identification of Deleterious SNPs and Their Effects on Structural Level in CHRNA3 Gene.

    PubMed

    Chandramohan, Vivek; Nagaraju, Navya; Rathod, Shrikant; Kaphle, Anubhav; Muddapur, Uday

    2015-08-01

    The aim of our study is to identify probable deleterious genetic variations that can alter the expression and the function of the CHRNA3 gene using in silico methods. Of the 2305 SNPs identified in the CHRNA3 gene, 115 were found to be non-synonymous and 12 and 15 nsSNPs were found to be in the 5' and 3' UTRs, respectively. Further, out of the 115 nsSNPs investigated, eight were predicted to be deleterious by both SIFT and PredictSNP servers. The major mutations predicted to affect the structure of the protein are phenylalanine to valine (Y43V) and lysine to asparagine (K216N) as shown by the trajectory run in molecular dynamics studies. The random transition of the protein structures over the simulation period caused by these mutations hints at how the native state is distorted which could lead to the loss of structural stability and functionality of the nicotinic acetylcholine receptors subunit α-3 protein. Based on this work, we propose that the nsSNP with SNP id of rs75495285 and rs76821682 will have comparatively more deleterious effects than the other predicted mutations in destabilizing the protein structure. PMID:26002565

  17. Concordant Gene Expression in Leukemia Cells and Normal Leukocytes Is Associated with Germline cis-SNPs

    PubMed Central

    French, Deborah; Yang, Wenjian; Hamilton, Leo H.; Neale, Geoffrey; Fan, Yiping; Downing, James R.; Cox, Nancy J.; Pui, Ching-Hon; Evans, William E.; Relling, Mary V.

    2008-01-01

    The degree to which gene expression covaries between different primary tissues within an individual is not well defined. We hypothesized that expression that is concordant across tissues is more likely influenced by genetic variability than gene expression which is discordant between tissues. We quantified expression of 11,873 genes in paired samples of primary leukemia cells and normal leukocytes from 92 patients with acute lymphoblastic leukemia (ALL). Genetic variation at >500,000 single nucleotide polymorphisms (SNPs) was also assessed. The expression of only 176/11,783 (1.5%) genes was correlated (p<0.008, FDR = 25%) in the two tissue types, but expression of a high proportion (20 of these 176 genes) was significantly related to cis-SNP genotypes (adjusted p<0.05). In an independent set of 134 patients with ALL, 14 of these 20 genes were validated as having expression related to cis-SNPs, as were 9 of 20 genes in a second validation set of HapMap cell lines. Genes whose expression was concordant among tissue types were more likely to be associated with germline cis-SNPs than genes with discordant expression in these tissues; genes affected were involved in housekeeping functions (GSTM2, GAPDH and NCOR1) and purine metabolism. PMID:18478092

  18. Functional classification of 15 million SNPs detected from diverse chicken populations

    PubMed Central

    Gheyas, Almas A.; Boschiero, Clarissa; Eory, Lel; Ralph, Hannah; Kuo, Richard; Woolliams, John A.; Burt, David W.

    2015-01-01

    Next-generation sequencing has prompted a surge of discovery of millions of genetic variants from vertebrate genomes. Besides applications in genetic association and linkage studies, a fraction of these variants will have functional consequences. This study describes detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions. The study reports: 183K amino acid-altering SNPs of which 48% predicted as evolutionary intolerant, 13K splicing variants, 51K likely to alter RNA secondary structures, 500K within most conserved elements and 3K from non-coding RNAs. Regions of local fixation within commercial broiler and layer lines were investigated as potential selective sweeps using genome-wide SNP data. Relationships with phenotypes, if any, of the pfVars were explored by overlaying the sweep regions with known QTLs. Based on this, the candidate genes and/or causal mutations for a number of important traits are discussed. Although the fixed variants within sweep regions were enriched with non-coding SNPs, some non-synonymous-intolerant mutations reached fixation, suggesting their possible adaptive advantage. The results presented in this study are expected to have important implications for future genomic research to identify candidate causal mutations and in poultry breeding. PMID:25926514

  19. Impact of Single Nucleotide Polymorphisms (SNPs) on Immunosuppressive Therapy in Lung Transplantation.

    PubMed

    Ruiz, Jesus; Herrero, María José; Bosó, Virginia; Megías, Juan Eduardo; Hervás, David; Poveda, Jose Luis; Escrivá, Juan; Pastor, Amparo; Solé, Amparo; Aliño, Salvador Francisco

    2015-01-01

    Lung transplant patients present important variability in immunosuppressant blood concentrations during the first months after transplantation. Pharmacogenetics could explain part of this interindividual variability. We evaluated SNPs in genes that have previously shown correlations in other kinds of solid organ transplantation, namely ABCB1 and CYP3A5 genes with tacrolimus (Tac) and ABCC2, UGT1A9 and SLCO1B1 genes with mycophenolic acid (MPA), during the first six months after lung transplantation (51 patients). The genotype was correlated to the trough blood drug concentrations corrected for dose and body weight (C0/Dc). The ABCB1 variant in rs1045642 was associated with significantly higher Tac concentration, at six months post-transplantation (CT vs. CC). In the MPA analysis, CT patients in ABCC2 rs3740066 presented significantly lower blood concentrations than CC or TT, three months after transplantation. Other tendencies, confirming previously expected results, were found associated with the rest of studied SNPs. An interesting trend was recorded for the incidence of acute rejection according to NOD2/CARD15 rs2066844 (CT: 27.9%; CC: 12.5%). Relevant SNPs related to Tac and MPA in other solid organ transplants also seem to be related to the efficacy and safety of treatment in the complex setting of lung transplantation. PMID:26307985

  20. [Association analysis between SNPs of the growth hormone receptor gene and growth traits in arctic fox].

    PubMed

    DU, Zhi-Heng; Liu, Zong-Yue; Bai, Xiu-Juan

    2010-06-01

    Using single-strand conformation polymorphism (PCR-SSCP) and DNA sequencing, single nucleotide polymorphisms (SNPs) of growth hormone receptor (GHR) gene were detected in an arctic fox population. Correlation analysis between GHR polymorphisms and growth traits were carried out using the appropriate model. Four SNPs, G3A in the 5'UTR, C99T in the first exon, T59C and G65A in the fifth exon were identified on the arctic fox GHR gene. The G3A and C99T polymorphisms of GHR were associated with female fox body weight (Pamp;0.05) and the T59C and G65A polymorphisms of GHR were associated with male fox body weight (Pamp;0.05) and the skin length of the female fox (Pamp;0.01). Therefore, marker assistant selection on body weight and skin length of arctic foxes using these SNPs can be applied to get big and high quality arctic foxes. PMID:20566464

  1. Identification of Sex-Linked SNPs and Sex-Determining Regions in the Yellowtail Genome.

    PubMed

    Koyama, Takashi; Ozaki, Akiyuki; Yoshida, Kazunori; Suzuki, Junpei; Fuji, Kanako; Aoki, Jun-ya; Kai, Wataru; Kawabata, Yumi; Tsuzaki, Tatsuo; Araki, Kazuo; Sakamoto, Takashi

    2015-08-01

    Unlike the conservation of sex-determining (SD) modes seen in most mammals and birds, teleost fishes exhibit a wide variety of SD systems and genes. Hence, the study of SD genes and sex chromosome turnover in fish is one of the most interesting topics in evolutionary biology. To increase resolution of the SD gene evolutionary trajectory in fish, identification of the SD gene in more fish species is necessary. In this study, we focused on the yellowtail, a species widely cultivated in Japan. It is a member of family Carangidae in which no heteromorphic sex chromosome has been observed, and no SD gene has been identified to date. By performing linkage analysis and BAC walking, we identified a genomic region and SNPs with complete linkage to yellowtail sex. Comparative genome analysis revealed the yellowtail SD region ancestral chromosome structure as medaka-fugu. Two inversions occurred in the yellowtail linage after it diverged from the yellowtail-medaka ancestor. An association study using wild yellowtails and the SNPs developed from BAC ends identified two SNPs that can reasonably distinguish the sexes. Therefore, these will be useful genetic markers for yellowtail breeding. Based on a comparative study, it was suggested that a PDZ domain containing the GIPC protein might be involved in yellowtail sex determination. The homomorphic sex chromosomes widely observed in the Carangidae suggest that this family could be a suitable marine fish model to investigate the early stages of sex chromosome evolution, for which our results provide a good starting point. PMID:25975833

  2. Do SNPs of DRD4 gene predict adult persistence of ADHD in a Chinese sample?

    PubMed

    Li, Yueling; Baker-Ericzen, Mary; Ji, Ning; Chang, Weili; Guan, Lili; Qian, Qiujin; Zhang, Yujuan; Faraone, Stephen V; Wang, Yufeng

    2013-01-30

    The dopamine D4 receptor (DRD4) gene has been frequently studied in relation to attention deficit hyperactivity disorder (ADHD) but little is known about the contribution of single nucleotide polymorphisms (SNPs) of the DRD4 gene to the development and persistence of ADHD. In the present study, we examined the association between two SNPs in DRD4 (rs1800955, rs916455) and adult ADHD persistence in a Chinese sample. Subjects (n=193) were diagnosed with ADHD in childhood and reassessed in young adulthood at an affiliated clinic of Peking University Sixth Hospital. Kaplan-Meier survival analyses and Cox proportional hazard models were used to test the association between ADHD remission and alleles of the two SNPs. DRD4 rs916455 C allele carriers were more likely to have persistent ADHD symptoms in adulthood. No significant association was found between rs1800955 allele and the course of ADHD. These newly detected associations between DRD4 polymorphisms and ADHD prognosis in adulthood may help to predict the persistence of childhood ADHD into adulthood. PMID:23031802

  3. Impact of Single Nucleotide Polymorphisms (SNPs) on Immunosuppressive Therapy in Lung Transplantation

    PubMed Central

    Ruiz, Jesus; Herrero, María José; Bosó, Virginia; Megías, Juan Eduardo; Hervás, David; Poveda, Jose Luis; Escrivá, Juan; Pastor, Amparo; Solé, Amparo; Aliño, Salvador Francisco

    2015-01-01

    Lung transplant patients present important variability in immunosuppressant blood concentrations during the first months after transplantation. Pharmacogenetics could explain part of this interindividual variability. We evaluated SNPs in genes that have previously shown correlations in other kinds of solid organ transplantation, namely ABCB1 and CYP3A5 genes with tacrolimus (Tac) and ABCC2, UGT1A9 and SLCO1B1 genes with mycophenolic acid (MPA), during the first six months after lung transplantation (51 patients). The genotype was correlated to the trough blood drug concentrations corrected for dose and body weight (C0/Dc). The ABCB1 variant in rs1045642 was associated with significantly higher Tac concentration, at six months post-transplantation (CT vs. CC). In the MPA analysis, CT patients in ABCC2 rs3740066 presented significantly lower blood concentrations than CC or TT, three months after transplantation. Other tendencies, confirming previously expected results, were found associated with the rest of studied SNPs. An interesting trend was recorded for the incidence of acute rejection according to NOD2/CARD15 rs2066844 (CT: 27.9%; CC: 12.5%). Relevant SNPs related to Tac and MPA in other solid organ transplants also seem to be related to the efficacy and safety of treatment in the complex setting of lung transplantation. PMID:26307985

  4. Prediction of CYP3A4 enzyme activity using haplotype tag SNPs in African Americans

    PubMed Central

    Perera, MA; Thirumaran, RK; Cox, NJ; Hanauer, S; Das, S; Brimer-Cline, C; Lamba, V; Schuetz, EG; Ratain, MJ; Di Rienzo, A

    2009-01-01

    The CYP3A locus encodes hepatic enzymes that metabolize many clinically used drugs. However, there is marked interindividual variability in enzyme expression and clearance of drugs metabolized by these enzymes. We utilized comparative genomics and computational prediction of transcriptional factor binding sites to evaluate regions within CYP3A that were most likely to contribute to this variation. We then used a haplotype tagging single-nucleotide polymorphisms (htSNPs) approach to evaluate the entire locus with the fewest number of maximally informative SNPs. We investigated the association between these htSNPs and in vivo CYP3A enzyme activity using a single-point IV midazolam clearance assay. We found associations between the midazolam phenotype and age, diagnosis of hypertension and one htSNP (141689) located upstream of CYP3A4. 141689 lies near the xenobiotic responsive enhancer module (XREM) regulatory region of CYP3A4. Cell-based studies show increased transcriptional activation with the minor allele at 141689, in agreement with the in vivo association study findings. This study marks the first systematic evaluation of coding and noncoding variation that may contribute to CYP3A phenotypic variability. PMID:18825162

  5. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean

  6. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  7. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...

  8. Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...

  9. Rank and Order: Evaluating the Performance of SNPs for Individual Assignment in a Non-Model Organism

    PubMed Central

    Storer, Caroline G.; Pascal, Carita E.; Roberts, Steven B.; Templin, William D.; Seeb, Lisa W.; Seeb, James E.

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: FST, informativeness (In), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from FST, In, and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  10. Rank and order: evaluating the performance of SNPs for individual assignment in a non-model organism.

    PubMed

    Storer, Caroline G; Pascal, Carita E; Roberts, Steven B; Templin, William D; Seeb, Lisa W; Seeb, James E

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: F(ST), informativeness (I(n)), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from F(ST), I(n), and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

  11. Rigid porous filter

    DOEpatents

    Chiang, Ta-Kuan (Morgantown, WV); Straub, Douglas L. (Morgantown, WV); Dennis, Richard A. (Morgantown, WV)

    2000-01-01

    The present invention involves a porous rigid filter including a plurality of concentric filtration elements having internal flow passages and forming external flow passages there between. The present invention also involves a pressure vessel containing the filter for the removal of particulates from high pressure particulate containing gases, and further involves a method for using the filter to remove such particulates. The present filter has the advantage of requiring fewer filter elements due to the high surface area-to-volume ratio provided by the filter, requires a reduced pressure vessel size, and exhibits enhanced mechanical design properties, improved cleaning properties, configuration options, modularity and ease of fabrication.

  12. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, H.S.; Thompson, R.C.; Hubbard, C.W.; Perkins, R.W.

    1997-03-25

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, where after the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant. 5 figs.

  13. Filter type gas sampler with filter consolidation

    DOEpatents

    Miley, Harry S.; Thompson, Robert C.; Hubbard, Charles W.; Perkins, Richard W.

    1997-01-01

    Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, whereafter the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant.

  14. Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans

    PubMed Central

    Du, Mengmeng; Auer, Paul L.; Jiao, Shuo; Haessler, Jeffrey; Altshuler, David; Boerwinkle, Eric; Carlson, Christopher S.; Carty, Cara L.; Chen, Yii-Der Ida; Curtis, Keith; Franceschini, Nora; Hsu, Li; Jackson, Rebecca; Lange, Leslie A.; Lettre, Guillaume; Monda, Keri L.; Nickerson, Deborah A.; Reiner, Alex P.; Rich, Stephen S.; Rosse, Stephanie A.; Rotter, Jerome I.; Willer, Cristen J.; Wilson, James G.; North, Kari; Kooperberg, Charles; Heard-Costa, Nancy; Peters, Ulrike

    2014-01-01

    Adult body height is a quantitative trait for which genome-wide association studies (GWAS) have identified numerous loci, primarily in European populations. These loci, comprising common variants, explain <10% of the phenotypic variance in height. We searched for novel associations between height and common (minor allele frequency, MAF ≥5%) or infrequent (0.5% < MAF < 5%) variants across the exome in African Americans. Using a reference panel of 1692 African Americans and 471 Europeans from the National Heart, Lung, and Blood Institute's (NHLBI) Exome Sequencing Project (ESP), we imputed whole-exome sequence data into 13 719 African Americans with existing array-based GWAS data (discovery). Variants achieving a height-association threshold of P < 5E−06 in the imputed dataset were followed up in an independent sample of 1989 African Americans with whole-exome sequence data (replication). We used P < 2.5E−07 (=0.05/196 779 variants) to define statistically significant associations in meta-analyses combining the discovery and replication sets (N = 15 708). We discovered and replicated three independent loci for association: 5p13.3/C5orf22/rs17410035 (MAF = 0.10, β = 0.64 cm, P = 8.3E−08), 13q14.2/SPRYD7/rs114089985 (MAF = 0.03, β = 1.46 cm, P = 4.8E−10) and 17q23.3/GH2/rs2006123 (MAF = 0.30; β = 0.47 cm; P = 4.7E−09). Conditional analyses suggested 5p13.3 (C5orf22/rs17410035) and 13q14.2 (SPRYD7/rs114089985) may harbor novel height alleles independent of previous GWAS-identified variants (r2 with GWAS loci <0.01); whereas 17q23.3/GH2/rs2006123 was correlated with GWAS-identified variants in European and African populations. Notably, 13q14.2/rs114089985 is infrequent in African Americans (MAF = 3%), extremely rare in European Americans (MAF = 0.03%), and monomorphic in Asian populations, suggesting it may be an African-American-specific height allele. Our findings demonstrate that whole-exome imputation of sequence variants can identify low-frequency variants and discover novel variants in non-European populations. PMID:25027330

  15. Accuracy of genomic predictions for feed efficiency traits of beef cattle using 50K and imputed HD genotypes.

    PubMed

    Lu, D; Akanno, E C; Crowley, J J; Schenkel, F; Li, H; De Pauw, M; Moore, S S; Wang, Z; Li, C; Stothard, P; Plastow, G; Miller, S P; Basarab, J A

    2016-04-01

    The accuracy of genomic predictions can be used to assess the utility of dense marker genotypes for genetic improvement of beef efficiency traits. This study was designed to test the impact of genomic distance between training and validation populations, training population size, statistical methods, and density of genetic markers on prediction accuracy for feed efficiency traits in multibreed and crossbred beef cattle. A total of 6,794 beef cattle data collated from various projects and research herds across Canada were used. Illumina BovineSNP50 (50K) and imputed Axiom Genome-Wide BOS 1 Array (HD) genotypes were available for all animals. The traits studied were DMI, ADG, and residual feed intake (RFI). Four validation groups of 150 animals each, including Angus (AN), Charolais (CH), Angus-Hereford crosses (ANHH), and a Charolais-based composite (TX) were created by considering the genomic distance between pairs of individuals in the validation groups. Each validation group had 7 corresponding training groups of increasing sizes ( = 1,000, 1,999, 2,999, 3,999, 4,999, 5,998, and 6,644), which also represent increasing average genomic distance between pairs of individuals in the training and validations groups. Prediction of genomic estimated breeding values (GEBV) was performed using genomic best linear unbiased prediction (GBLUP) and Bayesian method C (BayesC). The accuracy of genomic predictions was defined as the Pearson's correlation between adjusted phenotype and GEBV (), unless otherwise stated. Using 50K genotypes, the highest average achieved in purebreds (AN, CH) was 0.41 for DMI, 0.34 for ADG, and 0.35 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.21 for ADG, and 0.25 for RFI. Similarly, when imputed HD genotypes were applied in purebreds (AN, CH), the highest average was 0.14 for DMI, 0.15 for ADG, and 0.14 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.22 for ADG, and 0.24 for RFI. The of GBLUP predictions were greatly reduced with increasing genomic average distance compared to those from BayesC predictions. The results indicate that 50K genotypes, used with BayesC, are more effective for predicting GEBV in purebred cattle. Imputed HD genotypes found utility when dealing with composites and crossbreds. Formulation of a fairly large training set for genomic predictions in beef cattle should consider the genomic distance between the training and target populations. PMID:27135994

  16. Co-regulated Transcripts Associated to Cooperating eSNPs Define Bi-fan Motifs in Human Gene Networks

    PubMed Central

    Kreimer, Anat; Pe'er, Itsik

    2014-01-01

    Associations between the level of single transcripts and single corresponding genetic variants, expression single nucleotide polymorphisms (eSNPs), have been extensively studied and reported. However, most expression traits are complex, involving the cooperative action of multiple SNPs at different loci affecting multiple genes. Finding these cooperating eSNPs by exhaustive search has proven to be statistically challenging. In this paper we utilized availability of sequencing data with transcriptional profiles in the same cohorts to identify two kinds of usual suspects: eSNPs that alter coding sequences or eSNPs within the span of transcription factors (TFs). We utilize a computational framework for considering triplets, each comprised of a SNP and two associated genes. We examine pairs of triplets with such cooperating source eSNPs that are both associated with the same pair of target genes. We characterize such quartets through their genomic, topological and functional properties. We establish that this regulatory structure of cooperating quartets is frequent in real data, but is rarely observed in permutations. eSNP sources are mostly located on different chromosomes and away from their targets. In the majority of quartets, SNPs affect the expression of the two gene targets independently of one another, suggesting a mutually independent rather than a directionally dependent effect. Furthermore, the directions in which the minor allele count of the SNP affects gene expression within quartets are consistent, so that the two source eSNPs either both have the same effect on the target genes or both affect one gene in the opposite direction to the other. Same-effect eSNPs are observed more often than expected by chance. Cooperating quartets reported here in a human system might correspond to bi-fans, a known network motif of four nodes previously described in model organisms. Overall, our analysis offers insights regarding the fine motif structure of human regulatory networks. PMID:25210734

  17. Extended active optical lattice filters: filter synthesis.

    PubMed

    Dabkowski, Mieczyslaw; El Nagdi, Amr; Hunt, Louis R; Liu, Ke; Macfarlane, Duncan L; Ramakrishna, Viswanath

    2010-04-01

    In this paper, we study the synthesis of asymptotically stable filters from a unit cell of a two-dimensional tunable lattice filter architecture consisting of four four-port couplers and four waveguides containing semiconductor optical amplifiers. Upper bounds on the number of gains that will produce a filter with a priori prescribed poles, for a specific system, are obtained. We also provide sufficient conditions on the reflection-type coefficients, characterizing each four-port coupler, which ensure that real-valued gains, taking values in [0,1], exist so that the filter is asymptotically stable. Finally, we motivate the notion of a transmission zero of a filter and discuss the possibility of simultaneously placing both poles and transmission zeros for the unit cell. PMID:20360832

  18. HEPA Filter Vulnerability Assessment

    SciTech Connect

    GUSTAVSON, R.D.

    2000-05-11

    This assessment of High Efficiency Particulate Air (HEPA) filter vulnerability was requested by the USDOE Office of River Protection (ORP) to satisfy a DOE-HQ directive to evaluate the effect of filter degradation on the facility authorization basis assumptions. Within the scope of this assessment are ventilation system HEPA filters that are classified as Safety-Class (SC) or Safety-Significant (SS) components that perform an accident mitigation function. The objective of the assessment is to verify whether HEPA filters that perform a safety function during an accident are likely to perform as intended to limit release of hazardous or radioactive materials, considering factors that could degrade the filters. Filter degradation factors considered include aging, wetting of filters, exposure to high temperature, exposure to corrosive or reactive chemicals, and exposure to radiation. Screening and evaluation criteria were developed by a site-wide group of HVAC engineers and HEPA filter experts from published empirical data. For River Protection Project (RPP) filters, the only degradation factor that exceeded the screening threshold was for filter aging. Subsequent evaluation of the effect of filter aging on the filter strength was conducted, and the results were compared with required performance to meet the conditions assumed in the RPP Authorization Basis (AB). It was found that the reduction in filter strength due to aging does not affect the filter performance requirements as specified in the AB. A portion of the HEPA filter vulnerability assessment is being conducted by the ORP and is not part of the scope of this study. The ORP is conducting an assessment of the existing policies and programs relating to maintenance, testing, and change-out of HEPA filters used for SC/SS service. This document presents the results of a HEPA filter vulnerability assessment conducted for the River protection project as requested by the DOE Office of River Protection.

  19. Cordierite silicon nitride filters

    SciTech Connect

    Sawyer, J.; Buchan, B. ); Duiven, R.; Berger, M. ); Cleveland, J.; Ferri, J. )

    1992-02-01

    The objective of this project was to develop a silicon nitride based crossflow filter. This report summarizes the findings and results of the project. The project was phased with Phase I consisting of filter material development and crossflow filter design. Phase II involved filter manufacturing, filter testing under simulated conditions and reporting the results. In Phase I, Cordierite Silicon Nitride (CSN) was developed and tested for permeability and strength. Target values for each of these parameters were established early in the program. The values were met by the material development effort in Phase I. The crossflow filter design effort proceeded by developing a macroscopic design based on required surface area and estimated stresses. Then the thermal and pressure stresses were estimated using finite element analysis. In Phase II of this program, the filter manufacturing technique was developed, and the manufactured filters were tested. The technique developed involved press-bonding extruded tiles to form a filter, producing a monolithic filter after sintering. Filters manufactured using this technique were tested at Acurex and at the Westinghouse Science and Technology Center. The filters did not delaminate during testing and operated and high collection efficiency and good cleanability. Further development in areas of sintering and filter design is recommended.

  20. A set of EST-SNPs for map saturation and cultivar identification in melon

    PubMed Central

    Deleu, Wim; Esteras, Cristina; Roig, Cristina; González-To, Mireia; Fernández-Silva, Iria; Gonzalez-Ibeas, Daniel; Blanca, José; Aranda, Miguel A; Arús, Pere; Nuez, Fernando; Monforte, Antonio J; Picó, Maria Belén; Garcia-Mas, Jordi

    2009-01-01

    Background There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs. Results EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) × 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars. Conclusion This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 × 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon accessions as efficiently as with SSR markers, and these markers may also be useful for cultivar identification in Occidental melon varieties. PMID:19604363

  1. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references

    PubMed Central

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-01-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case–control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5–98.8% obtained with the original HIBAG references to 95.2–99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets. PMID:25707395

  2. Novel Backup Filter Device for Candle Filters

    SciTech Connect

    Bishop, B.; Goldsmith, R.; Dunham, G.; Henderson, A.

    2002-09-18

    The currently preferred means of particulate removal from process or combustion gas generated by advanced coal-based power production processes is filtration with candle filters. However, candle filters have not shown the requisite reliability to be commercially viable for hot gas clean up for either integrated gasifier combined cycle (IGCC) or pressurized fluid bed combustion (PFBC) processes. Even a single candle failure can lead to unacceptable ash breakthrough, which can result in (a) damage to highly sensitive and expensive downstream equipment, (b) unacceptably low system on-stream factor, and (c) unplanned outages. The U.S. Department of Energy (DOE) has recognized the need to have fail-safe devices installed within or downstream from candle filters. In addition to CeraMem, DOE has contracted with Siemens-Westinghouse, the Energy & Environmental Research Center (EERC) at the University of North Dakota, and the Southern Research Institute (SRI) to develop novel fail-safe devices. Siemens-Westinghouse is evaluating honeycomb-based filter devices on the clean-side of the candle filter that can operate up to 870 C. The EERC is developing a highly porous ceramic disk with a sticky yet temperature-stable coating that will trap dust in the event of filter failure. SRI is developing the Full-Flow Mechanical Safeguard Device that provides a positive seal for the candle filter. Operation of the SRI device is triggered by the higher-than-normal gas flow from a broken candle. The CeraMem approach is similar to that of Siemens-Westinghouse and involves the development of honeycomb-based filters that operate on the clean-side of a candle filter. The overall objective of this project is to fabricate and test silicon carbide-based honeycomb failsafe filters for protection of downstream equipment in advanced coal conversion processes. The fail-safe filter, installed directly downstream of a candle filter, should have the capability for stopping essentially all particulate bypassing a broken or leaking candle while having a low enough pressure drop to allow the candle to be backpulse-regenerated. Forward-flow pressure drop should increase by no more than 20% because of incorporation of the fail-safe filter.

  3. HEPA filter monitoring program

    NASA Astrophysics Data System (ADS)

    Kirchner, K. N.; Johnson, C. M.; Aiken, W. F.; Lucerna, J. J.; Barnett, R. L.; Jensen, R. T.

    1986-07-01

    The testing and replacement of HEPA filters, widely used in the nuclear industry to purify process air, are costly and labor-intensive. Current methods of testing filter performance, such as differential pressure measurement and scanning air monitoring, allow determination of overall filter performance but preclude detection of incipient filter failure such as small holes in the filters. Using current technology, a continual in-situ monitoring system was designed which provides three major improvements over current methods of filter testing and replacement. The improvements include: cost savings by reducing the number of intact filters which are currently being replaced unnecessarily; more accurate and quantitative measurement of filter performance; and reduced personnel exposure to a radioactive environment by automatically performing most testing operations.

  4. Backward multiple imputation estimation of the conditional lifetime expectancy function with application to censored human longevity data.

    PubMed

    Kong, Jing; Klein, Barbara E K; Klein, Ronald; Wahba, Grace

    2015-09-29

    The conditional lifetime expectancy function (LEF) is the expected lifetime of a subject given survival past a certain time point and the values of a set of explanatory variables. This function is attractive to researchers because it summarizes the entire residual life distribution and has an easy interpretation compared with the popularly used hazard function. In this paper, we propose a general framework of backward multiple imputation for estimating the conditional LEF and the variance of the estimator in the right-censoring setting. Simulation studies are conducted to investigate the empirical properties of the proposed estimator and the corresponding variance estimator. We demonstrate the method on the Beaver Dam Eye Study data, where the expected human lifetime is modeled with smoothing-spline ANOVA given the covariates information including sex, lifestyle factors, and disease variables. PMID:26371300

  5. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    PubMed Central

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  6. Genome of The Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels.

    PubMed

    van Leeuwen, Elisabeth M; Karssen, Lennart C; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J; Huffman, Jennifer E; White, Charles C; Feitosa, Mary F; Bartz, Traci M; Manichaikul, Ani; Joshi, Peter K; Peloso, Gina M; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J; Milaneschi, Yuri; Penninx, Brenda W J H; Francioli, Laurent C; Menelaou, Androniki; Pulit, Sara L; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A; Franco, Oscar H; Mateo Leach, Irene; Beekman, Marian; de Craen, Anton J M; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J; Porteous, David J; Sattar, Naveed; Packard, Chris J; Buckley, Brendan M; Brody, Jennifer A; Bis, Joshua C; Rotter, Jerome I; Mychaleckyj, Josyf C; Campbell, Harry; Duan, Qing; Lange, Leslie A; Wilson, James F; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F; Rich, Stephen S; Psaty, Bruce M; Borecki, Ingrid B; Kearney, Patricia M; Stott, David J; Adrienne Cupples, L; Jukema, J Wouter; van der Harst, Pim; Sijbrands, Eric J; Hottenga, Jouke-Jan; Uitterlinden, Andre G; Swertz, Morris A; van Ommen, Gert-Jan B; de Bakker, Paul I W; Eline Slagboom, P; Boomsma, Dorret I; Wijmenga, Cisca; van Duijn, Cornelia M

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of The Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10(-4)), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  7. Backward multiple imputation estimation of the conditional lifetime expectancy function with application to censored human longevity data

    PubMed Central

    Kong, Jing; Klein, Barbara E. K.; Klein, Ronald; Wahba, Grace

    2015-01-01

    The conditional lifetime expectancy function (LEF) is the expected lifetime of a subject given survival past a certain time point and the values of a set of explanatory variables. This function is attractive to researchers because it summarizes the entire residual life distribution and has an easy interpretation compared with the popularly used hazard function. In this paper, we propose a general framework of backward multiple imputation for estimating the conditional LEF and the variance of the estimator in the right-censoring setting. Simulation studies are conducted to investigate the empirical properties of the proposed estimator and the corresponding variance estimator. We demonstrate the method on the Beaver Dam Eye Study data, where the expected human lifetime is modeled with smoothing-spline ANOVA given the covariates information including sex, lifestyle factors, and disease variables. PMID:26371300

  8. MST Filterability Tests

    SciTech Connect

    Poirier, M. R.; Burket, P. R.; Duignan, M. R.

    2015-03-12

    The Savannah River Site (SRS) is currently treating radioactive liquid waste with the Actinide Removal Process (ARP) and the Modular Caustic Side Solvent Extraction Unit (MCU). The low filter flux through the ARP has limited the rate at which radioactive liquid waste can be treated. Recent filter flux has averaged approximately 5 gallons per minute (gpm). Salt Batch 6 has had a lower processing rate and required frequent filter cleaning. Savannah River Remediation (SRR) has a desire to understand the causes of the low filter flux and to increase ARP/MCU throughput. In addition, at the time the testing started, SRR was assessing the impact of replacing the 0.1 micron filter with a 0.5 micron filter. This report describes testing of MST filterability to investigate the impact of filter pore size and MST particle size on filter flux and testing of filter enhancers to attempt to increase filter flux. The authors constructed a laboratory-scale crossflow filter apparatus with two crossflow filters operating in parallel. One filter was a 0.1 micron Mott sintered SS filter and the other was a 0.5 micron Mott sintered SS filter. The authors also constructed a dead-end filtration apparatus to conduct screening tests with potential filter aids and body feeds, referred to as filter enhancers. The original baseline for ARP was 5.6 M sodium salt solution with a free hydroxide concentration of approximately 1.7 M.3 ARP has been operating with a sodium concentration of approximately 6.4 M and a free hydroxide concentration of approximately 2.5 M. SRNL conducted tests varying the concentration of sodium and free hydroxide to determine whether those changes had a significant effect on filter flux. The feed slurries for the MST filterability tests were composed of simple salts (NaOH, NaNO2, and NaNO3) and MST (0.2 – 4.8 g/L). The feed slurry for the filter enhancer tests contained simulated salt batch 6 supernate, MST, and filter enhancers.

  9. Survey of digital filtering

    NASA Technical Reports Server (NTRS)

    Nagle, H. T., Jr.

    1972-01-01

    A three part survey is made of the state-of-the-art in digital filtering. Part one presents background material including sampled data transformations and the discrete Fourier transform. Part two, digital filter theory, gives an in-depth coverage of filter categories, transfer function synthesis, quantization and other nonlinear errors, filter structures and computer aided design. Part three presents hardware mechanization techniques. Implementations by general purpose, mini-, and special-purpose computers are presented.

  10. An active filter primer

    NASA Astrophysics Data System (ADS)

    Delagrange, A. D.

    1983-02-01

    In the past few years active filters have become very popular. This report explains why, and explains what active filters can (and can't) do. It gives the basics of active filter design, both theory and practice. It can be used as a handbook to build working active filters of the most common types. This report is an update of the original issued in 1979.

  11. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

    PubMed Central

    van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

    2015-01-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  12. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer

    PubMed Central

    Al-Tassan, Nada A.; Whiffin, Nicola; Hosking, Fay J.; Palles, Claire; Farrington, Susan M.; Dobbins, Sara E.; Harris, Rebecca; Gorman, Maggie; Tenesa, Albert; Meyer, Brian F.; Wakil, Salma M.; Kinnersley, Ben; Campbell, Harry; Martin, Lynn; Smith, Christopher G.; Idziaszczyk, Shelley; Barclay, Ella; Maughan, Timothy S.; Kaplan, Richard; Kerr, Rachel; Kerr, David; Buchannan, Daniel D.; Ko Win, Aung; Hopper, John; Jenkins, Mark; Lindor, Noralane M.; Newcomb, Polly A.; Gallinger, Steve; Conti, David; Schumacher, Fred; Casey, Graham; Dunlop, Malcolm G.; Tomlinson, Ian P.; Cheadle, Jeremy P.; Houlston, Richard S.

    2015-01-01

    Genome-wide association studies (GWAS) of colorectal cancer (CRC) have identified 23 susceptibility loci thus far. Analyses of previously conducted GWAS indicate additional risk loci are yet to be discovered. To identify novel CRC susceptibility loci, we conducted a new GWAS and performed a meta-analysis with five published GWAS (totalling 7,577 cases and 9,979 controls of European ancestry), imputing genotypes utilising the 1000 Genomes Project. The combined analysis identified new, significant associations with CRC at 1p36.2 marked by rs72647484 (minor allele frequency [MAF] = 0.09) near CDC42 and WNT4 (P = 1.21 × 10−8, odds ratio [OR] = 1.21 ) and at 16q24.1 marked by rs16941835 (MAF = 0.21, P = 5.06 × 10−8; OR = 1.15) within the long non-coding RNA (lncRNA) RP11-58A18.1 and ~500 kb from the nearest coding gene FOXL1. Additionally we identified a promising association at 10p13 with rs10904849 intronic to CUBN (MAF = 0.32, P = 7.01 × 10-8; OR = 1.14). These findings provide further insights into the genetic and biological basis of inherited genetic susceptibility to CRC. Additionally, our analysis further demonstrates that imputation can be used to exploit GWAS data to identify novel disease-causing variants. PMID:25990418

  13. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    NASA Astrophysics Data System (ADS)

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  14. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    PubMed

    Horikoshi, Momoko; M?gi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; H?gg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtim?ki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Mller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ?0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169

  15. Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The dissection of complex traits of economic importance for the pig industry requires the availability of a significant number of genetic markers, such as SNPs. This study was conducted in order to discover thousands of porcine SNPs using next generation sequencing technologies and use those SNPs, a...

  16. Filter service system

    DOEpatents

    Sellers, Cheryl L. (Peoria, IL); Nordyke, Daniel S. (Arlington Heights, IL); Crandell, Richard A. (Morton, IL); Tomlins, Gregory (Peoria, IL); Fei, Dong (Peoria, IL); Panov, Alexander (Dunlap, IL); Lane, William H. (Chillicothe, IL); Habeger, Craig F. (Chillicothe, IL)

    2008-12-09

    According to an exemplary embodiment of the present disclosure, a system for removing matter from a filtering device includes a gas pressurization assembly. An element of the assembly is removably attachable to a first orifice of the filtering device. The system also includes a vacuum source fluidly connected to a second orifice of the filtering device.

  17. Practical Active Capacitor Filter

    NASA Technical Reports Server (NTRS)

    Shuler, Robert L., Jr. (Inventor)

    2005-01-01

    A method and apparatus is described that filters an electrical signal. The filtering uses a capacitor multiplier circuit where the capacitor multiplier circuit uses at least one amplifier circuit and at least one capacitor. A filtered electrical signal results from a direct connection from an output of the at least one amplifier circuit.

  18. Nonlinear Attitude Filtering Methods

    NASA Technical Reports Server (NTRS)

    Markley, F. Landis; Crassidis, John L.; Cheng, Yang

    2005-01-01

    This paper provides a survey of modern nonlinear filtering methods for attitude estimation. Early applications relied mostly on the extended Kalman filter for attitude estimation. Since these applications, several new approaches have been developed that have proven to be superior to the extended Kalman filter. Several of these approaches maintain the basic structure of the extended Kalman filter, but employ various modifications in order to provide better convergence or improve other performance characteristics. Examples of such approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters that propagate and update a discrete set of sigma points rather than using linearized equations for the mean and covariance are also reviewed. A two-step approach is discussed with a first-step state that linearizes the measurement model and an iterative second step to recover the desired attitude states. These approaches are all based on the Gaussian assumption that the probability density function is adequately specified by its mean and covariance. Other approaches that do not require this assumption are reviewed, including particle filters and a Bayesian filter based on a non-Gaussian, finite-parameter probability density function on SO(3). Finally, the predictive filter, nonlinear observers and adaptive approaches are shown. The strengths and weaknesses of the various approaches are discussed.

  19. HEPA filter encapsulation

    DOEpatents

    Gates-Anderson, Dianne D.; Kidd, Scott D.; Bowers, John S.; Attebery, Ronald W.

    2003-01-01

    A low viscosity resin is delivered into a spent HEPA filter or other waste. The resin is introduced into the filter or other waste using a vacuum to assist in the mass transfer of the resin through the filter media or other waste.

  20. Rethinking Stability of Silver Sulfide Nanoparticles (Ag2S-NPs) in the Aquatic Environment: Photoinduced Transformation of Ag2S-NPs in the Presence of Fe(III).

    PubMed

    Li, Lingxiangyu; Wang, Yawei; Liu, Qian; Jiang, Guibin

    2016-01-01

    The stability of engineered nanomaterials in a natural aquatic environment has drawn much attention over the past few years. Silver sulfide nanoparticles (Ag2S-NPs) are generally assumed to be stable in a natural environment as a result of their physicochemical property; however, it may vary depending upon environmental conditions. Here, we investigated whether and how the environmentally relevant factors including light irradiation, solution pH, inorganic salts, dissolved organic matter (DOM), and dissolved oxygen (DO) individually and in combination influenced the stability of Ag2S-NPs in an aquatic environment. We presented for the first time that transformation of Ag2S-NPs can indeed occur in the aqueous system with an environmentally relevant concentration of Fe(3+) under simulated solar irradiation and natural sunlight within a short time (96 h), along with significant changes in morphology and dissolution. The photoinduced transformation of Ag2S-NPs in the presence of Fe(3+) can be dramatically influenced by solution pH, Ca(2+)/Na(+), Cl(-)/SO4(2-), DOM, and DO. Moreover, Ag2S-NP dissolution increased within 28 h, followed rapid decline in the next 68 h, which may be a result of the reconstitution of small Ag2S-NPs. Taken together, this work is of importance to comprehensively evaluate the stability of Ag2S-NPs in an aquatic environment, improving our understanding of their potential risks to human and environmental health. PMID:26606372

  1. Predicting functional regulatory SNPs in the human antimicrobial peptide genes DEFB1 and CAMP in tuberculosis and HIV/AIDS.

    PubMed

    Flores Saiffe Farías, Adolfo; Jaime Herrera López, Enrique; Moreno Vázquez, Cristopher Jorge; Li, Wentian; Prado Montes de Oca, Ernesto

    2015-12-01

    Single nucleotide polymorphisms (SNPs) in transcription factor binding sites (TFBSs) within gene promoter region or enhancers can modify the transcription rate of genes related to complex diseases. These SNPs can be called regulatory SNPs (rSNPs). Data compiled from recent projects, such as the 1000 Genomes Project and ENCODE, has revealed essential information used to perform in silico prediction of the molecular and biological repercussions of SNPs within TFBS. However, most of these studies are very limited, as they only analyze SNPs in coding regions or when applied to promoters, and do not integrate essential biological data like TFBSs, expression profiles, pathway analysis, homotypic redundancy (number of TFBSs for the same TF in a region), chromatin accessibility and others, which could lead to a more accurate prediction. Our aim was to integrate different data in a biologically coherent method to analyze the proximal promoter regions of two antimicrobial peptide genes, DEFB1 and CAMP, that are associated with tuberculosis (TB) and HIV/AIDS. We predicted SNPs within the promoter regions that are more likely to interact with transcription factors (TFs). We also assessed the impact of homotypic redundancy using a novel approach called the homotypic redundancy weight factor (HWF). Our results identified 10 SNPs, which putatively modify the binding affinity of 24 TFs previously identified as related to TB and HIV/AIDS expression profiles (e.g. KLF5, CEBPA and NFKB1 for TB; FOXP2, BRCA1, CEBPB, CREB1, EBF1 and ZNF354C for HIV/AIDS; and RUNX2, HIF1A, JUN/AP-1, NR4A2, EGR1 for both diseases). Validating with the OregAnno database and cell-specific functional/non functional SNPs from additional 13 genes, our algorithm performed 53% sensitivity and 84.6% specificity to detect functional rSNPs using the DNAseI-HUP database. We are proposing our algorithm as a novel in silico method to detect true functional rSNPs in antimicrobial peptide genes. With further improvement, this novel method could be applied to other promoters in order to design probes and to discover new drug targets for complex diseases. PMID:26447748

  2. LincSNP: a database of linking disease-associated SNPs to human large intergenic non-coding RNAs

    PubMed Central

    2014-01-01

    Background Genome-wide association studies (GWAS) have successfully identified a large number of single nucleotide polymorphisms (SNPs) that are associated with a wide range of human diseases. However, many of these disease-associated SNPs are located in non-coding regions and have remained largely unexplained. Recent findings indicate that disease-associated SNPs in human large intergenic non-coding RNA (lincRNA) may lead to susceptibility to diseases through their effects on lincRNA expression. There is, therefore, a need to specifically record these SNPs and annotate them as potential candidates for disease. Description We have built LincSNP, an integrated database, to identify and annotate disease-associated SNPs in human lincRNAs. The current release of LincSNP contains approximately 140,000 disease-associated SNPs (or linkage disequilibrium SNPs), which can be mapped to around 5,000 human lincRNAs, together with their comprehensive functional annotations. The database also contains annotated, experimentally supported SNP-lincRNA-disease associations and disease-associated lincRNAs. It provides flexible search options for data extraction and searches can be performed by disease/phenotype name, SNP ID, lincRNA name and chromosome region. In addition, we provide users with a link to download all the data from LincSNP and have developed a web interface for the submission of novel identified SNP-lincRNA-disease associations. Conclusions The LincSNP database aims to integrate disease-associated SNPs and human lincRNAs, which will be an important resource for the investigation of the functions and mechanisms of lincRNAs in human disease. The database is available at http://bioinfo.hrbmu.edu.cn/LincSNP. PMID:24885522

  3. A real-time PCR genotyping assay to detect FAD2A SNPs in peanuts (Arachis hypogaea L.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The high oleic (C18:1) phenotype in peanuts has been previously demonstrated to result from a homozygous recessive genotype (ol1ol1ol2ol2) in two homeologous fatty acid desaturase genes (FAD2A and FAD2B) with two key SNPs. These mutant SNPs, specifically G448A in FAD2A and 442insA in FAD2B, signifi...

  4. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs

    PubMed Central

    Wu, XianMing; Hurst, Laurence D.

    2016-01-01

    Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3–69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20–45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3′ non-“AGgt” splice site. We suggest the concept of the “fragile” exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density. PMID:26545919

  5. Regenerative particulate filter development

    NASA Technical Reports Server (NTRS)

    Descamp, V. A.; Boex, M. W.; Hussey, M. W.; Larson, T. P.

    1972-01-01

    Development, design, and fabrication of a prototype filter regeneration unit for regenerating clean fluid particle filter elements by using a backflush/jet impingement technique are reported. Development tests were also conducted on a vortex particle separator designed for use in zero gravity environment. A maintainable filter was designed, fabricated and tested that allows filter element replacement without any leakage or spillage of system fluid. Also described are spacecraft fluid system design and filter maintenance techniques with respect to inflight maintenance for the space shuttle and space station.

  6. SNP mining in Crassostrea gigas EST data: transferability to four other Crassostrea species, phylogenetic inferences and outlier SNPs under selection.

    PubMed

    Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

    2014-01-01

    Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies. PMID:25238392

  7. SNP Mining in Crassostrea gigas EST Data: Transferability to Four Other Crassostrea Species, Phylogenetic Inferences and Outlier SNPs under Selection

    PubMed Central

    Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

    2014-01-01

    Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies. PMID:25238392

  8. HLA-A SNPs and amino acid variants are associated with nasopharyngeal carcinoma in Malaysian Chinese.

    PubMed

    Chin, Yoon-Ming; Mushiroda, Taisei; Takahashi, Atsushi; Kubo, Michiaki; Krishnan, Gopala; Yap, Lee-Fah; Teo, Soo-Hwang; Lim, Paul Vey-Hong; Yap, Yoke-Yeow; Pua, Kin-Choo; Kamatani, Naoyuki; Nakamura, Yusuke; Sam, Choon-Kook; Khoo, Alan Soo-Beng; Ng, Ching-Ching

    2015-02-01

    Nasopharyngeal carcinoma (NPC) arises from the mucosal epithelium of the nasopharynx and is constantly associated with Epstein-Barr virus type 1 (EBV-1) infection. We carried out a genome-wide association study (GWAS) of 575,247 autosomal SNPs in 184 NPC patients and 236 healthy controls of Malaysian Chinese ethnicity. Potential association signals were replicated in a separate cohort of 260 NPC patients and 245 healthy controls. We confirmed the association of HLA-A to NPC with the strongest signal detected in rs3869062 (p = 1.73 × 10(-9)). HLA-A fine mapping revealed associations in the amino acid variants as well as its corresponding SNPs in the antigen peptide binding groove (p(HLA-A-aa-site-99) = 3.79 × 10(-8), p(rs1136697) = 3.79 × 10(-8)) and T-cell receptor binding site (p(HLA-A-aa-site-145) = 1.41 × 10(-4), p(rs1059520) = 1.41 × 10(-4)) of the HLA-A. We also detected strong association signals in the 5'-UTR region with predicted active promoter states (p(rs41545520) = 7.91 × 10(-8)). SNP rs41545520 is a potential binding site for repressor ATF3, with increased binding affinity for rs41545520-G correlated with reduced HLA-A expression. Multivariate logistic regression diminished the effects of HLA-A amino acid variants and SNPs, indicating a correlation with the effects of HLA-A*11:01, and to a lesser extent HLA-A*02:07. We report the strong genetic influence of HLA-A on NPC susceptibility in the Malaysian Chinese. PMID:24947555

  9. SNPs detected in the yak MC4R gene and their association with growth traits.

    PubMed

    Cai, X; Mipam, T D; Zhao, F F; Sun, L

    2015-07-01

    MC4R (melanocortin 4 receptor) is expressed in the appetite-regulating areas of the brain and takes part in leptin signaling pathways. Sequencing of the coding region of the MC4R gene for 354 yaks identified the following five single nucleotide polymorphisms (SNPs): SNP1 (273C>T), SNP2 (321 G>T), SNP3 (864 C>A), SNP4 (1069G>C) and SNP5 (1206 G>C). SNP1, SNP2 and SNP3 were synonymous mutations, whereas SNP4 and SNP5 were missense mutations resulting in amino acid substitutions (V286L and R331S). Pairwise linkage disequilibrium (LD) analysis indicated that two pairs of SNPs, SNP2 and SNP5 (r(2)=0.81027) and SNP4 and SNP5 (r(2)=0.53816), exhibited higher degrees of LD. CC genotype of SNP4, CGACG and CTCCC haplotypes for all SNPs were associated with increased BW of animals that were 18 months old and with the average daily gain. The secondary structure and transmembrane region prediction of the yak MC4R protein suggested that SNP4 was correlated with influential changes in the seventh transmembrane domain of the MC4R protein and with the functional deterioration or even incapacitation of MC4R, which may contribute to the increased feed intake, BW and average daily gain of the yaks with CC genotypes. The data from this study suggested that 1069G>C SNP of the MC4R gene could be used in marker-assisted selection of growth traits in the Maiwa yak breed. PMID:25757688

  10. A joint association test for multiple SNPs in genetic case-control studies.

    PubMed

    Wang, Tao; Jacob, Howard; Ghosh, Soumitra; Wang, Xujing; Zeng, Zhao-Bang

    2009-02-01

    For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods. PMID:18770519

  11. Association study of FOXO3A SNPs and aging phenotypes in Danish oldest-old individuals.

    PubMed

    Soerensen, Mette; Nygaard, Marianne; Dato, Serena; Stevnsner, Tinna; Bohr, Vilhelm A; Christensen, Kaare; Christiansen, Lene

    2015-02-01

    FOXO3A variation has repeatedly been reported to associate with human longevity, yet only few studies have investigated whether FOXO3A variation also associates with aging-related traits. Here, we investigate the association of 15 FOXO3A tagging single nucleotide polymorphisms (SNPs) in 1088 oldest-old Danes (age 92-93) with 4 phenotypes known to predict their survival: cognitive function, hand grip strength, activity of daily living (ADL), and self-rated health. Based on previous studies in humans and foxo animal models, we also explore self-reported diabetes, cancer, cardiovascular disease, osteoporosis, and bone (femur/spine/hip/wrist) fracture. Gene-based testing revealed significant associations of FOXO3A variation with ADL (P = 0.044) and bone fracture (P = 0.006). The single-SNP statistics behind the gene-based analysis indicated increased ADL (decreased disability) and reduced bone fracture risk for carriers of the minor alleles of 8 and 10 SNPs, respectively. These positive directions of effects are in agreement with the positive effects on longevity previously reported for these SNPs. However, when correcting for the test of 9 phenotypes by Bonferroni correction, bone fracture showed borderline significance (P = 0.054), while ADL did not (P = 0.396). Although the single-SNP associations did not formally replicate in another study population of oldest-old Danes (n = 1279, age 94-100), the estimates were of similar direction of effect as observed in the Discovery sample. A pooled analysis of both study populations displayed similar or decreased sized P-values for most associations, hereby supporting the initial findings. Nevertheless, confirmation in additional study populations is needed. PMID:25470651

  12. Differences in allele frequencies of autosomal dominant hypercholesterolemia SNPs in the Malaysian population.

    PubMed

    Alex, Livy; Chahil, Jagdish Kaur; Lye, Say Hean; Bagali, Pramod; Ler, Lian Wee

    2012-06-01

    Hypercholesterolemia is caused by different interactions of lifestyle and genetic determinants. At the genetic level, it can be attributed to the interactions of multiple polymorphisms, or as in the example of familial hypercholesterolemia (FH), it can be the result of a single mutation. A large number of genetic markers, mostly single nucleotide polymorphisms (SNP) or mutations in three genes, implicated in autosomal dominant hypercholesterolemia (ADH), viz APOB (apolipoprotein B), LDLR (low density lipoprotein receptor) and PCSK9 (proprotein convertase subtilisin/kexin type-9), have been identified and characterized. However, such studies have been insufficiently undertaken specifically in Malaysia and Southeast Asia in general. The main objective of this study was to identify ADH variants, specifically ADH-causing mutations and hypercholesterolemia-associated polymorphisms in multiethnic Malaysian population. We aimed to evaluate published SNPs in ADH causing genes, in this population and to report any unusual trends. We examined a large number of selected SNPs from previous studies of APOB, LDLR, PCSK9 and other genes, in clinically diagnosed ADH patients (n=141) and healthy control subjects (n=111). Selection of SNPs was initiated by searching within genes reported to be associated with ADH from known databases. The important finding was 137 mono-allelic markers (44.1%) and 173 polymorphic markers (55.8%) in both subject groups. By comparing to publicly available data, out of the 137 mono-allelic markers, 23 markers showed significant differences in allele frequency among Malaysians, European Whites, Han Chinese, Yoruba and Gujarati Indians. Our data can serve as reference for others in related fields of study during the planning of their experiments. PMID:22534770

  13. Tag SNPs detect association of the CYP1B1 gene with primary open angle glaucoma

    PubMed Central

    Hewitt, Alex W.; Mackey, David A.; Mitchell, Paul; Craig, Jamie E.

    2010-01-01

    Purpose The cytochrome p450 family 1 subfamily B (CYP1B1) gene is a well known cause of autosomal recessive primary congenital glaucoma. It has also been postulated as a modifier of disease severity in primary open angle glaucoma (POAG), particularly in juvenile onset families. However, the role of common variation in the gene in relation to POAG has not been thoroughly explored. Methods Seven tag single nucleotide polymorphisms (SNPs), including two coding variants (L432V and N543S), were genotyped in 860 POAG cases and 898 examined normal controls. Each SNP and haplotype was assessed for association with disease. In addition, a subset of 396 severe cases and 452 elderly controls were analyzed separately. Results There was no association of any individual SNP in the full data set. Two SNPs (rs162562 and rs10916) were nominally associated under a dominant model in the severe cases (p<0.05). A common haplotype (AGCAGCC) was also found to be nominally associated in both the full data set (p=0.048, OR [95%CI]=0.83 [0.69–0.90]) and more significantly in the severe cases (p=0.004, OR [95%CI]=0.68 [0.52–0.89]) which survives correction for multiple testing. Conclusions Although no major effect of common variation at the CYP1B1 locus on POAG was found, there could be an effect of SNPs tagged by rs162562 and represented on the AGCAGCC haplotype. PMID:21139974

  14. Association between SNPs in genes involved in folate metabolism and preterm birth risk.

    PubMed

    Wang, B J; Liu, M J; Wang, Y; Dai, J R; Tao, J Y; Wang, S N; Zhong, N; Chen, Y

    2015-01-01

    We investigated the association between 12 single nucleotide polymorphisms (SNPs) in 11 genes involved in folate metabolic and preterm birth. A subset of SNPs selected from 11 genes/loci involved in the folic acid metabolism pathway were subjected to SNaPshot analysis in a case-control study. Twelve SNPs (CBS-C699T, DHFR-c594+59del19, GST01-C428T, MTHFD-G1958A, MTHFR-C677T, MTHFR-A1298C, MTR-A2756G, MTRR-A66G, NFE2L2-ins1+C11108T, RFC1-G80A, TCN2-C776G, and TYMS-1494del6) in 503 DNA samples were simultaneously tested, and included 315 preterm births and 188 controls. None of the 12 SNP genotype distributions related to the folic acid metabolism pathway showed a significant difference between preterm and term babies. The frequency of the compound mutation genotype of MTHFD-G1958A, MTR-A2756G and RFC1-G80A in preterm babies was 7.3%, which was significantly higher than the 2.7% in term babies. Seven babies carried the compound mutation genotype of MTHFD-G1958A, MTR-A2756G, and CBS-C699T, but this was not observed in term babies. The frequency of the combined wild-type genotype of MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1- G80A in preterm babies was 3.17%, which was significantly lower than the 7.4% in term babies. The 12 SNPs screened in this study were not independent risk factors of preterm birth. Compound mutation genotypes, including MTHFD-G1958A, MTR-A2756G, and RFC1- G80A and MTHFD-G1958A, MTR-A2756G, and CBS-C699T, may increase the risk of preterm birth. The combined wild-type genotype MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1-G80A may decrease the risk of preterm birth. PMID:25730024

  15. Compact planar microwave blocking filters

    NASA Technical Reports Server (NTRS)

    U-Yen, Kongpop (Inventor); Wollack, Edward J. (Inventor)

    2012-01-01

    A compact planar microwave blocking filter includes a dielectric substrate and a plurality of filter unit elements disposed on the substrate. The filter unit elements are interconnected in a symmetrical series cascade with filter unit elements being organized in the series based on physical size. In the filter, a first filter unit element of the plurality of filter unit elements includes a low impedance open-ended line configured to reduce the shunt capacitance of the filter.

  16. Filtering separators having filter cleaning apparatus

    SciTech Connect

    Margraf, A.

    1984-08-28

    This invention relates to filtering separators of the kind having a housing which is subdivided by a partition, provided with parallel rows of holes or slots, into a dust-laden gas space for receiving filter elements positioned in parallel rows and being impinged upon by dust-laden gas from the outside towards the inside, and a clean gas space. In addition, the housing is provided with a chamber for cleansing the filter element surfaces of a row by counterflow action while covering at the same time the partition holes or slots leading to the adjacent rows of filter elements. The chamber is arranged for the supply of compressed air to at least one injector arranged to feed compressed air and secondary air to the row of filter elements to be cleansed. The chamber is also reciprocatingly displaceable along the partition in periodic and intermittent manner. According to the invention, a surface of the chamber facing towards the partition covers at least two of the rows of holes or slots of the partition, and the chamber is closed upon itself with respect to the clean gas space, and is connected to a compressed air reservoir via a distributor pipe and a control valve. At least one of the rows of holes or slots of the partition and the respective row of filter elements in flow communication therewith are in flow communication with the discharge side of at least one injector acted upon with compressed air. At least one other row of the rows of holes or slots of the partition and the respective row of filter elements is in flow communication with the suction side of the injector.

  17. Development of a multiplex PCR system of 59 mitochondrial SNPs and genetic analysis in Chinese population.

    PubMed

    Nie, Yanchai; Zhang, Chen; Jiao, Haitao; Zhao, Ziqin; Zhou, Huaigu

    2014-07-01

    The analysis of SNPs located on the mitochondrial DNA can provide information on maternal genetics. In the present study, a set of 59 SNPs were detected simultaneously using three multiplex allele-specific PCR and subsequent CE. Allele-specific primers were designed with different sizes to allow for specifically amplified paired alleles in the same reaction. An allelic ladder based on reference alleles was also created to maintain high-quality analysis standard. Samples from 400 unrelated individuals (200 of Han population and 200 of Uyghur population, China) were successfully analyzed and assigned into 106 relevant haplotypes, resulting in a discrimination power of 98.5%. The haplotype diversity was 0.978 for Han and 0.972 for Uyghur, respectively. Pairwise comparison of haplotype frequency distributions showed significant difference across ethnicities. These results suggest that the 59-SNP PCR system is a reliable, rapid, and economical method for large-scale screening of mitochondrial DNA variation, adding a new aspect for forensic individual identification. PMID:24659556

  18. The genetics of human infertility by functional interrogation of SNPs in mice.

    PubMed

    Singh, Priti; Schimenti, John C

    2015-08-18

    Infertility is a prevalent health issue, affecting ∼15% of couples of childbearing age. Nearly one-half of idiopathic infertility cases are thought to have a genetic basis, but the underlying causes are largely unknown. Traditional methods for studying inheritance, such as genome-wide association studies and linkage analyses, have been confounded by the genetic and phenotypic complexity of reproductive processes. Here we describe an association- and linkage-free approach to identify segregating infertility alleles, in which CRISPR/Cas9 genome editing is used to model putatively deleterious nonsynonymous SNPs (nsSNPs) in the mouse orthologs of fertility genes. Mice bearing "humanized" alleles of four essential meiosis genes, each predicted to be deleterious by most of the commonly used algorithms for analyzing functional SNP consequences, were examined for fertility and reproductive defects. Only a Cdk2 allele mimicking SNP rs3087335, which alters an inhibitory WEE1 protein kinase phosphorylation site, caused infertility and revealed a novel function in regulating spermatogonial stem cell maintenance. Our data indicate that segregating infertility alleles exist in human populations. Furthermore, whereas computational prediction of SNP effects is useful for identifying candidate causal mutations for diverse diseases, this study underscores the need for in vivo functional evaluation of physiological consequences. This approach can revolutionize personalized reproductive genetics by establishing a permanent reference of benign vs. infertile alleles. PMID:26240362

  19. The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs.

    PubMed

    Shriver, Mark D; Kennedy, Giulia C; Parra, Esteban J; Lawson, Heather A; Sonpar, Vibhor; Huang, Jing; Akey, Joshua M; Jones, Keith W

    2004-05-01

    Understanding the nature of evolutionary relationships among persons and populations is important for the efficient application of genome science to biomedical research. We have analysed 8,525 autosomal single nucleotide polymorphisms (SNPs) in 84 individuals from four populations: African-American, European-American, Chinese and Japanese. Individual relationships were reconstructed using the allele sharing distance and the neighbour-joining tree making method. Trees show clear clustering according to population, with the root branching from the African-American clade. The African-American cluster is much less star-like than European-American and East Asian clusters, primarily because of admixture. Furthermore, on the East Asian branch, all ten Chinese individuals cluster together and all ten Japanese individuals cluster together. Using positional information, we demonstrate strong correlations between inter-marker distance and both locus-specific FST (the proportion of total variation due to differentiation) levels and branch lengths. Chromosomal maps of the distribution of locus-specific branch lengths were constructed by combining these data with other published SNP markers (total of 33,704 SNPs). These maps clearly illustrate a non-uniform distribution of human genetic substructure, an instructional and useful paradigm for education and research. PMID:15588487

  20. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs

    PubMed Central

    2013-01-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  1. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  2. A simple method using PyrosequencingTM to identify de novo SNPs in pooled DNA samples

    PubMed Central

    Lin, Yeong-Shin; Liu, Fu-Guo Robert; Wang, Tzi-Yuan; Pan, Cheng-Tsung; Chang, Wei-Ting; Li, Wen-Hsiung

    2011-01-01

    A practical way to reduce the cost of surveying single-nucleotide polymorphism (SNP) in a large number of individuals is to measure the allele frequencies in pooled DNA samples. PyrosequencingTM has been frequently used for this application because signals generated by this approach are proportional to the amount of DNA templates. The PyrosequencingTM pyrogram is determined by the dispensing order of dNTPs, which is usually designed based on the known SNPs to avoid asynchronistic extensions of heterozygous sequences. Therefore, utilizing the pyrogram signals to identify de novo SNPs in DNA pools has never been undertook. Here, in this study we developed an algorithm to address this issue. With the sequence and pyrogram of the wild-type allele known in advance, we could use the pyrogram obtained from the pooled DNA sample to predict the sequence of the unknown mutant allele (de novo SNP) and estimate its allele frequency. Both computational simulation and experimental PyrosequencingTM test results suggested that our method performs well. The web interface of our method is available at http://life.nctu.edu.tw/∼yslin/PSM/. PMID:21131285

  3. The genetics of human infertility by functional interrogation of SNPs in mice

    PubMed Central

    Singh, Priti; Schimenti, John C.

    2015-01-01

    Infertility is a prevalent health issue, affecting ∼15% of couples of childbearing age. Nearly one-half of idiopathic infertility cases are thought to have a genetic basis, but the underlying causes are largely unknown. Traditional methods for studying inheritance, such as genome-wide association studies and linkage analyses, have been confounded by the genetic and phenotypic complexity of reproductive processes. Here we describe an association- and linkage-free approach to identify segregating infertility alleles, in which CRISPR/Cas9 genome editing is used to model putatively deleterious nonsynonymous SNPs (nsSNPs) in the mouse orthologs of fertility genes. Mice bearing “humanized” alleles of four essential meiosis genes, each predicted to be deleterious by most of the commonly used algorithms for analyzing functional SNP consequences, were examined for fertility and reproductive defects. Only a Cdk2 allele mimicking SNP rs3087335, which alters an inhibitory WEE1 protein kinase phosphorylation site, caused infertility and revealed a novel function in regulating spermatogonial stem cell maintenance. Our data indicate that segregating infertility alleles exist in human populations. Furthermore, whereas computational prediction of SNP effects is useful for identifying candidate causal mutations for diverse diseases, this study underscores the need for in vivo functional evaluation of physiological consequences. This approach can revolutionize personalized reproductive genetics by establishing a permanent reference of benign vs. infertile alleles. PMID:26240362

  4. Genetic Diversity and Demographic History of Cajanus spp. Illustrated from Genome-Wide SNPs

    PubMed Central

    Saxena, Rachit K.; von Wettberg, Eric; Upadhyaya, Hari D.; Sanchez, Vanessa; Songok, Serah; Saxena, Kulbhushan; Kimurto, Paul; Varshney, Rajeev K.

    2014-01-01

    Understanding genetic structure of Cajanus spp. is essential for achieving genetic improvement by quantitative trait loci (QTL) mapping or association studies and use of selected markers through genomic assisted breeding and genomic selection. After developing a comprehensive set of 1,616 single nucleotide polymorphism (SNPs) and their conversion into cost effective KASPar assays for pigeonpea (Cajanus cajan), we studied levels of genetic variability both within and between diverse set of Cajanus lines including 56 breeding lines, 21 landraces and 107 accessions from 18 wild species. These results revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, 75.8% of successful SNP assays revealed polymorphism, and more than 95% of these assays could be successfully transferred to related wild species. To show regional patterns of variation, we used STRUCTURE and Analysis of Molecular Variance (AMOVA) to partition variance among hierarchical sets of landraces and wild species at either the continental scale or within India. STRUCTURE separated most of the domesticated germplasm from wild ecotypes, and separates Australian and Asian wild species as has been found previously. Among Indian regions and states within regions, we found 36% of the variation between regions, and 64% within landraces or wilds within states. The highest level of polymorphism in wild relatives and landraces was found in Madhya Pradesh and Andhra Pradesh provinces of India representing the centre of origin and domestication of pigeonpea respectively. PMID:24533111

  5. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

    PubMed

    Lee, S Hong; Ripke, Stephan; Neale, Benjamin M; Faraone, Stephen V; Purcell, Shaun M; Perlis, Roy H; Mowry, Bryan J; Thapar, Anita; Goddard, Michael E; Witte, John S; Absher, Devin; Agartz, Ingrid; Akil, Huda; Amin, Farooq; Andreassen, Ole A; Anjorin, Adebayo; Anney, Richard; Anttila, Verneri; Arking, Dan E; Asherson, Philip; Azevedo, Maria H; Backlund, Lena; Badner, Judith A; Bailey, Anthony J; Banaschewski, Tobias; Barchas, Jack D; Barnes, Michael R; Barrett, Thomas B; Bass, Nicholas; Battaglia, Agatino; Bauer, Michael; Bayés, Mònica; Bellivier, Frank; Bergen, Sarah E; Berrettini, Wade; Betancur, Catalina; Bettecken, Thomas; Biederman, Joseph; Binder, Elisabeth B; Black, Donald W; Blackwood, Douglas H R; Bloss, Cinnamon S; Boehnke, Michael; Boomsma, Dorret I; Breen, Gerome; Breuer, René; Bruggeman, Richard; Cormican, Paul; Buccola, Nancy G; Buitelaar, Jan K; Bunney, William E; Buxbaum, Joseph D; Byerley, William F; Byrne, Enda M; Caesar, Sian; Cahn, Wiepke; Cantor, Rita M; Casas, Miguel; Chakravarti, Aravinda; Chambert, Kimberly; Choudhury, Khalid; Cichon, Sven; Cloninger, C Robert; Collier, David A; Cook, Edwin H; Coon, Hilary; Cormand, Bru; Corvin, Aiden; Coryell, William H; Craig, David W; Craig, Ian W; Crosbie, Jennifer; Cuccaro, Michael L; Curtis, David; Czamara, Darina; Datta, Susmita; Dawson, Geraldine; Day, Richard; De Geus, Eco J; Degenhardt, Franziska; Djurovic, Srdjan; Donohoe, Gary J; Doyle, Alysa E; Duan, Jubao; Dudbridge, Frank; Duketis, Eftichia; Ebstein, Richard P; Edenberg, Howard J; Elia, Josephine; Ennis, Sean; Etain, Bruno; Fanous, Ayman; Farmer, Anne E; Ferrier, I Nicol; Flickinger, Matthew; Fombonne, Eric; Foroud, Tatiana; Frank, Josef; Franke, Barbara; Fraser, Christine; Freedman, Robert; Freimer, Nelson B; Freitag, Christine M; Friedl, Marion; Frisén, Louise; Gallagher, Louise; Gejman, Pablo V; Georgieva, Lyudmila; Gershon, Elliot S; Geschwind, Daniel H; Giegling, Ina; Gill, Michael; Gordon, Scott D; Gordon-Smith, Katherine; Green, Elaine K; Greenwood, Tiffany A; Grice, Dorothy E; Gross, Magdalena; Grozeva, Detelina; Guan, Weihua; Gurling, Hugh; De Haan, Lieuwe; Haines, Jonathan L; Hakonarson, Hakon; Hallmayer, Joachim; Hamilton, Steven P; Hamshere, Marian L; Hansen, Thomas F; Hartmann, Annette M; Hautzinger, Martin; Heath, Andrew C; Henders, Anjali K; Herms, Stefan; Hickie, Ian B; Hipolito, Maria; Hoefels, Susanne; Holmans, Peter A; Holsboer, Florian; Hoogendijk, Witte J; Hottenga, Jouke-Jan; Hultman, Christina M; Hus, Vanessa; Ingason, Andrés; Ising, Marcus; Jamain, Stéphane; Jones, Edward G; Jones, Ian; Jones, Lisa; Tzeng, Jung-Ying; Kähler, Anna K; Kahn, René S; Kandaswamy, Radhika; Keller, Matthew C; Kennedy, James L; Kenny, Elaine; Kent, Lindsey; Kim, Yunjung; Kirov, George K; Klauck, Sabine M; Klei, Lambertus; Knowles, James A; Kohli, Martin A; Koller, Daniel L; Konte, Bettina; Korszun, Ania; Krabbendam, Lydia; Krasucki, Robert; Kuntsi, Jonna; Kwan, Phoenix; Landén, Mikael; Långström, Niklas; Lathrop, Mark; Lawrence, Jacob; Lawson, William B; Leboyer, Marion; Ledbetter, David H; Lee, Phil H; Lencz, Todd; Lesch, Klaus-Peter; Levinson, Douglas F; Lewis, Cathryn M; Li, Jun; Lichtenstein, Paul; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Liu, Chunyu; Lohoff, Falk W; Loo, Sandra K; Lord, Catherine; Lowe, Jennifer K; Lucae, Susanne; MacIntyre, Donald J; Madden, Pamela A F; Maestrini, Elena; Magnusson, Patrik K E; Mahon, Pamela B; Maier, Wolfgang; Malhotra, Anil K; Mane, Shrikant M; Martin, Christa L; Martin, Nicholas G; Mattheisen, Manuel; Matthews, Keith; Mattingsdal, Morten; McCarroll, Steven A; McGhee, Kevin A; McGough, James J; McGrath, Patrick J; McGuffin, Peter; McInnis, Melvin G; McIntosh, Andrew; McKinney, Rebecca; McLean, Alan W; McMahon, Francis J; McMahon, William M; McQuillin, Andrew; Medeiros, Helena; Medland, Sarah E; Meier, Sandra; Melle, Ingrid; Meng, Fan; Meyer, Jobst; Middeldorp, Christel M; Middleton, Lefkos; Milanova, Vihra; Miranda, Ana; Monaco, Anthony P; Montgomery, Grant W; Moran, Jennifer L; Moreno-De-Luca, Daniel; Morken, Gunnar; Morris, Derek W; Morrow, Eric M; Moskvina, Valentina; Muglia, Pierandrea; Mühleisen, Thomas W; Muir, Walter J; Müller-Myhsok, Bertram; Murtha, Michael; Myers, Richard M; Myin-Germeys, Inez; Neale, Michael C; Nelson, Stan F; Nievergelt, Caroline M; Nikolov, Ivan; Nimgaonkar, Vishwajit; Nolen, Willem A; Nöthen, Markus M; Nurnberger, John I; Nwulia, Evaristus A; Nyholt, Dale R; O'Dushlaine, Colm; Oades, Robert D; Olincy, Ann; Oliveira, Guiomar; Olsen, Line; Ophoff, Roel A; Osby, Urban; Owen, Michael J; Palotie, Aarno; Parr, Jeremy R; Paterson, Andrew D; Pato, Carlos N; Pato, Michele T; Penninx, Brenda W; Pergadia, Michele L; Pericak-Vance, Margaret A; Pickard, Benjamin S; Pimm, Jonathan; Piven, Joseph; Posthuma, Danielle; Potash, James B; Poustka, Fritz; Propping, Peter; Puri, Vinay; Quested, Digby J; Quinn, Emma M; Ramos-Quiroga, Josep Antoni; Rasmussen, Henrik B; Raychaudhuri, Soumya; Rehnström, Karola; Reif, Andreas; Ribasés, Marta; Rice, John P; Rietschel, Marcella; Roeder, Kathryn; Roeyers, Herbert; Rossin, Lizzy; Rothenberger, Aribert; Rouleau, Guy; Ruderfer, Douglas; Rujescu, Dan; Sanders, Alan R; Sanders, Stephan J; Santangelo, Susan L; Sergeant, Joseph A; Schachar, Russell; Schalling, Martin; Schatzberg, Alan F; Scheftner, William A; Schellenberg, Gerard D; Scherer, Stephen W; Schork, Nicholas J; Schulze, Thomas G; Schumacher, Johannes; Schwarz, Markus; Scolnick, Edward; Scott, Laura J; Shi, Jianxin; Shilling, Paul D; Shyn, Stanley I; Silverman, Jeremy M; Slager, Susan L; Smalley, Susan L; Smit, Johannes H; Smith, Erin N; Sonuga-Barke, Edmund J S; St Clair, David; State, Matthew; Steffens, Michael; Steinhausen, Hans-Christoph; Strauss, John S; Strohmaier, Jana; Stroup, T Scott; Sutcliffe, James S; Szatmari, Peter; Szelinger, Szabocls; Thirumalai, Srinivasa; Thompson, Robert C; Todorov, Alexandre A; Tozzi, Federica; Treutlein, Jens; Uhr, Manfred; van den Oord, Edwin J C G; Van Grootheest, Gerard; Van Os, Jim; Vicente, Astrid M; Vieland, Veronica J; Vincent, John B; Visscher, Peter M; Walsh, Christopher A; Wassink, Thomas H; Watson, Stanley J; Weissman, Myrna M; Werge, Thomas; Wienker, Thomas F; Wijsman, Ellen M; Willemsen, Gonneke; Williams, Nigel; Willsey, A Jeremy; Witt, Stephanie H; Xu, Wei; Young, Allan H; Yu, Timothy W; Zammit, Stanley; Zandi, Peter P; Zhang, Peng; Zitman, Frans G; Zöllner, Sebastian; Devlin, Bernie; Kelsoe, John R; Sklar, Pamela; Daly, Mark J; O'Donovan, Michael C; Craddock, Nicholas; Sullivan, Patrick F; Smoller, Jordan W; Kendler, Kenneth S; Wray, Naomi R

    2013-09-01

    Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

  6. Transposon Insertions, Structural Variations, and SNPs Contribute to the Evolution of the Melon Genome.

    PubMed

    Sanseverino, Walter; Hénaff, Elizabeth; Vives, Cristina; Pinosio, Sara; Burgos-Paz, William; Morgante, Michele; Ramos-Onsins, Sebastián E; Garcia-Mas, Jordi; Casacuberta, Josep Maria

    2015-10-01

    The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution. PMID:26174143

  7. Genotyping three SNPs affecting warfarin drug response by isothermal real-time HAD assays

    PubMed Central

    Li, Ying; Jortani, Saeed A.; Ramey-Hartung, Bronwyn; Hudson, Elizabeth; Lemieux, Bertrand; Kong, Huimin

    2010-01-01

    Background The response to the anticoagulant drug warfarin is greatly affected by genetic polymorphisms in the VKORC1 and CYP2C9 genes. Genotyping these polymorphisms has been shown to be important in reducing the time of the trial and error process for finding the maintenance dose of warfarin thus reducing the risk of adverse effects of the drug. Method We developed a real-time isothermal DNA amplification system for genotyping three single nucleotide polymorphisms (SNPs) that influence warfarin response. For each SNP, real-time isothermal Helicase Dependent Amplification (HDA) reactions were performed to amplify a DNA fragment containing the SNP. Amplicons were detected by fluorescently labeled allele specific probes during real-time HDA amplification. Results Fifty clinical samples were analyzed by the HDA-based method, generating a total of 150 results. Of these, 148 were consistent between the HDA-based assays and a reference method. The two samples with unresolved HDA-based test results were repeated and found to be consistent with the reference method. Conclusion The HDA-based assays demonstrated a clinically acceptable performance for genotyping the VKORC1 -1639G>A SNP and two SNPs (430C>T and 1075A>C) for the CYP2C9 enzyme (CYP2C9*2 and CYP2C9*3), all of which are relevant in warfarin pharmacogenentics. PMID:20854800

  8. Generic Kalman Filter Software

    NASA Technical Reports Server (NTRS)

    Lisano, Michael E., II; Crues, Edwin Z.

    2005-01-01

    The Generic Kalman Filter (GKF) software provides a standard basis for the development of application-specific Kalman-filter programs. Historically, Kalman filters have been implemented by customized programs that must be written, coded, and debugged anew for each unique application, then tested and tuned with simulated or actual measurement data. Total development times for typical Kalman-filter application programs have ranged from months to weeks. The GKF software can simplify the development process and reduce the development time by eliminating the need to re-create the fundamental implementation of the Kalman filter for each new application. The GKF software is written in the ANSI C programming language. It contains a generic Kalman-filter-development directory that, in turn, contains a code for a generic Kalman filter function; more specifically, it contains a generically designed and generically coded implementation of linear, linearized, and extended Kalman filtering algorithms, including algorithms for state- and covariance-update and -propagation functions. The mathematical theory that underlies the algorithms is well known and has been reported extensively in the open technical literature. Also contained in the directory are a header file that defines generic Kalman-filter data structures and prototype functions and template versions of application-specific subfunction and calling navigation/estimation routine code and headers. Once the user has provided a calling routine and the required application-specific subfunctions, the application-specific Kalman-filter software can be compiled and executed immediately. During execution, the generic Kalman-filter function is called from a higher-level navigation or estimation routine that preprocesses measurement data and post-processes output data. The generic Kalman-filter function uses the aforementioned data structures and five implementation- specific subfunctions, which have been developed by the user on the basis of the aforementioned templates. The GKF software can be used to develop many different types of unfactorized Kalman filters. A developer can choose to implement either a linearized or an extended Kalman filter algorithm, without having to modify the GKF software. Control dynamics can be taken into account or neglected in the filter-dynamics model. Filter programs developed by use of the GKF software can be made to propagate equations of motion for linear or nonlinear dynamical systems that are deterministic or stochastic. In addition, filter programs can be made to operate in user-selectable "covariance analysis" and "propagation-only" modes that are useful in design and development stages.

  9. Concentric Split Flow Filter

    NASA Technical Reports Server (NTRS)

    Stapleton, Thomas J. (Inventor)

    2015-01-01

    A concentric split flow filter may be configured to remove odor and/or bacteria from pumped air used to collect urine and fecal waste products. For instance, filter may be designed to effectively fill the volume that was previously considered wasted surrounding the transport tube of a waste management system. The concentric split flow filter may be configured to split the air flow, with substantially half of the air flow to be treated traveling through a first bed of filter media and substantially the other half of the air flow to be treated traveling through the second bed of filter media. This split flow design reduces the air velocity by 50%. In this way, the pressure drop of filter may be reduced by as much as a factor of 4 as compare to the conventional design.

  10. Optically tunable optical filter

    NASA Astrophysics Data System (ADS)

    James, Robert T. B.; Wah, Christopher; Iizuka, Keigo; Shimotahira, Hiroshi

    1995-12-01

    We experimentally demonstrate an optically tunable optical filter that uses photorefractive barium titanate. With our filter we implement a spectrum analyzer at 632.8 nm with a resolution of 1.2 nm. We simulate a wavelength-division multiplexing system by separating two semiconductor laser diodes, at 1560 nm and 1578 nm, with the same filter. The filter has a bandwidth of 6.9 nm. We also use the same filter to take 2.5-nm-wide slices out of a 20-nm-wide superluminescent diode centered at 840 nm. As a result, we experimentally demonstrate a phenomenal tuning range from 632.8 to 1578 nm with a single filtering device.

  11. Contactor/filter improvements

    DOEpatents

    Stelman, D.

    1988-06-30

    A contactor/filter arrangement for removing particulate contaminants from a gaseous stream is described. The filter includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. A cover screen isolates the filter element from contact with the moving granular bed. In one embodiment, the granular material is comprised of porous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses. 6 figs.

  12. A Comprehensive In Silico Analysis of the Functional and Structural Impact of Nonsynonymous SNPs in the ABCA1 Transporter Gene

    PubMed Central

    Marín-Martín, Francisco R.; Soler-Rivas, Cristina; Martín-Hernández, Roberto; Rodriguez-Casado, Arantxa

    2014-01-01

    Disease phenotypes and defects in function can be traced to nonsynonymous single nucleotide polymorphisms (nsSNPs), which are important indicators of action sites and effective potential therapeutic approaches. Identification of deleterious nsSNPs is crucial to characterize the genetic basis of diseases, assess individual susceptibility to disease, determinate molecular and therapeutic targets, and predict clinical phenotypes. In this study using PolyPhen2 and MutPred in silico algorithms, we analyzed the genetic variations that can alter the expression and function of the ABCA1 gene that causes the allelic disorders familial hypoalphalipoproteinemia and Tangier disease. Predictions were validated with published results from in vitro, in vivo, and human studies. Out of a total of 233 nsSNPs, 80 (34.33%) were found deleterious by both methods. Among these 80 deleterious nsSNPs found, 29 (12.44%) rare variants resulted highly deleterious with a probability >0.8. We have observed that mostly variants with verified functional effect in experimental studies are correctly predicted as damage variants by MutPred and PolyPhen2 tools. Still, the controversial results of experimental approaches correspond to nsSNPs predicted as neutral by both methods, or contradictory predictions are obtained for them. A total of seventeen nsSNPs were predicted as deleterious by PolyPhen2, which resulted neutral by MutPred. Otherwise, forty two nsSNPs were predicted as deleterious by MutPred, which resulted neutral by PolyPhen2. PMID:25215231

  13. Detection of SNPs in the TBC1D1 gene and their association with carcass traits in chicken.

    PubMed

    Wang, Yan; Xu, Heng-Yong; Gilbert, Elizabeth R; Peng, Xing; Zhao, Xiao-Ling; Liu, Yi-Ping; Zhu, Qing

    2014-09-01

    TBC1D1 plays an important role in numerous fundamental physiological processes including muscle metabolism, regulation of whole body energy homeostasis and lipid metabolism. The objective of the present study was to identify single nucleotide polymorphisms (SNPs) in chicken TBC1D1 using 128 Erlang mountainous chickens and to determine if these SNPs are associated with carcass traits. The approach consisted of sequencing TBC1D1 using a panel of DNA from different individuals, revealing twenty-two SNPs. Among these SNPs, two polymorphisms (g.69307744C>T and g.69307608T>G) of block 1, four polymorphisms (g.69322320C>T, g.69322314G>A, g.69317290A>G and g.69317276T>C) of block 2 and four polymorphisms of block 3 (g.69349746G>A, g.69349736C>G, g.69349727C>T and g.69349694C>T) exhibited a high degree of linkage disequilibrium in all test populations. An association analysis was performed between the twenty-two SNPs and seven performance traits. SNPs g.69307744C>T, g.69340192G>A and g.69355665T>C were demonstrated to have a strong effect on liveweight (BW), carcass weight (CW), semi-eviscerated weight (SEW) and eviscerated weight (EW) and g.69340070C>T polymorphism was related to BW, SEW and BMW in chicken populations. However, for the other SNPs, there were no significant correlations between different genotypes and carcass traits. Meanwhile, haplotype CT-TG of block 1 and combined genotype AG-TT-AC-CT of block 3 were significantly associated with BW, CW, SEW and EW. Overall, our results provide evidence that polymorphisms in TBC1D1 are associated with carcass traits and would be a useful candidate gene in selection programs for improving carcass traits. PMID:24979340

  14. Hybrid Filter Membrane

    NASA Technical Reports Server (NTRS)

    Laicer, Castro; Rasimick, Brian; Green, Zachary

    2012-01-01

    Cabin environmental control is an important issue for a successful Moon mission. Due to the unique environment of the Moon, lunar dust control is one of the main problems that significantly diminishes the air quality inside spacecraft cabins. Therefore, this innovation was motivated by NASA s need to minimize the negative health impact that air-suspended lunar dust particles have on astronauts in spacecraft cabins. It is based on fabrication of a hybrid filter comprising nanofiber nonwoven layers coated on porous polymer membranes with uniform cylindrical pores. This design results in a high-efficiency gas particulate filter with low pressure drop and the ability to be easily regenerated to restore filtration performance. A hybrid filter was developed consisting of a porous membrane with uniform, micron-sized, cylindrical pore channels coated with a thin nanofiber layer. Compared to conventional filter media such as a high-efficiency particulate air (HEPA) filter, this filter is designed to provide high particle efficiency, low pressure drop, and the ability to be regenerated. These membranes have well-defined micron-sized pores and can be used independently as air filters with discreet particle size cut-off, or coated with nanofiber layers for filtration of ultrafine nanoscale particles. The filter consists of a thin design intended to facilitate filter regeneration by localized air pulsing. The two main features of this invention are the concept of combining a micro-engineered straight-pore membrane with nanofibers. The micro-engineered straight pore membrane can be prepared with extremely high precision. Because the resulting membrane pores are straight and not tortuous like those found in conventional filters, the pressure drop across the filter is significantly reduced. The nanofiber layer is applied as a very thin coating to enhance filtration efficiency for fine nanoscale particles. Additionally, the thin nanofiber coating is designed to promote capture of dust particles on the filter surface and to facilitate dust removal with pulse or back airflow.

  15. Filter vapor trap

    DOEpatents

    Guon, Jerold

    1976-04-13

    A sintered filter trap is adapted for insertion in a gas stream of sodium vapor to condense and deposit sodium thereon. The filter is heated and operated above the melting temperature of sodium, resulting in a more efficient means to remove sodium particulates from the effluent inert gas emanating from the surface of a liquid sodium pool. Preferably the filter leaves are precoated with a natrophobic coating such as tetracosane.

  16. Nanofiber Filters Eliminate Contaminants

    NASA Technical Reports Server (NTRS)

    2009-01-01

    With support from Phase I and II SBIR funding from Johnson Space Center, Argonide Corporation of Sanford, Florida tested and developed its proprietary nanofiber water filter media. Capable of removing more than 99.99 percent of dangerous particles like bacteria, viruses, and parasites, the media was incorporated into the company's commercial NanoCeram water filter, an inductee into the Space Foundation's Space Technology Hall of Fame. In addition to its drinking water filters, Argonide now produces large-scale nanofiber filters used as part of the reverse osmosis process for industrial water purification.

  17. Linear phase compressive filter

    DOEpatents

    McEwan, T.E.

    1995-06-06

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line. 2 figs.

  18. Linear phase compressive filter

    DOEpatents

    McEwan, Thomas E.

    1995-01-01

    A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line.

  19. Birefringent filter design

    NASA Technical Reports Server (NTRS)

    Bair, Clayton H. (Inventor)

    1991-01-01

    A birefringent filter is provided for tuning the wavelength of a broad band emission laser. The filter comprises thin plates of a birefringent material having thicknesses which are non-unity, integral multiples of the difference between the thicknesses of the two thinnest plates. The resulting wavelength selectivity is substantially equivalent to the wavelength selectivity of a conventional filter which has a thinnest plate having a thickness equal to this thickness difference. The present invention obtains an acceptable tuning of the wavelength while avoiding a decrease in optical quality associated with conventional filters wherein the respective plate thicknesses are integral multiples of the thinnest plate.

  20. Independent task Fourier filters

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John

    2001-11-01

    Since the early 1960s, a major part of optical computing systems has been Fourier pattern recognition, which takes advantage of high speed filter changes to enable powerful nonlinear discrimination in `real time.' Because filter has a task quite independent of the tasks of the other filters, they can be applied and evaluated in parallel or, in a simple approach I describe, in sequence very rapidly. Thus I use the name ITFF (independent task Fourier filter). These filters can also break very complex discrimination tasks into easily handled parts, so the wonderful space invariance properties of Fourier filtering need not be sacrificed to achieve high discrimination and good generalizability even for ultracomplex discrimination problems. The training procedure proceeds sequentially, as the task for a given filter is defined a posteriori by declaring it to be the discrimination of particular members of set A from all members of set B with sufficient margin. That is, we set the threshold to achieve the desired margin and note the A members discriminated by that threshold. Discriminating those A members from all members of B becomes the task of that filter. Those A members are then removed from the set A, so no other filter will be asked to perform that already accomplished task.

  1. Cis-regulatory variations: A study of SNPs around genes showing cis-linkage in segregating mouse populations

    PubMed Central

    GuhaThakurta, Debraj; Xie, Tao; Anand, Manish; Edwards, Stephen W; Li, Guoya; Wang, Susanna S; Schadt, Eric E

    2006-01-01

    Background Changes in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene expression quantitative trait loci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites. Results We demonstrate that the cis-eQTL genes (CEGs) have a significantly higher frequency of cis-SNPs compared to non-CEGs (when both sets are taken from the non-IBD regions, i.e. regions not identical by descent). Most CEGs having cis-SNPs do not contain these SNPs in the phylogenetically conserved regions. In those CEGs that contain cis-SNPs in the phylogenetically conserved regions, enrichment of cis-SNPs occurs both within and outside of the conserved sequences. A higher fraction of CEGs are also seen to harbor cis-SNP that affect predicted transcription factor binding sites, a likely consequence of the higher cis-SNPs density in these genes. Conclusion This present study provides the first genome-wide investigation of the putative cis-regulatory variations in a large set of genes whose levels of expression give rise to cis-linkage in segregating mammalian populations. Our results provide insights into the challenges that exist in identifying polymorphisms regulating gene expression using bioinformatic sequence analysis approaches. The data provided herein should benefit future investigations in this area. PMID:16978413

  2. In silico analysis of consequences of non-synonymous SNPs of Slc11a2 gene in Indian bovines.

    PubMed

    Patel, Shreya M; Koringa, Prakash G; Reddy, Bhaskar B; Nathani, Neelam M; Joshi, Chaitanya G

    2015-09-01

    The aim of our study was to analyze the consequences of non-synonymous SNPs in Slc11a2 gene using bioinformatic tools. There is a current need of efficient bioinformatic tools for in-depth analysis of data generated by the next generation sequencing technologies. SNPs are known to play an imperative role in understanding the genetic basis of many genetic diseases. Slc11a2 is one of the major metal transporter families in mammals and plays a critical role in host defenses. In this study, we performed a comprehensive analysis of the impact of all non-synonymous SNPs in this gene using multiple tools like SIFT, PROVEAN, I-Mutant and PANTHER. Among the total 124 SNPs obtained from amplicon sequencing of Slc11a2 gene by Ion Torrent PGM involving 10 individuals of Gir cattle and Murrah buffalo each, we found 22 non-synonymous. Comparing the prediction of these 4 methods, 5 nsSNPs (G369R, Y374C, A377V, Q385H and N492S) were identified as deleterious. In addition, while tested out for polar interactions with other amino acids in the protein, from above 5, Y374C, Q385H and N492S showed a change in interaction pattern and further confirmed by an increase in total energy after energy minimizations in case of mutant protein compared to the native. PMID:26484229

  3. SNPs in the aryl hydrocarbon receptor-interacting protein gene associated with sporadic non-functioning pituitary adenoma

    PubMed Central

    HU, YESHUAI; YANG, JUN; CHANG, YONGKAI; MA, SHUNCHANG; QI, JIANFA

    2016-01-01

    Mutations in the aryl hydrocarbon receptor-interacting protein (AIP) gene have previously been associated with a predisposition to pituitary adenomas. However, to the best of our knowledge, mutations in AIP that relate specifically to sporadic non-functioning pituitary adenomas (NFPAs) have yet to be reported. Therefore, the present study aimed to identify single nucleotide polymorphisms (SNPs) in the AIP gene that may be associated with NFPAs. Peripheral blood samples and the entire coding sequence of the AIP gene from 56 patients with NFPAs and 56 controls were analyzed in triplicate. Of the 56 patients with NFPAs, 9 patients (16.1%) were identified as harboring five different SNPs, although no germline mutations in the AIP gene were detected in any of the patients. Three different SNPs (7051C>T, 8012G>C and 8020G>C) were identified in exons 4 and 6 in 3 different patients (each in 1 patient). Two different SNPs (7318C>A and 7886A>G) were identified in exons 5 and 6, respectively, in 6 different patients (each in 3 patients). No SNPs or germline mutations in the AIP gene were identified in the controls. The results of the present study suggested that mutations in the AIP gene might not have an important role in the tumorigenesis of NFPAs. However, further studies are required in order to investigate potential molecular and genetic mechanisms that may underlie the involvement of AIP in NFPA. PMID:26998050

  4. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  5. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea.

    PubMed

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  6. Mapping the genetic variation of regional brain volumes as explained by all common SNPs from the ADNI study.

    PubMed

    Bryant, Christopher; Giovanello, Kelly S; Ibrahim, Joseph G; Chang, Jing; Shen, Dinggang; Peterson, Bradley S; Zhu, Hongtu

    2013-01-01

    Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p = 0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p = 0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer's disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs. PMID:24015190

  7. Identification of Novel Single Nucleotide Polymorphisms (SNPs) in Deer (Odocoileus spp.) Using the BovineSNP50 BeadChip

    PubMed Central

    Haynes, Gwilym D.; Latch, Emily K.

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the FST-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1−30.1 million years before present). PMID:22590559

  8. Discovery of URAT1 SNPs and association between serum uric acid levels and URAT1

    PubMed Central

    Cho, Sung Kweon; Kim, Soriul; Chung, Jae-Yong; Jee, Sun Ha

    2015-01-01

    Objectives Human urate transporter 1 (URAT1) is a member of the organic anion transporter family (SLC22A12) that primarily regulates the renal tubular reabsorption of uric acid. This casecontrol study was designed to analyse whether hURAT1 might also be a candidate gene for hyperuricaemia or hypouricaemia. Setting We recruited 68 healthy volunteers and divided them into two groups: a normal uric acid group and a hyperuricaemia group. We analysed the sequence of the URAT1 gene and found five significant single nucleotide polymorphisms (SNPs). We then selected 900 male subjects from the 262?200 enrolled in the Korean Cancer Prevention Study-II (KCPS-II) cohort for further genetic analysis. Participants DNA samples from 36 individuals with normal uric acid (<4.5?mg/dL) and 32 individuals with hyperuricaemia (>8.5?mg/dL) were sequenced. Five significant SNPs (rs7929627, rs75786299, rs3825017, rs11602903 and rs121907892) were identified. We then chose 900 subjects from the KCPS-II cohort consisting of 450 subjects with normal uric acid (UA <4.1?mg/dL) and 450 subjects with hyperuricaemia (UA >8.7?mg/dL). The groups were matched by age, body mass index, metabolic syndrome and use of anti-hypertensive medication. Primary outcome measures We compared the OR of the incidence of hyperuricaemia by URAT1 genotype. Results The strongest association with hyperuricaemia was observed for rs75786299 (IVS3+11A/G) with an OR of 32.05. rs7929627 (IVS7-103A/G) and rs3825017 (N82N) showed an association with hyperuricaemia with ORs of 2.56 and 2.29, respectively. rs11602903 (788A/T) and rs121907892 (W258X) were negatively correlated with hyperuricaemia with ORs of 0.350 and 0.447, respectively. Individuals carrying the GATAG haplotype (n=32)a relatively common variant consisting of rs7929627, rs75786299 and rs3825017showed the highest risk for hyperuricaemia with an OR of 92.23 (p=9.5510?3). Conclusions These results indicate that five newly described SNPs in the hURAT1 gene are significantly associated with uric acid level (4-2008-0318 and 4-2011-0277). PMID:26603249

  9. A Genome-Wide Investigation of SNPs and CNVs in Schizophrenia

    PubMed Central

    Maia, Jessica; Feng, Sheng; Heinzen, Erin L.; Shianna, Kevin V.; Yoon, Woohyun; Kasperavičiūtė, Dalia; Gennarelli, Massimo; Strittmatter, Warren J.; Bonvicini, Cristian; Rossi, Giuseppe; Jayathilake, Karu; Cola, Philip A.; McEvoy, Joseph P.; Keefe, Richard S. E.; Fisher, Elizabeth M. C.; St. Jean, Pamela L.; Giegling, Ina; Hartmann, Annette M.; Möller, Hans-Jürgen; Ruppert, Andreas; Fraser, Gillian; Crombie, Caroline; Middleton, Lefkos T.; St. Clair, David; Roses, Allen D.; Muglia, Pierandrea; Francks, Clyde; Rujescu, Dan; Meltzer, Herbert Y.; Goldstein, David B.

    2009-01-01

    We report a genome-wide assessment of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) in schizophrenia. We investigated SNPs using 871 patients and 863 controls, following up the top hits in four independent cohorts comprising 1,460 patients and 12,995 controls, all of European origin. We found no genome-wide significant associations, nor could we provide support for any previously reported candidate gene or genome-wide associations. We went on to examine CNVs using a subset of 1,013 cases and 1,084 controls of European ancestry, and a further set of 60 cases and 64 controls of African ancestry. We found that eight cases and zero controls carried deletions greater than 2 Mb, of which two, at 8p22 and 16p13.11-p12.4, are newly reported here. A further evaluation of 1,378 controls identified no deletions greater than 2 Mb, suggesting a high prior probability of disease involvement when such deletions are observed in cases. We also provide further evidence for some smaller, previously reported, schizophrenia-associated CNVs, such as those in NRXN1 and APBA2. We could not provide strong support for the hypothesis that schizophrenia patients have a significantly greater “load” of large (>100 kb), rare CNVs, nor could we find common CNVs that associate with schizophrenia. Finally, we did not provide support for the suggestion that schizophrenia-associated CNVs may preferentially disrupt genes in neurodevelopmental pathways. Collectively, these analyses provide the first integrated study of SNPs and CNVs in schizophrenia and support the emerging view that rare deleterious variants may be more important in schizophrenia predisposition than common polymorphisms. While our analyses do not suggest that implicated CNVs impinge on particular key pathways, we do support the contribution of specific genomic regions in schizophrenia, presumably due to recurrent mutation. On balance, these data suggest that very few schizophrenia patients share identical genomic causation, potentially complicating efforts to personalize treatment regimens. PMID:19197363

  10. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, T.E.; Alvin, M.A.; Bruck, G.J.; Smeltzer, E.E.

    1999-03-02

    A filter holder and gasket assembly are disclosed for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut. 9 figs.

  11. Filter holder and gasket assembly for candle or tube filters

    DOEpatents

    Lippert, Thomas Edwin; Alvin, Mary Anne; Bruck, Gerald Joseph; Smeltzer, Eugene E.

    1999-03-02

    A filter holder and gasket assembly for holding a candle filter element within a hot gas cleanup system pressure vessel. The filter holder and gasket assembly includes a filter housing, an annular spacer ring securely attached within the filter housing, a gasket sock, a top gasket, a middle gasket and a cast nut.

  12. Durability of ceramic filters

    SciTech Connect

    Alvin, M.A.; Tressler, R.E.; Lippert, T.E.; Diaz, E.S.; Smeltzer, E.E.

    1994-10-01

    The objectives of this program are to identify the potential long-term thermal/chemical effects that advanced coal-based power generating systems have on the stability of porous ceramic filter materials, as well as to assess the influence of these effects on filter operating performance and life.

  13. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  14. Using imputation and mixture model approaches to integrate multi-state capture-recapture models with assignment information.

    PubMed

    Wen, Zhi; Pollock, Kenneth H; Nichols, James D; Waser, Peter M; Cao, Weihua

    2014-06-01

    In this article, we first extend the superpopulation capture-recapture model to multiple states (locations or populations) for two age groups., Wen et al., (2011; 2013) developed a new approach combining capture-recapture data with population assignment information to estimate the relative contributions of in situ births and immigrants to the growth of a single study population. Here, we first generalize Wen et al., (2011; 2013) approach to a system composed of multiple study populations (multi-state) with two age groups, where an imputation approach is employed to account for the uncertainty inherent in the population assignment information. Then we develop a different, individual-level mixture model approach to integrate the individual-level population assignment information with the capture-recapture data. Our simulation and real data analyses show that the fusion of population assignment information with capture-recapture data allows us to estimate the origination-specific recruitment of new animals to the system and the dispersal process between populations within the system. Compared to a standard capture-recapture model, our new models improve the estimation of demographic parameters, including survival probability, origination-specific entry probability, and especially the probability of movement between populations, yielding higher accuracy and precision. PMID:24571715

  15. Tracking harmonic notch filter

    NASA Astrophysics Data System (ADS)

    Emo, Frederick L.

    1990-07-01

    Disclosed in this patent is an electronic filter for automatically tracking and removing harmonically related interfering electrical signals such as power line interference harmonics without attenuating other signals of interest even though the signals are frequency stable and/or near the interference signal frequencies. The filter comprises a very narrow band electronic commutated capacitor-bank comb-notch filter driven by a counter/decoder circuit which is in turn driven by a phase locked loop. The filter also comprises two narrow band analog filters tuned to the two lowest harmonics of the interfering signal and drives the comb-notch at unit multiples of the fundamental of the interference frequency. This action is continuous such that center frequencies of the notches are automatically adjusted to compensate for small variations in the interference frequency.

  16. Sub-micron filter

    DOEpatents

    Tepper, Frederick; Kaledin, Leonid

    2009-10-13

    Aluminum hydroxide fibers approximately 2 nanometers in diameter and with surface areas ranging from 200 to 650 m.sup.2/g have been found to be highly electropositive. When dispersed in water they are able to attach to and retain electronegative particles. When combined into a composite filter with other fibers or particles they can filter bacteria and nano size particulates such as viruses and colloidal particles at high flux through the filter. Such filters can be used for purification and sterilization of water, biological, medical and pharmaceutical fluids, and as a collector/concentrator for detection and assay of microbes and viruses. The alumina fibers are also capable of filtering sub-micron inorganic and metallic particles to produce ultra pure water. The fibers are suitable as a substrate for growth of cells. Macromolecules such as proteins may be separated from each other based on their electronegative charges.

  17. Sintered composite filter

    DOEpatents

    Bergman, W.

    1986-05-02

    A particulate filter medium formed of a sintered composite of 0.5 micron diameter quartz fibers and 2 micron diameter stainless steel fibers is described. Preferred composition is about 40 vol.% quartz and about 60 vol.% stainless steel fibers. The media is sintered at about 1100/sup 0/C to bond the stainless steel fibers into a cage network which holds the quartz fibers. High filter efficiency and low flow resistance are provided by the smaller quartz fibers. High strength is provided by the stainless steel fibers. The resulting media has a high efficiency and low pressure drop similar to the standard HEPA media, with tensile strength at least four times greater, and a maximum operating temperature of about 550/sup 0/C. The invention also includes methods to form the composite media and a HEPA filter utilizing the composite media. The filter media can be used to filter particles in both liquids and gases.

  18. Implicit Kalman filtering

    NASA Technical Reports Server (NTRS)

    Skliar, M.; Ramirez, W. F.

    1997-01-01

    For an implicitly defined discrete system, a new algorithm for Kalman filtering is developed and an efficient numerical implementation scheme is proposed. Unlike the traditional explicit approach, the implicit filter can be readily applied to ill-conditioned systems and allows for generalization to descriptor systems. The implementation of the implicit filter depends on the solution of the congruence matrix equation (A1)(Px)(AT1) = Py. We develop a general iterative method for the solution of this equation, and prove necessary and sufficient conditions for convergence. It is shown that when the system matrices of an implicit system are sparse, the implicit Kalman filter requires significantly less computer time and storage to implement as compared to the traditional explicit Kalman filter. Simulation results are presented to illustrate and substantiate the theoretical developments.

  19. Multidimensional synthetic estimation filter

    NASA Technical Reports Server (NTRS)

    Monroe, Stanley E., Jr.; Juday, Richard D.

    1990-01-01

    The synthetic estimation filter (SEF) crafts an affine variation into its response to a changing parameter (e.g. scale or rotation). Sets of such filters are used in an estimation correlator to reduce the number of filters required for a given tracking accuracy. By overspecifying the system (one more SEF than parameters to be tracked), the ratio of correlation responses between filters forms a robust estimator into the spanned domain of the parameters. Previous results dealt with a laboratory correlator which could track a single parameter. This paper explores the SEF and the estimator's extension to more dimensions. A 2D example is given in which a reduction of filters from 25 to 3 is demonstrated to span a 4-degree square portion of pose space.

  20. BIREFRINGENT FILTER MODEL

    NASA Technical Reports Server (NTRS)

    Cross, P. L.

    1994-01-01

    Birefringent filters are often used as line-narrowing components in solid state lasers. The Birefringent Filter Model program generates a stand-alone model of a birefringent filter for use in designing and analyzing a birefringent filter. It was originally developed to aid in the design of solid state lasers to be used on aircraft or spacecraft to perform remote sensing of the atmosphere. The model is general enough to allow the user to address problems such as temperature stability requirements, manufacturing tolerances, and alignment tolerances. The input parameters for the program are divided into 7 groups: 1) general parameters which refer to all elements of the filter; 2) wavelength related parameters; 3) filter, coating and orientation parameters; 4) input ray parameters; 5) output device specifications; 6) component related parameters; and 7) transmission profile parameters. The program can analyze a birefringent filter with up to 12 different components, and can calculate the transmission and summary parameters for multiple passes as well as a single pass through the filter. The Jones matrix, which is calculated from the input parameters of Groups 1 through 4, is used to calculate the transmission. Output files containing the calculated transmission or the calculated Jones' matrix as a function of wavelength can be created. These output files can then be used as inputs for user written programs. For example, to plot the transmission or to calculate the eigen-transmittances and the corresponding eigen-polarizations for the Jones' matrix, write the appropriate data to a file. The Birefringent Filter Model is written in Microsoft FORTRAN 2.0. The program format is interactive. It was developed on an IBM PC XT equipped with an 8087 math coprocessor, and has a central memory requirement of approximately 154K. Since Microsoft FORTRAN 2.0 does not support complex arithmetic, matrix routines for addition, subtraction, and multiplication of complex, double precision variables are included. The Birefringent Filter Model was written in 1987.