These are representative sample records from Science.gov related to your search topic.
For comprehensive and current results, perform a real-time search at Science.gov.
1

Impact of pre-imputation SNP-filtering on genotype imputation results  

PubMed Central

Background Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time. PMID:25112433

2014-01-01

2

Quick, “Imputation-free” meta-analysis with proxy-SNPs  

PubMed Central

Background Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software), however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy-SNPs for missing markers to avoid unnecessary power loss. MA with YAMAS can be readily conducted as YAMAS provides a generic parser for heterogeneous tabulated file formats within the GWAS field and avoids cumbersome setups. In this way, it supplements the meta-analysis process. PMID:22971100

2012-01-01

3

Interaction Association Analysis of Imputed SNPs in Case-Control and Follow-Up Studies.  

PubMed

A new method is described to assess the interactions of imputed SNPs (single nucleotide polymorphisms) in case-control and follow-up studies, properly incorporating SNP imputation uncertainty in the likelihood model. Using simulation studies and analysis of real data obtained from the Framingham study cohort, we compare the performance of this new method to DOSAGE and NAIVE (also known as Best-Guess) methods, developed and commonly used in the context of single SNP and extended to SNP-by-SNP interaction. The results show that only our new method is unbiased under all examined scenarios regarding allele frequencies, imputation uncertainty degree, and interaction effect size. In addition, our method achieves at least as much power as the other two, and exceeds their statistical power in certain follow-up analysis situations. This method is fast enough to perform Genome Wide Interaction Studies (GWIS) with hundreds of thousands of interactions. By performing an exhaustive simulation study let us to provide recommendations for selecting the most appropriated method depending on MAF, interaction effect size, and uncertainty degree. In general, DOSAGE and our proposed method are recommended in most situations being our method more powerful and accurate when uncertainty and effect increase. PMID:25613387

Subirana, Isaac; González, Juan R

2015-03-01

4

Using Family-Based Imputation in Genome-Wide Association Studies with Large Complex Pedigrees: The Framingham Heart Study  

PubMed Central

Imputation has been widely used in genome-wide association studies (GWAS) to infer genotypes of un-genotyped variants based on the linkage disequilibrium in external reference panels such as the HapMap and 1000 Genomes. However, imputation has only rarely been performed based on family relationships to infer genotypes of un-genotyped individuals. Using 8998 Framingham Heart Study (FHS) participants genotyped with Affymetrix 550K SNPs, we imputed genotypes of same set of SNPs for additional 3121 participants, most of whom were never genotyped due to lack of DNA sample. Prior to imputation, 122 pedigrees were too large to be handled by the imputation software Merlin. Therefore, we developed a novel pedigree splitting algorithm that can maximize the number of genotyped relatives for imputing each un-genotyped individual, while keeping new sub-pedigrees under a pre-specified size. In GWAS of four phenotypes available in FHS (Alzheimer disease, circulating levels of fibrinogen, high-density lipoprotein cholesterol, and uric acid), we compared results using genotyped individuals only with results using both genotyped and imputed individuals. We studied the impact of applying different imputation quality filtering thresholds on the association results and did not found a universal threshold that always resulted in a more significant p-value for previously identified loci. However most of these loci had a lower p-value when we only included imputed genotypes with with ?60% SNP- and ?50% person-specific imputation certainty. In summary, we developed a novel algorithm for splitting large pedigrees for imputation and found a plausible imputation quality filtering threshold based on FHS. Further examination may be required to generalize this threshold to other studies. PMID:23284720

Chen, Wei-Min; Larson, Martin G.; Fox, Caroline S.; Vasan, Ramachandran S.; Seshadri, Sudha; O’Donnell, Christopher J.; Yang, Qiong

2012-01-01

5

Genotype Imputation  

PubMed Central

Genotype imputation is now an essential tool in the analysis of genomewide association scans. The technique allows geneticists to accurately evaluate the evidence for association at genetic markers that are not directly genotyped. Genotype imputation increases power of genomewide association scans and is particularly useful for combining the association scan results across studies that rely on different genotyping platforms. Here, we review the history and theoretical underpinnings of the technique. To illustrate performance of the approach, we summarize results from several actual gene mapping studies. Finally, we preview the role of genotype imputation in an era when whole genome resequencing is becoming increasingly common. PMID:19715440

Li, Yun; Willer, Cristen; Sanna, Serena; Abecasis, Gonçalo

2010-01-01

6

Two adjustment strategies for imputation across genotyping arrays.  

PubMed

Genotype imputation is a powerful approach in genome-wide association studies (GWAS) because it can provide higher resolution for associated regions and facilitate meta-analysis. However, bias can exist if different genotyping arrays are used and are unbalanced for case versus control subjects. The intersection imputation strategy [imputation based on single nucleotide polymorphisms (SNPs) available on all arrays] is a valid strategy that eliminates the bias caused by unbalanced genotyping, but achieved at the expense of reduced statistical power. In order to improve power in this situation, we introduce two new strategies: the replacement strategy based on the imputation quality score (IQS) ?0.9 and the correction strategy. The IQS is a score that we have previously introduced based on Cohen's kappa of rater agreement. The replacement strategy with IQS ?0.9 is a hybrid approach that utilizes measured genotypes for SNPs available on one or more of all arrays whenever the SNP has a high imputation quality (defined by IQS ?0.9). The correction strategy combines measured genotypes as well as imputed and corrected genotype dosages for SNPs available on one or more of all arrays. The correction strategy yields a valid statistical test, while the replacement strategy with IQS ?0.9 eliminates most spurious associations. Both strategies maintain statistical power. PMID:25033910

Xie, Yiran; Hancock, Dana B; Johnson, Eric O; Rice, John P

2014-01-01

7

Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle  

PubMed Central

Background The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle. Methods Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated. Results Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs. Conclusions Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability. PMID:25022768

2014-01-01

8

Genotype imputation via matrix completion  

PubMed Central

Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency. PMID:23233546

Chi, Eric C.; Zhou, Hua; Chen, Gary K.; Del Vecchyo, Diego Ortega; Lange, Kenneth

2013-01-01

9

Recursively Imputed Survival Trees  

PubMed Central

We propose recursively imputed survival tree (RIST) regression for right-censored data. This new nonparametric regression procedure uses a novel recursive imputation approach combined with extremely randomized trees that allows significantly better use of censored data than previous tree based methods, yielding improved model fit and reduced prediction error. The proposed method can also be viewed as a type of Monte Carlo EM algorithm which generates extra diversity in the tree-based fitting process. Simulation studies and data analyses demonstrate the superior performance of RIST compared to previous methods. PMID:23125470

Zhu, Ruoqing; Kosorok, Michael R.

2011-01-01

10

Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes  

PubMed Central

Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF?0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute. PMID:25621886

Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J. Brent; Wang, Li

2015-01-01

11

Practical Consideration of Genotype Imputation: Sample Size, Window Size, Reference Choice, and Untyped Rate  

PubMed Central

Imputation offers a promising way to infer the missing and/or untyped genotypes in genetic studies. In practice, however, many factors may affect the quality of imputation. In this study, we evaluated the influence of untyped rate, sizes of the study sample and the reference sample, window size, and reference choice (for admixed population), as the factors affecting the quality of imputation. The results show that in order to obtain good imputation quality, it is necessary to have an untyped rate less than 50%, a reference sample size greater than 50, and a window size of greater than 500 SNPs (roughly 1 MB in base pairs). Compared with the whole-region imputation, piecewise imputation with large-enough window sizes provides improved efficacy. For an admixed study sample, if only an external reference panel is used, it should include samples from the ancestral populations that represent the admixed population under investigation. Internal references are strongly recommended. When internal references are limited, however, augmentation by external references should be used carefully. More specifically, augmentation with samples from the major source populations of the admixture can lower the quality of imputation; augmentation with seemingly genetically unrelated cohorts may improve the quality of imputation. PMID:22308193

Zhang, Boshao; Zhi, Degui; Zhang, Kui; Gao, Guimin; Limdi, Nita N.; Liu, Nianjun

2011-01-01

12

Error rate for imputation from the Illumina BovineSNP50 chip to the Illumina BovineHD chip  

PubMed Central

Background Imputation of genotypes from low-density to higher density chips is a cost-effective method to obtain high-density genotypes for many animals, based on genotypes of only a relatively small subset of animals (reference population) on the high-density chip. Several factors influence the accuracy of imputation and our objective was to investigate the effects of the size of the reference population used for imputation and of the imputation method used and its parameters. Imputation of genotypes was carried out from 50 000 (moderate-density) to 777 000 (high-density) SNPs (single nucleotide polymorphisms). Methods The effect of reference population size was studied in two datasets: one with 548 and one with 1289 Holstein animals, genotyped with the Illumina BovineHD chip (777 k SNPs). A third dataset included the 548 animals genotyped with the 777 k SNP chip and 2200 animals genotyped with the Illumina BovineSNP50 chip. In each dataset, 60 animals were chosen as validation animals, for which all high-density genotypes were masked, except for the Illumina BovineSNP50 markers. Imputation was studied in a subset of six chromosomes, using the imputation software programs Beagle and DAGPHASE. Results Imputation with DAGPHASE and Beagle resulted in 1.91% and 0.87% allelic imputation error rates in the dataset with 548 high-density genotypes, when scale and shift parameters were 2.0 and 0.1, and 1.0 and 0.0, respectively. When Beagle was used alone, the imputation error rate was 0.67%. If the information obtained by Beagle was subsequently used in DAGPHASE, imputation error rates were slightly higher (0.71%). When 2200 moderate-density genotypes were added and Beagle was used alone, imputation error rates were slightly lower (0.64%). The least imputation errors were obtained with Beagle in the reference set with 1289 high-density genotypes (0.41%). Conclusions For imputation of genotypes from the 50 k to the 777 k SNP chip, Beagle gave the lowest allelic imputation error rates. Imputation error rates decreased with increasing size of the reference population. For applications for which computing time is limiting, DAGPHASE using information from Beagle can be considered as an alternative, since it reduces computation time and increases imputation error rates only slightly. PMID:24495554

2014-01-01

13

Blind Deconvolution via Sequential Imputations  

Microsoft Academic Search

The sequential imputation procedure is applied to adaptively and sequentially reconstruct discrete input signals that are blurred by an unknown linear moving average channel and contaminated by additive Gaussian noises, a problem known as blind deconvolution in digital communication. A rejuvenation procedure for improving the efficiency of sequential imputation is introduced and theoretically justified. The proposed method does not require

Jun S. Liu; Rong Chen

1995-01-01

14

Linkage analysis with sequential imputation  

Microsoft Academic Search

Multilocus calculations, using all available information on all pedigree members, are important for linkage analysis. Exact calculation methods in linkage analysis are limited in either the number of loci or the number of pedigree members they can handle. In this article, we propose a Monte Carlo method for linkage analysis based on sequential imputation. Unlike exact methods, sequential imputation can

Zachary Skrivanek; Shili Linn; Mark Irwin

2003-01-01

15

Design of a bovine low-density SNP array optimized for imputation  

Technology Transfer Automated Retrieval System (TEKTRAN)

The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where de...

16

Impact of Genotype Imputation on the Performance of GBLUP and Bayesian Methods for Genomic Prediction  

PubMed Central

The aim of this study was to evaluate the impact of genotype imputation on the performance of the GBLUP and Bayesian methods for genomic prediction. A total of 10,309 Holstein bulls were genotyped on the BovineSNP50 BeadChip (50 k). Five low density single nucleotide polymorphism (SNP) panels, containing 6,177, 2,480, 1,536, 768 and 384 SNPs, were simulated from the 50 k panel. A fraction of 0%, 33% and 66% of the animals were randomly selected from the training sets to have low density genotypes which were then imputed into 50 k genotypes. A GBLUP and a Bayesian method were used to predict direct genomic values (DGV) for validation animals using imputed or their actual 50 k genotypes. Traits studied included milk yield, fat percentage, protein percentage and somatic cell score (SCS). Results showed that performance of both GBLUP and Bayesian methods was influenced by imputation errors. For traits affected by a few large QTL, the Bayesian method resulted in greater reductions of accuracy due to imputation errors than GBLUP. Including SNPs with largest effects in the low density panel substantially improved the accuracy of genomic prediction for the Bayesian method. Including genotypes imputed from the 6 k panel achieved almost the same accuracy of genomic prediction as that of using the 50 k panel even when 66% of the training population was genotyped on the 6 k panel. These results justified the application of the 6 k panel for genomic prediction. Imputations from lower density panels were more prone to errors and resulted in lower accuracy of genomic prediction. But for animals that have close relationship to the reference set, genotype imputation may still achieve a relatively high accuracy. PMID:25025158

Chen, Liuhong; Li, Changxi; Sargolzaei, Mehdi; Schenkel, Flavio

2014-01-01

17

The Use of Family Relationships and Linkage Disequilibrium to Impute Phase and Missing Genotypes in Up to Whole-Genome Sequence Density Genotypic Data  

PubMed Central

A novel method, called linkage disequilibrium multilocus iterative peeling (LDMIP), for the imputation of phase and missing genotypes is developed. LDMIP performs an iterative peeling step for every locus, which accounts for the family data, and uses a forward–backward algorithm to accumulate information across loci. Marker similarity between haplotype pairs is used to impute possible missing genotypes and phases, which relies on the linkage disequilibrium between closely linked markers. After this imputation step, the combined iterative peeling/forward–backward algorithm is applied again, until convergence. The calculations per iteration scale linearly with number of markers and number of individuals in the pedigree, which makes LDMIP well suited to large numbers of markers and/or large numbers of individuals. Per iteration calculations scale quadratically with the number of alleles, which implies biallelic markers are preferred. In a situation with up to 15% randomly missing genotypes, the error rate of the imputed genotypes was <1% and ?99% of the missing genotypes were imputed. In another example, LDMIP was used to impute whole-genome sequence data consisting of 17,321 SNPs on a chromosome. Imputation of the sequence was based on the information of 20 (re)sequenced founder individuals and genotyping their descendants for a panel of 3000 SNPs. The error rate of the imputed SNP genotypes was 10%. However, if the parents of these 20 founders are also sequenced, >99% of missing genotypes are imputed correctly. PMID:20479147

Meuwissen, Theo; Goddard, Mike

2010-01-01

18

Imputation of TPMT defective alleles for the identification of patients with high-risk phenotypes  

PubMed Central

Background: The activity of thiopurine methyltransferase (TPMT) is subject to genetic variation. Loss-of-function alleles are associated with various degrees of myelosuppression after treatment with thiopurine drugs, thus genotype-based dosing recommendations currently exist. The aim of this study was to evaluate the potential utility of leveraging genomic data from large biorepositories in the identification of individuals with TPMT defective alleles. Material and methods: TPMT variants were imputed using the 1000 Genomes Project reference panel in 87,979 samples from the biobank at The Children's Hospital of Philadelphia. Population ancestry was determined by principal component analysis using HapMap3 samples as reference. Frequencies of the TPMT imputed alleles, genotypes and the associated phenotype were determined across the different populations. A sample of 630 subjects with genotype data from Sanger sequencing (N = 59) and direct genotyping (N = 583) (12 samples overlapping in the two groups) was used to check the concordance between the imputed and observed genotypes, as well as the sensitivity, specificity and positive and negative predictive values of the imputation. Results: Two SNPs (rs1800460 and rs1142345) that represent three TPMT alleles (*3A, *3B, and *3C) were imputed with adequate quality. Frequency for the associated enzyme activity varied across populations and 89.36–94.58% were predicted to have normal TPMT activity, 5.3–10.31% intermediate and 0.12–0.34% poor activities. Overall, 98.88% of individuals (623/630) were correctly imputed into carrying no risk alleles (553/553), heterozygous (45/46) and homozygous (25/31). Sensitivity, specificity and predictive values of imputation were over 90% in all cases except for the sensitivity of imputing homozygous subjects that was 80.64%. Conclusion: Imputation of TPMT alleles from existing genomic data can be used as a first step in the screening of individuals at risk of developing serious adverse events secondary to thiopurine drugs. PMID:24860591

Almoguera, Berta; Vazquez, Lyam; Connolly, John J.; Bradfield, Jonathan; Sleiman, Patrick; Keating, Brendan; Hakonarson, Hakon

2014-01-01

19

Imputing Amino Acid Polymorphisms in Human Leukocyte Antigens  

PubMed Central

DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N?=?918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes. PMID:23762245

Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick J.; Rich, Stephen S.; Raychaudhuri, Soumya; de Bakker, Paul I.W.

2013-01-01

20

Imputing amino acid polymorphisms in human leukocyte antigens.  

PubMed

DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N?=?918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes. PMID:23762245

Jia, Xiaoming; Han, Buhm; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick J; Rich, Stephen S; Raychaudhuri, Soumya; de Bakker, Paul I W

2013-01-01

21

PedBLIMP: Extending Linear Predictors to Impute Genotypes in Pedigrees  

PubMed Central

Recently, Wen and Stephens [Wen and Stephens 2010] proposed a linear predictor, called BLIMP, that uses conditional multivariate normal moments to impute genotypes with accuracy similar to current state-of-the-art methods. One novelty is that it regularized the estimated covariance matrix based on a model from population genetics. We extended multivariate moments to impute genotypes in pedigrees. Our proposed method, PedBLIMP, utilizes both the linkage disequilibrium (LD) information estimated from external panel data and the pedigree structure or identity by descent (IBD) information. The proposed method was evaluated on a pedigree design where some individuals were genotyped with dense markers and the rest with sparse markers. We found that incorporating the pedigree/IBD information can improve imputation accuracy compared to BLIMP. Because rare variants usually have low LD with other single nucleotide polymorphisms (SNPs), incorporating pedigree/IBD information largely improved imputation accuracy for rare variants. We also compared PedBLIMP with IMPUTE2 and GIGI. Results show that when sparse markers are in a certain density range, our method can outperform both IMPUTE2 and GIGI. PMID:25044249

Chen, Wenan; Schaid, Daniel J.

2014-01-01

22

Design of a Bovine Low-Density SNP Array Optimized for Imputation  

PubMed Central

The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle. PMID:22470530

Boichard, Didier; Chung, Hoyoung; Dassonneville, Romain; David, Xavier; Eggen, André; Fritz, Sébastien; Gietzen, Kimberly J.; Hayes, Ben J.; Lawley, Cynthia T.; Sonstegard, Tad S.; Van Tassell, Curtis P.; VanRaden, Paul M.; Viaud-Martinez, Karine A.; Wiggans, George R.

2012-01-01

23

A New Statistic to Evaluate Imputation Reliability  

Microsoft Academic Search

BackgroundAs the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these

Peng Lin; Sarah M. Hartz; Zhehao Zhang; Scott F. Saccone; Jia Wang; Jay A. Tischfield; Howard J. Edenberg; John R. Kramer; Alison M. Goate; Laura J. Bierut; John P. Rice

2010-01-01

24

Within- and across-breed imputation of high-density genotypes in dairy and beef cattle from medium- and low-density genotypes.  

PubMed

The objective of this study was to evaluate, using three different genotype density panels, the accuracy of imputation from lower- to higher-density genotypes in dairy and beef cattle. High-density genotypes consisting of 777,962 single-nucleotide polymorphisms (SNP) were available on 3122 animals comprised of 269, 196, 710, 234, 719, 730 and 264 Angus, Belgian Blue, Charolais, Hereford, Holstein-Friesian, Limousin and Simmental bulls, respectively. Three different genotype densities were generated: low density (LD; 6501 autosomal SNPs), medium density (50K; 47,770 autosomal SNPs) and high density (HD; 735,151 autosomal SNPs). Imputation from lower- to higher-density genotype platforms was undertaken within and across breeds exploiting population-wide linkage disequilibrium. The mean allele concordance rate per breed from LD to HD when undertaken using a single breed or multiple breed reference population varied from 0.956 to 0.974 and from 0.947 to 0.967, respectively. The mean allele concordance rate per breed from 50K to HD when undertaken using a single breed or multiple breed reference population varied from 0.987 to 0.994 and from 0.987 to 0.993, respectively. The accuracy of imputation was generally greater when the reference population was solely comprised of the breed to be imputed compared to when the reference population comprised of multiple breeds, although the impact was less when imputing from 50K to HD compared to imputing from LD. PMID:24906026

Berry, D P; McClure, M C; Mullen, M P

2014-06-01

25

The distributional impact of imputed rent  

Microsoft Academic Search

Imputed rents reflect the economic benefits of owner-occupied and social housing. Known to be one of the most significant components of household disposable income, imputed rents have been available in the EU-SILC since 2007. This paper examines the quality of the data on imputed rents and their distributional impact in the period of 2007–2010. We find the overall distributional impact

Hannele Sauli

2013-01-01

26

Survival estimation and testing via multiple imputation  

Microsoft Academic Search

Multiple imputation is a technique for handling data sets with missing values. The method fills in each missing value several times, creating many augmented data sets. Each augmented data set is analyzed separately and the results combined to give a final result consisting of an estimate and a measure of uncertainty. In this paper we consider nonparametric multiple-imputation methods to

Jeremy M. G. Taylor; Susan Murray; Chiu-Hsieh Hsu

2002-01-01

27

Imputation of missing data in time series for air pollutants  

NASA Astrophysics Data System (ADS)

Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

Junger, W. L.; Ponce de Leon, A.

2015-02-01

28

CUTOFF: A spatio-temporal imputation method  

NASA Astrophysics Data System (ADS)

Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value's nearest spatial neighbors. Extensions to this method are also developed to expand the method's ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters for CUTOFF, which can be used by other imputation methods as well. We analyze some rainfall data from 78 gauging stations in the Murray-Darling Basin in Australia using the CUTOFF imputation method and compare its performance to four well-studied competing imputation methods, namely, k-nearest neighbors, singular value decomposition, multiple imputation and random forest. Empirical results show that our method captures the temporal patterns well and is effective at imputing large gaps in the data. Compared to the competing methods, CUTOFF is more accurate and much faster. We analyze further examples to demonstrate CUTOFF's applications to two different data sets and provide extra evidence of its validity and usefulness. We implement a simulation study based on the Murray-Darling Basin data to evaluate the method; the results show that our method performs well in both accuracy and computational efficiency.

Feng, Lingbing; Nowak, Gen; O'Neill, T. J.; Welsh, A. H.

2014-11-01

29

GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies  

PubMed Central

Background Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. Results In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate?>?0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. Conclusion GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. GACT software http://www.uvm.edu/genomics/software/gact PMID:25038819

2014-01-01

30

Assessing methods for assigning SNPs to genes in gene-based tests of association using common variants.  

PubMed

Gene-based tests of association are frequently applied to common SNPs (MAF>5%) as an alternative to single-marker tests. In this analysis we conduct a variety of simulation studies applied to five popular gene-based tests investigating general trends related to their performance in realistic situations. In particular, we focus on the impact of non-causal SNPs and a variety of LD structures on the behavior of these tests. Ultimately, we find that non-causal SNPs can significantly impact the power of all gene-based tests. On average, we find that the "noise" from 6-12 non-causal SNPs will cancel out the "signal" of one causal SNP across five popular gene-based tests. Furthermore, we find complex and differing behavior of the methods in the presence of LD within and between non-causal and causal SNPs. Ultimately, better approaches for a priori prioritization of potentially causal SNPs (e.g., predicting functionality of non-synonymous SNPs), application of these methods to sequenced or fully imputed datasets, and limited use of window-based methods for assigning inter-genic SNPs to genes will improve power. However, significant power loss from non-causal SNPs may remain unless alternative statistical approaches robust to the inclusion of non-causal SNPs are developed. PMID:23741293

Petersen, Ashley; Alvarez, Carolina; DeClaire, Scott; Tintle, Nathan L

2013-01-01

31

Multi-Population Classical HLA Type Imputation  

PubMed Central

Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007) and Ron et al. (1998). HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%). On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework. PMID:23459081

Moutsianas, Loukas; Shen, Judong; Cox, Charles; Nelson, Matthew R.; McVean, Gil

2013-01-01

32

Improving accuracy of rare variant imputation with a two-step imputation approach.  

PubMed

Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies. PMID:24939589

Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G; Rivadeneira, Fernando; Estrada, Karol

2015-03-01

33

Combinations of SNPs Related to Signal Transduction in Bipolar Disorder  

PubMed Central

Any given single nucleotide polymorphism (SNP) in a genome may have little or no functional impact. A biologically significant effect may possibly emerge only when a number of key SNP-related genotypes occur together in a single organism. Thus, in analysis of many SNPs in association studies of complex diseases, it may be useful to look at combinations of genotypes. Genes related to signal transmission, e.g., ion channel genes, may be of interest in this respect in the context of bipolar disorder. In the present study, we analysed 803 SNPs in 55 genes related to aspects of signal transmission and calculated all combinations of three genotypes from the 3×803 SNP genotypes for 1355 controls and 607 patients with bipolar disorder. Four clusters of patient-specific combinations were identified. Permutation tests indicated that some of these combinations might be related to bipolar disorder. The WTCCC bipolar dataset were use for replication, 469 of the 803 SNP were present in the WTCCC dataset either directly (n?=?132) or by imputation (n?=?337) covering 51 of our selected genes. We found three clusters of patient-specific 3×SNP combinations in the WTCCC dataset. Different SNPs were involved in the clusters in the two datasets. The present analyses of the combinations of SNP genotypes support a role for both genetic heterogeneity and interactions in the genetic architecture of bipolar disorder. PMID:21897858

Koefoed, Pernille; Andreassen, Ole A.; Bennike, Bente; Dam, Henrik; Djurovic, Srdjan; Hansen, Thomas; Jorgensen, Martin Balslev; Kessing, Lars Vedel; Melle, Ingrid; Møller, Gert Lykke; Mors, Ole; Werge, Thomas; Mellerup, Erling

2011-01-01

34

Dual imputation model for incomplete longitudinal data.  

PubMed

Missing values are a practical issue in the analysis of longitudinal data. Multiple imputation (MI) is a well-known likelihood-based method that has optimal properties in terms of efficiency and consistency if the imputation model is correctly specified. Doubly robust (DR) weighing-based methods protect against misspecification bias if one of the models, but not necessarily both, for the data or the mechanism leading to missing data is correct. We propose a new imputation method that captures the simplicity of MI and protection from the DR method. This method integrates MI and DR to protect against misspecification of the imputation model under a missing at random assumption. Our method avoids analytical complications of missing data particularly in multivariate settings, and is easy to implement in standard statistical packages. Moreover, the proposed method works very well with an intermittent pattern of missingness when other DR methods can not be used. Simulation experiments show that the proposed approach achieves improved performance when one of the models is correct. The method is applied to data from the fireworks disaster study, a randomized clinical trial comparing therapies in disaster-exposed children. We conclude that the new method increases the robustness of imputations. PMID:23909566

Jolani, Shahab; Frank, Laurence E; van Buuren, Stef

2014-05-01

35

Automatic Treatment Planning with Convex Imputing  

NASA Astrophysics Data System (ADS)

Current inverse optimization-based treatment planning for radiotherapy requires a set of complex DVH objectives to be simultaneously minimized. This process, known as multi-objective optimization, is challenging due to non-convexity in individual objectives and insufficient knowledge in the tradeoffs among the objective set. As such, clinical practice involves numerous iterations of human intervention that is costly and often inconsistent. In this work, we propose to address treatment planning with convex imputing, a new-data mining technique that explores the existence of a latent convex objective whose optimizer reflects the DVH and dose-shaping properties of previously optimized cases. Using ten clinical prostate cases as the basis for comparison, we imputed a simple least-squares problem from the optimized solutions of the prostate cases, and show that the imputed plans are more consistent than their clinical counterparts in achieving planning goals.

Sayre, G. A.; Ruan, D.

2014-03-01

36

Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling  

ERIC Educational Resources Information Center

Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

Lee, Taehun; Cai, Li

2012-01-01

37

A general efficient and flexible approach for genome-wide association analyses of imputed genotypes in family-based designs.  

PubMed

Genotype imputation is a critical technique for following up genome-wide association studies. Efficient methods are available for dealing with the probabilistic nature of imputed single nucleotide polymorphisms (SNPs) in population-based designs, but not for family-based studies. We have developed a new analytical approach (FBATdosage), using imputed allele dosage in the general framework of family-based association tests to bridge this gap. Simulation studies showed that FBATdosage yielded highly consistent type I error rates, whatever the level of genotype uncertainty, and a much higher power than the best-guess genotype approach. FBATdosage allows fast linkage and association testing of several million of imputed variants with binary or quantitative phenotypes in nuclear families of arbitrary size with arbitrary missing data for the parents. The application of this approach to a family-based association study of leprosy susceptibility successfully refined the association signal at two candidate loci, C1orf141-IL23R on chromosome 1 and RAB32-C6orf103 on chromosome 6. PMID:25044438

Cobat, Aurélie; Abel, Laurent; Alcaïs, Alexandre; Schurr, Erwin

2014-09-01

38

Multiple imputation for an incomplete covariate that is a ratio.  

PubMed

We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable. PMID:23922236

Morris, Tim P; White, Ian R; Royston, Patrick; Seaman, Shaun R; Wood, Angela M

2014-01-15

39

Imputation-Based Analysis of Association Studies  

E-print Network

-SNP tests, this approach results in increased power to detect association, even in cases in which the causal that the whole genome, or a set of candidate regions). Because of correlation (linkage disequilibrium, LD) among SNPs. Thus, intuitively, testing typed SNPs for association with a phenotype will also have some power

Ober, Carole

40

Imputation methods for doubly censored HIV data  

Microsoft Academic Search

In medical research, it is common to have doubly censored survival data: origin time and event time are both subject to censoring. In this paper, we review simple and probability-based methods that are used to impute interval censored origin time and compare the performance of these methods through extensive simulations in the one-sample problem, two-sample problem and Cox regression model

Wei Zhang; Ying Zhang; Kathryn Chaloner; Jack T. Stapleton

2009-01-01

41

48 CFR 1830.7002-4 - Determining imputed cost of money.  

Code of Federal Regulations, 2013 CFR

...2013-10-01 false Determining imputed cost of money. 1830.7002-4 Section 1830.7002-4...1830.7002-4 Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction,...

2013-10-01

42

48 CFR 1830.7002-4 - Determining imputed cost of money.  

Code of Federal Regulations, 2010 CFR

...2010-10-01 true Determining imputed cost of money. 1830.7002-4 Section 1830.7002-4...1830.7002-4 Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction,...

2010-10-01

43

48 CFR 1830.7002-4 - Determining imputed cost of money.  

Code of Federal Regulations, 2011 CFR

...2011-10-01 false Determining imputed cost of money. 1830.7002-4 Section 1830.7002-4...1830.7002-4 Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction,...

2011-10-01

44

48 CFR 1830.7002-4 - Determining imputed cost of money.  

Code of Federal Regulations, 2012 CFR

...2012-10-01 false Determining imputed cost of money. 1830.7002-4 Section 1830.7002-4...1830.7002-4 Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction,...

2012-10-01

45

48 CFR 1830.7002-4 - Determining imputed cost of money.  

Code of Federal Regulations, 2014 CFR

...2014-10-01 false Determining imputed cost of money. 1830.7002-4 Section 1830.7002-4...1830.7002-4 Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction,...

2014-10-01

46

Spatially consistent nearest neighbor imputation of forest stand data  

Microsoft Academic Search

This study suggests a method for improving spatial consistency in the estimation of forest stand data. Traditional nearest neighbor imputation can preserve between-variable consistency within a unit, but not between geographically nearby units. The lack of spatial consistency may cause problems when data are used for purposes of forestry planning or scenario analysis. In spatially consistent nearest neighbor imputation, adjacent

Andreas Barth; Jörgen Wallerman; Göran Ståhl

2009-01-01

47

A Comparison of Imputation Methods for Bayesian Factor Analysis Models  

ERIC Educational Resources Information Center

Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…

Merkle, Edgar C.

2011-01-01

48

Multiple Imputation For Interval Censored Data With Auxiliary Variables  

Microsoft Academic Search

We propose a nonparametric multiple imputation scheme, NPMLE imputation, for the analysis of interval censored survival data. Features of the method are that it converts interval-censored data problems to complete data or right censored data problems to which many standard approaches can be used, and the measures of uncertainty are easily obtained. In addition to the event time of primary

Chiu-Hsieh Hsu; Jeremy Taylor; Susan Murray

2004-01-01

49

Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.  

PubMed

Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914

Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

2014-03-15

50

Use and abuse of census editing and imputation.  

PubMed

With the advent of electronic processing of census data, it has become common practice in some countries to change answers on questionnaires that seem inconsistent with other answers ("editing"), and to fill in blank spaces on questionnaires with plausable answers ("imputation"). Increasing incidence of these practices has caused uneasiness among both users and producers of census data. Elaborate editing and imputation can introduce serious errors into published data, and can destroy evidence that collected data is of limited quality and must be used with caution. In support of editing and imputation, it is argued that the quality of data is improved, that convenience of analysis is enhanced, and that data may be more credible. The author discusses each of these arguments in turn. She concludes that some types of editing (notably field editing and imputation and redundant imputation) enhance or help maintain data quality, others (semi-informed or blind imputation) can debase quality and must be used with great caution. User convenience justifies some use of imputation, such as replacement of unknown data that can have negligible effects on census results, but is not good enough reason for filling in all unknowns -- it is reasonable to expect variance in quality of data and to use caution when using some data. The credibility of census data can be damaged by excessive editing and imputation, and users of data should be educated about its limitations. The author believes that it has become necessary for census organizations to establish guidelines for editing and imputation, which should be published. A series of principles around which a wider discussion of the subject could be organized is offered. PMID:12309770

Banister, J

1980-02-01

51

Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.  

PubMed

The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200?000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies.European Journal of Human Genetics advance online publication, 8 October 2014; doi:10.1038/ejhg.2014.216. PMID:25293720

Pistis, Giorgio; Porcu, Eleonora; Vrieze, Scott I; Sidore, Carlo; Steri, Maristella; Danjou, Fabrice; Busonero, Fabio; Mulas, Antonella; Zoledziewska, Magdalena; Maschio, Andrea; Brennan, Christine; Lai, Sandra; Miller, Michael B; Marcelli, Marco; Urru, Maria Francesca; Pitzalis, Maristella; Lyons, Robert H; Kang, Hyun M; Jones, Chris M; Angius, Andrea; Iacono, William G; Schlessinger, David; McGue, Matt; Cucca, Francesco; Abecasis, Gonçalo R; Sanna, Serena

2014-10-01

52

A second generation human haplotype map of over 3.1 million SNPs  

PubMed Central

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. PMID:17943122

2009-01-01

53

Linking SNPs to CAG repeat length in  

E-print Network

Linking SNPs to CAG repeat length in Huntington's disease patients Wanzhao Liu1, Lori A Kennington1) is a promising therapy for human trinucleotide repeat diseases such as Huntington's disease. Linking SNP repeat length and nucleotide identity of heterozygous SNPs using Huntington's disease patient peripheral

Cai, Long

54

MaCH-Admix: Genotype Imputation for Admixed Populations  

PubMed Central

Imputation in admixed populations is an important problem but challenging due to the complex linkage disequilibrium (LD) pattern. The emergence of large reference panels such as that from the 1,000 Genomes Project enables more accurate imputation in general, and in particular for admixed populations and for uncommon variants. To efficiently benefit from these large reference panels, one key issue to consider in modern genotype imputation framework is the selection of effective reference panels. In this work, we consider a number of methods for effective reference panel construction inside a hidden Markov model and specific to each target individual. These methods fall into two categories: identity-by-state (IBS) based and ancestry-weighted approach. We evaluated the performance on individuals from recently admixed populations. Our target samples include 8,421 African Americans and 3,587 Hispanic Americans from the Women’s Health Initiative, which allow assessment of imputation quality for uncommon variants. Our experiments include both large and small reference panels; large, medium, and small target samples; and in genome regions of varying levels of LD. We also include BEAGLE and IMPUTE2 for comparison. Experiment results with large reference panel suggest that our novel piecewise IBS method yields consistently higher imputation quality than other methods/software. The advantage is particularly noteworthy among uncommon variants where we observe up to 5.1% information gain with the difference being highly significant (Wilcoxon signed rank test P-value < 0.0001). Our work is the first that considers various sensible approaches for imputation in admixed populations and presents a comprehensive comparison. PMID:23074066

Liu, Eric Yi; Li, Mingyao; Wang, Wei; Li, Yun

2012-01-01

55

Evidence after imputation for a role of MICA variants in nonprogression and elite control of HIV type 1 infection.  

PubMed

Past genome-wide association studies (GWAS) involving individuals with AIDS have mainly identified associations in the HLA region. Using the latest software, we imputed 7 million single-nucleotide polymorphisms (SNPs)/indels of the 1000 Genomes Project from the GWAS-determined genotypes of individuals in the Genomics of Resistance to Immunodeficiency Virus AIDS nonprogression cohort and compared them with those of control cohorts. The strongest signals were in MICA, the gene encoding major histocompatibility class I polypeptide-related sequence A (P = 3.31 × 10(-12)), with a particular exonic deletion (P = 1.59 × 10(-8)) in full linkage disequilibrium with the reference HCP5 rs2395029 SNP. Haplotype analysis also revealed an additive effect between HLA-C, HLA-B, and MICA variants. These data suggest a role for MICA in progression and elite control of human immunodeficiency virus type 1 infection. PMID:24939907

Le Clerc, Sigrid; Delaneau, Olivier; Coulonges, Cédric; Spadoni, Jean-Louis; Labib, Taoufik; Laville, Vincent; Ulveling, Damien; Noirel, Josselin; Montes, Matthieu; Schächter, François; Caillat-Zucman, Sophie; Zagury, Jean-François

2014-12-15

56

Missing value imputation: with application to handwriting data  

NASA Astrophysics Data System (ADS)

Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

Xu, Zhen; Srihari, Sargur N.

2015-01-01

57

The Relationship between Imputation Error and Statistical Power in Genetic Association Studies  

E-print Network

REPORT The Relationship between Imputation Error and Statistical Power in Genetic Association associations at imputed markers. Here, using a 2 3 3 chi- square test, we describe a relationship between genotype-imputation error rates and the sample-size inflation required for achieving statistical power

Rosenberg, Noah

58

Functional annotation of colon cancer risk SNPs  

PubMed Central

Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with increased risk for CRC. A molecular understanding of the functional consequences of this genetic variation has been complicated because each GWAS SNP is a surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Here we use genomic and epigenomic information to test the hypothesis that the GWAS SNPs and/or correlated SNPs are in elements that regulate gene expression, and identify 23 promoters and 28 enhancers. Using gene expression data from normal and tumour cells, we identify 66 putative target genes of the risk-associated enhancers (10 of which were also identified by promoter SNPs). Employing CRISPR nucleases, we delete one risk-associated enhancer and identify genes showing altered expression. We suggest that similar studies be performed to characterize all CRC risk-associated enhancers. PMID:25268989

Yao, Lijing; Tak, Yu Gyoung; Berman, Benjamin P.; Farnham, Peggy J.

2014-01-01

59

Reference-free detection of isolated SNPs  

PubMed Central

Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism. PMID:25404127

Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre

2015-01-01

60

Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle  

PubMed Central

Background Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives. Methods Genotypes were simulated for all individuals in the pedigree of a real (historical) dataset of phenotyped dairy cows and with part of the pedigree genotyped. The software AlphaImpute was used for imputation in its standard settings but also without phasing, i.e. using basic inheritance rules and segregation analysis only. Different scenarios were evaluated i.e.: (1) the real data scenario, (2) addition of genotypes of sires and maternal grandsires of the ungenotyped individuals, and (3) addition of one, two, or four genotyped offspring of the ungenotyped individuals to the reference population. Results The imputation accuracy using AlphaImpute in its standard settings was lower than without phasing. Including genotypes of sires and maternal grandsires in the reference population improved imputation accuracy, i.e. the correlation of the true genotypes with the imputed genotype dosages, corrected for mean gene content, across all animals increased from 0.47 (real situation) to 0.60. Including one, two and four genotyped offspring increased the accuracy of imputation across all animals from 0.57 (no offspring) to 0.73, 0.82, and 0.92, respectively. Conclusions At present, the use of basic inheritance rules and segregation analysis appears to be the best imputation method for ungenotyped individuals. Comparison of our empirical animal-specific imputation accuracies to predictions based on selection index theory suggested that not correcting for mean gene content considerably overestimates the true accuracy. Imputation of ungenotyped individuals can help to include valuable phenotypes for genome-wide association studies or for genomic prediction, especially when the ungenotyped individuals have genotyped offspring. PMID:24490796

2014-01-01

61

Meta-analysis and imputation refines the association of 15q25 with smoking quantity  

PubMed Central

Smoking is a leading global cause of disease and mortality1. We performed a genomewide meta-analytic association study of smoking-related behavioral traits in a total sample of 41,150 individuals drawn from 20 disease, population, and control cohorts. Our analysis confirmed an effect on smoking quantity (SQ) at a locus on 15q25 (P=9.45e-19) that includes three genes encoding neuronal nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, CHRNB4). We used data from the 1000 Genomes project to investigate the region using imputation, which allowed analysis of virtually all common variants in the region and offered a five-fold increase in coverage over the HapMap. This increased the spectrum of potentially causal single nucleotide polymorphisms (SNPs), which included a novel SNP that showed the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3. PMID:20418889

Liu, Jason Z.; Tozzi, Federica; Waterworth, Dawn M.; Pillai, Sreekumar G.; Muglia, Pierandrea; Middleton, Lefkos; Berrettini, Wade; Knouff, Christopher W.; Yuan, Xin; Waeber, Gérard; Vollenweider, Peter; Preisig, Martin; Wareham, Nicholas J; Zhao, Jing Hua; Loos, Ruth J.F.; Barroso, Inês; Khaw, Kay-Tee; Grundy, Scott; Barter, Philip; Mahley, Robert; Kesaniemi, Antero; McPherson, Ruth; Vincent, John B.; Strauss, John; Kennedy, James L.; Farmer, Anne; McGuffin, Peter; Day, Richard; Matthews, Keith; Bakke, Per; Gulsvik, Amund; Lucae, Susanne; Ising, Marcus; Brueckl, Tanja; Horstmann, Sonja; Wichmann, H.-Erich; Rawal, Rajesh; Dahmen, Norbert; Lamina, Claudia; Polasek, Ozren; Zgaga, Lina; Huffman, Jennifer; Campbell, Susan; Kooner, Jaspal; Chambers, John C; Burnett, Mary Susan; Devaney, Joseph M.; Pichard, Augusto D.; Kent, Kenneth M.; Satler, Lowell; Lindsay, Joseph M.; Waksman, Ron; Epstein, Stephen; Wilson, James F.; Wild, Sarah H.; Campbell, Harry; Vitart, Veronique; Reilly, Muredach P.; Li, Mingyao; Qu, Liming; Wilensky, Robert; Matthai, William; Hakonarson, Hakon H.; Rader, Daniel J.; Franke, Andre; Wittig, Michael; Schäfer, Arne; Uda, Manuela; Terracciano, Antonio; Xiao, Xiangjun; Busonero, Fabio; Scheet, Paul; Schlessinger, David; St Clair, David; Rujescu, Dan; Abecasis, Gonçalo R.; Grabe, Hans Jörgen; Teumer, Alexander; Völzke, Henry; Petersmann, Astrid; John, Ulrich; Rudan, Igor; Hayward, Caroline; Wright, Alan F.; Kolcic, Ivana; Wright, Benjamin J; Thompson, John R; Balmforth, Anthony J.; Hall, Alistair S.; Samani, Nilesh J.; Anderson, Carl A.; Ahmad, Tariq; Mathew, Christopher G.; Parkes, Miles; Satsangi, Jack; Caulfield, Mark; Munroe, Patricia B.; Farrall, Martin; Dominiczak, Anna; Worthington, Jane; Thomson, Wendy; Eyre, Steve; Barton, Anne; Mooser, Vincent; Francks, Clyde; Marchini, Jonathan

2013-01-01

62

Sequence Imputation of HPV16 Genomes for Genetic Association Studies  

E-print Network

Sequence Imputation of HPV16 Genomes for Genetic Association Studies Benjamin Smith1 , Zigui Chen1 type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs

DeSalle, Rob

63

ORIGINAL INVESTIGATION Methods to impute missing genotypes for population data  

E-print Network

ORIGINAL INVESTIGATION Methods to impute missing genotypes for population data Zhaoxia Yu Ã? Daniel For large-scale genotyping studies, it is com- mon for most subjects to have some missing genetic markers, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods

Yu, Zhaoxia

64

Genotype-Imputation Accuracy across Worldwide Human Populations  

E-print Network

ARTICLE Genotype-Imputation Accuracy across Worldwide Human Populations Lucy Huang,1,2,* Yun Li,1 involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study

Rosenberg, Noah

65

Using Multiple Imputation to Integrate and Disseminate Confidential Microdata  

E-print Network

Using Multiple Imputation to Integrate and Disseminate Confidential Microdata Jerome P. Reiter be complicated by the agencies' need to protect the confidentiality of database subjects, which could be at risk integration and dissemination while protecting data confidentiality. It reviews existing methods for obtaining

Reiter, Jerome P.

66

32 CFR 776.29 - Imputed disqualification: General rule.  

Code of Federal Regulations, 2010 CFR

...attorneys working in the same military law office are not automatically...attorneys working in the same law office. Such representation...co-accused at trial by court-martial. Imputed disqualification rules...proscribe all attorneys from one law office from representing a...

2010-07-01

67

32 CFR 776.29 - Imputed disqualification: General rule.  

Code of Federal Regulations, 2014 CFR

...attorneys working in the same military law office are not automatically...attorneys working in the same law office. Such representation...co-accused at trial by court-martial. Imputed disqualification rules...proscribe all attorneys from one law office from representing a...

2014-07-01

68

32 CFR 776.29 - Imputed disqualification: General rule.  

Code of Federal Regulations, 2013 CFR

...attorneys working in the same military law office are not automatically...attorneys working in the same law office. Such representation...co-accused at trial by court-martial. Imputed disqualification rules...proscribe all attorneys from one law office from representing a...

2013-07-01

69

32 CFR 776.29 - Imputed disqualification: General rule.  

Code of Federal Regulations, 2012 CFR

...attorneys working in the same military law office are not automatically...attorneys working in the same law office. Such representation...co-accused at trial by court-martial. Imputed disqualification rules...proscribe all attorneys from one law office from representing a...

2012-07-01

70

32 CFR 776.29 - Imputed disqualification: General rule.  

Code of Federal Regulations, 2011 CFR

...attorneys working in the same military law office are not automatically...attorneys working in the same law office. Such representation...co-accused at trial by court-martial. Imputed disqualification rules...proscribe all attorneys from one law office from representing a...

2011-07-01

71

A Moment Adjusted Imputation Method for Measurement Error Models  

PubMed Central

Summary Studies of clinical characteristics frequently measure covariates with a single observation. This may be a mis-measured version of the “true” phenomenon due to sources of variability like biological fluctuations and device error. Descriptive analyses and outcome models that are based on mis-measured data generally will not reflect the corresponding analyses based on the “true” covariate. Many statistical methods are available to adjust for measurement error. Imputation methods like regression calibration and moment reconstruction are easily implemented but are not always adequate. Sophisticated methods have been proposed for specific applications like density estimation, logistic regression, and survival analysis. However, it is frequently infeasible for an analyst to adjust each analysis separately, especially in preliminary studies where resources are limited. We propose an imputation approach called Moment Adjusted Imputation (MAI) that is flexible and relatively automatic. Like other imputation methods, it can be used to adjust a variety of analyses quickly, and it performs well under a broad range of circumstances. We illustrate the method via simulation and apply it to a study of systolic blood pressure and health outcomes in patients hospitalized with acute heart failure. PMID:21385161

Thomas, Laine; Stefanski, Leonard; Davidian, Marie

2011-01-01

72

Match Bias in Wage Gap Estimates Due to Earnings Imputation  

Microsoft Academic Search

About 30% of workers in the Current Population Survey have earnings imputed. Wage gap estimates are biased toward zero when the attribute being studied (e.g., union status) is not a criterion used to match donors to nonrespondents. An expression for \\

2004-01-01

73

Novel SNPs in cytochrome P450 oxidoreductase.  

PubMed

Cytochrome P450 oxidoreductase (POR) is the single flavoprotein which donates electrons to the microsomal cytochrome P450 enzymes for oxidation of their substrates. In this study, we sequenced all 15 exons and the surrounding intronic sequences of POR in 100 human liver samples to identify novel and confirm known genetic polymorphisms in POR. Thirty-four single nucleotide polymorphisms (SNPs) were identified including 9 in the coding exons (5 synonymous and 4 nonsynonymous), 20 in the intronic regions, and 5 in the 3'-UTR. Of these, 9 were novel SNPs, including three nonsynonymous SNPs, SNH313003 (817733G>C; K49N), SNH313020 (848661C>A; L420M), and SNH313029 (849577T>C; L577P) with minor allele frequencies of 0.005, 0.045, and 0.020, respectively. We also confirmed a previously reported non-synonymous SNP rs1057868 (A503V) as well as five synonymous SNPs (G5G, T29T, P129P, S485S, and S572S) all with allele frequencies similar to those previously reported. Structurally, these polymorphisms occur in different regions: SNH313003 (K49N) in the amino-terminal tail, SNH313020 (L420M) in the connecting domain, SNH313029 (L577P) in the NADPH-binding domain, and rs1057868 (A503V) in the FAD binding domain. PMID:17827787

Hart, Steven N; Li, Ye; Nakamoto, Kaori; Wesselman, Chris; Zhong, Xiao-bo

2007-08-01

74

Estimation of caries experience by multiple imputation and direct standardization.  

PubMed

Valid estimates of caries experience are needed to monitor oral population health. Obtaining such estimates in practice is often complicated by nonresponse and missing data. The goal of this study was to estimate caries experiences in a population of children aged 5 and 11 years, in the presence of nonresponse and missing data. Four estimation methods are compared. Each method makes implicit assumptions about the processes that caused the nonresponse and the missing data. Three of the four methods are based on unrealistic assumptions about the missing data and underestimate caries experience. Under the missing at random assumption, multiple imputation in combination with direct standardization corrects for the deficiencies of current methodology. In the presence of missing data and nonresponse, we recommend a combination of multiple imputation and direct standardization to obtain correct estimates of caries experience. PMID:24296647

Schuller, A A; van Buuren, S

2014-01-01

75

SNPs in forensic genetics: a review on SNP typing methodologies  

Microsoft Academic Search

There is an increasing interest in single nucleotide polymorphism (SNP) typing in the forensic field, not only for the usefulness of SNPs for defining Y chromosome or mtDNA haplogroups or for analyzing the geographical origin of samples, but also for the potential applications of autosomal SNPs. The interest of forensic researchers in autosomal SNPs has been attracted due to the

Beatriz Sobrino; María Brión; Angel Carracedo

2005-01-01

76

Imputation and quality control steps for combining multiple genome-wide datasets  

PubMed Central

The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR. PMID:25566314

Verma, Shefali S.; de Andrade, Mariza; Tromp, Gerard; Kuivaniemi, Helena; Pugh, Elizabeth; Namjou-Khales, Bahram; Mukherjee, Shubhabrata; Jarvik, Gail P.; Kottyan, Leah C.; Burt, Amber; Bradford, Yuki; Armstrong, Gretta D.; Derr, Kimberly; Crawford, Dana C.; Haines, Jonathan L.; Li, Rongling; Crosslin, David; Ritchie, Marylyn D.

2014-01-01

77

rSNPBase: a database for curated regulatory SNPs  

PubMed Central

In recent years, human regulatory SNPs (rSNPs) have been widely studied. Here, we present database rSNPBase, freely available at http://rsnp.psych.ac.cn/, to provide curated rSNPs that analyses the regulatory features of all SNPs in the human genome with reference to experimentally supported regulatory elements. In contrast with previous SNP functional annotation databases, rSNPBase is characterized by several unique features. (i) To improve reliability, all SNPs in rSNPBase are annotated with reference to experimentally supported regulatory elements. (ii) rSNPBase focuses on rSNPs involved in a wide range of regulation types, including proximal and distal transcriptional regulation and post-transcriptional regulation, and identifies their potentially regulated genes. (iii) Linkage disequilibrium (LD) correlations between SNPs were analysed so that the regulatory feature is annotated to SNP-set rather than a single SNP. (iv) rSNPBase provides the spatio-temporal labels and experimental eQTL labels for SNPs. In summary, rSNPBase provides more reliable, comprehensive and user-friendly regulatory annotations on rSNPs and will assist researchers in selecting candidate SNPs for further genetic studies and in exploring causal SNPs for in-depth molecular mechanisms of complex phenotypes. PMID:24285297

Guo, Liyuan; Du, Yang; Chang, Suhua; Zhang, Kunlin; Wang, Jing

2014-01-01

78

Tuning multiple imputation by predictive mean matching and local residual draws  

PubMed Central

Background Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor’s residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified. Methods We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified. Results In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations. Conclusions PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work. PMID:24903709

2014-01-01

79

TSPYL5 SNPs: Association with Plasma Estradiol Concentrations and Aromatase Expression  

PubMed Central

We performed a discovery genome-wide association study to identify genetic factors associated with variation in plasma estradiol (E2) concentrations using DNA from 772 postmenopausal women with estrogen receptor (ER)-positive breast cancer prior to the initiation of aromatase inhibitor therapy. Association analyses showed that the single nucleotide polymorphisms (SNP) (rs1864729) with the lowest P value (P = 3.49E-08), mapped to chromosome 8 near TSPYL5. We also identified 17 imputed SNPs in or near TSPYL5 with P values < 5E-08, one of which, rs2583506, created a functional estrogen response element. We then used a panel of lymphoblastoid cell lines (LCLs) stably transfected with ER? with known genome-wide SNP genotypes to demonstrate that TSPYL5 expression increased after E2 exposure of cells heterozygous for variant TSPYL5 SNP genotypes, but not in those homozygous for wild-type alleles. TSPYL5 knockdown decreased, and overexpression increased aromatase (CYP19A1) expression in MCF-7 cells, LCLs, and adipocytes through the skin/adipose (I.4) promoter. Chromatin immunoprecipitation assay showed that TSPYL5 bound to the CYP19A1 I.4 promoter. A putative TSPYL5 binding motif was identified in 43 genes, and TSPYL5 appeared to function as a transcription factor for most of those genes. In summary, genome-wide significant SNPs in TSPYL5 were associated with elevated plasma E2 in postmenopausal breast cancer patients. SNP rs2583506 created a functional estrogen response element, and LCLs with variant SNP genotypes displayed increased E2-dependent TSPYL5 expression. TSPYL5 induced CYP19A1 expression and that of many other genes. These studies have revealed a novel mechanism for regulating aromatase expression and plasma E2 concentrations in postmenopausal women with ER(+) breast cancer. PMID:23518928

Liu, Mohan; Ingle, James N.; Fridley, Brooke L.; Buzdar, Aman U.; Robson, Mark E.; Kubo, Michiaki; Wang, Liewei; Batzler, Anthony; Jenkins, Gregory D.; Pietrzak, Tracy L.; Carlson, Erin E.; Goetz, Matthew P.; Northfelt, Donald W.; Perez, Edith A.; Williard, Clark V.; Schaid, Daniel J.; Nakamura, Yusuke

2013-01-01

80

BLUP Genotype Imputation for Case-Control Association Testing With Related Individuals and Missing Data  

E-print Network

1 BLUP Genotype Imputation for Case-Control Association Testing With Related Individuals, for each marker tested, some individuals will have missing genotype data. The MQLS method has been proposed missing genotypes are imputed using the best linear unbiased predictor (BLUP) based on relatives' genotype

Gilad, Yoav

81

Estimation of missing rainfall data using spatial interpolation and imputation methods  

NASA Astrophysics Data System (ADS)

This study is aimed to estimate missing rainfall data by dividing the analysis into three different percentages namely 5%, 10% and 20% in order to represent various cases of missing data. In practice, spatial interpolation methods are chosen at the first place to estimate missing data. These methods include normal ratio (NR), arithmetic average (AA), coefficient of correlation (CC) and inverse distance (ID) weighting methods. The methods consider the distance between the target and the neighbouring stations as well as the correlations between them. Alternative method for solving missing data is an imputation method. Imputation is a process of replacing missing data with substituted values. A once-common method of imputation is single-imputation method, which allows parameter estimation. However, the single imputation method ignored the estimation of variability which leads to the underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. In this study, comparison of spatial interpolation methods and multiple imputations method are presented to estimate missing rainfall data. The performance of the estimation methods used are assessed using the similarity index (S-index), mean absolute error (MAE) and coefficient of correlation (R).

Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Azman, Muhammad Az-zuhri

2015-02-01

82

Imputation of Missing Genotypes From Sparse to High Density Using Long-Range Phasing  

Technology Transfer Automated Retrieval System (TEKTRAN)

Related individuals in a population share long chromosome segments which trace to a common ancestor. We describe a long-range phasing algorithm that makes use of this property to phase whole chromosomes and simultaneously impute a large number of missing markers. We test our method by imputing marke...

83

Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys  

ERIC Educational Resources Information Center

In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

Si, Yajuan; Reiter, Jerome P.

2013-01-01

84

Sequential Regression Multiple Imputation for Incomplete Multivariate Data using Markov Chain Monte Carlo  

Microsoft Academic Search

This paper discusses the theoretical background to handling missing data in a multivariate context. Earlier methods for dealing with item non-response are reviewed, followed by an examination of some of the more modern methods and, in particular, multiple imputation. One such technique, known as sequential regression multivariate imputation, which employs a Markov chain Monte Carlo algorithm is described and implemented.

Miguel Lacerda; Cally Ardington; Murray Leibbrandt

2007-01-01

85

Gaussianization-based quasi-imputation and expansion strategies for incomplete correlated binary responses  

Microsoft Academic Search

SUMMARY New quasi-imputation and expansion strategies for correlated binary responses are proposed by borrowing ideas from random number generation. The core idea is to convert correlated binary out- comes to multivariate normal outcomes in a sensible way so that re-conversion to the binary scale, after performing multiple imputation, yields the original specied marginal expectations and correla- tions. This conversion process

Hakan Demirtas; Donald Hedeker

2007-01-01

86

A Simplified Framework for Using Multiple Imputation in Social Work Research  

ERIC Educational Resources Information Center

Missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings. Increasingly, social science researchers are turning to multiple imputation to handle missing data. Multiple imputation, in which missing values are replaced by values repeatedly drawn from…

Rose, Roderick A.; Fraser, Mark W.

2008-01-01

87

Finding Haplotype Tagging SNPs by Use of Principal Components Analysis  

PubMed Central

The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as “haplotype tagging SNPs” (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision. PMID:15389393

Lin, Zhen; Altman, Russ B.

2004-01-01

88

RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION  

PubMed Central

Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics 9 (2008) 18–29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way. We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost. PMID:21572975

Zhang, Zhongyang; Lange, Kenneth; Ophoff, Roel; Sabatti, Chiara

2011-01-01

89

Approximation Algorithms for the Selection of Robust Tag SNPs  

Microsoft Academic Search

\\u000a Recent studies have shown that the chromosomal recombination only takes places at some narrow hotspots. Within the chromosomal\\u000a region between these hotspots (called haplotype block), little or even no recombination occurs, and a small subset of SNPs\\u000a (called tag SNPs) is sufficient to capture the haplotype pattern of the block. In reality, the tag SNPs may be genotyped as\\u000a missing

Yao-ting Huang; Kui Zhang; Ting Chen; Kun-mao Chao

2004-01-01

90

Association Studies with Imputed Variants Using Expectation-Maximization Likelihood-Ratio Tests  

PubMed Central

Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women’s Health Initiative Study, provide further support to the validity and efficiency of our proposed methods. PMID:25383782

Huang, Kuan-Chieh; Sun, Wei; Wu, Ying; Chen, Mengjie; Mohlke, Karen L.; Lange, Leslie A.; Li, Yun

2014-01-01

91

Biological impact of missing-value imputation on downstream analyses of gene expression profiles  

PubMed Central

Motivation: Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation. Methods: Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure. Results: DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation. Contact: ctseng@pitt.edu; guy.brock@louisville.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21045072

Oh, Sunghee; Kang, Dongwan D.; Brock, Guy N.; Tseng, George C.

2011-01-01

92

Traffic Speed Data Imputation Method Based on Tensor Completion  

PubMed Central

Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

Ran, Bin; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

2015-01-01

93

A model-based approach to selection of tag SNPs  

Microsoft Academic Search

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes

Pierre Nicolas; Fengzhu Sun; Lei M. Li

2006-01-01

94

Rare variant testing of imputed data: an analysis pipeline typified.  

PubMed

Important methodological advancements in rare variant association testing have been made recently, among them collapsing tests, kernel methods and the variable threshold (VT) technique. Typically, rare variants from a region of interest are tested for association as a group ('bin'). Rare variant studies are already routinely performed as whole-exome sequencing studies. As an alternative approach, we propose a pipeline for rare variant analysis of imputed data and develop respective quality control criteria. We provide suggestions for the choice and construction of analysis bins in whole-genome application and support the analysis with implementations of standard burden tests (COLL, CMAT) in our INTERSNP-RARE software. In addition, three rare variant regression tests (REG, FRACREG and COLLREG) are implemented. All tests are accompanied with the VT approach which optimizes the definition of 'rareness'. We integrate kernel tests as implemented in SKAT/SKAT-O into the suggested strategies. Then, we apply our analysis scheme to a genome-wide association study of Alzheimer's disease. Further, we show that our pipeline leads to valid significance testing procedures with controlled type I error rates. Strong association signals surrounding the known APOE locus demonstrate statistical power. In addition, we highlight several suggestive rare variant association findings for follow-up studies, including genomic regions overlapping MCPH1, MED18 and NOTCH3. In summary, we describe and support a straightforward and cost-efficient rare variant analysis pipeline for imputed data and demonstrate its feasibility and validity. The strategy can complement rare variant studies with next generation sequencing data. PMID:25504234

Drichel, Dmitriy; Herold, Christine; Lacour, André; Ramirez, Alfredo; Jessen, Frank; Maier, Wolfgang; Noethen, Markus M; Leber, Markus; Vaitsiakhovich, Tatsiana; Becker, Tim

2014-01-01

95

Differential Network Analysis with Multiply Imputed Lipidomic Data  

PubMed Central

The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. PMID:25822937

Kujala, Maiju; Nevalainen, Jaakko; März, Winfried; Laaksonen, Reijo; Datta, Susmita

2015-01-01

96

Multiple Imputation by Ordered Monotone Blocks with Application to the Anthrax Vaccine Research Program  

E-print Network

Multiple Imputation by Ordered Monotone Blocks with Application to the Anthrax Vaccine Research with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due

West, Mike

97

37 CFR 11.110 - Imputation of conflicts of interest; General rule.  

Code of Federal Regulations, 2013 CFR

...of Professional Conduct Client-Practitioner Relationship § 11.110 Imputation...interest; General rule. (a) While practitioners are associated in a firm, none of...personal interest of the disqualified practitioner and does not present a...

2013-07-01

98

37 CFR 11.110 - Imputation of conflicts of interest; General rule.  

Code of Federal Regulations, 2014 CFR

...of Professional Conduct Client-Practitioner Relationship § 11.110 Imputation...interest; General rule. (a) While practitioners are associated in a firm, none of...personal interest of the disqualified practitioner and does not present a...

2014-07-01

99

Imputation of response rates from means and standard deviations in schizophrenia.  

PubMed

Missing outcome data is a major threat in meta-analytical studies of schizophrenia. Most clinical trials in psychiatry report only continuous outcome measures and express the effect of an intervention as a difference of means. However, these results are difficult to interpret for clinicians. Converting continuous data to binary response rates is one possible solution to the problem. Based on means and standard deviations for a continuous outcome, we examined the performance of an imputation method to define a dichotomous outcome using original individual patients' data from 16 randomized trials (6276 participants) comparing antipsychotic drugs in schizophrenia. We concluded that the imputed values re-captured in a reasonable degree the observed values providing a simple and practical alternative methodological choice for imputation of missing binary data in schizophrenia trials; nevertheless, the imputation method tended to introduce biases, especially for extreme risks and large treatment differences. PMID:24262679

Samara, Myrto T; Spineli, Loukia M; Furukawa, Toshi A; Engel, Rolf R; Davis, John M; Salanti, Georgia; Leucht, Stefan

2013-12-01

100

Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.  

PubMed

When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184

Blue, Elizabeth M; Sun, Lei; Tintle, Nathan L; Wijsman, Ellen M

2014-09-01

101

State based imputation of missing data for robust speech recognition and speech enhancement  

Microsoft Academic Search

Within the context of continuous-density HMM speech recognition in noise, we report on imputation of missing time-frequency regions using emission state probability distributions. Spectral subtraction and local signal-to- noise estimation based criteria are used to separate the present from the missing components. We consider two approaches to the problem of classification with missing data: marginalization and data imputation. A formal-

Ljubomir Josifovski; Martin Cooke; Phil D. Green; Ascension Vizinho

1999-01-01

102

Comparing strategies to fine map the association of common SNPs on chromosome 9p21 to Type 2 Diabetes and Myocardial Infarction  

PubMed Central

Non-coding variants at human chromosome 9p21 near CDKN2A and CDKN2B are associated with type 2 diabetes (T2D)1-4, myocardial infarction (MI)5-7, aneurysm8, vertical cup disc ratio9, and at least five cancers10-16. We compared approaches to more comprehensively assess genetic variation in the region. We performed targeted sequencing at high coverage in 47 individuals and compared the results to pilot data from the 1000 Genomes Project. We imputed variants into T2D and MI cohorts directly from targeted sequencing, from a genotyped reference panel derived from sequencing, and from 1000 Genomes low-coverage data. Common polymorphisms were captured similarly by all strategies. Imputation of intermediate frequency polymorphisms required a higher density of tag SNPs in disease samples than available on first generation Genome Wide Association Study (GWAS) arrays. Association analyses identified more comprehensive sets of variants demonstrating equivalent statistical association to T2D or MI, but did not identify stronger associations the original GWAS signals. PMID:21775993

Shea, Jessica; Agarwala, Vineeta; Philippakis, Anthony A.; Maguire, Jared; Banks, Eric; DePristo, Mark; Thomson, Brian; Guiducci, Candace; Kathiresan, Sekar; Gabriel, Stacey; Burtt, Noël P; Daly, Mark J.; Groop, Leif; Altshuler, David

2014-01-01

103

Identifying causal regulatory SNPs in ChIP-seq enhancers  

PubMed Central

Thousands of non-coding SNPs have been linked to human diseases in the past. The identification of causal alleles within this pool of disease-associated non-coding SNPs is largely impossible due to the inability to accurately quantify the impact of non-coding variation. To overcome this challenge, we developed a computational model that uses ChIP-seq intensity variation in response to non-coding allelic change as a proxy to the quantification of the biological role of non-coding SNPs. We applied this model to HepG2 enhancers and detected 4796 enhancer SNPs capable of disrupting enhancer activity upon allelic change. These SNPs are significantly over-represented in the binding sites of HNF4 and FOXA families of liver transcription factors and liver eQTLs. In addition, these SNPs are strongly associated with liver GWAS traits, including type I diabetes, and are linked to the abnormal levels of HDL and LDL cholesterol. Our model is directly applicable to any enhancer set for mapping causal regulatory SNPs. PMID:25520196

Huang, Di; Ovcharenko, Ivan

2015-01-01

104

Identifying causal regulatory SNPs in ChIP-seq enhancers.  

PubMed

Thousands of non-coding SNPs have been linked to human diseases in the past. The identification of causal alleles within this pool of disease-associated non-coding SNPs is largely impossible due to the inability to accurately quantify the impact of non-coding variation. To overcome this challenge, we developed a computational model that uses ChIP-seq intensity variation in response to non-coding allelic change as a proxy to the quantification of the biological role of non-coding SNPs. We applied this model to HepG2 enhancers and detected 4796 enhancer SNPs capable of disrupting enhancer activity upon allelic change. These SNPs are significantly over-represented in the binding sites of HNF4 and FOXA families of liver transcription factors and liver eQTLs. In addition, these SNPs are strongly associated with liver GWAS traits, including type I diabetes, and are linked to the abnormal levels of HDL and LDL cholesterol. Our model is directly applicable to any enhancer set for mapping causal regulatory SNPs. PMID:25520196

Huang, Di; Ovcharenko, Ivan

2015-01-01

105

Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.  

PubMed

With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

Ernst, Jason; Kellis, Manolis

2015-04-01

106

Validation of statistical imputation of allele-level multilocus phased genotypes from ambiguous HLA assignments.  

PubMed

Genetic matching for loci in the human leukocyte antigen (HLA) region between a donor and a patient in hematopoietic stem cell transplantation (HSCT) is critical to outcome; however, methods for HLA genotyping of donors in unrelated stem cell registries often yield results with allelic and phase ambiguity and/or do not query all clinically relevant loci. We present and evaluate a statistical method for in silico imputation of HLA alleles and haplotypes in large ambiguous population data from the Be The Match(®) Registry. Our method builds on haplotype frequencies estimated from registry populations and exploits patterns of linkage disequilibrium (LD) across HLA haplotypes to infer high resolution HLA assignments. We performed validation on simulated and real population data from the Registry with non-trivial ambiguity content. While real population datasets caused some predictions to deviate from expectation, validations still showed high percent recall for imputed results with average recall >76% when imputing HLA alleles from registry data. We simulated ambiguity generated by several HLA genotyping methods to evaluate the imputation performance on several levels of typing resolution. On average, imputation percent recall of allele-level HLA haplotypes was >95% for allele-level typing, >92% for intermediate resolution typing and >58% for serology (low-resolution) typing. Thus, allele-level HLA assignments can be imputed through the application of a set of statistical and population genetics inferences and with knowledge of haplotype frequencies and self-identified race and ethnicities. PMID:25040134

Madbouly, A; Gragert, L; Freeman, J; Leahy, N; Gourraud, P-A; Hollenbach, J A; Kamoun, M; Fernandez-Vina, M; Maiers, M

2014-09-01

107

SNPs selection using support vector regression and genetic algorithms in GWAS  

PubMed Central

Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332

2014-01-01

108

Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation.  

PubMed

Genomic selection has the potential to increase genetic progress. Genotype imputation of high-density single-nucleotide polymorphism (SNP) genotypes can improve the cost efficiency of genomic breeding value (GEBV) prediction for pig breeding. Consequently, the objectives of this work were to: (1) estimate accuracy of genomic evaluation and GEBV for three traits in a Yorkshire population and (2) quantify the loss of accuracy of genomic evaluation and GEBV when genotypes were imputed under two scenarios: a high-cost, high-accuracy scenario in which only selection candidates were imputed from a low-density platform and a low-cost, low-accuracy scenario in which all animals were imputed using a small reference panel of haplotypes. Phenotypes and genotypes obtained with the PorcineSNP60 BeadChip were available for 983 Yorkshire boars. Genotypes of selection candidates were masked and imputed using tagSNP in the GeneSeek Genomic Profiler (10K). Imputation was performed with BEAGLE using 128 or 1800 haplotypes as reference panels. GEBV were obtained through an animal-centric ridge regression model using de-regressed breeding values as response variables. Accuracy of genomic evaluation was estimated as the correlation between estimated breeding values and GEBV in a 10-fold cross validation design. Accuracy of genomic evaluation using observed genotypes was high for all traits (0.65-0.68). Using genotypes imputed from a large reference panel (accuracy: R(2) = 0.95) for genomic evaluation did not significantly decrease accuracy, whereas a scenario with genotypes imputed from a small reference panel (R(2) = 0.88) did show a significant decrease in accuracy. Genomic evaluation based on imputed genotypes in selection candidates can be implemented at a fraction of the cost of a genomic evaluation using observed genotypes and still yield virtually the same accuracy. On the other side, using a very small reference panel of haplotypes to impute training animals and candidates for selection results in lower accuracy of genomic evaluation. PMID:24531728

Badke, Yvonne M; Bates, Ronald O; Ernst, Catherine W; Fix, Justin; Steibel, Juan P

2014-04-01

109

Multiplex typing with 5 Y-chromosomal SNPs  

Microsoft Academic Search

Many different methods have been established for SNP detection and especially minisequencing has often been used. For our study we have selected 5 Y-chromosomal SNPs (M9, M17, M45, M170, M173) based on the degree of polymorphism. PCR primers were designed with the aim to get amplicon lengths of <200 bp. The Y-SNPs were optimized in singleplex reactions and combined to

D. Schell; R. Klein; E. Miltner; P. Wiegand

2006-01-01

110

A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation  

PubMed Central

Background Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation. Methods An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis. Results Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored. Conclusions The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations. PMID:22462519

2012-01-01

111

Prediction of functional regulatory SNPs in monogenic and complex disease  

PubMed Central

Next-Generation Sequencing (NGS) technologies are yielding ever-higher volumes of human genome sequence data. Given this large amount of data, it has become both a possibility and a priority to determine how disease-causing single nucleotide polymorphisms (SNPs) detected within gene regulatory regions (rSNPs) exert their effects on gene expression. Recently, several studies have explored whether disease-causing polymorphisms have attributes that can distinguish them from those that are neutral, attaining moderate success at discriminating between functional and putatively neutral regulatory SNPs. Here, we have extended this work by assessing the utility of both SNP-based features (those associated only with the polymorphism site and the surrounding DNA) and Gene-based features (those derived from the associated gene in whose regulatory region the SNP lies) in the identification of functional regulatory polymorphisms involved in either monogenic or complex disease. Gene-based features were found to be capable of both augmenting and enhancing the utility of SNP-based features in the prediction of known regulatory mutations. Adopting this approach, we achieved an AUC of 0.903 for predicting regulatory SNPs. Finally, our tool predicted 225 new regulatory SNPs with a high degree of confidence, with 105 of the 225 falling into linkage disequilibrium blocks of reported disease-associated GWAS SNPs. PMID:21796725

Zhao, Yiqiang; Clark, Wyatt T.; Mort, Matthew; Cooper, David N.; Radivojac, Predrag; Mooney, Sean D.

2013-01-01

112

GIGI: An Approach to Effective Imputation of Dense Genotypes on Large Pedigrees  

PubMed Central

Recent emergence of the common-disease-rare-variant hypothesis has renewed interest in the use of large pedigrees for identifying rare causal variants. Genotyping with modern sequencing platforms is increasingly common in the search for such variants but remains expensive and often is limited to only a few subjects per pedigree. In population-based samples, genotype imputation is widely used so that additional genotyping is not needed. We now introduce an analogous approach that enables computationally efficient imputation in large pedigrees. Our approach samples inheritance vectors (IVs) from a Markov Chain Monte Carlo sampler by conditioning on genotypes from a sparse set of framework markers. Missing genotypes are probabilistically inferred from these IVs along with observed dense genotypes that are available on a subset of subjects. We implemented our approach in the Genotype Imputation Given Inheritance (GIGI) program and evaluated the approach on both simulated and real large pedigrees. With a real pedigree, we also compared imputed results obtained from this approach with those from the population-based imputation program BEAGLE. We demonstrated that our pedigree-based approach imputes many alleles with high accuracy. It is much more accurate for calling rare alleles than is population-based imputation and does not require an outside reference sample. We also evaluated the effect of varying other parameters, including the marker type and density of the framework panel, threshold for calling genotypes, and population allele frequencies. By leveraging information from existing genotypes already assayed on large pedigrees, our approach can facilitate cost-effective use of sequence data in the pursuit of rare causal variants. PMID:23561844

Cheung, Charles Y.K.; Thompson, Elizabeth A.; Wijsman, Ellen M.

2013-01-01

113

Comparison of missing value imputation methods in time series: the case of Turkish meteorological data  

NASA Astrophysics Data System (ADS)

This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

2013-04-01

114

Imputation of Truncated p-Values For Meta-Analysis Methods and Its Genomic Application1  

PubMed Central

Microarray analysis to monitor expression activities in thousands of genes simultaneously has become routine in biomedical research during the past decade. a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that aggregate transformed p-value evidence have been widely used in genomic settings, among which Fisher's and Stouffer's methods are the most popular ones. In practice, raw data and p-values of DE evidence are often not available in genomic studies that are to be combined. Instead, only the detected DE gene lists under a certain p-value threshold (e.g., DE genes with p-value < 0.001) are reported in journal publications. The truncated p-value information makes the aforementioned meta-analysis methods inapplicable and researchers are forced to apply a less efficient vote counting method or naïvely drop the studies with incomplete information. The purpose of this paper is to develop effective meta-analysis methods for such situations with partially censored p-values. We developed and compared three imputation methods—mean imputation, single random imputation and multiple imputation—for a general class of evidence aggregation methods of which Fisher's and Stouffer's methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis frameworks were established. Simulations were performed to investigate the type Ierror, power and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were applied to several genomic applications in colorectal cancer, pain and liquid association analysis of major depressive disorder (MDD). The results showed that imputation methods outperformed existing naïve approaches. Mean imputation and multiple imputation methods performed the best and are recommended for future applications. PMID:25541588

Tang, Shaowu; Ding, Ying; Sibille, Etienne; Mogil, Jeffrey; Lariviere, William R.; Tseng, George C.

2014-01-01

115

Imputation of Variants from the 1000 Genomes Project Modestly Improves Known Associations and Can Identify Low-frequency Variant - Phenotype Associations Undetected by HapMap Based Imputation  

PubMed Central

Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ? MAF <5%) and rare variants (<1%)) can enhance previously identified associations and identify novel loci, we selected 93 quantitative circulating factors where data was available from the InCHIANTI population study. These phenotypes included cytokines, binding proteins, hormones, vitamins and ions. We selected these phenotypes because many have known strong genetic associations and are potentially important to help understand disease processes. We performed a genome-wide scan for these 93 phenotypes in InCHIANTI. We identified 21 signals and 33 signals that reached P<5×10?8 based on HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P<5×10?11 respectively. Imputation of 1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P<5×10?8 in both analyses (17 of which represent well replicated signals in the NHGRI catalogue), six were captured by the same index SNP, five were nominally more strongly associated in 1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF?=?0.007) and alpha1-antitrypsin that predisposes to emphysema (P?=?2.5×10?12). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations. PMID:23696881

Wood, Andrew R.; Perry, John R. B.; Tanaka, Toshiko; Hernandez, Dena G.; Zheng, Hou-Feng; Melzer, David; Gibbs, J. Raphael; Nalls, Michael A.; Weedon, Michael N.; Spector, Tim D.; Richards, J. Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B.; Frayling, Timothy M.

2013-01-01

116

Imputation method for lifetime exposure assessment in air pollution epidemiologic studies  

PubMed Central

Background Environmental epidemiology, when focused on the life course of exposure to a specific pollutant, requires historical exposure estimates that are difficult to obtain for the full time period due to gaps in the historical record, especially in earlier years. We show that these gaps can be filled by applying multiple imputation methods to a formal risk equation that incorporates lifetime exposure. We also address challenges that arise, including choice of imputation method, potential bias in regression coefficients, and uncertainty in age-at-exposure sensitivities. Methods During time periods when parameters needed in the risk equation are missing for an individual, the parameters are filled by an imputation model using group level information or interpolation. A random component is added to match the variance found in the estimates for study subjects not needing imputation. The process is repeated to obtain multiple data sets, whose regressions against health data can be combined statistically to develop confidence limits using Rubin’s rules to account for the uncertainty introduced by the imputations. To test for possible recall bias between cases and controls, which can occur when historical residence location is obtained by interview, and which can lead to misclassification of imputed exposure by disease status, we introduce an “incompleteness index,” equal to the percentage of dose imputed (PDI) for a subject. “Effective doses” can be computed using different functional dependencies of relative risk on age of exposure, allowing intercomparison of different risk models. To illustrate our approach, we quantify lifetime exposure (dose) from traffic air pollution in an established case–control study on Long Island, New York, where considerable in-migration occurred over a period of many decades. Results The major result is the described approach to imputation. The illustrative example revealed potential recall bias, suggesting that regressions against health data should be done as a function of PDI to check for consistency of results. The 1% of study subjects who lived for long durations near heavily trafficked intersections, had very high cumulative exposures. Thus, imputation methods must be designed to reproduce non-standard distributions. Conclusions Our approach meets a number of methodological challenges to extending historical exposure reconstruction over a lifetime and shows promise for environmental epidemiology. Application to assessment of breast cancer risks will be reported in a subsequent manuscript. PMID:23919666

2013-01-01

117

PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population  

PubMed Central

Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost. PMID:25735005

Livne, Oren E.; Han, Lide; Alkorta-Aranburu, Gorka; Wentworth-Sheilds, William; Abney, Mark; Ober, Carole; Nicolae, Dan L.

2015-01-01

118

Analysis of longitudinal clinical trials with missing data using multiple imputation in conjunction with robust regression.  

PubMed

In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood-based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M-estimation; Huber, 1973, Annals of Statistics 1, 799-821.) to protect against potential non-normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987, Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000, Biometrika 87, 113-124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non-normal distributions. A clinical trial example is used for illustration. PMID:22994905

Mehrotra, Devan V; Li, Xiaoming; Liu, Jiajun; Lu, Kaifeng

2012-12-01

119

Multimodal diagnosis of epilepsy using conditional dependence and multiple imputation.  

PubMed

The definitive diagnosis of the type of epilepsy, if it exists, in medication-resistant seizure disorder is based on the efficient combination of clinical information, long-term video-electroencephalography (EEG) and neuroimaging. Diagnoses are reached by a consensus panel that combines these diverse modalities using clinical wisdom and experience. Here we compare two methods of multimodal computer-aided diagnosis, vector concatenation (VC) and conditional dependence (CD), using clinical archive data from 645 patients with medication-resistant seizure disorder, confirmed by video-EEG. CD models the clinical decision process, whereas VC allows for statistical modeling of cross-modality interactions. Due to the nature of clinical data, not all information was available in all patients. To overcome this, we multiply-imputed the missing data. Using a C4.5 decision tree, single modality classifiers achieved 53.1%, 51.5% and 51.1% average accuracy for MRI, clinical information and FDG-PET, respectively, for the discrimination between non-epileptic seizures, temporal lobe epilepsy, other focal epilepsies and generalized-onset epilepsy (vs. chance, p<0.01). Using VC, the average accuracy was significantly lower (39.2%). In contrast, the CD classifier that classified with MRI then clinical information achieved an average accuracy of 58.7% (vs. VC, p<0.01). The decrease in accuracy of VC compared to the MRI classifier illustrates how the addition of more informative features does not improve performance monotonically. The superiority of conditional dependence over vector concatenation suggests that the structure imposed by conditional dependence improved our ability to model the underlying diagnostic trends in the multimodality data. PMID:25311448

Kerr, Wesley T; Hwang, Eric S; Raman, Kaavya R; Barritt, Sarah E; Patel, Akash B; Le, Justine M; Hori, Jessica M; Davis, Emily C; Braesch, Chelsea T; Janio, Emily A; Lau, Edward P; Cho, Andrew Y; Anderson, Ariana; Silverman, Daniel H S; Salamon, Noriko; Engel, Jerome; Stern, John M; Cohen, Mark S

2014-06-01

120

Association analysis identifies Melampsora ×columbiana poplar leaf rust resistance SNPs.  

PubMed

Populus species are currently being domesticated through intensive time- and resource-dependent programs for utilization in phytoremediation, wood and paper products, and conversion to biofuels. Poplar leaf rust disease can greatly reduce wood volume. Genetic resistance is effective in reducing economic losses but major resistance loci have been race-specific and can be readily defeated by the pathogen. Developing durable disease resistance requires the identification of non-race-specific loci. In the presented study, area under the disease progress curve was calculated from natural infection of Melampsora ×columbiana in three consecutive years. Association analysis was performed using 412 P. trichocarpa clones genotyped with 29,355 SNPs covering 3,543 genes. We found 40 SNPs within 26 unique genes significantly associated (permutated P<0.05) with poplar rust severity. Moreover, two SNPs were repeated in all three years suggesting non-race-specificity and three additional SNPs were differentially expressed in other poplar rust interactions. These five SNPs were found in genes that have orthologs in Arabidopsis with functionality in pathogen induced transcriptome reprogramming, Ca²?/calmodulin and salicylic acid signaling, and tolerance to reactive oxygen species. The additive effect of non-R gene functional variants may constitute high levels of durable poplar leaf rust resistance. Therefore, these findings are of significance for speeding the genetic improvement of this long-lived, economically important organism. PMID:24236018

La Mantia, Jonathan; Klápšt?, Jaroslav; El-Kassaby, Yousry A; Azam, Shofiul; Guy, Robert D; Douglas, Carl J; Mansfield, Shawn D; Hamelin, Richard

2013-01-01

121

Multiplex Genotyping of Cytokine Gene SNPs Using Fluorescence Bead Array  

PubMed Central

Single nucleotide polymorphisms (SNPs) of genes that affect cytokine production and function are known to influence the susceptibility and progression of immune-related conditions such as infection, autoimmune diseases, transplantation, and cancer. We established a multiplex genotyping method to analyze the SNPs of cytokine genes by combining the multiplex PCR and bead array platform. Thirteen cytokine gene regions, including 20 SNPs, were amplified, and allele-specific primer extension was performed in a single tube. High-quality allele-specific primers were selected for signals greater than 1000 median fluorescence intensity (MFI) for positive alleles, and less than 500 MFI for negative alleles. To select and improve the extension primers, modifications for the reverse direction, length or refractory were performed. 24 primers in the forward or reverse direction step and 12 primers in length or refractory modifications were selected and showed high concordance with results by nucleotide sequencing. Among the 13 candidate cytokine genes, the SNPs of 12 cytokine genes, including IL-1?, IL-1R, IL-1RA, IL-1?, IL-2, IL-4, IL-4R?, IL-6, IL-10, IL-12, TGF-?1, and TNF-?, were successfully defined with the selected allele-specific primers in healthy Korean subjects. Our genotyping system provides a fast and accurate detection for SNPs of multiple cytokine genes to investigate their association with immune-related diseases and transplantation outcomes. PMID:25689696

Jang, Jung-Pil; Baek, In-Cheol; Choi, Eun-Jeong; Kim, Tai-Gyu

2015-01-01

122

Association Analysis Identifies Melampsora ×columbiana Poplar Leaf Rust Resistance SNPs  

PubMed Central

Populus species are currently being domesticated through intensive time- and resource-dependent programs for utilization in phytoremediation, wood and paper products, and conversion to biofuels. Poplar leaf rust disease can greatly reduce wood volume. Genetic resistance is effective in reducing economic losses but major resistance loci have been race-specific and can be readily defeated by the pathogen. Developing durable disease resistance requires the identification of non-race-specific loci. In the presented study, area under the disease progress curve was calculated from natural infection of Melampsora ×columbiana in three consecutive years. Association analysis was performed using 412 P. trichocarpa clones genotyped with 29,355 SNPs covering 3,543 genes. We found 40 SNPs within 26 unique genes significantly associated (permutated P<0.05) with poplar rust severity. Moreover, two SNPs were repeated in all three years suggesting non-race-specificity and three additional SNPs were differentially expressed in other poplar rust interactions. These five SNPs were found in genes that have orthologs in Arabidopsis with functionality in pathogen induced transcriptome reprogramming, Ca2+/calmodulin and salicylic acid signaling, and tolerance to reactive oxygen species. The additive effect of non-R gene functional variants may constitute high levels of durable poplar leaf rust resistance. Therefore, these findings are of significance for speeding the genetic improvement of this long-lived, economically important organism. PMID:24236018

La Mantia, Jonathan; Klápšt?, Jaroslav; El-Kassaby, Yousry A.; Azam, Shofiul; Guy, Robert D.; Douglas, Carl J.; Mansfield, Shawn D.; Hamelin, Richard

2013-01-01

123

Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods  

Microsoft Academic Search

Missing data are often encountered in data sets used to construct software effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. The authors evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LD), mean imputation (MI), similar response pattern imputation

Ingunn Myrtveit; Erik Stensrud; Ulf H. Olsson

2001-01-01

124

Imputation-Based Genomic Coverage Assessments of Current Human Genotyping Arrays  

PubMed Central

Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. “Genomic coverage” is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50–80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies. PMID:23979933

Nelson, Sarah C.; Doheny, Kimberly F.; Pugh, Elizabeth W.; Romm, Jane M.; Ling, Hua; Laurie, Cecelia A.; Browning, Sharon R.; Weir, Bruce S.; Laurie, Cathy C.

2013-01-01

125

Progress toward an efficient panel of SNPs for ancestry inference.  

PubMed

Many panels of ancestry informative single nucleotide polymorphisms have been proposed in recent years for various purposes including detecting stratification in biomedical studies and determining an individual's ancestry in a forensic context. All of the panels have limitations in their generality and efficiency for routine forensic work. Some panels have used only a few populations to validate them. Some panels are based on very large numbers of SNPs thereby limiting the ability of others to test different populations. We have been working toward an efficient and globally useful panel of ancestry informative markers that is comprised of a small number of highly informative SNPs. We have developed a panel of 55 SNPs analyzed on 73 populations from around the world. We present the details of the panel and discuss its strengths and limitations. PMID:24508742

Kidd, Kenneth K; Speed, William C; Pakstis, Andrew J; Furtado, Manohar R; Fang, Rixun; Madbouly, Abeer; Maiers, Martin; Middha, Mridu; Friedlaender, Françoise R; Kidd, Judith R

2014-05-01

126

Multiple imputation of missing fMRI data in whole brain analysis  

PubMed Central

Whole brain fMRI analyses rarely include the entire brain because of missing data that result from data acquisition limits and susceptibility artifact, in particular. This missing data problem is typically addressed by omitting voxels from analysis, which may exclude brain regions that are of theoretical interest and increase the potential for Type II error at cortical boundaries or Type I error when spatial thresholds are used to establish significance. Imputation could significantly expand statistical map coverage, increase power, and enhance interpretations of fMRI results. We examined multiple imputation for group level analyses of missing fMRI data using methods that leverage the spatial information in fMRI datasets for both real and simulated data. Available case analysis, neighbor replacement, and regression based imputation approaches were compared in a general linear model framework to determine the extent to which these methods quantitatively (effect size) and qualitatively (spatial coverage) increased the sensitivity of group analyses. In both real and simulated data analysis, multiple imputation provided 1) variance that was most similar to estimates for voxels with no missing data, 2) fewer false positive errors in comparison to mean replacement, and 3) fewer false negative errors in comparison to available case analysis. Compared to the standard analysis approach of omitting voxels with missing data, imputation methods increased brain coverage in this study by 35% (from 33,323 to 45,071 voxels). In addition, multiple imputation increased the size of significant clusters by 58% and number of significant clusters across statistical thresholds, compared to the standard voxel omission approach. While neighbor replacement produced similar results, we recommend multiple imputation because it uses an informed sampling distribution to deal with missing data across subjects that can include neighbor values and other predictors. Multiple imputation is anticipated to be particularly useful for 1) large fMRI data sets with inconsistent missing voxels across subjects and 2) addressing the problem of increased artifact at ultra-high field, which significantly limit the extent of whole brain coverage and interpretations of results. PMID:22500925

Vaden, Kenneth I.; Gebregziabher, Mulugeta; Kuchinsky, Stefanie E.; Eckert, Mark A.

2012-01-01

127

Analysis of mitochondrial transcription factor A SNPs in alcoholic cirrhosis  

PubMed Central

Genetic susceptibility to alcoholic cirrhosis (AC) exists. We previously demonstrated hepatic mitochondrial DNA (mtDNA) damage in patients with AC compared with chronic alcoholics without cirrhosis. Mitochondrial transcription factor A (mtTFA) is central to mtDNA expression regulation and repair; however, it is unclear whether there are specific mtTFA single nucleotide polymorphisms (SNPs) in patients with AC and whether they affect mtDNA repair. In the present study, we screened mtTFA SNPs in patients with AC and analyzed their impact on the copy number of mtDNA in AC. A total of 50 patients with AC, 50 alcoholics without AC and 50 normal subjects were enrolled in the study. SNPs of full-length mtTFA were analyzed using the polymerase chain reaction (PCR) combined with gene sequencing. The hepatic mtTFA mRNA and mtDNA copy numbers were measured using quantitative PCR (qPCR), and mtTFA protein was measured using western blot analysis. A total of 18 mtTFA SNPs specific to patients with AC with frequencies >10% were identified. Two were located in the coding region and 16 were identified in non-coding regions. Conversely, there were five SNPs that were only present in patients with AC and normal subjects and had a frequency >10%. In the AC group, the hepatic mtTFA mRNA and protein levels were significantly lower than those in the other two groups. Moreover, the hepatic mtDNA copy number was significantly lower in the AC group than in the controls and alcoholics without AC. Based on these data, we conclude that AC-specific mtTFA SNPs may be responsible for the observed reductions in mtTFA mRNA, protein levels and mtDNA copy number and they may also increase the susceptibility to AC. PMID:24348767

TANG, CHUN; LIU, HONGMING; TANG, YONGLIANG; GUO, YONG; LIANG, XIANCHUN; GUO, LIPING; PI, RUXIAN; YANG, JUNTAO

2014-01-01

128

SNPs Occur in Regions with Less Genomic Sequence Conservation  

PubMed Central

Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3? splice sites, 5? splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3? UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation. PMID:21674007

Castle, John C.

2011-01-01

129

Distance Measures and Smoothing Methodology for Imputing Features of  

E-print Network

relationships among documents, much of it in the context of searching for and filtering pages of the World Wide Web. It includes distinctly statistical contributions; for example, work by Cutting, Karger, Pedersen, Toronto, Ontario M5S 3G3, Canada. Michael Gervers is TKKK, Department of History, University of Toronto

Feuerverger, Andrey

130

Quality assessment parameters for EST-derived SNPs from catfish  

Technology Transfer Automated Retrieval System (TEKTRAN)

Two factors were found to be most significant for validation of EST-derived SNPs: the contig size and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contig sizes were equal to or larger than...

131

The distribution of SNPs in human gene regulatory regions  

Microsoft Academic Search

BACKGROUND: As a result of high-throughput genotyping methods, millions of human genetic variants have been reported in recent years. To efficiently identify those with significant biological functions, a practical strategy is to concentrate on variants located in important sequence regions such as gene regulatory regions. RESULTS: Analysis of the most common type of variant, single nucleotide polymorphisms (SNPs), shows that

Yongjian Guo; D Curtis Jamison

2005-01-01

132

SOYBEAN UNIGENES – USING SNPS TO DETERMINE THEIR GENETIC MAP POSITION  

Technology Transfer Automated Retrieval System (TEKTRAN)

Single Nucleotide Polymorphisms (SNPs) were discovered in more than 1000 soybean unigenes via resequencing. Resequencing determined the presence of polymorphism in two of the five mapping populations used to develop the current simple sequence repeat-based soybean linkage map constructed using Join...

133

Hereditary genes and SNPs associated with breast cancer.  

PubMed

Breast cancer is the most common cancer among women affecting up to one third of tehm during their lifespans. Increased expression of some genes due to polymorphisms increases the risk of breast cancer incidence. Since mutations that are recognized to increase breast cancer risk within families are quite rare, identification of these SNPs is very important. The most important loci which include mutations are; BRCA1, BRCA2, PTEN, ATM, TP53, CHEK2, PPM1D, CDH1, MLH1, MRE11, MSH2, MSH6, MUTYH, NBN, PMS1, PMS2, BRIP1, RAD50, RAD51C, STK11 and BARD1. Presence of SNPs in these genes increases the risk of breast cancer and associated diagnostic markers are among the most reliable for assessing prognosis of breast cancer. In this article we reviewed the hereditary genes of breast cancer and SNPs associated with increasing the risk of breast cancer that were recently were reported from candidate gene, meta-analysis and GWAS studies. SNPs of genes associated with breast cancer can be used as a potential tool for improving cancer diagnosis and treatment planning. PMID:23886119

Mahdi, Kooshyar Mohammad; Nassiri, Mohammad Reza; Nasiri, Khadijeh

2013-01-01

134

Association analysis of candidate SNPs on reproductive traits in swine  

Technology Transfer Automated Retrieval System (TEKTRAN)

Being able to identify young females with superior reproduction traits would have a large financial impact on commercial swine producers. Previous studies have discovered SNPs associated with economically important traits such as litter size, growth rate, fat deposition, and feed intake. The objecti...

135

Genotype Error Detection and Imputation using Hidden Markov Models of Haplotype Diversity  

E-print Network

directly due to the limited coverage of current genotyping platforms, imputation of genotypes at untyped (LD) observed in the population under study. With a runtime that scales linearly both in the number locus, where K is a user-specified parameter (typically a small constant, we used K = 7 in our

Mandoiu, Ion

136

The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis  

ERIC Educational Resources Information Center

This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

Yoo, Jin Eun

2009-01-01

137

Missing data sensitivity analysis for recurrent event data using controlled imputation.  

PubMed

Statistical analyses of recurrent event data have typically been based on the missing at random assumption. One implication of this is that, if data are collected only when patients are on their randomized treatment, the resulting de jure estimator of treatment effect corresponds to the situation in which the patients adhere to this regime throughout the study. For confirmatory analysis of clinical trials, sensitivity analyses are required to investigate alternative de facto estimands that depart from this assumption. Recent publications have described the use of multiple imputation methods based on pattern mixture models for continuous outcomes, where imputation for the missing data for one treatment arm (e.g. the active arm) is based on the statistical behaviour of outcomes in another arm (e.g. the placebo arm). This has been referred to as controlled imputation or reference-based imputation. In this paper, we use the negative multinomial distribution to apply this approach to analyses of recurrent events and other similar outcomes. The methods are illustrated by a trial in severe asthma where the primary endpoint was rate of exacerbations and the primary analysis was based on the negative binomial model. PMID:24931317

Keene, Oliver N; Roger, James H; Hartley, Benjamin F; Kenward, Michael G

2014-01-01

138

Imputation of missing genotypes from sparse to high density using long-range phasing  

Technology Transfer Automated Retrieval System (TEKTRAN)

Related individuals share potentially long chromosome segments that trace to a common ancestor. A phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations was developed to phase large sections of a chromosome. In addition to phasing, ChromoPhase imputes missing genotyp...

139

Multiple Imputation of Dental Caries Data Using a Zero Inflated Poisson Regression Model  

PubMed Central

Excess zeros exhibited by dental caries data require special attention when multiple imputation is applied to such data. Objective To demonstrate a simple technique using a zero-inflated Poisson (ZIP) regression model, to perform multiple imputation for missing caries data. Methods The technique is demonstrated using data (N=24,403) from a medical office-based preventive dental program in North Carolina, where 27.2% of children (N=6,637) were missing information on physician-identified count of carious teeth. We first estimate a ZIP regression model using the non-missing caries data (N=17,766). The coefficients from the ZIP model are then used to predict the missing caries data. Results This technique results in imputed caries counts that are similar to the non-missing caries data in their distribution, especially with respect to the excess zeros in the non-missing caries data. Conclusion This technique can be easily applied to impute missing dental caries data. PMID:20880027

Pahel, Bhavna T.; Preisser, John S.; Stearns, Sally C.; Rozier, R. Gary

2010-01-01

140

Evaluation of an Imputed Pitch Velocity Model of the Auditory Kappa Effect  

ERIC Educational Resources Information Center

Three experiments evaluated an imputed pitch velocity model of the auditory kappa effect. Listeners heard 3-tone sequences and judged the timing of the middle (target) tone relative to the timing of the 1st and 3rd (bounding) tones. Experiment 1 held pitch constant but varied the time (T) interval between bounding tones (T = 728, 1,000, or 1,600…

Henry, Molly J.; McAuley, J. Devin

2009-01-01

141

AMERICAN JOURNAL OF INDUSTRIAL MEDICINE 49:709718 (2006) Smoking Imputation and Lung Cancer in  

E-print Network

AMERICAN JOURNAL OF INDUSTRIAL MEDICINE 49:709­718 (2006) Smoking Imputation and Lung Cancer exhaust exposure and lung cancer mortality in a large retrospective cohort study of US railroad workers­1996. Mortality analyses incorporated the effect of smoking on lung cancer risk. Results The smoking adjusted

Reid, Nancy

2006-01-01

142

Time series outlier detection and imputation Hermine N. Akouemo and Richard J. Povinelli  

E-print Network

1 Time series outlier detection and imputation Hermine N. Akouemo and Richard J. Povinelli of outliers in time series data. An autoregressive integrated moving average with exogenous inputs (ARIMAX) model is used to extract the characteristics of the time series and to find the residuals. The outliers

Povinelli, Richard J.

143

Generating Multiple Imputations for Matrix Sampling Data Analyzed with Item Response Models.  

ERIC Educational Resources Information Center

Describes and assesses missing data methods currently used to analyze data from matrix sampling designs implemented by the National Assessment of Educational Progress. Several improved methods are developed, and these models are evaluated using an EM algorithm to obtain maximum likelihood estimates followed by multiple imputation of complete data…

Thomas, Neal; Gan, Nianci

1997-01-01

144

Missing value estimation for DNA microarray gene expression data: local least squares imputation  

Microsoft Academic Search

Motivation: Gene expression data often contain missing expression values. Effective missing value estimation meth- ods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures

Hyunsoo Kim; Gene H. Golub; Haesun Park

2005-01-01

145

ANALYSIS OF AN ADAPTIVE ITERATIVE LEARNING ALGORITHM FOR FREEWAY RAMP FLOW IMPUTATION  

E-print Network

ANALYSIS OF AN ADAPTIVE ITERATIVE LEARNING ALGORITHM FOR FREEWAY RAMP FLOW IMPUTATION Ajith is specified using link parameters (represented by a fundamental diagram of traffic flow), input demands (along: horowitz@berkeley.edu ABSTRACT We present an adaptive iterative learning based flow imputa- tion algorithm

Horowitz, Roberto

146

Symmetric smoothing filters from global consistency constraints.  

PubMed

Many patch-based image denoising methods can be viewed as data-dependent smoothing filters that carry out a weighted averaging of similar pixels. It has recently been argued that these averaging filters can be improved using their doubly stochastic approximation, which are symmetric and stable smoothing operators. In this paper, we introduce a simple principle of consistency that argues that the relative similarities between pixels as imputed by the averaging matrix should be preserved in the filtered output. The resultant consistency filter has the theoretically desirable properties of being symmetric and stable, and is a generalized doubly stochastic matrix. In addition, we can also interpret our consistency filter as a specific form of Laplacian regularization. Thus, our approach unifies two strands of image denoising methods, i.e., symmetric smoothing filters and spectral graph theory. Our consistency filter provides high-quality image denoising and significantly outperforms the doubly stochastic version. We present a thorough analysis of the properties of our proposed consistency filter and compare its performance with that of other significant methods for image denoising in the literature. PMID:25532176

Haque, Sheikh Mohammadul; Pai, Gautam P; Govindu, Venu Madhav

2015-05-01

147

A NONPARAMETRIC MULTIPLE IMPUTATION APPROACH FOR DATA WITH MISSING COVARIATE VALUES WITH APPLICATION TO COLORECTAL ADENOMA DATA  

PubMed Central

A nearest neighbor-based multiple imputation approach is proposed to recover missing covariate information using the predictive covariates while estimating the association between the outcome and the covariates. To conduct the imputation, two working models are fitted to define an imputing set. This approach is expected to be robust to the underlying distribution of the data. We show in simulation and demonstrate on a colorectal data set that the proposed approach can improve efficiency and reduce bias in a situation with missing at random compared to the complete case analysis and the modified inverse probability weighted method. PMID:24697618

Hsu, Chiu-Hsieh; Long, Qi; Li, Yisheng; Jacobs, Elizabeth

2015-01-01

148

Comparison of SNPs and microsatellites in identifying offtypes of cacao clones from Cameroon  

Technology Transfer Automated Retrieval System (TEKTRAN)

Single Nucleotide Polymorphism (SNP) markers are increasingly being used in crop breeding programs, slowly replacing microsatellites and other markers. SNPs provide many benefits over microsatellites, including ease of analysis and unambiguous results across various platforms. We compare SNPs to m...

149

SNP-VISTA: An Interactive SNPs Visualization Tool  

SciTech Connect

Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.

Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L.

2005-07-05

150

Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage.  

PubMed

The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.'s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while allowing timely incorporation of empirical data (e.g., annual forest inventory). PMID:23305341

Wilson, Barry Tyler; Woodall, Christopher W; Griffith, Douglas M

2013-01-01

151

Identity-by-descent graphs offer a flexible framework for imputation and both linkage and association analyses  

PubMed Central

We demonstrate the flexibility of identity-by-descent (IBD) graphs for genotype imputation and testing relationships between genotype and phenotype. We analyzed chromosome 3 and the first replicate of simulated diastolic blood pressure. IBD graphs were obtained from complete pedigrees and full multipoint marker analysis, facilitating subsequent linkage and other analyses. For rare alleles, pedigree-based imputation using these IBD graphs had a higher call rate than did population-based imputation. Combining the two approaches improved call rates for common alleles. We found it advantageous to incorporate known, rather than estimated, pedigree relationships when testing for association. Replacing missing data with imputed alleles improved association signals as well. Analyses were performed with knowledge of the underlying model. PMID:25519371

2014-01-01

152

Accounting for Dependence Induced by Weighted KNN Imputation in Paired Samples, Motivated by a Colorectal Cancer Study  

PubMed Central

Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control. PMID:25849489

Suyundikov, Anvar; Stevens, John R.; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K.; Slattery, Martha L.

2015-01-01

153

7 CFR 3017.630 - May the Department of Agriculture impute conduct of one person to another?  

Code of Federal Regulations, 2010 CFR

7 Agriculture 15 2010-01-01 2010-01-01 false May the Department of Agriculture impute conduct of one person to another? 3017.630 Section 3017.630 Agriculture Regulations of the Department of Agriculture...

2010-01-01

154

A FRET-based analysis of SNPs without fluorescent probes  

PubMed Central

Fluorescence resonance energy transfer (FRET) is a simple procedure for detecting specific DNA sequences, and is therefore used in many fields. However, the cost is relatively high, because FRET-based methods usually require fluorescent probes. We have designed a cost-effective way of using FRET, and developed a novel approach for the genotyping of single nucleotide polymorphisms (SNPs) and allele frequency estimation. The key feature of this method is that it uses a DNA-binding fluorogenic molecule, SYBR Green I, as an energy donor for FRET. In this method, single base extension is performed with dideoxynucleotides labeled with an orange dye and a red dye in the presence of SYBR Green I. The dyes incorporated into the extended products accept energy from SYBR Green I and emit fluorescence. We have validated the method with ten SNPs, which were successfully discriminated by end-point measurements of orange and red fluorescence intensity in a microplate fluorescence reader. Using a mixture of homozygous samples, we also confirmed the potential of this method for estimation of allele frequency. Application of this strategy to large-scale studies will reduce the time and cost of genotyping a vast number of SNPs. PMID:15534363

Takatsu, Kyoko; Yokomaku, Toyokazu; Kurata, Shinya; Kanagawa, Takahiro

2004-01-01

155

Mapping Insertions, Deletions and SNPs on Venter's Chromosomes  

PubMed Central

Background The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. Results We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. Conclusions Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations. PMID:19543403

Costantini, Maria; Bernardi, Giorgio

2009-01-01

156

The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data  

E-print Network

in data with missing covariate values present challenges. We (Wood et al, 2008; White et al 2011) and others (Ambler et al, 2007; Vergouw et al., 2010; Carpenter and Kenward, 2013) have previously described multiple imputation methods to deal... (Marshall et al., 2009). For example, M model performance measures can be estimated from the imputation- specific predictions and then pooled using Rubin’s rules, as previously recommended (Marshall et al., 2009; Vergouwe et al., 2010; White et al., 2011...

Wood, Angela M.; Royston, Patrick; White, Ian R.

2015-01-01

157

Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees.  

PubMed

In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here. PMID:25132070

Saad, Mohamad; Wijsman, Ellen M

2014-11-01

158

TTF-1 and RET promoter SNPs: regulation of RET transcription in Hirschsprung's disease  

Microsoft Academic Search

Single nucleotide polymorphisms (SNPs) of the coding regions of receptor tyrosine kinase gene (RET )a re associated with Hirschsprung's disease (HSCR, aganglionic megacolon). These SNPs, individually or com- bined, may act as a low penetrance susceptibility locus and\\/or be in linkage disequilibrium (LD) with another susceptibility locus located in RET regulatory regions. Because two RET promoter SNPs have been found

Raymond W. Ganster; Vincent C. H. Lui; Thomas Y. Y. Leon; Man-Ting So; Anson M. F. Lau; Ming Fu; Mai-Har Sham; Joanne Knight; Maria Stella Zannini; Pak C. Sham; Paul K. H. Tam

2005-01-01

159

Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti  

PubMed Central

Background Genetic markers are widely used to understand the biology and population dynamics of disease vectors, but often markers are limited in the resolution they provide. In particular, the delineation of population structure, fine scale movement and patterns of relatedness are often obscured unless numerous markers are available. To address this issue in the major arbovirus vector, the yellow fever mosquito (Aedes aegypti), we used double digest Restriction-site Associated DNA (ddRAD) sequencing for the discovery of genome-wide single nucleotide polymorphisms (SNPs). We aimed to characterize the new SNP set and to test the resolution against previously described microsatellite markers in detecting broad and fine-scale genetic patterns in Ae. aegypti. Results We developed bioinformatics tools that support the customization of restriction enzyme-based protocols for SNP discovery. We showed that our approach for RAD library construction achieves unbiased genome representation that reflects true evolutionary processes. In Ae. aegypti samples from three continents we identified more than 18,000 putative SNPs. They were widely distributed across the three Ae. aegypti chromosomes, with 47.9% found in intergenic regions and 17.8% in exons of over 2,300 genes. Pattern of their imputed effects in ORFs and UTRs were consistent with those found in a recent transcriptome study. We demonstrated that individual mosquitoes from Indonesia, Australia, Vietnam and Brazil can be assigned with a very high degree of confidence to their region of origin using a large SNP panel. We also showed that familial relatedness of samples from a 0.4 km2 area could be confidently established with a subset of SNPs. Conclusions Using a cost-effective customized RAD sequencing approach supported by our bioinformatics tools, we characterized over 18,000 SNPs in field samples of the dengue fever mosquito Ae. aegypti. The variants were annotated and positioned onto the three Ae. aegypti chromosomes. The new SNP set provided much greater resolution in detecting population structure and estimating fine-scale relatedness than a set of polymorphic microsatellites. RAD-based markers demonstrate great potential to advance our understanding of mosquito population processes, critical for implementing new control measures against this major disease vector. PMID:24726019

2014-01-01

160

Comparison of Results from Different Imputation Techniques for Missing Data from an Anti-Obesity Drug Trial  

PubMed Central

Background In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias. In anti-obesity drug trials, many data are usually missing, and the most used imputation method is last observation carried forward (LOCF). LOCF is generally considered conservative, but there are more reliable methods such as multiple imputation (MI). Objectives To compare four different methods of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate. Methods We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI. Results 561 participants were randomised. Compared to placebo, there was a significantly greater weight loss with topiramate in all analyses: 9.5 kg (SE 1.17) in the complete case analysis (N?=?86), 6.8 kg (SE 0.66) using LOCF (N?=?561), 6.4 kg (SE 0.90) using MI (N?=?561) and 1.5 kg (SE 0.28) using BOCF (N?=?561). Conclusions The different imputation methods gave very different results. Contrary to widely stated claims, LOCF did not produce a conservative (i.e., lower) efficacy estimate compared to MI. Also, LOCF had a lower SE than MI. PMID:25409438

Jørgensen, Anders W.; Lundstrøm, Lars H.; Wetterslev, Jørn; Astrup, Arne; Gøtzsche, Peter C.

2014-01-01

161

Comparison of multiple imputation and complete-case in a simulated longitudinal data with missing covariate  

NASA Astrophysics Data System (ADS)

Along a continual process of collecting data, missing recorded datum always a main problem faced by the real application. It happens due to the carelessness or the unawareness of a recorder to the importance of data documentation. In this study, a random-effects analysis which simulates data from a proposed algorithm is presented with a missing covariate. It is an improved simulation method which involves first-order autoregressive (AR(1)) process in measuring the correlation between measurements of a subject across two time sequence. Complete-case analysis and multiple imputation method are comparatively implemented for the estimation procedure. This study shows that the multiple imputation method results in estimations which fit well to the data which are not only missing completely at random (MCAR) but also missing at random (MAR). However, the complete-case analysis results in estimators which fit well to the data which are only MCAR.

Yoke, Chin Wan; Khalid, Zarina Mohd

2014-07-01

162

Inference from Multiple Imputation for Missing Data Using Mixtures of Normals  

PubMed Central

We consider two difficulties with standard multiple imputation methods for missing data based on Rubin's t method for confidence intervals: their often excessive width, and their instability. These problems are present most often when the number of copies is small, as is often the case when a data collection organization is making multiple completed datasets available for analysis. We suggest using mixtures of normals as an alternative to Rubin's t. We also examine the performance of improper imputation methods as an alternative to generating copies from the true posterior distribution for the missing observations. We report the results of simulation studies and analyses of data on health-related quality of life in which the methods suggested here gave narrower confidence intervals and more stable inferences, especially with small numbers of copies or non-normal posterior distributions of parameter estimates. A free R software package called MImix that implements our methods is available from CRAN. PMID:20454634

Steele, Russell J.; Wang, Naisyin; Raftery, Adrian E.

2010-01-01

163

Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys.  

PubMed

Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys. PMID:24223447

Rendall, Michael S; Ghosh-Dastidar, Bonnie; Weden, Margaret M; Baker, Elizabeth H; Nazarov, Zafar

2013-11-01

164

Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys  

PubMed Central

Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys. PMID:24223447

Rendall, Michael S.; Ghosh-Dastidar, Bonnie; Weden, Margaret M.; Baker, Elizabeth H.; Nazarov, Zafar

2013-01-01

165

Normalization and missing value imputation for label-free LC-MS analysis  

SciTech Connect

Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

Karpievitch, Yuliya; Dabney, Alan R.; Smith, Richard D.

2012-11-05

166

Imputing historical statistics, soils information, and other land-use data to crop area  

NASA Technical Reports Server (NTRS)

In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

1982-01-01

167

Missing data imputation of solar radiation data under different atmospheric conditions.  

PubMed

Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

2014-01-01

168

Missing value imputation on missing completely at random data using multilayer perceptrons.  

PubMed

Data mining is based on data files which usually contain errors in the form of missing values. This paper focuses on a methodological framework for the development of an automated data imputation model based on artificial neural networks. Fifteen real and simulated data sets are exposed to a perturbation experiment, based on the random generation of missing values. These data set sizes range from 47 to 1389 records. A perturbation experiment was performed for each data set where the probability of missing value was set to 0.05. Several architectures and learning algorithms for the multilayer perceptron are tested and compared with three classic imputation procedures: mean/mode imputation, regression and hot-deck. The obtained results, considering different performance measures, not only suggest this approach improves the quality of a database with missing values, but also the best results are clearly obtained using the Multilayer Perceptron model in data sets with categorical variables. Three learning rules (Levenberg-Marquardt, BFGS Quasi-Newton and Conjugate Gradient Fletcher-Reeves Update) and a small number of hidden nodes are recommended. PMID:20875726

Silva-Ramírez, Esther-Lydia; Pino-Mejías, Rafael; López-Coello, Manuel; Cubiles-de-la-Vega, María-Dolores

2011-01-01

169

Imputation of Microsatellite Alleles from Dense SNP Genotypes for Parental Verification  

PubMed Central

Microsatellite (MS) markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP)-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP). A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing four dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey) were used to generate reference haplotypes. This approach has been verified (>98% accurate) for imputing the International Society of Animal Genetics recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP-based parental verification. PMID:22912645

McClure, Matthew; Sonstegard, Tad; Wiggans, George; Van Tassell, Curtis P

2012-01-01

170

Impact of non-normal random effects on inference by multiple imputation: A simulation assessment  

PubMed Central

Multivariate extensions of well-known linear mixed-effects models have been increasingly utilized in inference by multiple imputation in the analysis of multilevel incomplete data. The normality assumption for the underlying error terms and random effects plays a crucial role in simulating the posterior predictive distribution from which the multiple imputations are drawn. The plausibility of this normality assumption on the subject-specific random effects is assessed. Specifically, the performance of multiple imputation created under a multivariate linear mixed-effects model is investigated on a diverse set of incomplete data sets simulated under varying distributional characteristics. Under moderate amounts of missing data, the simulation study confirms that the underlying model leads to a well-calibrated procedure with negligible biases and actual coverage rates close to nominal rates in estimates of the regression coefficients. Estimation quality of the random-effect variance and association measures, however, are negatively affected from both the misspecification of the random-effect distribution and number of incompletely-observed variables. Some of the adverse impacts include lower coverage rates and increased biases. PMID:20526424

Yucel, Recai M.; Demirtas, Hakan

2010-01-01

171

Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions  

PubMed Central

Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

2014-01-01

172

Tracing Cattle Breeds with Principal Components Analysis Ancestry Informative SNPs  

PubMed Central

The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products. PMID:21490966

Lewis, Jamey; Abas, Zafiris; Dadousis, Christos; Lykidis, Dimitrios; Paschou, Peristera; Drineas, Petros

2011-01-01

173

Combined sequence and sequence-structure-based methods for analyzing RAAS gene SNPs: a computational approach.  

PubMed

The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool. PMID:24878201

Singh, Kh Dhanachandra; Karthikeyan, Muthusamy

2014-12-01

174

Transcriptome analysis of the gill of Takifugu rubripes using Illumina sequencing for discovery of SNPs.  

PubMed

Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies in many species. High-throughput sequencing of RNA was developed primarily to analyze global gene expression, while it is an efficient way to discover SNPs from the expressed genes. In this study, we conducted transcriptome sequencing of the gill samples of Takifugu rubripes analyzed by using Illumina HiSeq 2000 platform to identify gene-associated SNPs from the transcriptome of T. rubripes gill. A total of 27,085,235 unique-mapped-reads from 55,061,524 raw data reads were generated. A total of 56,972 putative SNPs were discovered, which were located in 11,327 genes. 35,839 SNPs were transitions (Ts), 21,074 SNPs were transversions (Tv) and 88.1% of 56,972 SNPs were assigned to the 22 chromosomes. The average minor allele frequency (MAF) of the SNPs was 0.26. GO and KEGG pathway analyses were conducted to analyze the genes containing SNPs. Validation of selected SNPs revealed that 63.4% of SNPs (34/52) were true SNPs. RNA-Seq is a cost-effective way to discover gene-associated SNPs. In this study, a large number of SNPs were identified and these data will be useful resources for population genetic study, evolution analysis, resource assessment, genetic linkage analysis and genome-wide association studies. The results of our study can also offer some useful information as molecular makers to help select and cultivate T. rubripes. PMID:24747987

Cui, Jun; Wang, Hongdi; Liu, Shikai; Qiu, Xuemei; Jiang, Zhiqiang; Wang, Xiuli

2014-06-01

175

Purposeful Variable Selection and Stratification to Impute Missing FAST Data in Trauma Research  

PubMed Central

Background The Focused Assessment with Sonography for Trauma (FAST) exam is an important variable in many retrospective trauma studies. The purpose of this study was to devise an imputation method to overcome missing data for the FAST exam. Due to variability in patients’ injuries and trauma care, these data are unlikely to be missing completely at random (MCAR), raising concern for validity when analyses exclude patients with missing values. Methods Imputation was conducted under a less restrictive, more plausible missing at random (MAR) assumption. Patients with missing FAST exams had available data on alternate, clinically relevant elements that were strongly associated with FAST results in complete cases, especially when considered jointly. Subjects with missing data (32.7%) were divided into eight mutually exclusive groups based on selected variables that both described the injury and were associated with missing FAST values. Additional variables were selected within each group to classify missing FAST values as positive or negative, and correct FAST exam classification based on these variables was determined for patients with non-missing FAST values. Results Severe head/neck injury (odds ratio, OR=2.04), severe extremity injury (OR=4.03), severe abdominal injury (OR=1.94), no injury (OR=1.94), other abdominal injury (OR=0.47), other head/neck injury (OR=0.57) and other extremity injury (OR=0.45) groups had significant ORs for missing data; the other group odds ratio was not significant (OR=0.84). All 407 missing FAST values were imputed, with 109 classified as positive. Correct classification of non-missing FAST results using the alternate variables was 87.2%. Conclusions Purposeful imputation for missing FAST exams based on interactions among selected variables assessed by simple stratification may be a useful adjunct to sensitivity analysis in the evaluation of imputation strategies under different missing data mechanisms. This approach has the potential for widespread application in clinical and translational research and validation is warranted. Level of Evidence Level II Prognostic or Epidemiological PMID:23778515

Fuchs, Paul A.; del Junco, Deborah J.; Fox, Erin E.; Holcomb, John B.; Rahbar, Mohammad H.; Wade, Charles A.; Alarcon, Louis H.; Brasel, Karen J.; Bulger, Eileen M.; Cohen, Mitchell J.; Myers, John G.; Muskat, Peter; Phelan, Herb A.; Schreiber, Martin A.; Cotton, Bryan A.

2013-01-01

176

Lazy collaborative filtering for data sets with missing values.  

PubMed

As one of the biggest challenges in research on recommender systems, the data sparsity issue is mainly caused by the fact that users tend to rate a small proportion of items from the huge number of available items. This issue becomes even more problematic for the neighborhood-based collaborative filtering (CF) methods, as there are even lower numbers of ratings available in the neighborhood of the query item. In this paper, we aim to address the data sparsity issue in the context of neighborhood-based CF. For a given query (user, item), a set of key ratings is first identified by taking the historical information of both the user and the item into account. Then, an auto-adaptive imputation (AutAI) method is proposed to impute the missing values in the set of key ratings. We present a theoretical analysis to show that the proposed imputation method effectively improves the performance of the conventional neighborhood-based CF methods. The experimental results show that our new method of CF with AutAI outperforms six existing recommendation methods in terms of accuracy. PMID:23757575

Ren, Yongli; Li, Gang; Zhang, Jun; Zhou, Wanlei

2013-12-01

177

Disk filter  

DOEpatents

An electric disk filter provides a high efficiency at high temperature. A hollow outer filter of fibrous stainless steel forms the ground electrode. A refractory filter material is placed between the outer electrode and the inner electrically isolated high voltage electrode. Air flows through the outer filter surfaces through the electrified refractory filter media and between the high voltage electrodes and is removed from a space in the high voltage electrode.

Bergman, W.

1985-01-09

178

The operating regimes and basic control principles of SNPS Topaz''. [Cs  

SciTech Connect

The basic operating regimes of space nuclear power system (SNPS) Topaz'' are considered. These regimes include: prelaunch preparation and launch into working orbit, SNPS start-up to obtain desired electric power, nominal regime, SNPS shutdown. The main requirements for SNPS at different regimes are given, and the control algorithms providing these requirements are described. The control algorithms were chosen on the basis of theoretical studies and ground power tests of the SNPS prototypes. Topaz'' successful ground and flight tests allow to conclude that for SNPS of this type control algorithm providing required thermal state of cesium vapor supply system and excluding any possibility of discharge processes in current conducting elements is the most expedient at the start-up regime. At the nominal regime required electric power should be provided by maintenance of reactor current and fast-acting voltage regulator utilization. The limitation of the outlet coolant temperature should be foreseen also.

Makarov, A.N.; Volberg, M.S.; Grayznov, G.M.; Zhabotinsky, E.E.; Serbin, V.I. (Scientific Production Unification Krasnaya Zvezda'' USSR, Moscow 115230 (SU))

1991-01-05

179

Tools, resources and databases for SNPs and indels in sequences: a review.  

PubMed

Single Nucleotide Polymorphism (SNP) is a mutation where, a single base in the DNA differs from the usual base at that position. SNPs are the marker of choice in genetic analysis and also useful in locating genes associated with diseases. SNPs are important and frequently occurring point mutations in genomes and have many practical implications. In silico methods are easy to study the SNPs that are occurring in known genomes or sequences of a species of interest during the post genomic era. There are many on-line and stand alone tools to analyse the SNPs. We intend to guide the reader with the software details such as algorithmic background, file requirements, operating system specificity and species specificity, if any, for the tools of SNPs detection in plants and animals. We also list many databases and resources available today to describe SNPs in wide range of organisms. PMID:24794070

Seal, Abhik; Gupta, Arun; Mahalaxmi, M; Aykkal, Riju; Singh, Tiratha Raj; Arunachalam, Vadivel

2014-01-01

180

Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants  

Microsoft Academic Search

We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North

Paul R Burton; David G Clayton; Nick Craddock; Panos Deloukas; Audrey Duncanson; Dominic P Kwiatkowski; Mark I McCarthy; Willem H Ouwehand; Nilesh J Samani; John A Todd; Jeffrey C Barrett; Dan Davison; Peter Donnelly; Doug Easton; Hin-Tak Leung; Jonathan L Marchini; Andrew P Morris; Chris CA Spencer; Martin D Tobin; Antony P Attwood; James P Boorman; Barbara Cant; Ursula Everson; Judith M Hussey; Jennifer D Jolley; Alexandra S Knight; Kerstin Koch; Elizabeth Meech; Sarah Nutland; Christopher V Prowse; Helen E Stevens; Niall C Taylor; Graham R Walters; Neil M Walker; Nicholas A Watkins; Thilo Winzer; Richard W Jones; Wendy L McArdle; Susan M Ring; David P Strachan; Marcus Pembrey; Gerome Breen; David St Clair; Sian Caesar; Katharine Gordon-Smith; Lisa Jones; Christine Fraser; Elaine K Green; Detelina Grozeva; Marian L Hamshere; Peter A Holmans; Ian R Jones; George Kirov; Valentina Moskivina; Ivan Nikolov; Michael C O'Donovan; Michael J Owen; David A Collier; Amanda Elkin; Anne Farmer; Richard Williamson; Peter McGuffin; Allan H Young; I Nicol Ferrier; Stephen G Ball; Anthony J Balmforth; Jennifer H Barrett; Timothy D Bishop; Mark M Iles; Azhar Maqbool; Nadira Yuldasheva; Alistair S Hall; Peter S Braund; Richard J Dixon; Massimo Mangino; Suzanne Stevens; John R Thompson; Francesca Bredin; Mark Tremelling; Miles Parkes; Hazel Drummond; Charles W Lees; Elaine R Nimmo; Jack Satsangi; Sheila A Fisher; Alastair Forbes; Cathryn M Lewis; Clive M Onnie; Natalie J Prescott; Jeremy Sanderson; Christopher G Matthew; Jamie Barbour; M Khalid Mohiuddin; Catherine E Todhunter; John C Mansfield; Tariq Ahmad; Fraser R Cummings; Derek P Jewell; John Webster; Morris J Brown; Mark G Lathrop; John Connell; Anna Dominiczak; Carolina A Braga Marcano; Beverley Burke; Richard Dobson; Johannie Gungadoo; Kate L Lee; Patricia B Munroe; Stephen J Newhouse; Abiodun Onipinla; Chris Wallace; Mingzhan Xue; Mark Caulfield; Martin Farrall; Anne Barton; Ian N Bruce; Hannah Donovan; Steve Eyre; Paul D Gilbert; Samantha L Hilder; Anne M Hinks; Sally L John; Catherine Potter; Alan J Silman; Deborah PM Symmons; Wendy Thomson; Jane Worthington; David B Dunger; Barry Widmer; Timothy M Frayling; Rachel M Freathy; Hana Lango; John R B Perry; Beverley M Shields; Michael N Weedon; Andrew T Hattersley; Graham A Hitman; Mark Walker; Kate S Elliott; Christopher J Groves; Cecilia M Lindgren; Nigel W Rayner; Nicolas J Timpson; Eleftheria Zeggini; Melanie Newport; Giorgio Sirugo; Emily Lyons; Fredrik Vannberg; Adrian V S Hill; Linda A Bradbury; Claire Farrar; Jennifer J Pointon; Paul Wordsworth; Matthew A Brown; Jayne A Franklyn; Joanne M Heward; Matthew J Simmonds; Stephen CL Gough; Sheila Seal; Michael R Stratton; Nazneen Rahman; Maria Ban; An Goris; Stephen J Sawcer; Alastair Compston; David Conway; Muminatou Jallow; Kirk A Rockett; Suzannah J Bumpstead; Amy Chaney; Kate Downes; Mohammed JR Ghori; Rhian Gwilliam; Sarah E Hunt; Michael Inouye; Andrew Keniry; Emma King; Ralph McGinnis; Simon Potter; Rathi Ravindrarajah; Pamela Whittaker; Claire Widden; David Withers; Niall J Cardin; Teresa Ferreira; Joanne Pereira-Gale; Ingeleif B Hallgrimsdóttir; Bryan N Howie; Zhan Su; Yik Ying Teo; Damjan Vukcevic; David Bentley; Sarah L Mitchell; Paul R Newby; Oliver J Brand; Jackie Carr-Smith; Simon H S Pearce; Stephen C L Gough; John D Reveille; Xiaodong Zhou; Anne-Marie Sims; Alison Dowling; Jacqueline Taylor; Tracy Doan; John C Davis; Laurie Savage; Michael M Ward; Thomas L Learch; Michael H Weisman; Lon R Cardon; David M Evans

2007-01-01

181

Evaluating model-based imputation methods for missing covariates in regression models with interactions.  

PubMed

Imputation strategies are widely used in settings that involve inference with incomplete data. However, implementation of a particular approach always rests on assumptions, and subtle distinctions between methods can have an impact on subsequent analyses. In this research article, we are concerned with regression models in which the true underlying relationship includes interaction terms. We focus in particular on a linear model with one fully observed continuous predictor, a second partially observed continuous predictor, and their interaction. We derive the conditional distribution of the missing covariate and interaction term given the observed covariate and the outcome variable, and examine the performance of a multiple imputation procedure based on this distribution. We also investigate several alternative procedures that can be implemented by adapting multivariate normal multiple imputation software in ways that might be expected to perform well despite incompatibilities between model assumptions and true underlying relationships among the variables. The methods are compared in terms of bias, coverage, and CI width. As expected, the procedure based on the correct conditional distribution performs well across all scenarios. Just as importantly for general practitioners, several of the approaches based on multivariate normality perform comparably with the correct conditional distribution in a number of circumstances, although interestingly, procedures that seek to preserve the multiplicative relationship between the interaction term and the main-effects are found to be substantially less reliable. For illustration, the various procedures are applied to an analysis of post-traumatic stress disorder symptoms in a study of childhood trauma. Copyright © 2015 John Wiley & Sons, Ltd. PMID:25630757

Kim, Soeun; Sugar, Catherine A; Belin, Thomas R

2015-05-20

182

Imputation Techniques Using SAS Software For Incomplete Data In Diabetes Clinical Trials  

Microsoft Academic Search

ABSTRACT Missing data ,are ,common ,in clinical ,trials. In longitudinal studies,missing ,data ,are mostly ,related to drop-outs. Some drop-outs appear,completely,at random.,The source,for other drop-outs is withdrawal,from ,trials due to lack,of efficacy. For the latter case ,the standard ,analysis ,of the ,actual observed data,produces ,bias. An attractive ,approach ,to avoid ,this problem,is to impute,(i.e. fill in) the missing,data. This paper is

M. Khutoryansky; Won-chin Huang

183

A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes  

USGS Publications Warehouse

Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ? 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ? 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.

Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

2012-01-01

184

Detection of recombination events, haplotype reconstruction and imputation of sires using half-sib SNP genotypes  

PubMed Central

Background Identifying recombination events and the chromosomal segments that constitute a gamete is useful for a number of applications in genomic analyses. In livestock, genotypic data are commonly available for half-sib families. We propose a straightforward but computationally efficient method to use single nucleotide polymorphism marker genotypes on half-sibs to reconstruct the recombination and segregation events that occurred during meiosis in a sire to form the haplotypes observed in its offspring. These meiosis events determine a block structure in paternal haplotypes of the progeny and this can be used to phase the genotypes of individuals in single half-sib families, to impute haplotypes of the sire if they are not genotyped or to impute the paternal strand of the offspring’s sequence based on sequence data of the sire. Methods The hsphase algorithm exploits information from opposing homozygotes among half-sibs to identify recombination events, and the chromosomal regions from the paternal and maternal strands of the sire (blocks) that were inherited by its progeny. This information is then used to impute the sire’s genotype, which, in turn, is used to phase the half-sib family. Accuracy (defined as R2) and performance of this approach were evaluated by using simulated and real datasets. Phasing results for the half-sibs were benchmarked to other commonly used phasing programs – AlphaPhase, BEAGLE and PedPhase 3. Results Using a simulated dataset with 20 markers per cM, and for a half-sib family size of 4 and 40, the accuracy of block detection, was 0.58 and 0.96, respectively. The accuracy of inferring sire genotypes was 0.75 and 1.00 and the accuracy of phasing was around 0.97, respectively. hsphase was more robust to genotyping errors than PedPhase 3, AlphaPhase and BEAGLE. Computationally, hsphase was much faster than AlphaPhase and BEAGLE. Conclusions In half-sib families of size 8 and above, hsphase can accurately detect block structure of paternal haplotypes, impute genotypes of ungenotyped sires and reconstruct haplotypes in progeny. The method is much faster and more accurate than other widely used population-based phasing programs. A program implementing the method is freely available as an R package (hsphase). PMID:24495596

2014-01-01

185

Multiple imputation methods for multivariate one-sided tests with missing data.  

PubMed

Multivariate one-sided hypotheses testing problems arise frequently in practice. Various tests have been developed. In practice, there are often missing values in multivariate data. In this case, standard testing procedures based on complete data may not be applicable or may perform poorly if the missing data are discarded. In this article, we propose several multiple imputation methods for multivariate one-sided testing problem with missing data. Some theoretical results are presented. The proposed methods are evaluated using simulations. A real data example is presented to illustrate the methods. PMID:21466531

Wang, Tao; Wu, Lang

2011-12-01

186

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values.  

PubMed

Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment. This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (III) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation. Prediction models for breast cancer survivability are constructed using four different methods: K-Nearest Neighbors, Classification Trees, Logistic Regression and Support Vector Machines. Experiments are performed in a nested ten-fold cross-validation procedure and, according to the obtained results, the best results are provided by the K-Nearest Neighbors algorithm: more than 81% of accuracy and more than 0.78 of area under the Receiver Operator Characteristic curve, which constitutes very good results in this complex scenario. PMID:25725446

García-Laencina, Pedro J; Abreu, Pedro Henriques; Abreu, Miguel Henriques; Afonoso, Noémia

2015-04-01

187

Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE.  

PubMed

Individual participant data meta-analyses (IPD-MA) are increasingly used for developing and validating multivariable (diagnostic or prognostic) risk prediction models. Unfortunately, some predictors or even outcomes may not have been measured in each study and are thus systematically missing in some individual studies of the IPD-MA. As a consequence, it is no longer possible to evaluate between-study heterogeneity and to estimate study-specific predictor effects, or to include all individual studies, which severely hampers the development and validation of prediction models. Here, we describe a novel approach for imputing systematically missing data and adopt a generalized linear mixed model to allow for between-study heterogeneity. This approach can be viewed as an extension of Resche-Rigon's method (Stat Med 2013), relaxing their assumptions regarding variance components and allowing imputation of linear and nonlinear predictors. We illustrate our approach using a case study with IPD-MA of 13 studies to develop and validate a diagnostic prediction model for the presence of deep venous thrombosis. We compare the results after applying four methods for dealing with systematically missing predictors in one or more individual studies: complete case analysis where studies with systematically missing predictors are removed, traditional multiple imputation ignoring heterogeneity across studies, stratified multiple imputation accounting for heterogeneity in predictor prevalence, and multilevel multiple imputation (MLMI) fully accounting for between-study heterogeneity. We conclude that MLMI may substantially improve the estimation of between-study heterogeneity parameters and allow for imputation of systematically missing predictors in IPD-MA aimed at the development and validation of prediction models. Copyright © 2015?John Wiley & Sons, Ltd. PMID:25663182

Jolani, Shahab; Debray, Thomas P A; Koffijberg, Hendrik; van Buuren, Stef; Moons, Karel G M

2015-05-20

188

De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes  

PubMed Central

Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project. PMID:23110314

2012-01-01

189

Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.  

PubMed

Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. PMID:25345392

Chaurasia, Ashok; Harel, Ofer

2015-02-10

190

High quality SNPs/Indels mining and characterization in ginger from ESTs data base  

PubMed Central

Ginger (Zingiber officinale Rosc.) is an important herb of the family Zingiberaceae. It is accepted as a universal cure for a multitude of diseases in Indian systems of medicine and its rhizomes are equally popular as a spice ingredient throughout Asia. SNPs, the definitive genetic markers, representing the finest resolution of a DNA sequence, are abundantly found in populations having a lower rate of mutation and are used for genomic analysis. The public ESTs sequences mostly lack quality files, making high quality SNPs detection more difficult since it is exclusively based on sequence comparisons. In the present study, current dbESTs of NCBI was mined and 38115 ginger ESTs sequences were obtained and assembled into contigs using CAP3 program. In this analysis, recent software tool QualitySNP was used to detect 11523 potential SNPs sites, 8810 high quality SNPs and 1008 indels polymorphisms with a frequency of 1.61 SNPs / 10 kbp. Of ESTs libraries generated from three ginger tissues together, rhizomes had a frequency of 0.32 SNPs and 0.03 indels per 10 kbp whereas the leaves had a frequency of 2.51 SNPs and 0.23 indels per 10 kbp and root is showing relative frequency of 0.76/10 kbp SNPs and 0.02/10 kbp indels. The present analysis provides additional information about the tissue wise presence of haplotypes (222), distribution of high quality exonic (2355) and intronic (6455) SNPs and information about singletons (7538) in addition to contigs transitions and transversions ratio (0.57). Among all tissue detected SNPs, transversions number is higher in comparison to the number of transitions. Quality SNPs detected in this work can be used as markers for further ginger genetic experiments.

Gaur, Mahendra; Das, Aradhana; Subudhi, Enketeswara

2015-01-01

191

Biological Filters.  

ERIC Educational Resources Information Center

Presents the 1978 literature review of wastewater treatment. The review is concerned with biological filters, and it covers: (1) trickling filters; (2) rotating biological contractors; and (3) miscellaneous reactors. A list of 14 references is also presented. (HM)

Klemetson, S. L.

1978-01-01

192

SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects  

PubMed Central

Background High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. Results In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates: 1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D). Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices. 2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Conclusions Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies. SNiPlay is available at: http://sniplay.cirad.fr/. PMID:21545712

2011-01-01

193

A multiple imputation approach for clustered interval-censored survival data.  

PubMed

Multivariate interval-censored failure time data arise commonly in many studies of epidemiology and biomedicine. Analysis of these type of data is more challenging than the right-censored data. We propose a simple multiple imputation strategy to recover the order of occurrences based on the interval-censored event times using a conditional predictive distribution function derived from a parametric gamma random effects model. By imputing the interval-censored failure times, the estimation of the regression and dependence parameters in the context of a gamma frailty proportional hazards model using the well-developed EM algorithm is made possible. A robust estimator for the covariance matrix is suggested to adjust for the possible misspecification of the parametric baseline hazard function. The finite sample properties of the proposed method are investigated via simulation. The performance of the proposed method is highly satisfactory, whereas the computation burden is minimal. The proposed method is also applied to the diabetic retinopathy study (DRS) data for illustration purpose and the estimates are compared with those based on other existing methods for bivariate grouped survival data. PMID:20069624

Lam, K F; Xu, Ying; Cheung, Tak-Lun

2010-03-15

194

Kidney Filtering  

NSDL National Science Digital Library

In this activity, students filter different substances through a plastic window screen, different sized hardware cloth and poultry netting. Their model shows how the thickness of a filter in the kidney is imperative in deciding what will be filtered out and what will stay within the blood stream.

2014-09-18

195

Human Identification by Genotyping Single Nucleotide Polymorphisms (SNPs) Using an APEX Microarray  

Microsoft Academic Search

Interest in utilizing single nucleotide polymorphisms (SNPs) for genetic research is rapidly increasing. Nuclear SNPs are typically biallelic loci that segregate in a Medelian fashion and are the most common type of genetic variation in humans. They are eminently suited to a broad range of applications that include genome mapping, pharmacogenomics, and genotyping for disease diagnostics and genetic identity. By

Lisa D. White; John M. Shumaker; Jeffrey J. Tollett; Rick W. Staub

196

The role of complementary bipartite visual analytical representations in the analysis of SNPs: a case study  

E-print Network

-nucleotide polymorphisms (SNPs) can help to classify subjects on the basis of their continental origins, with applications. This variation, resulting from millennia of natural selection and random drift, is coded in w20e30 million specific diseases2 and SNPs that are highly associated with continental origins. For example, several

Bhavnani, Suresh K.

197

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and  

E-print Network

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden

McCarroll, Steve

198

PATHOTYPING OF SALMONELLA ENTERICA BY ANALYSIS OF SNPS IN CYAA AND FLANKING 23S RIBOSOMAL SEQUENCES  

Technology Transfer Automated Retrieval System (TEKTRAN)

The egg-contaminating phenotype of Salmonella enterica serotype Enteritidis was linked to single-nucleotide polymorphisms (SNPs) occurring in cyaA, which encodes adenylate cyclase that produces cAMP and pyrophosphate from ATP. Ribotyping indicated that SNPs in cyaA were linked to polymorphisms occur...

199

A Critical Review of Strategies for Selecting Haplotype Tag SNPs Ruby Lee  

E-print Network

the block with the maximum m:n ratio, where m is the total number of SNPs in the block, and n is the number of htSNPs in the block, i.e. the minimal number on a given chromosome can be inherited in blocks of haplotypes, and when

200

Thermal state of SNPS Topaz'' units: Calculation basing and experimental confirmation  

SciTech Connect

The ensuring thermal state parameters of thermionic space nuclear power system (SNPS) units in required limits on all operating regimes is a factor which determines SNPSs lifetime. The requirements to unit thermal state are distinguished to a marked degree, and both the corresponding units arragement in SNPS power generating module and the use of definite control algorithms, special thermal regulation and protection are neccessary for its provision. The computer codes which permit to define the thermal transient performances of liquid metal loop and main units had been elaborated for calculation basis of required SNPS Topaz'' unit thermal state. The conformity of these parameters to a given requirements are confirmed by results of autonomous unit tests, tests of mock-ups, power tests of ground SNPS prototypes and flight tests of two SNPS Topaz''.

Bogush, I.P.; Bushinsky, A.V.; Galkin, A.Y.; Serbin, V.I.; Zhabotinsky, E.E. (Scientific-Production Unification Krasnaya Zvezda'' USSR Moscow 115230 (SU))

1991-01-01

201

A SNP Resource for Human Chromosome 22: Extracting Dense Clusters of SNPs From the Genomic Sequence  

PubMed Central

The recent publication of the complete sequence of human chromosome 22 provides a platform from which to investigate genomic sequence variation. We report the identification and characterization of 12,267 potential variants (SNPs and other small insertions/deletions) of human chromosome 22, discovered in the overlaps of 460 clones used for the chromosome sequencing. We found, on average, 1 potential variant every 1.07 kb and approximately 18% of the potential variants involve insertions/deletions. The SNPs have been positioned both relative to each other, and to genes, predicted genes, repeat sequences, other genetic markers, and the 2730 SNPs previously identified on the chromosome. A subset of the SNPs were verified experimentally using either PCR–RFLP or genomic Invader assays. These experiments confirmed 92% of the potential variants in a panel of 92 individuals. [Details of the SNPs and RFLP assays can be found at http://www.sanger.ac.uk and in dbSNP.] PMID:11156626

Dawson, Elisabeth; Chen, Yuan; Hunt, Sarah; Smink, Luc J.; Hunt, Adrienne; Rice, Kate; Livingston, Simon; Bumpstead, Suzannah; Bruskiewich, Richard; Sham, Pak; Ganske, Rocky; Adams, Mark; Kawasaki, Kazuhiko; Shimizu, Nobuyoshi; Minoshima, Shinsei; Roe, Bruce; Bentley, David; Dunham, Ian

2001-01-01

202

Defining, Evaluating, and Removing Bias Induced by Linear Imputation in Longitudinal Clinical Trials with MNAR Missing Data  

PubMed Central

Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of “biased data”, which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = X?, leads to the definition of the primary parameter ? = (X?X)?1X?E[Y], and the definition of linear secondary parameters of the form ? = L? = L(X?X)?1X?E[Y], including for example, a parameter representing a “treatment effect”. These parameters depend explicitly on E[Y], which raises the questions: what is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is “completed” via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of ?,Bias(?^)=E[?^]??,Bias(?^)=E[?^ ]?L?, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include “Last Observation Carried Forward” (LOCF) and “Baseline Observation Carried Forward” (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical, but very realistic longitudinal analgesic clinical trial. PMID:21390998

Helms, Ronald W.; Helms-Reece, Laura; Helms, Russell W.; Helms, Mary W.

2011-01-01

203

A multiple imputation approach to the analysis of interval-censored failure time data with the additive hazards model  

Microsoft Academic Search

This paper discusses regression analysis of interval-censored failure time data, which occur in many fields including demographical, epidemiological, financial, medical, and sociological studies. For the problem, we focus on the situation where the survival time of interest can be described by the additive hazards model and a multiple imputation approach is presented for inference. A major advantage of the approach

Ling Chen; Jianguo Sun

2010-01-01

204

Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.  

PubMed

Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = X?, leads to the definition of the primary parameter ? = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form ? = L? = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of ?, Bias(?) = E[?] - ?, Bias(?) = E[?] - L?, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial. PMID:21390998

Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

2011-03-01

205

SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data  

SciTech Connect

Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs in gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.

Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.; Loots, Gabriela G.; Houston, Kathryn A.; Dubchak, Inna; Speed, Terence P.; Rubin, Edward M.

2002-01-01

206

Domain Altering SNPs in the Human Proteome and Their Impact on Signaling Pathways  

PubMed Central

Single nucleotide polymorphisms (SNPs) constitute an important mode of genetic variations observed in the human genome. A small fraction of SNPs, about four thousand out of the ten million, has been associated with genetic disorders and complex diseases. The present study focuses on SNPs that fall on protein domains, 3D structures that facilitate connectivity of proteins in cell signaling and metabolic pathways. We scanned the human proteome using the PROSITE web tool and identified proteins with SNP containing domains. We showed that SNPs that fall on protein domains are highly statistically enriched among SNPs linked to hereditary disorders and complex diseases. Proteins whose domains are dramatically altered by the presence of an SNP are even more likely to be present among proteins linked to hereditary disorders. Proteins with domain-altering SNPs comprise highly connected nodes in cellular pathways such as the focal adhesion, the axon guidance pathway and the autoimmune disease pathways. Statistical enrichment of domain/motif signatures in interacting protein pairs indicates extensive loss of connectivity of cell signaling pathways due to domain-altering SNPs, potentially leading to hereditary disorders. PMID:20886114

Liu, Yichuan; Tozeren, Aydin

2010-01-01

207

Population Genomic Analyses Based on 1 Million SNPs in Commercial Egg Layers  

PubMed Central

Identifying signatures of selection can provide valuable insight about the genes or genomic regions that are or have been under selective pressure, which can lead to a better understanding of genotype-phenotype relationships. A common strategy for selection signature detection is to compare samples from several populations and search for genomic regions with outstanding genetic differentiation. Wright's fixation index, FST, is a useful index for evaluation of genetic differentiation between populations. The aim of this study was to detect selective signatures between different chicken groups based on SNP-wise FST calculation. A total of 96 individuals of three commercial layer breeds and 14 non-commercial fancy breeds were genotyped with three different 600K SNP-chips. After filtering a total of 1 million SNPs were available for FST calculation. Averages of FST values were calculated for overlapping windows. Comparisons of these were then conducted between commercial egg layers and non-commercial fancy breeds, as well as between white egg layers and brown egg layers. Comparing non-commercial and commercial breeds resulted in the detection of 630 selective signatures, while 656 selective signatures were detected in the comparison between the commercial egg-layer breeds. Annotation of selection signature regions revealed various genes corresponding to productions traits, for which layer breeds were selected. Among them were NCOA1, SREBF2 and RALGAPA1 associated with reproductive traits, broodiness and egg production. Furthermore, several of the detected genes were associated with growth and carcass traits, including POMC, PRKAB2, SPP1, IGF2, CAPN1, TGFb2 and IGFBP2. Our approach demonstrates that including different populations with a specific breeding history can provide a unique opportunity for a better understanding of farm animal selection. PMID:24739889

Gholami, Mahmood; Erbe, Malena; Gärke, Christian; Preisinger, Rudolf; Weigend, Annett; Weigend, Steffen; Simianer, Henner

2014-01-01

208

Moment Adjusted Imputation for Multivariate Measurement Error Data with Applications to Logistic Regression  

PubMed Central

In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mismeasured data will differ from the corresponding analysis based on the “true” covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeo between convenience and performance. Moment Adjusted Imputation (MAI) is method for measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuastions, inducing correlated multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well. PMID:24072947

Thomas, Laine; Stefanski, Leonard A.; Davidian, Marie

2013-01-01

209

Assessing assay agreement estimation for multiple left-censored data: a multiple imputation approach.  

PubMed

Agreement between two assays is usually based on the concordance correlation coefficient (CCC), estimated from the means, standard deviations, and correlation coefficient of these assays. However, such data will often suffer from left-censoring because of lower limits of detection of these assays. To handle such data, we propose to extend a multiple imputation approach by chained equations (MICE) developed in a close setting of one left-censored assay. The performance of this two-step approach is compared with that of a previously published maximum likelihood estimation through a simulation study. Results show close estimates of the CCC by both methods, although the coverage is improved by our MICE proposal. An application to cytomegalovirus quantification data is provided. PMID:25292387

Lapidus, Nathanael; Chevret, Sylvie; Resche-Rigon, Matthieu

2014-12-30

210

Missing Data Analysis Using Multiple Imputation: Getting to the Heart of the Matter  

PubMed Central

Missing data are a pervasive problem in health investigations. We describe some background of missing data analysis and criticize ad-hoc methods which are prone to serious problems. We then focus on multiple imputation, in which missing cases are first filled in by several sets of plausible values to create multiple completed datasets, then standard complete-data procedures are applied to each completed dataset, and finally the multiple sets of results are combined to yield a single inference. We introduce the basic concepts and general methodology, and provide some guidance for application. For illustration, we use a study assessing the effect of cardiovascular diseases on hospice discussion for late stage lung cancer patients. PMID:20123676

He, Yulei

2010-01-01

211

Filtering apparatus  

DOEpatents

A vertical vessel having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas.

Haldipur, Gaurang B. (Monroeville, PA); Dilmore, William J. (Murrysville, PA)

1992-01-01

212

Filtering apparatus  

DOEpatents

A vertical vessel is described having a lower inlet and an upper outlet enclosure separated by a main horizontal tube sheet. The inlet enclosure receives the flue gas from a boiler of a power system and the outlet enclosure supplies cleaned gas to the turbines. The inlet enclosure contains a plurality of particulate-removing clusters, each having a plurality of filter units. Each filter unit includes a filter clean-gas chamber defined by a plate and a perforated auxiliary tube sheet with filter tubes suspended from each tube sheet and a tube connected to each chamber for passing cleaned gas to the outlet enclosure. The clusters are suspended from the main tube sheet with their filter units extending vertically and the filter tubes passing through the tube sheet and opening in the outlet enclosure. The flue gas is circulated about the outside surfaces of the filter tubes and the particulate is absorbed in the pores of the filter tubes. Pulses to clean the filter tubes are passed through their inner holes through tubes free of bends which are aligned with the tubes that pass the clean gas. 18 figs.

Haldipur, G.B.; Dilmore, W.J.

1992-09-01

213

Confidence intervals after multiple imputation: combining profile likelihood information from logistic regressions.  

PubMed

In the logistic regression analysis of a small-sized, case-control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi-squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c , allowing confidence limits for a parameter ? ?at level 1?-?? to be identified as those ?* and ?** that satisfy CDF c (?*)?=?????2 and CDF c (?**)?=?1?-?????2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. PMID:23873477

Heinze, Georg; Ploner, Meinhard; Beyea, Jan

2013-12-20

214

Modeling Filter Bypass: Impact on Filter Efficiency  

Microsoft Academic Search

Current models and test methods for determining filter efficiency ignore filter bypass, the air that circumvents filter media because of gaps around the filter or filter housing. In this paper, we develop a general model to estimate the size-resolved particle removal efficiency, including bypass, of HVAC filters. The model applies the measured pressure drop of the filter to determine the

Matthew Ward; Jeffrey Siegel

215

Filtering Light  

NSDL National Science Digital Library

Students learn how CCD cameras use color filters to create astronomical images in this Moveable Museum unit. The four-page PDF guide includes suggested general background readings for educators, activity notes, and step-by-step directions. Students look at black-and-white photos to understand gray scale and construct simple red and green cellophane filters and observe magazine images through them.

216

Submicron filter  

Microsoft Academic Search

Aluminum hydroxide fibers approximately 2 nanometers in diameter and with surface areas ranging from 200 to 650 m.sup.2\\/g have been found to be highly electropositive. When dispersed in water they are able to attach to and retain electronegative particles. When combined into a composite filter with other fibers or particles they can filter bacteria and nano size particulates such as

Frederick Tepper; Leonid Kaledin

2009-01-01

217

Water Filter  

NSDL National Science Digital Library

In this engineering activity, challenge learners to invent a water filter that cleans dirty water. Learners construct a filter device out of a 2-liter bottle and then experiment with different materials like gravel, sand, and cotton balls to see which is the most effective.
Safety note: An adult's help is needed for this activity.

WGBH Boston

2002-01-01

218

Imputation of the Rare HOXB13 G84E Mutation and Cancer Risk in a Large Population-Based Cohort  

PubMed Central

An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show here that—by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers—we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from the California Men’s Health Study specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers, and this difference increased with age. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4×10?12). The G84E mutation was also suggestively associated with an increase in risk for the following cancer sites by approximately 50% in a pleiotropic manner: breast, non-Hodgkin’s lymphoma, kidney, bladder, melanoma, endometrium, and pancreas (p = 0.042). PMID:25629170

Hoffmann, Thomas J.; Sakoda, Lori C.; Shen, Ling; Jorgenson, Eric; Habel, Laurel A.; Liu, Jinghua; Kvale, Mark N.; Asgari, Maryam M.; Banda, Yambazi; Corley, Douglas; Kushi, Lawrence H.; Quesenberry, Charles P.; Schaefer, Catherine; Van Den Eeden, Stephen K.; Risch, Neil; Witte, John S.

2015-01-01

219

Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.  

PubMed

An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show here that-by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers-we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from the California Men's Health Study specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37-0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers, and this difference increased with age. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4×10-12). The G84E mutation was also suggestively associated with an increase in risk for the following cancer sites by approximately 50% in a pleiotropic manner: breast, non-Hodgkin's lymphoma, kidney, bladder, melanoma, endometrium, and pancreas (p = 0.042). PMID:25629170

Hoffmann, Thomas J; Sakoda, Lori C; Shen, Ling; Jorgenson, Eric; Habel, Laurel A; Liu, Jinghua; Kvale, Mark N; Asgari, Maryam M; Banda, Yambazi; Corley, Douglas; Kushi, Lawrence H; Quesenberry, Charles P; Schaefer, Catherine; Van Den Eeden, Stephen K; Risch, Neil; Witte, John S

2015-01-01

220

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes  

PubMed Central

Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution). Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003). Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims. PMID:19814809

Hibbert, James D; Liese, Angela D; Lawson, Andrew; Porter, Dwayne E; Puett, Robin C; Standiford, Debra; Liu, Lenna; Dabelea, Dana

2009-01-01

221

Nonreplication of an association of SGIP1 SNPs with alcohol dependence and resting theta EEG power.  

PubMed

A recent study in a sample of Plains Indians showed association between eight single nucleotide polymorphisms (SNPs) located in the SGIP1 gene and resting ? electroencephalogram (EEG) power. This association appeared to generalize to alcohol use disorders, for which EEG power is a potential endophenotype. We analyzed a large, diverse sample for replication of the association of these implicated SGIP1 SNPs (genotyped on the Illumina 1M platform) with alcohol dependence (N=3988) and ? EEG power (N=1066). We found no evidence of association of the earlier implicated SGIP1 SNPs with either alcohol dependence or ? EEG power (all P>0.15) in this sample. The earlier implicated SNPs located in SGIP1 gene showed no association with alcohol dependence or ? EEG power in this sample of individuals with European and/or African ancestry. This failure to replicate may be the result of differences in ancestry between this sample and the original sample. PMID:21317682

Derringer, Jaime; Krueger, Robert F; Manz, Niklas; Porjesz, Bernice; Almasy, Laura; Bookman, Ebony; Edenberg, Howard J; Kramer, John R; Tischfield, Jay A; Bierut, Laura J

2011-10-01

222

SNP-Seek database of SNPs derived from 3000 rice genomes  

PubMed Central

We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L.

2015-01-01

223

SNP-Seek database of SNPs derived from 3000 rice genomes.  

PubMed

We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L

2015-01-01

224

Coding region mitochondrial DNA SNPs: Targeting East Asian and Native American haplogroups  

Microsoft Academic Search

We have developed a single PCR multiplex SNaPshot reaction that consists of 32 coding region SNPs that allows (i) increasing the discrimination power of the mitochondrial DNA (mtDNA) typing in forensic casework, and (ii) haplogroup assignments of mtDNA profiles in both human population studies (e.g. anthropological) and medical research. The selected SNPs target the East Asian phylogeny, including its Native

V. Álvarez-Iglesias; J. C. Jaime; Á. Carracedo; A. Salas

2007-01-01

225

Integrated detection and population-genetic analysis of SNPs and copy number variation  

Microsoft Academic Search

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap

Finny G Kuruvilla; Joshua M Korn; Simon Cawley; James Nemesh; Alec Wysoker; Michael H Shapero; Paul I W de Bakker; Julian B Maller; Andrew Kirby; Amanda L Elliott; Melissa Parkin; Earl Hubbell; Teresa Webster; Rui Mei; James Veitch; Patrick J Collins; Robert Handsaker; Steve Lincoln; Marcia Nizzari; John Blume; Keith W Jones; Rich Rava; Mark J Daly; Stacey B Gabriel; Steven A McCarroll; David Altshuler

2008-01-01

226

Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs  

PubMed Central

Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NF?B action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NF?B perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100

Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

2014-01-01

227

On the performance of multiple imputation based on chained equations in tackling missing data of the African ?3.7 -globin deletion in a malaria association study.  

PubMed

Multiple imputation based on chained equations (MICE) is an alternative missing genotype method that can use genetic and nongenetic auxiliary data to inform the imputation process. Previously, MICE was successfully tested on strongly linked genetic data. We have now tested it on data of the HBA2 gene which, by the experimental design used in a malaria association study in Tanzania, shows a high missing data percentage and is weakly linked with the remaining genetic markers in the data set. We constructed different imputation models and studied their performance under different missing data conditions. Overall, MICE failed to accurately predict the true genotypes. However, using the best imputation model for the data, we obtained unbiased estimates for the genetic effects, and association signals of the HBA2 gene on malaria positivity. When the whole data set was analyzed with the same imputation model, the association signal increased from 0.80 to 2.70 before and after imputation, respectively. Conversely, postimputation estimates for the genetic effects remained the same in relation to the complete case analysis but showed increased precision. We argue that these postimputation estimates are reasonably unbiased, as a result of a good study design based on matching key socio-environmental factors. PMID:24942080

Sepúlveda, Nuno; Manjurano, Alphaxard; Drakeley, Chris; Clark, Taane G

2014-07-01

228

All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs  

PubMed Central

Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR?=?1?FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.

2013-01-01

229

All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.  

PubMed

Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

2013-04-01

230

Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes.  

PubMed

Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered. PMID:24243664

Saad, Mohamad; Wijsman, Ellen M

2014-01-01

231

A Bayesian Multiple Imputation Method for Handling Longitudinal Pesticide Data with Values below the Limit of Detection.  

PubMed

Environmental and biomedical research often produces data below the limit of detection (LOD), or left-censored data. Imputing explicit values for values < LOD in a multivariate setting, such as with longitudinal data, is difficult using a likelihood-based approach. A Bayesian multiple imputation (MI) method is introduced to handle left-censored multivariate data. A Gibbs sampler, which uses an iterative process, is employed to simulate the target multivariate distribution within a Bayesian framework. Following convergence, multiple plausible data sets are generated for analysis by standard statistical methods outside of a Bayesian framework. With explicit imputed values available variables can be analyzed as outcomes or predictors. We illustrate a practical application using longitudinal data from the Community Participatory Approach to Measuring Farmworker Pesticide Exposure (PACE3) study to evaluate the association between urinary acephate concentrations (indicating pesticide exposure) and self-reported potential pesticide poisoning symptoms. Additionally, a simulation study is used to evaluate the sampling property of the estimators for distributional parameters as well as regression coefficients estimated with the generalized estimating equation (GEE) approach. Results demonstrated that the Bayesian MI estimates performed well in most settings, and we recommend the use of this valid and feasible approach to analyze multivariate data with values < LOD. PMID:23504271

Chen, Haiying; Quandt, Sara A; Grzywacz, Joseph G; Arcury, Thomas A

2013-03-01

232

A new classification rule for incomplete doubly multivariate data using mixed effects model with performance comparisons on the imputed data.  

PubMed

A mixed effects model, enhanced by a Kronecker product structure for the residual variance-covariance matrix, is used in conjunction with a discriminant analysis technique, to devise a new statistical classification method on incomplete doubly multivariate data. The proposed method is efficient in small scale clinical trials that use relatively few patients. The new classification method is also applied to multiply imputed data sets. The misclassification error rates (MERs) are compared in order to investigate the effectiveness of the new classification rule on an incomplete data set. The classification method is applied to a real data set. The error rates on the incomplete data set are found to be much less than the median error rate on the multiply imputed data sets. Non-parametric methods, such as kernel method and k-nearest neighbourhood method, are also applied to multiply imputed data sets. Results illustrating the advantages of the new classification method over classic non-parametric classification methods are presented. PMID:16220496

Roy, Anuradha

2006-05-30

233

SNPs for Parentage Testing and Traceability in Globally Diverse Breeds of Sheep  

PubMed Central

DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF?0.3) in 48±5 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionization–time-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parent’s genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.1×10(?39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the world’s sheep breeds. PMID:24740156

Heaton, Michael P.; Leymaster, Kreg A.; Kalbfleisch, Theodore S.; Kijas, James W.; Clarke, Shannon M.; McEwan, John; Maddox, Jillian F.; Basnayake, Veronica; Petrik, Dustin T.; Simpson, Barry; Smith, Timothy P. L.; Chitko-McKown, Carol G.

2014-01-01

234

Heritability of submaximal exercise heart rate response to exercise training is accounted for by nine SNPs  

PubMed Central

Endurance training-induced changes in hemodynamic traits are heritable. However, few genes associated with heart rate training responses have been identified. The purpose of our study was to perform a genome-wide association study to uncover DNA sequence variants associated with submaximal exercise heart rate training responses in the HERITAGE Family Study. Heart rate was measured during steady-state exercise at 50 W (HR50) on 2 separate days before and after a 20-wk endurance training program in 483 white subjects from 99 families. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. After quality control procedures, 320,000 single-nucleotide polymorphisms (SNPs) were available for the genome-wide association study analyses, which were performed using the MERLIN software package (single-SNP analyses and conditional heritability tests) and standard regression models (multivariate analyses). The strongest associations for HR50 training response adjusted for age, sex, body mass index, and baseline HR50 were detected with SNPs at the YWHAQ locus on chromosome 2p25 (P = 8.1 × 10?7), the RBPMS locus on chromosome 8p12 (P = 3.8 × 10?6), and the CREB1 locus on chromosome 2q34 (P = 1.6 × 10?5). In addition, 37 other SNPs showed P values <9.9 × 10?5. After removal of redundant SNPs, the 10 most significant SNPs explained 35.9% of the ?HR50 variance in a multivariate regression model. Conditional heritability tests showed that nine of these SNPs (all intragenic) accounted for 100% of the ?HR50 heritability. Our results indicate that SNPs in nine genes related to cardiomyocyte and neuronal functions, as well as cardiac memory formation, fully account for the heritability of the submaximal heart rate training response. PMID:22174390

Sung, Yun Ju; Sarzynski, Mark A.; Rice, Treva K.; Rao, D. C.; Bouchard, Claude

2012-01-01

235

FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease  

PubMed Central

Background Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs. Results We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis. Conclusions Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs. PMID:19061490

Chen, Rong; Morgan, Alex A; Dudley, Joel; Deshpande, Tarangini; Li, Li; Kodama, Keiichi; Chiang, Annie P; Butte, Atul J

2008-01-01

236

Imputation-Based Meta-Analysis of Severe Malaria in Three African Populations  

PubMed Central

Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP–based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles. PMID:23717212

Band, Gavin; Le, Quang Si; Jostins, Luke; Pirinen, Matti; Kivinen, Katja; Jallow, Muminatou; Sisay-Joof, Fatoumatta; Bojang, Kalifa; Pinder, Margaret; Sirugo, Giorgio; Conway, David J.; Nyirongo, Vysaul; Kachala, David; Molyneux, Malcolm; Taylor, Terrie; Ndila, Carolyne; Peshu, Norbert; Marsh, Kevin; Williams, Thomas N.; Alcock, Daniel; Andrews, Robert; Edkins, Sarah; Gray, Emma; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Schuldt, Kathrin; Clark, Taane G.; Small, Kerrin S.; Teo, Yik Ying; Kwiatkowski, Dominic P.; Rockett, Kirk A.; Barrett, Jeffrey C.; Spencer, Chris C. A.

2013-01-01

237

Recovering incomplete data using Statistical Multiple Imputations (SMI): a case study in environmental chemistry.  

PubMed

This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset. PMID:21962689

Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D

2011-10-15

238

A small number of candidate gene SNPs reveal continental ancestry in African Americans.  

PubMed

Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K = 2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99 ± 0.003; 0.98 ± 0.01; 0.93 ± 0.03; and 0.81 ± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably. PMID:23278390

Kodaman, Nuri; Aldrich, Melinda C; Smith, Jeffrey R; Signorello, Lisa B; Bradley, Kevin; Breyer, Joan; Cohen, Sarah S; Long, Jirong; Cai, Qiuyin; Giles, Justin; Bush, William S; Blot, William J; Matthews, Charles E; Williams, Scott M

2013-01-01

239

Identification of a combination of SNPs associated with Graves' disease using swarm intelligence.  

PubMed

Graves' disease, the production of thyroid-stimulating hormone receptor-stimulating antibodies leading to hyperthyroidism, is one of the most common forms of human autoimmune disease. It is widely agreed that complex diseases are not controlled simply by an individual gene or DNA variation but by their combination. Single nucleotide polymorphisms (SNPs), which are the most common form of DNA variation, have great potential as a medical diagnostic tool. In this paper, the P-value is used as a SNP pre-selection criterion, and a wrapper algorithm with binary particle swarm optimization is used to find the rule for discriminating between affected and control subjects. We analyzed the association between combinations of SNPs and Graves' disease by investigating 108 SNPs in 384 cases and 652 controls. We evaluated our method by differentiating between cases and controls in a five-fold cross validation test, and it achieved a 72.9% prediction accuracy with a combination of 17 SNPs. The experimental results showed that SNPs, even those with a high P-value, have a greater effect on Graves' disease when acting in a combination. PMID:21318483

Wei, Bin; Peng, QinKe; Zhang, QuanWei; Li, ChenYao

2011-02-01

240

Improved feature-based prediction of SNPs in human cytochrome P450 enzymes.  

PubMed

Single nucleotide polymorphisms (SNPs) make up the most common form of mutations in human cytochrome P450 enzymes family, and have the potential to bring with different drug responses or specific diseases in individual patients. Here, based on machine learning technology, we aim to explore an effective set of sequence-based features for improving prediction of SNPs by using support vector machine algorithms. The features are derived from the target residues and flanking protein sequences, such as amino acid types, sequences composition, physicochemical properties, position-specific scoring matrix, phylogenetic entropy and the number of possible codons of target residues. In order to deal with the imbalance data with a majority of non-SNPs and a minority of SNPs, a preprocessing strategy based on fuzzy set theory was applied to the datasets. Our final model achieves the performance of 93.8% in sensitivity, 88.8% in specificity, 91.3% in accuracy and 0.971 of AUC value, which is significantly higher than the previous DNA sequence-based or protein sequence-based methods. Furthermore, our study also suggested the roles of individual features for prediction of SNPs. The most important features consist of the amino acid type, the number of available codons, position-specific scoring matrix and phylogenetic entropy. The improved model will be a promising tool for SNP predictions, and assist in the research of genome mutation and personalized prescriptions. PMID:25792441

Li, Li; Xiong, Yi; Zhang, Zhuo-Yu; Guo, Quan; Xu, Qin; Liow, Hien-Haw; Zhang, Yong-Hong; Wei, Dong-Qing

2015-03-01

241

Thiopurine pharmacogenomics: association of SNPs with clinical response and functional validation of candidate genes  

PubMed Central

Aim We investigated candidate genes associated with thiopurine metabolism and clinical response in childhood acute lymphoblastic leukemia. Materials & methods We performed genome-wide SNP association studies of 6-thioguanine and 6-mercaptopurine cytotoxicity using lymphoblastoid cell lines. We then genotyped the top SNPs associated with lymphoblastoid cell line cytotoxicity, together with tagSNPs for genes in the ‘thiopurine pathway’ (686 total SNPs), in DNA from 589 Caucasian UK ALL97 patients. Functional validation studies were performed by siRNA knockdown in cancer cell lines. Results SNPs in the thiopurine pathway genes ABCC4, ABCC5, IMPDH1, ITPA, SLC28A3 and XDH, and SNPs located within or near ATP6AP2, FRMD4B, GNG2, KCNMA1 and NME1, were associated with clinical response and measures of thiopurine metabolism. Functional validation showed shifts in cytotoxicity for these genes. Conclusion The clinical response to thiopurines may be regulated by variation in known thiopurine pathway genes and additional novel genes outside of the thiopurine pathway. PMID:24624911

Matimba, Alice; Li, Fang; Livshits, Alina; Cartwright, Cher S; Scully, Stephen; Fridley, Brooke L; Jenkins, Gregory; Batzler, Anthony; Wang, Liewei; Weinshilboum, Richard; Lennard, Lynne

2014-01-01

242

Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs  

SciTech Connect

Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

Yang, Jing; Li, Yuan-Yuan [School of Biotechnology, East China University of Science and Technology, Shanghai 200237 (China) [School of Biotechnology, East China University of Science and Technology, Shanghai 200237 (China); Shanghai Center for Bioinformation Technology, Shanghai 200235 (China); Li, Yi-Xue, E-mail: yxli@sibs.ac.cn [School of Biotechnology, East China University of Science and Technology, Shanghai 200237 (China) [School of Biotechnology, East China University of Science and Technology, Shanghai 200237 (China); Shanghai Center for Bioinformation Technology, Shanghai 200235 (China); Ye, Zhi-Qiang, E-mail: yezq@pkusz.edu.cn [Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen 518055 (China) [Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen 518055 (China); Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031 (China)

2012-03-02

243

Air filter  

SciTech Connect

An air filter is described that has a counter rotating drum, i.e., the rotation of the drum is opposite the tangential intake of air. The intake air has about 1 lb of rock wool fibers per 107 cu. ft. of air sometimes at about 100% relative humidity. The fibers are doffed from the drum by suction nozzle which are adjacent to the drum at the bottom of the filter housing. The drum screen is cleaned by periodically jetting hot dry air at 120 psig through the screen into the suction nozzles.

Jackson, R.E.; Sparks, J.E.

1981-03-03

244

Water Filters  

NASA Technical Reports Server (NTRS)

Seeking to find a more effective method of filtering potable water that was highly contaminated, Mike Pedersen, founder of Western Water International, learned that NASA had conducted extensive research in methods of purifying water on board manned spacecraft. The key is Aquaspace Compound, a proprietary WWI formula that scientifically blends various types of glandular activated charcoal with other active and inert ingredients. Aquaspace systems remove some substances; chlorine, by atomic adsorption, other types of organic chemicals by mechanical filtration and still others by catalytic reaction. Aquaspace filters are finding wide acceptance in industrial, commercial, residential and recreational applications in the U.S. and abroad.

1988-01-01

245

Estimating the proportion of variation in susceptibility to multiple sclerosis captured by common SNPs  

NASA Astrophysics Data System (ADS)

Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.

Watson, Corey T.; Disanto, Giulio; Breden, Felix; Giovannoni, Gavin; Ramagopalan, Sreeram V.

2012-10-01

246

A Bayesian Hierarchical Model for Relating Multiple SNPs within Multiple Genes to Disease Risk  

PubMed Central

A variety of methods have been proposed for studying the association of multiple genes thought to be involved in a common pathway for a particular disease. Here, we present an extension of a Bayesian hierarchical modeling strategy that allows for multiple SNPs within each gene, with external prior information at either the SNP or gene level. The model involves variable selection at the SNP level through latent indicator variables and Bayesian shrinkage at the gene level towards a prior mean vector and covariance matrix that depend on external information. The entire model is fitted using Markov chain Monte Carlo methods. Simulation studies show that the approach is capable of recovering many of the truly causal SNPs and genes, depending upon their frequency and size of their effects. The method is applied to data on 504?SNPs in 38 candidate genes involved in DNA damage response in the WECARE study of second breast cancers in relation to radiotherapy exposure. PMID:24490143

Duan, Lewei; Thomas, Duncan C.

2013-01-01

247

Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data  

PubMed Central

Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach. PMID:23710259

Hu, Bo; Xu, Yaomin

2013-01-01

248

A multiplex allele-specific primer extension assay for forensically informative SNPs distributed throughout the mitochondrial genome  

Microsoft Academic Search

The typing of single nucleotide polymorphisms (SNPs) located throughout the mitochondrial genome (mtGenome) can help resolve individuals with an identical HV1\\/HV2 mitotype. A set of 11 SNPs selected for distinguishing individuals of the most common Caucasian HV1\\/HV2 mitotype were incorporated in an allele specific primer extension assay. The assay was optimized for multiplex detection of SNPs at positions 3010, 4793,

Peter M. Vallone; Rebecca S. Just; Michael D. Coble; John M. Butler; Thomas J. Parsons

2004-01-01

249

Defining the contribution of SNPs identified in asthma GWAS to clinical variables in asthmatic children  

PubMed Central

Background Asthma genome-wide association studies (GWAS) have identified several asthma susceptibility genes with confidence; however the relative contribution of these genetic variants or single nucleotide polymorphisms (SNPs) to clinical endpoints (as opposed to disease diagnosis) remains largely unknown. Thus the aim of this study was to firstly bridge this gap in knowledge and secondly investigate whether these SNPs or those that are in linkage disequilibrium are likely to be functional candidates with respect to regulation of gene expression, using reported data from the ENCODE project. Methods Eleven of the key SNPs identified in eight loci from recent asthma GWAS were evaluated for association with asthma and clinical outcomes, including percent predicted FEV1, bronchial hyperresponsiveness (BHR) to methacholine, severity defined by British Thoracic Society steps and positive response to skin prick test, using the family based association test additive model in a well characterised UK cohort consisting of 370 families with at least two asthmatic children. Results GSDMB SNP rs2305480 (Ser311Pro) was associated with asthma diagnosis (p?=?8.9×10-4), BHR (p?=?8.2×10-4) and severity (p?=?1.5×10-4) with supporting evidence from a second GSDMB SNP rs11078927 (intronic). SNPs evaluated in IL33, IL18R1, IL1RL1, SMAD3, IL2RB, PDE4D, CRB1 and RAD50 did not show association with any phenotype tested when corrected for multiple testing. Analysis using ENCODE data provides further insight into the functional relevance of these SNPs. Conclusions Our results provide further support for the role of GSDMB SNPs in determining multiple asthma related phenotypes in childhood asthma including associations with lung function and disease severity. PMID:24066901

2013-01-01

250

Gaussian Filters for Nonlinear Filtering Problems  

Microsoft Academic Search

In this paper we develop and analyze real-time and accurate filters for nonlinear filtering problems based on the Gaussian distributions. We present the systematic formulation of Gaussian filters and develop efficient and accurate numerical integration of the optimal filter. We also discuss the mixed Gaussian filters in which the conditional probability density is approximated by the sum of Gaussian distributions.

Kazufumi Ito; Kaiqi Xiong

1999-01-01

251

SNaPshot® minisequencing analysis of multiple ancestry-informative Y-SNPs using capillary electrophoresis.  

PubMed

This protocol describes a strategy for analyzing phylogenetic Y-SNPs in a hierarchical multiplex assay by utilizing the SNaPshot(®) Multiplex System. Step by step, the protocol assists in the appropriate selection of SNPs, the primer design, the set up of PCR/SBE reactions as well as in the analysis of the results. Furthermore, a forensic approach is highlighted, in which the most probable ancestry of an unknown male DNA is inferred by the geographical distribution of the assigned Y-SNP haplogroup. PMID:22139657

Geppert, Maria; Roewer, Lutz

2012-01-01

252

Exonic versus intronic SNPs: contrasting roles in revealing the population genetic differentiation of a widespread bird species.  

PubMed

Recent years have seen considerable progress in applying single nucleotide polymorphisms (SNPs) to population genetics studies. However, relatively few have attempted to use them to study the genetic differentiation of wild bird populations and none have examined possible differences of exonic and intronic SNPs in these studies. Here, using 144 SNPs, we examined population genetic differentiation in the saker falcon (Falco cherrug) across Eurasia. The position of each SNP was verified using the recently sequenced saker genome with 108 SNPs positioned within the introns of 10 fragments and 36 SNPs in the exons of six genes, comprising MHC, MC1R and four others. In contrast to intronic SNPs, both Bayesian clustering and principal component analyses using exonic SNPs consistently revealed two genetic clusters, within which the least admixed individuals were found in Europe/central Asia and Qinghai (China), respectively. Pairwise D analysis for exonic SNPs showed that the two populations were significantly differentiated and between the two clusters the frequencies of five SNP markers were inferred to be influenced by selection. Central Eurasian populations clustered in as intermediate between the two main groups, consistent with their geographic position. But the westernmost populations of central Europe showed evidence of demographic isolation. Our work highlights the importance of functional exonic SNPs for studying population genetic pattern in a widespread avian species. PMID:25074575

Zhan, X; Dixon, A; Batbayar, N; Bragin, E; Ayas, Z; Deutschova, L; Chavko, J; Domashevsky, S; Dorosencu, A; Bagyura, J; Gombobaatar, S; Grlica, I D; Levin, A; Milobog, Y; Ming, M; Prommer, M; Purev-Ochir, G; Ragyov, D; Tsurkanu, V; Vetrov, V; Zubkov, N; Bruford, M W

2015-01-01

253

Phosphorus Filter  

USGS Multimedia Gallery

Tom Kehler, fishery biologist at the U.S. Fish and Wildlife Service's Northeast Fishery Center in Lamar, Pennsylvania, checks the flow rate of water leaving a phosphorus filter column. The USGS has pioneered a new use for acid mine drainage residuals that are currently a disposal challenge, usi...

254

hsphase: an R package for pedigree reconstruction, detection of recombination events, phasing and imputation of half-sib family groups  

PubMed Central

Background Identification of recombination events and which chromosomal segments contributed to an individual is useful for a number of applications in genomic analyses including haplotyping, imputation, signatures of selection, and improved estimates of relationship and probability of identity by descent. Genotypic data on half-sib family groups are widely available in livestock genomics. This structure makes it possible to identify recombination events accurately even with only a few individuals and it lends itself well to a range of applications such as parentage assignment and pedigree verification. Results Here we present hsphase, an R package that exploits the genetic structure found in half-sib livestock data to identify and count recombination events, impute and phase un-genotyped sires and phase its offspring. The package also allows reconstruction of family groups (pedigree inference), identification of pedigree errors and parentage assignment. Additional functions in the package allow identification of genomic mapping errors, imputation of paternal high density genotypes from low density genotypes, evaluation of phasing results either from hsphase or from other phasing programs. Various diagnostic plotting functions permit rapid visual inspection of results and evaluation of datasets. Conclusion The hsphase package provides a suite of functions for analysis and visualization of genomic structures in half-sib family groups implemented in the widely used R programming environment. Low level functions were implemented in C++ and parallelized to improve performance. hsphase was primarily designed for use with high density SNP array data but it is fast enough to run directly on sequence data once they become more widely available. The package is available (GPL 3) from the Comprehensive R Archive Network (CRAN) or from http://www-personal.une.edu.au/~cgondro2/hsphase.htm. PMID:24906803

2014-01-01

255

lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse  

PubMed Central

Long non-coding RNAs (lncRNAs) play key roles in various cellular contexts and diseases by diverse mechanisms. With the rapid growth of identified lncRNAs and disease-associated single nucleotide polymorphisms (SNPs), there is a great demand to study SNPs in lncRNAs. Aiming to provide a useful resource about lncRNA SNPs, we systematically identified SNPs in lncRNAs and analyzed their potential impacts on lncRNA structure and function. In total, we identified 495 729 and 777 095 SNPs in more than 30 000 lncRNA transcripts in human and mouse, respectively. A large number of SNPs were predicted with the potential to impact on the miRNA–lncRNA interaction. The experimental evidence and conservation of miRNA–lncRNA interaction, as well as miRNA expressions from TCGA were also integrated to prioritize the miRNA–lncRNA interactions and SNPs on the binding sites. Furthermore, by mapping SNPs to GWAS results, we found that 142 human lncRNA SNPs are GWAS tagSNPs and 197 827 lncRNA SNPs are in the GWAS linkage disequilibrium regions. All these data for human and mouse lncRNAs were imported into lncRNASNP database (http://bioinfo.life.hust.edu.cn/lncRNASNP/), which includes two sub-databases lncRNASNP-human and lncRNASNP-mouse. The lncRNASNP database has a user-friendly interface for searching and browsing through the SNP, lncRNA and miRNA sections. PMID:25332392

Gong, Jing; Liu, Wei; Zhang, Jiayou; Miao, Xiaoping; Guo, An-Yuan

2015-01-01

256

Identification of new SNPs in native South American populations by resequencing the Y chromosome.  

PubMed

The Y-chromosomal genetic landscape of South America is relatively homogenous. The majority of native Amerindian people are assigned to haplogroup Q and only a small percentage belongs to haplogroup C. With the aim of further differentiating the major Q lineages and thus obtaining new insights into the population history of South America, two individuals, both belonging to the sub-haplogroup Q-M3, were analyzed with next-generation sequencing. Several new candidate SNPs were evaluated and four were confirmed to be new, haplogroup Q-specific, and variable. One of the new SNPs, named MG2, identifies a new sub-haplogroup downstream of Q-M3; the other three (MG11, MG13, MG15) are upstream of Q-M3 but downstream of M242, and describe branches at the same phylogenetic positions as previously known SNPs in the samples tested. These four SNPs were typed in 100 individuals belonging to haplogroup Q. PMID:25303787

Geppert, M; Ayub, Q; Xue, Y; Santos, S; Ribeiro-dos-Santos, Â; Baeta, M; Núñez, C; Martínez-Jarreta, B; Tyler-Smith, C; Roewer, L

2015-03-01

257

SNPs and MALDI-TOF MS: Tools for DNA Typing in Forensic Paternity Testing and Anthropology  

Microsoft Academic Search

DNA markers used for individual identification in forensic sciences are based on repeat sequences in nuclear DNA and the mitochondrial DNA hypervariable regions 1 and 2. An alternative to these markers is the use of single nucleotide polymorphisms (SNPs). These have a particular advantage in the analysis of degraded or poor samples, which are often all that is available in

Elizabet Petkovski; Christine Keyser-Tracqui; Rémi Hienne; Bertrand Ludes

2005-01-01

258

Transforming Growth Factor-?1 SNPs: Genetic and Phenotypic Correlations in Progressive Kidney Insufficiency  

Microsoft Academic Search

Associations have been described between polymorphisms of cytokine and growth factor genes and susceptibility to, or progression of, an increasing number of diseases. TGF-?1 plays an important role in the pathogenesis of experimental and clinical glomerulosclerosis and tubulointerstitial fibrosis. In this study, single nucleotide polymorphisms (SNPs) in the TGF?1 gene were investigated as possible markers for the progression of chronic

M. Salah Khalil; A. M. El Nahas; A. I. F. Blakemore

2005-01-01

259

Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA Segments  

PubMed Central

Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 260 accessions including 55 from Myanmar, and 178 different genotypes were thus identified. A total of 99 cultivars were assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable identification of 1200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the pummelo species. PMID:24732455

Wu, Bo; Zhong, Guang-yan; Yue, Jian-qiang; Yang, Run-ting; Li, Chong; Li, Yue-jia; Zhong, Yun; Wang, Xuan; Jiang, Bo; Zeng, Ji-wu; Zhang, Li; Yan, Shu-tang; Bei, Xue-jun; Zhou, Dong-guo

2014-01-01

260

Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs  

E-print Network

0 Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza phylogeny model for haplotype evolution has been successfully applied to haplotype resolution from genotype in the design and analysis of genetic studies. We consider a novel type of data, xor-genotypes, which

Shamir, Ron

261

AN ANALYSIS OF NEIGHBORING NUCLEOTIDE EFFECTS ON SNPS IN NUCLEAR DNA FROM MAIZE (ZEA MAYS)  

Technology Transfer Automated Retrieval System (TEKTRAN)

The composition of neighboring nucleotides influences the pattern and rate of mutation in plant organelle DNA and in vertebrate nuclear DNA. Here we study the influence of context on mutations in nuclear DNA from a plant (maize; Zea mays ssp. mays) using a dataset of 10,472 SNPs generated by reseq...

262

118 SNPs of folate-related genes and risks of spina bifida and conotruncal heart defects  

Microsoft Academic Search

BACKGROUND: Folic acid taken in early pregnancy reduces risks for delivering offspring with several congenital anomalies. The mechanism by which folic acid reduces risk is unknown. Investigations into genetic variation that influences transport and metabolism of folate will help fill this data gap. We focused on 118 SNPs involved in folate transport and metabolism. METHODS: Using data from a California

Gary M Shaw; Wei Lu; Huiping Zhu; Wei Yang; Farren BS Briggs; Suzan L Carmichael; Lisa F Barcellos; Edward J Lammer; Richard H Finnell

2009-01-01

263

Phytologia (April 2010) 92(1)68 DISCOVERY AND SNPS ANALYSES OF POPULATIONS OF  

E-print Network

Phytologia (April 2010) 92(1)68 DISCOVERY AND SNPS ANALYSES OF POPULATIONS OF JUNIPERUS MARITIMA 98368 ABSTRACT Trees from two populations of Juniperus commonly identified as J. scopulorum growing of a Pleistocene refugium for J. maritima. Phytologia 92(1): 68-81 (April, 2010). KEY WORDS: Juniperus maritima, J

Adams, Robert P.

264

Genome-wide association studies (GWAS) use dense maps of SNPs that  

E-print Network

-genetics arguments suggested that LD in the general human population would probably be limited to distances below 100Genome-wide association studies (GWAS) use dense maps of SNPs that cover the human genome to look of the lessons that were learnt from the initial crop of GWAS for future studies of human genetic variation

Kruglyak, Leonid

265

Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease,  

E-print Network

Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, UK, 4 Department of Agriculture and Food Systems, University of Melbourne, Melbourne, Australia, 5 Nuffield Department of Obstetrics heritability. We used a linear mixed model to fit all single nu- cleotide polymorphisms (SNPs) simultaneously

Nyholt, Dale R.

266

Large-scale enrichment and discovery of gene-associated SNPs  

Technology Transfer Automated Retrieval System (TEKTRAN)

With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...

267

SNPs in Transcripts In our analysis, 30% of the aligned reads are allele-specific.  

E-print Network

SNPs in Transcripts In our analysis, 30% of the aligned reads are allele-specific. The figure below of the transcriptomes. 2. Transcript level analysis reveals strong evidence for allele-specific differential expressions shows that 85% of transcripts have at least one SNP between 129 and PWK, which makes the F1 cross- es

Whitton, Mary C.

268

Cross-Amplification and Validation of SNPs Conserved over 44 Million Years between Seals and Dogs  

PubMed Central

High-density SNP arrays developed for humans and their companion species provide a rapid and convenient tool for generating SNP data in closely-related non-model organisms, but have not yet been widely applied to phylogenetically divergent taxa. Consequently, we used the CanineHD BeadChip to genotype 24 Antarctic fur seal (Arctocephalus gazella) individuals. Despite seals and dogs having diverged around 44 million years ago, 33,324 out of 173,662 loci (19.2%) could be genotyped, of which 173 were polymorphic and clearly interpretable. Two SNPs were validated using KASP genotyping assays, with the resulting genotypes being 100% concordant with those obtained from the high-density array. Two loci were also confirmed through in silico visualisation after mapping them to the fur seal transcriptome. Polymorphic SNPs were distributed broadly throughout the dog genome and did not differ significantly in proximity to genes from either monomorphic SNPs or those that failed to cross-amplify in seals. However, the nearest genes to polymorphic SNPs were significantly enriched for functional annotations relating to energy metabolism, suggesting a possible bias towards conserved regions of the genome. PMID:23874599

Hoffman, Joseph I.; Thorne, Michael A. S.; McEwing, Rob; Forcada, Jaume; Ogden, Rob

2013-01-01

269

Supplementary Methods: Smoothed genetic map positions: To obtain genetic positions for the SNPs in this study,  

E-print Network

integrated genetic map1 , with the one modification that we inserted "artificial" markers at either endSupplementary Methods: Smoothed genetic map positions: To obtain genetic positions for the SNPs of each chromosome's centromere, with 0.01cM genetic distance separation between them to ensure that very

Reich, David

270

SNPs for parentage testing and traceability in globally diverse breeds of sheep  

Technology Transfer Automated Retrieval System (TEKTRAN)

DNA-based parentage determination accelerates genetic improvement by increasing pedigree accuracy. However, the utility of any “parentage SNP” varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities...

271

Mining SNPs and Indels in Mung Bean (Vigna radiata) by Ecotilling  

Technology Transfer Automated Retrieval System (TEKTRAN)

Ecotilling is a powerful genetic analysis tool. It can provide rapid identification of naturally occurring Single Nucleotide Polymorphisms (SNPs) and small insertion/deletions (indels) in a pool of accessions for a gene of interest. This technique eliminates the time consuming and expensive proced...

272

Expression of DISC1 binding partners is reduced in schizophrenia and associated with DISC1 SNPs  

E-print Network

. INTRODUCTION Schizophrenia is a psychiatric disorder characterized by cognitive impairment, disturbances for polygenic disorders like schizophrenia will likely require demonstration that genetic variation predictsExpression of DISC1 binding partners is reduced in schizophrenia and associated with DISC1 SNPs

Baker, Chris I.

273

Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNPs that Associate with Disease  

PubMed Central

Complex genetic disorders are a result of a combination of genetic and non-genetic factors, all potentially interacting. Machine learning methods hold the potential to identify multi-locus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of SNPs of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of ADTrees. The algorithm is order nk2, where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7822 SNPs in 3548 individuals). Our results suggest that BADTrees holds promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease. PMID:22851473

Guy, Richard T.; Santago, Peter; Langefeld, Carl D.

2013-01-01

274

Bayesian Filtering: From Kalman Filters to Particle Filters, and Beyond  

Microsoft Academic Search

In this self-contained survey\\/review paper, we system- atically investigate the roots of Bayesian filtering as well as its rich leaves in the literature. Stochastic filtering theory is briefly reviewed with emphasis on nonlinear and non-Gaussian filtering. Following the Bayesian statistics, different Bayesian filtering techniques are de- veloped given different scenarios. Under linear quadratic Gaussian circumstance, the celebrated Kalman filter can

ZHE CHEN

275

Water Filter  

NASA Technical Reports Server (NTRS)

A compact, lightweight electrolytic water sterilizer available through Ambassador Marketing, generates silver ions in concentrations of 50 to 100 parts per billion in water flow system. The silver ions serve as an effective bactericide/deodorizer. Tap water passes through filtering element of silver that has been chemically plated onto activated carbon. The silver inhibits bacterial growth and the activated carbon removes objectionable tastes and odors caused by addition of chlorine and other chemicals in municipal water supply. The three models available are a kitchen unit, a "Tourister" unit for portable use while traveling and a refrigerator unit that attaches to the ice cube water line. A filter will treat 5,000 to 10,000 gallons of water.

1982-01-01

276

Plasmonic filters.  

SciTech Connect

Metal films perforated with subwavelength hole arrays have been show to demonstrate an effect known as Extraordinary Transmission (EOT). In EOT devices, optical transmission passbands arise that can have up to 90% transmission and a bandwidth that is only a few percent of the designed center wavelength. By placing a tunable dielectric in proximity to the EOT mesh, one can tune the center frequency of the passband. We have demonstrated over 1 micron of passive tuning in structures designed for an 11 micron center wavelength. If a suitable midwave (3-5 micron) tunable dielectric (perhaps BaTiO{sub 3}) were integrated with an EOT mesh designed for midwave operation, it is possible that a fast, voltage tunable, low temperature filter solution could be demonstrated with a several hundred nanometer passband. Such an element could, for example, replace certain components in a filter wheel solution.

Passmore, Brandon Scott; Shaner, Eric Arthur; Barrick, Todd A.

2009-09-01

277

Drug Filtering  

NSDL National Science Digital Library

This lesson from Illuminations looks at exponential decay. The example of how kidneys filter blood is used. The material asks students to determine the amount of a drug that remains in the body over a period of time. Students will predict behavior by an exponential decay model and graph an exponential set of data. The lesson is appropriate for grades 9-12 and should require 1 class period to complete.

278

Common CD36 SNPs reduce protein expression and may contribute to a protective atherogenic profile  

PubMed Central

Membrane CD36 functions in the uptake of fatty acids (FAs), oxidized lipoproteins and in signal transduction after binding these ligands. In rodents, CD36 is implicated in abnormal lipid metabolism, inflammation and atherosclerosis. In humans, CD36 variants have been identified to influence free FA and high-density lipoprotein (HDL) levels and to associate with the risk of the metabolic syndrome, coronary artery disease and stroke. In this study, 15 common lipid-associated CD36 single nucleotide polymorphisms (SNPs) were evaluated for the impact on monocyte CD36 expression (protein and transcript) in 104 African Americans. In a subset of subjects, the SNPs were tested for association with monocyte surface CD36 (n = 65) and platelet total CD36 (n = 57). The relationship between CD36 expression and serum HDL and very low-density lipoproteins (VLDLs) levels was also examined. After a permutation-based correction for multiple tests, four SNPs (rs1761667, rs3211909, rs3211913, rs3211938) influenced monocyte CD36 protein and two (rs3211909, rs3211938) platelet CD36. The effect of the HDL-associated SNPs on CD36 expression inversely related to the impact on serum HDL and potential causality was supported by Mendelian randomization analysis. Consistent with this, monocyte CD36 protein negatively correlated with total HDL and HDL subfractions. In contrast, positive correlations were documented between monocyte CD36 and VLDL lipid, particle number and apolipoprotein B. In conclusion, CD36 variants that reduce protein expression appear to promote a protective metabolic profile. The SNPs in this study may have predictive potential on CD36 expression and disease susceptibility in African Americans. Further studies are warranted to validate and determine whether these findings are population specific. PMID:20935172

Love-Gregory, Latisha; Sherva, Richard; Schappe, Timothy; Qi, Jian-Shen; McCrea, Jennifer; Klein, Samuel; Connelly, Margery A.; Abumrad, Nada A.

2011-01-01

279

Rapid screening of mtDNA coding region SNPs for the identification of west European Caucasian haplogroups  

Microsoft Academic Search

This work presents a selection of 16 SNPs from the coding region of the human mitochondrial DNA. The selected markers are used for the assignment of individuals to one of the nine major European Caucasian mitochondrial haplogroups. The selected SNPs are targeted in two multiplex systems, via the application of the SNaPshot kit, a multiplex method based on the dideoxy

Anita Brandstätter; Thomas J. Parsons; Walther Parson

2003-01-01

280

Identification of Novel Single Nucleotide Polymorphisms (SNPs) in Deer (Odocoileus spp.) Using the BovineSNP50  

E-print Network

Identification of Novel Single Nucleotide Polymorphisms (SNPs) in Deer (Odocoileus spp.) Using) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped

Latch, Emily K.

281

Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis  

PubMed Central

Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses. PMID:22013517

Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

2012-01-01

282

Rocket noise filtering system using digital filters  

NASA Technical Reports Server (NTRS)

A set of digital filters is designed to filter rocket noise to various bandwidths. The filters are designed to have constant group delay and are implemented in software on a general purpose computer. The Parks-McClellan algorithm is used. Preliminary tests are performed to verify the design and implementation. An analog filter which was previously employed is also simulated.

Mauritzen, David

1990-01-01

283

Novel SNPs in the bovine ADIPOQ and PPARGC1A genes are associated with carcass traits in Hanwoo (Korean cattle).  

PubMed

Adiponectin (ADIPOQ) modulates several biological processes including energy homeostasis, glucose and lipid metabolism. The bovine ADIPOQ gene was located near the QTL affecting marbling, ribeye muscle area and fat thickness on BTA1. The gene encoding peroxisome proliferator-activated receptor-? coactivator-1? (PPARGC1A) was located within the QTL region of the traits on BTA6. Moreover, its protein product has various biological functions such as cellular energy homeostasis, including adaptive thermogenesis, adipogenesis and gluconeogenesis. Therefore, the ADIPOQ and PPARGC1A genes are a positional and functional candidate gene for carcass traits in beef cattle. The objectives of this study were to identify polymorphisms in the bovine ADIPOQ and PPARGC1A genes, to evaluate their associations with carcass traits in Hanwoo (Korean cattle) population. We identified nine SNPs in the ADIPOQ gene. Two SNPs (DQ156119: g.1436T > C and DQ156119: g.1454A > G) in the promoter region were recognized as new SNPs identified in Hanwoo. Association analysis indicated that the g.1454A > G SNP genotype was significantly associated with effects on LMA (P = 0.004) and BF (P = 0.021). The ADIPOQ haplotype was also found to have significant effect on the LMA. In the PPARGC1A gene, we identified 11 SNPs in the two unexplored regions (intron 3 and 5). Among them, seven SNPs were located in intron 3 and four SNPs were located in intron 5. Of these 11 putative novel SNPs, two SNPs (AY839822: g.292C > T and AY839823: g.1064C > T) with minor allele frequency (MAF) > 0.20 were examined for associations with carcass traits. The association analysis revealed that both SNPs in PPARGC1A gene were significantly associated with LMA (P < 0.05). These findings suggest that the SNPs of bovine ADIPOQ and PPARGC1A genes may be a useful molecular marker for selection of carcass traits in Hanwoo. PMID:23649766

Shin, Sungchul; Chung, Euiryong

2013-07-01

284

Predictive mapping of forest composition and structure with direct gradient analysis and nearest- neighbor imputation in coastal Oregon, U.S.A  

Microsoft Academic Search

Spatially explicit information on the species composition and structure of forest vegetation is needed at broad spatial scales for natural resource policy analysis and ecological research. We present a method for predictive vegetation mapping that applies direct gradient analysis and nearest-neighbor imputation to ascribe detailed ground at - tributes of vegetation to each pixel in a digital landscape map. The

Janet L. Ohmann; Matthew J. Gregory

2002-01-01

285

An empirical comparison of SNPs and microsatellites for parentage and kinship assignment in a wild sockeye salmon (Oncorhynchus nerka) population.  

PubMed

Because of their high variability, microsatellites are still considered the marker of choice for studies on parentage and kinship in wild populations. Nevertheless, single nucleotide polymorphisms (SNPs) are becoming increasing popular in many areas of molecular ecology, owing to their high-throughput, easy transferability between laboratories and low genotyping error. An ongoing discussion concerns the relative power of SNPs compared to microsatellites-that is, how many SNP loci are needed to replace a panel of microsatellites? Here, we evaluate the assignment power of 80 SNPs (H(E) = 0.30, 80 independent alleles) and 11 microsatellites (H(E) = 0.85, 192 independent alleles) in a wild population of about 400 sockeye salmon with two commonly used software packages (Cervus3, Colony2) and, for SNPs only, a newly developed software (SNPPIT). Assignment success was higher for SNPs than for microsatellites, especially for parent pairs, irrespective of the method used. Colony2 assigned a larger proportion of offspring to at least one parent than the other methods, although Cervus and SNPPIT detected more parent pairs. Identification of full-sib groups without parental information from relatedness measures was possible using both marker systems, although explicit reconstruction of such groups in Colony2 was impossible for SNPs because of computation time. Our results confirm the applicability of SNPs for parentage analyses and refute the predictability of assignment success from the number of independent alleles. PMID:21429171

Hauser, Lorenz; Baird, Melissa; Hilborn, Ray; Seeb, Lisa W; Seeb, James E

2011-03-01

286

Evaluation of Approaches to Identify Associated SNPs That Explain the Linkage Evidence in Nuclear Families with Affected Siblings  

PubMed Central

Linkage analysis is often followed by association mapping to localize disease variants. In this paper, we evaluate approaches to determine how much of the observed linkage evidence, namely the identity-by-descent (IBD) sharing at the linkage peak, is explained by associated SNPs. We study several methods: Homozygote Sharing Tests (HST), Genotype Identity-by-Descent Sharing Test (GIST), and a permutation approach. We also propose a new approach, HSTMLB, combining HST and the Maximum Likelihood Binomial (MLB) linkage statistic. These methods can identify SNPs partially explaining the linkage peak, but only HST and HSTMLB can identify SNPs that do not fully explain the linkage evidence and be applied to multiple-SNPs. We contrast these methods with the association tests implemented in the software LAMP. In our simulations, GIST is more powerful at finding SNPs that partially explain the linkage peak, while HST and HSTMLB are equally powerful at identifying SNPs that do not fully explain the linkage peak. When applied to the North American Rheumatoid Arthritis Consortium data, HST and HSTMLB identify marker pairs that may fully explain the linkage peak on chromosome 6. In conclusion, HST and HSTMLB provide simple and flexible tools to identify SNPs that explain the IBD sharing at the linkage peak. PMID:19996608

Chen, Ming-Huei; Van Eerdewegh, Paul; Vincent, Quentin B.; Alcais, Alexandre; Abel, Laurent; Dupuis, Josée

2010-01-01

287

Rank and Order: Evaluating the Performance of SNPs for Individual Assignment in a Non-Model Organism  

PubMed Central

Single nucleotide polymorphisms (SNPs) are valuable tools for ecological and evolutionary studies. In non-model species, the use of SNPs has been limited by the number of markers available. However, new technologies and decreasing technology costs have facilitated the discovery of a constantly increasing number of SNPs. With hundreds or thousands of SNPs potentially available, there is interest in comparing and developing methods for evaluating SNPs to create panels of high-throughput assays that are customized for performance, research questions, and resources. Here we use five different methods to rank 43 new SNPs and 71 previously published SNPs for sockeye salmon: FST, informativeness (In), average contribution to principal components (LC), and the locus-ranking programs BELS and WHICHLOCI. We then tested the performance of these different ranking methods by creating 48- and 96-SNP panels of the top-ranked loci for each method and used empirical and simulated data to obtain the probability of assigning individuals to the correct population using each panel. All 96-SNP panels performed similarly and better than the 48-SNP panels except for the 96-SNP BELS panel. Among the 48-SNP panels, panels created from FST, In, and LC ranks performed better than panels formed using the top-ranked loci from the programs BELS and WHICHLOCI. The application of ranking methods to optimize panel performance will become more important as more high-throughput assays become available. PMID:23185290

Storer, Caroline G.; Pascal, Carita E.; Roberts, Steven B.; Templin, William D.; Seeb, Lisa W.; Seeb, James E.

2012-01-01

288

Detection of cis-acting regulatory SNPs using allelic expression data  

PubMed Central

Allelic expression (AE) imbalance between the two alleles of a gene can be used to detect cis-acting regulatory SNPs (rSNPs) in individuals heterozygous for a transcribed SNP (tSNP). In this paper, we propose three tests for AE analysis focusing on phase-unknown data and any degree of linkage disequilibrium (LD) between the rSNP and tSNP: a test based on the minimum p-value of a one-sided F and two-sided t tests proposed previously for phase-unknown data, a test that combines these two p-values, and a mixture-model based test. We compare these three tests to the F and t tests and an existing regression-based test for phase-known data. We show that the ranking of the tests based on power depends most strongly on the magnitude of the LD between the rSNP and tSNP. For phase-unknown data we find that under a range of scenarios, our proposed tests have higher power than the F and t tests when LD between the rSNP and tSNP is moderate (~.2 < D'RT < ~.8). We further demonstrate that the presence of a second ungenotyped rSNP almost never invalidates the proposed tests nor substantially changes their power rankings. For detection of cis-acting regulatory SNPs using phase-unknown AE data, we recommend the F test when the rSNP and tSNP are in or near linkage equilibrium (D'RT < .2); the t test when the two SNPs are in strong LD (D'RT > .7); and the mixture-model based test for intermediate LD levels (.2 < D'RT < .7). PMID:21769929

Xiao, Rui; Scott, Laura J.

2014-01-01

289

Association of obesity risk SNPs in PCSK1 with insulin sensitivity and proinsulin conversion  

PubMed Central

Background Prohormone convertase 1 is involved in maturation of peptides. Rare mutations in gene PCSK1, encoding this enzyme, cause childhood obesity and abnormal glucose homeostasis with elevated proinsulin concentrations. Common single nucleotide polymorphisms (SNPs) within this gene, rs6232 and rs6235, are associated with obesity. We studied whether these SNPs influence the prediabetic traits insulin resistance, ?-cell dysfunction, or glucose intolerance. Methods We genotyped 1498 German subjects for SNPs rs6232 and rs6235 within PCSK1. The subjects were metabolically characterized by oral glucose tolerance test with glucose, insulin, proinsulin, and C-peptide measurements. A subgroup of 512 subjects underwent a hyperinsulinemic-euglycemic clamp. Results The minor allele frequencies were 25.8% for SNP rs6235 and 6.0% for rs6232. After adjustment for sex and age, we found no association of SNPs rs6235 and rs6232 with BMI or other weight-related traits (all p ? 0.07). Both minor alleles, adjusted for sex, age, BMI and insulin sensitivity were associated with elevated AUCproinsulin and AUCproinsulin/AUCinsulin (rs6235: padditive model ? 0.009, effect sizes 8/8%, rs6232: pdominant model ? 0.01, effect sizes 10/21%). Insulin secretion was not affected by the variants (different secretion parameters, all p ? 0.08). The minor allele of SNP rs6232 was additionally associated with 15% higher OGTT-derived and 19% higher clamp-derived insulin sensitivity (pdom ? 0.0047), 4.5% lower HOMAIR (pdom = 0.02) and 3.5% lower 120-min glucose (pdom = 0.0003) independently of BMI and proinsulin conversion. SNP rs6235 was not associated with parameters of glucose metabolism. Conclusions Like rare mutations in PCSK1, the more common variants tested determine glucose-stimulated proinsulin conversion, but not insulin secretion. In addition, rs6232, encoding the amino acid exchange N221D, influences insulin sensitivity and glucose homeostasis. PMID:20534142

2010-01-01

290

An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes  

Microsoft Academic Search

This paper assesses the use of single nucleotide polymorphisms (SNPs) for forensic analysis. It demonstrates that relatively\\u000a small arrays of approx. 50 loci are comparable to existing short tandem repeat (STR) multiplexes. A quantitative test, however,\\u000a is a prerequisite for mixture interpretation. In addition, as the mixture proportion becomes low, it will be necessary to\\u000a distinguish between the allele and

Peter Gill

2001-01-01

291

Fishing for SNPs: A Targeted Locus Approach for Single Nucleotide Polymorphism Discovery in Rainbow Trout  

Microsoft Academic Search

The combination of whole-genome sequencing efforts and emerging high-throughput genotyping techniques has made single nucleotide polymorphisms (SNPs) a marker of choice for molecular genetic analyses in model organisms. This class of marker holds great promise for resolving questions of phylogeny, population structure, introgression, and adaptive genetic variation. Fifty-five polymerase chain reaction primer pairs were used to target variable regions of

A. E. Sprowles; M. R. Stephens; N. W. Clipperton; B. P. May

2006-01-01

292

A Reduced Number of mtSNPs Saturates Mitochondrial DNA Haplotype Diversity of Worldwide Population Groups  

Microsoft Academic Search

BackgroundThe high levels of variation characterising the mitochondrial DNA (mtDNA) molecule are due ultimately to its high average mutation rate; moreover, mtDNA variation is deeply structured in different populations and ethnic groups. There is growing interest in selecting a reduced number of mtDNA single nucleotide polymorphisms (mtSNPs) that account for the maximum level of discrimination power in a given population.

Antonio Salas; Jorge Amigo; Vincent Macaulay

2010-01-01

293

Genetic Variation and Recent Positive Selection in Worldwide Human Populations: Evidence from Nearly 1 Million SNPs  

PubMed Central

Background Genome-wide scans of hundreds of thousands of single-nucleotide polymorphisms (SNPs) have resulted in the identification of new susceptibility variants to common diseases and are providing new insights into the genetic structure and relationships of human populations. Moreover, genome-wide data can be used to search for signals of recent positive selection, thereby providing new insights into the genetic adaptations that occurred as modern humans spread out of Africa and around the world. Methodology We genotyped approximately 500,000 SNPs in 255 individuals (5 individuals from each of 51 worldwide populations) from the Human Genome Diversity Panel (HGDP-CEPH). When merged with non-overlapping SNPs typed previously in 250 of these same individuals, the resulting data consist of over 950,000 SNPs. We then analyzed the genetic relationships and ancestry of individuals without assigning them to populations, and we also identified candidate regions of recent positive selection at both the population and regional (continental) level. Conclusions Our analyses both confirm and extend previous studies; in particular, we highlight the impact of various dispersals, and the role of substructure in Africa, on human genetic diversity. We also identified several novel candidate regions for recent positive selection, and a gene ontology (GO) analysis identified several GO groups that were significantly enriched for such candidate genes, including immunity and defense related genes, sensory perception genes, membrane proteins, signal receptors, lipid binding/metabolism genes, and genes involved in the nervous system. Among the novel candidate genes identified are two genes involved in the thyroid hormone pathway that show signals of selection in African Pygmies that may be related to their short stature. PMID:19924308

Theunert, Christoph; Pugach, Irina; Li, Jing; Nandineni, Madhusudan R.; Gross, Arnd; Scholz, Markus; Stoneking, Mark

2009-01-01

294

Identification of SNPs in the cystic fibrosis interactome influencing pulmonary progression in cystic fibrosis  

PubMed Central

There is growing evidence that the great phenotypic variability in patients with cystic fibrosis (CF) not only depends on the genotype, but apart from a combination of environmental and stochastic factors predominantly also on modifier gene effects. It has been proposed that genes interacting with CF transmembrane conductance regulator (CFTR) and epithelial sodium channel (ENaC) are potential modifiers. Therefore, we assessed the impact of single-nucleotide polymorphisms (SNPs) of several of these interacters on CF disease outcome. SNPs that potentially alter gene function were genotyped in 95 well-characterized p.Phe508del homozygous CF patients. Linear mixed-effect model analysis was used to assess the relationship between sequence variants and the repeated measurements of lung function parameters. In total, we genotyped 72 SNPs in 10 genes. Twenty-five SNPs were used for statistical analysis, where we found strong associations for one SNP in PPP2R4 with the lung clearance index (P?0.01), the specific effective airway resistance (P?0.005) and the forced expiratory volume in 1?s (P?0.005). In addition, we identified one SNP in SNAP23 to be significantly associated with three lung function parameters as well as one SNP in PPP2R1A and three in KRT19 to show a significant influence on one lung function parameter each. Our findings indicate that direct interacters with CFTR, such as SNAP23, PPP2R4 and PPP2R1A, may modify the residual function of p.Phe508del-CFTR while variants in KRT19 may modulate the amount of p.Phe508del-CFTR at the apical membrane and consequently modify CF disease. PMID:22892532

Gisler, Franziska M; von Kanel, Thomas; Kraemer, Richard; Schaller, André; Gallati, Sabina

2013-01-01

295

Impulsiveness mediates the association between GABRA2 SNPs and lifetime alcohol problems  

PubMed Central

Genetic variants in GABRA2 have previously been shown to be associated with alcohol measures, EEG ? waves, and impulsiveness-related traits. Impulsiveness is a behavioral risk factor for alcohol and other substance abuse. Here, we tested association between 11 variants in GABRA2 with NEO- impulsiveness and problem drinking. Our sample of 295 unrelated adult subjects was from a community of families with at least one male with DSM-IV Alcohol use diagnosis, and from a socioeconomically comparable control group. Ten GABRA2 SNPs were associated with the NEO-impulsiveness (p < 0.03). The alleles associated with higher impulsiveness correspond to the minor alleles identified in previous alcohol dependence studies. All ten SNPs are in LD with each other and represent one effect on impulsiveness. Four SNPs and the corresponding haplotype from intron 3 to intron 4 were also associated with Lifetime Alcohol Problems Score (LAPS, p < 0.03) (not corrected for multiple testing). Impulsiveness partially mediates (22.6% average) this relation between GABRA2 and LAPS. Our results suggest that GABRA2 variation in the region between introns 3 and 4 is associated with impulsiveness and this effect partially influences the development of alcohol problems, but a direct effect of GABRA2 on problem drinking remains. A potential functional SNP rs279827, located next to a splice site, is located in the most significant region for both impulsiveness and LAPS. The high degree of LD among nine of these SNPs and the conditional analyses we have performed suggest that all variants represent one signal. PMID:23566244

Villafuerte, Sandra; Strumba, Viktorya; Stoltenberg, Scott F.; Zucker, Robert A.; Burmeister, Margit

2013-01-01

296

Genome-Wide Association Studies Using Haplotypes and Individual SNPs in Simmental Cattle  

PubMed Central

Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components. PMID:25330174

Wu, Yang; Fan, Huizhong; Wang, Yanhui; Zhang, Lupei; Gao, Xue; Chen, Yan; Li, Junya; Ren, HongYan; Gao, Huijiang

2014-01-01

297

Improved Resolution Haplogroup G Phylogeny in the Y Chromosome, Revealed by a Set of Newly Characterized SNPs  

PubMed Central

Background Y-SNP haplogroup G (hgG), defined by Y-SNP marker M201, is relatively uncommon in the United States general population, with only 8 additional sub-markers characterized. Many of the previously described eight sub-markers are either very rare (2–4%) or do not distinguish between major populations within this hg. In fact, prior to the current study, only 2% of our reference Caucasian population belonged to hgG and all of these individuals were in sub-haplogroup G2a, defined by P15. Additional Y-SNPs are needed in order to differentiate between individuals within this haplogroup. Principal Findings In this work we have investigated whether we could differentiate between a population of 63 hgG individuals using previously uncharacterized Y-SNPs. We have designed assays to test these individuals using all known hgG SNPs (n?=?9) and an additional 16 unreported/undefined Y-SNPS. Using a combination of DNA sequence and genetic genealogy databases, we have uncovered a total of 15 new hgG SNPs that had been previously reported but not phylogenetically characterized. Ten of the new Y-SNPs are phylogenetically equivalent to M201, one is equivalent to P15 and, interestingly, four create new, separate haplogroups. Three of the latter are more common than many of the previously defined Y-SNPs. Y-STR data from these individuals show that DYS385*12 is present in (70%) of G2a3b1-U13 individuals while only 4% of non-G2a3b1-U13 individuals posses the DYS385*12 allele. Conclusions This study uncovered several previously undefined Y-SNPs by using data from several database sources. The new Y-SNPs revealed in this paper will be of importance to those with research interests in population biology and human evolution. PMID:19495413

Sims, Lynn M.; Garvey, Dennis; Ballantyne, Jack

2009-01-01

298

Co-regulated transcripts associated to cooperating eSNPs define Bi-fan motifs in human gene networks.  

PubMed

Associations between the level of single transcripts and single corresponding genetic variants, expression single nucleotide polymorphisms (eSNPs), have been extensively studied and reported. However, most expression traits are complex, involving the cooperative action of multiple SNPs at different loci affecting multiple genes. Finding these cooperating eSNPs by exhaustive search has proven to be statistically challenging. In this paper we utilized availability of sequencing data with transcriptional profiles in the same cohorts to identify two kinds of usual suspects: eSNPs that alter coding sequences or eSNPs within the span of transcription factors (TFs). We utilize a computational framework for considering triplets, each comprised of a SNP and two associated genes. We examine pairs of triplets with such cooperating source eSNPs that are both associated with the same pair of target genes. We characterize such quartets through their genomic, topological and functional properties. We establish that this regulatory structure of cooperating quartets is frequent in real data, but is rarely observed in permutations. eSNP sources are mostly located on different chromosomes and away from their targets. In the majority of quartets, SNPs affect the expression of the two gene targets independently of one another, suggesting a mutually independent rather than a directionally dependent effect. Furthermore, the directions in which the minor allele count of the SNP affects gene expression within quartets are consistent, so that the two source eSNPs either both have the same effect on the target genes or both affect one gene in the opposite direction to the other. Same-effect eSNPs are observed more often than expected by chance. Cooperating quartets reported here in a human system might correspond to bi-fans, a known network motif of four nodes previously described in model organisms. Overall, our analysis offers insights regarding the fine motif structure of human regulatory networks. PMID:25210734

Kreimer, Anat; Pe'er, Itsik

2014-09-01

299

Generation of genome-scale gene-associated SNPs in catfish for the construction of a high-density SNP array  

Microsoft Academic Search

BACKGROUND: Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies. In order to provide the best genome coverage for the analysis of performance and production traits, a large number of relatively evenly distributed SNPs are needed. Gene-associated SNPs may fulfill these requirements of large numbers and genome wide distribution. In addition, gene-associated SNPs could themselves

Shikai Liu; Zunchun Zhou; Jianguo Lu; Fanyue Sun; Shaolin Wang; Hong Liu; Yanliang Jiang; Huseyin Kucuktas; Ludmilla Kaltenboeck; Eric Peatman; Zhanjiang Liu

2011-01-01

300

Common SNPs of AmelogeninX (AMELX) and dental caries susceptibility.  

PubMed

Genetic approaches have shown that several genes could modify caries susceptibility; AmelogeninX (AMELX) has been repeatedly designated. Here, we hypothesized that AMELX mutations resulting in discrete changes of enamel microstructure may be found in children with a severe caries phenotype. In parallel, possible AMELX mutations that could explain resistance to caries may be found in caries-free patients. In this study, coding exons of AMELX and exon-intron boundaries were sequenced in 399 individuals with extensive caries (250) or caries-free (149) individuals from nine French hospital groups. No mutation responsible for a direct change of amelogenin function was identified. Seven single-nucleotide polymorphisms (SNPs) were found, 3 presenting a high allele frequency, and 1 being detected for the first time. Three SNPs were located in coding regions, 2 of them being non-synonymous. Both evolutionary and statistical analyses showed that none of these SNPs was associated with caries susceptibility, suggesting that AMELX is not a gene candidate in our studied population. PMID:23525533

Gasse, B; Grabar, S; Lafont, A G; Quinquis, L; Opsahl Vital, S; Davit-Béal, T; Moulis, E; Chabadel, O; Hennequin, M; Courson, F; Droz, D; Vaysse, F; Laboux, O; Tassery, H; Al-Hashimi, N; Boillot, A; Carel, J C; Treluyer, J M; Jeanpierre, M; Beldjord, C; Sire, J Y; Chaussain, C

2013-05-01

301

Genetic association of SNPs in the FTO gene and predisposition to obesity in Malaysian Malays.  

PubMed

The common variants in the fat mass- and obesity-associated (FTO) gene have been previously found to be associated with obesity in various adult populations. The objective of the present study was to investigate whether the single nucleotide polymorphisms (SNPs) and linkage disequilibrium (LD) blocks in various regions of the FTO gene are associated with predisposition to obesity in Malaysian Malays. Thirty-one FTO SNPs were genotyped in 587 (158 obese and 429 non-obese) Malaysian Malay subjects. Obesity traits and lipid profiles were measured and single-marker association testing, LD testing, and haplotype association analysis were performed. LD analysis of the FTO SNPs revealed the presence of 57 regions with complete LD (D' = 1.0). In addition, we detected the association of rs17817288 with low-density lipoprotein cholesterol. The FTO gene may therefore be involved in lipid metabolism in Malaysian Malays. Two haplotype blocks were present in this region of the FTO gene, but no particular haplotype was found to be significantly associated with an increased risk of obesity in Malaysian Malays. PMID:22911346

Apalasamy, Y D; Ming, M F; Rampal, S; Bulgiba, A; Mohamed, Z

2012-12-01

302

PrimerZ: streamlined primer design for promoters, exons and human SNPs.  

PubMed

PrimerZ (http://genepipe.ngc.sinica.edu.tw/primerz/) is a web application dedicated primarily to primer design for genes and human SNPs. PrimerZ accepts genes by gene name or Ensembl accession code, and SNPs by dbSNP rs or AFFY_Probe IDs. The promoter and exon sequence information of all gene transcripts fetched from the Ensembl database (http://www.ensembl.org) are processed before being passed on to Primer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) for individual primer design. All results returned from Primer 3 are organized and integrated in a specially designed web page for easy browsing. Besides the web page presentation, csv text file export is also provided for enhanced user convenience. PrimerZ automates highly standard but tedious gene primer design to improve the success rate of PCR experiments. More than 2000 primers have been designed with PrimerZ at our institute since 2004 and the success rate is over 70%. The addition of several new features has made PrimerZ even more useful to the research community in facilitating primer design for promoters, exons and SNPs. PMID:17537812

Tsai, Ming-Fang; Lin, Yi-Jung; Cheng, Yu-Chang; Lee, Kuo-Hsi; Huang, Cheng-Chih; Chen, Yuan-Tsong; Yao, Adam

2007-07-01

303

Y-chromosomal SNPs in Finno-Ugric-speaking populations analyzed by minisequencing on microarrays.  

PubMed

An increasing number of single nucleotide polymorphisms (SNPs) on the Y chromosome are being identified. To utilize the full potential of the SNP markers in population genetic studies, new genotyping methods with high throughput are required. We describe a microarray system based on the minisequencing single nucleotide primer extension principle for multiplex genotyping of Y-chromosomal SNP markers. The system was applied for screening a panel of 25 Y-chromosomal SNPs in a unique collection of samples representing five Finno--Ugric populations. The specific minisequencing reaction provides 5-fold to infinite discrimination between the Y-chromosomal genotypes, and the microarray format of the system allows parallel and simultaneous analysis of large numbers of SNPs and samples. In addition to the SNP markers, five Y-chromosomal microsatellite loci were typed. Altogether 10,000 genotypes were generated to assess the genetic diversity in these population samples. Six of the 25 SNP markers (M9, Tat, SRY10831, M17, M12, 92R7) were polymorphic in the analyzed populations, yielding six distinct SNP haplotypes. The microsatellite data were used to study the genetic structure of two major SNP haplotypes in the Finns and the Saami in more detail. We found that the most common haplotypes are shared between the Finns and the Saami, and that the SNP haplotypes show regional differences within the Finns and the Saami, which supports the hypothesis of two separate settlement waves to Finland. PMID:11230171

Raitio, M; Lindroos, K; Laukkanen, M; Pastinen, T; Sistonen, P; Sajantila, A; Syvänen, A C

2001-03-01

304

RNAsnp: Efficient Detection of Local RNA Secondary Structure Changes Induced by SNPs  

PubMed Central

Structural characteristics are essential for the functioning of many noncoding RNAs and cis-regulatory elements of mRNAs. SNPs may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs. The global measures employed so far suffer from limited accuracy of folding programs on large RNAs and are computationally too demanding for genome-wide applications. Here, we present a strategy that focuses on the local regions of maximal structural change between mutant and wild-type. These local regions are approximated in a “screening mode” that is intended for genome-wide applications. Furthermore, localized regions are identified as those with maximal discrepancy. The mutation effects are quantified in terms of empirical P values. To this end, the RNAsnp software uses extensive precomputed tables of the distribution of SNP effects as function of length and GC content. RNAsnp thus achieves both a noise reduction and speed-up of several orders of magnitude over shuffling-based approaches. On a data set comprising 501 SNPs associated with human-inherited diseases, we predict 54 to have significant local structural effect in the untranslated region of mRNAs. RNAsnp is available at http://rth.dk/resources/rnasnp. PMID:23315997

Sabarinathan, Radhakrishnan; Tafer, Hakim; Seemann, Stefan E; Hofacker, Ivo L; Stadler, Peter F; Gorodkin, Jan

2013-01-01

305

Filtering Water  

NSDL National Science Digital Library

The first site related to water filtration is from the US Environmental Agency entitled EPA Environmental Education: Water Filtration (1 ). The two-page document explains the need for water filtration and the steps water treatment plants take to purify water. To further understand the process, a demonstration project is provided that illustrates these purification steps, which include coagulation, sedimentation, filtration, and disinfection. The second site is an interesting Flash animation called Filtration: How Does it Work (2 ) provided by Canada's Prairie Farm Rehabilitation Administration. Visitors will learn various types of filtration procedures and systems and the materials that are used such as carbon and sand. Next, from the National Science Foundation is a learning activity called Get Out the Gunk (3 ). Using just a few simple items from around the house, kids will be able to answer questions like "Does a filter work better with a lot of water rushing through, or a small trickle?" and "Does it make the water cleaner if you pour it through a filter twice?" The fourth Web site, Rapid Sand Filtration (4 ), is provided by Dottie Schmitt and Christie Shinault of Virginia Tech. The authors describe the process, which involves the flow of water through a bed of granular media, normally following settling basins in conventional water treatment trains to remove any particulate matter left over after flocculation and settling. Along with its thorough description, readers can view illustrations and photographs that further explain the process. The Vegetative Buffer Strips for Improved Surface Water Quality (5) Web site is provided by the Iowa State University Extension office. The document explains what vegetative buffer strips are, how they filter contaminants and sediment from surface water, how effective they are, and more. The sixth offering is a file called Infiltration Basins and Trenches (6) that is offered by the University of Wisconsin Extension. These structures are intended to collect water, have it infiltrate into the ground, and have it purified along the way. This document explains how effective they are at removing pollutants, how to install them, design guidelines, maintenance, and more. Next, from a site called Wilderness Survial.net is the Water Filtration Devices (7) page. Visitors read how to make a filtering system out of cloth, sand, crushed rock, charcoal, or a hollow log, although as is stated, the water still has to be purified. The last site, from the US Geological Survey, is called A Visit to a Wastewater-Treatment Plant: Primary Treatment of Wastewater (8). Although geared towards children, the site does a good job of explaining what happens at each stage of the treatment process and how pollutants are removed to help keep water clean. Everything from screening, pumping, aerating, sludge and scum removal, killing bacteria, and what is done with wastewater residuals is covered.

Brieske, Joel A.

2003-01-01

306

Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans.  

PubMed

Adult body height is a quantitative trait for which genome-wide association studies (GWAS) have identified numerous loci, primarily in European populations. These loci, comprising common variants, explain <10% of the phenotypic variance in height. We searched for novel associations between height and common (minor allele frequency, MAF ?5%) or infrequent (0.5% < MAF < 5%) variants across the exome in African Americans. Using a reference panel of 1692 African Americans and 471 Europeans from the National Heart, Lung, and Blood Institute's (NHLBI) Exome Sequencing Project (ESP), we imputed whole-exome sequence data into 13 719 African Americans with existing array-based GWAS data (discovery). Variants achieving a height-association threshold of P < 5E-06 in the imputed dataset were followed up in an independent sample of 1989 African Americans with whole-exome sequence data (replication). We used P < 2.5E-07 (=0.05/196 779 variants) to define statistically significant associations in meta-analyses combining the discovery and replication sets (N = 15 708). We discovered and replicated three independent loci for association: 5p13.3/C5orf22/rs17410035 (MAF = 0.10, ? = 0.64 cm, P = 8.3E-08), 13q14.2/SPRYD7/rs114089985 (MAF = 0.03, ? = 1.46 cm, P = 4.8E-10) and 17q23.3/GH2/rs2006123 (MAF = 0.30; ? = 0.47 cm; P = 4.7E-09). Conditional analyses suggested 5p13.3 (C5orf22/rs17410035) and 13q14.2 (SPRYD7/rs114089985) may harbor novel height alleles independent of previous GWAS-identified variants (r(2) with GWAS loci <0.01); whereas 17q23.3/GH2/rs2006123 was correlated with GWAS-identified variants in European and African populations. Notably, 13q14.2/rs114089985 is infrequent in African Americans (MAF = 3%), extremely rare in European Americans (MAF = 0.03%), and monomorphic in Asian populations, suggesting it may be an African-American-specific height allele. Our findings demonstrate that whole-exome imputation of sequence variants can identify low-frequency variants and discover novel variants in non-European populations. PMID:25027330

Du, Mengmeng; Auer, Paul L; Jiao, Shuo; Haessler, Jeffrey; Altshuler, David; Boerwinkle, Eric; Carlson, Christopher S; Carty, Cara L; Chen, Yii-Der Ida; Curtis, Keith; Franceschini, Nora; Hsu, Li; Jackson, Rebecca; Lange, Leslie A; Lettre, Guillaume; Monda, Keri L; Nickerson, Deborah A; Reiner, Alex P; Rich, Stephen S; Rosse, Stephanie A; Rotter, Jerome I; Willer, Cristen J; Wilson, James G; North, Kari; Kooperberg, Charles; Heard-Costa, Nancy; Peters, Ulrike

2014-12-15

307

Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology  

Technology Transfer Automated Retrieval System (TEKTRAN)

The dissection of complex traits of economic importance for the pig industry requires the availability of a significant number of genetic markers, such as SNPs. This study was conducted in order to discover thousands of porcine SNPs using next generation sequencing technologies and use those SNPs, a...

308

Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology  

Microsoft Academic Search

Background: The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to

Antonio M. Ramos; Richard P. M. A. Crooijmans; Nabeel A. Affara; Andreia J. Amaral; H. H. D. Kerstens; H. J. W. C. Megens; M. A. M. Groenen; Carol Churcher; Richard Clark; Patrick Dehais; Mark S. Hansen; Jakob Hedegaard; Zhi-Liang Hu; Andy S. Law; Hendrik-Jan Megens; Denis Milan; Danny J. Nonneman; Gary A. Rohrer; Max F. Rothschild; Tim P. L. Smith; Robert D. Schnabel; Curt P. Van Tassell; Jeremy F. Taylor; Ralph T. Wiedmann; Lawrence B. Schook

2009-01-01

309

Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology  

Microsoft Academic Search

BackgroundThe dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design

Antonio M. Ramos; Richard P. M. A. Crooijmans; Nabeel A. Affara; Andreia J. Amaral; Alan L. Archibald; Jonathan E. Beever; Christian Bendixen; Carol Churcher; Richard Clark; Patrick Dehais; Mark S. Hansen; Jakob Hedegaard; Zhi-Liang Hu; Hindrik H. Kerstens; Andy S. Law; Hendrik-Jan Megens; Denis Milan; Danny J. Nonneman; Gary A. Rohrer; Max F. Rothschild; Tim P. L. Smith; Robert D. Schnabel; Curt P. van Tassell; Jeremy F. Taylor; Ralph T. Wiedmann; Lawrence B. Schook; Martien A. M. Groenen; Laszlo Orban

2009-01-01

310

Analyses of porcine public SNPs in coding-gene regions by re-sequencing and phenotypic association studies.  

PubMed

The Porcine SNP database has a huge number of SNPs, but these SNPs are mostly found by computer data-mining procedures and have not been well characterized. We re-sequenced 1,439 porcine public SNPs from four commercial pig breeds and one Korean domestic breed (Korean Native pig, KNP) by using two DNA pools from eight unrelated animals in each breed. These SNPs were from 419 protein-coding genes covering the 18 autosomes, and the re-sequencing in breeds confirmed 690 public SNPs (47.9%) and 226 novel mutations (173 SNPs and 53 insertions/deletions). Thus, totally, 916 variations were found from our study. Of the 916 variations, 148 SNPs (16.2%) were found across all the five breeds, and 199 SNPs (21.7%) were breed specific polymorphisms. According to the SNP locations in the gene sequences, these 916 variations were categorized into 802 non-coding SNPs (785 in intron, 17 in 3'-UTR) and 114 coding SNPs (86 synonymous SNPs, 28 non-synonymous SNPs). The nucleotide substitution analyses for these SNPs revealed that 70.2% were from transitions, 20.0% from transversions, and the remaining 5.79% were deletions or insertions. Subsequently, we genotyped 261 SNPs from 180 genes in an experimental KNP × Landrace F2 cross by the Sequenom MassARRAY system. A total of 33 traits including growth, carcass composition and meat quality were analyzed for the phenotypic association tests using the 132 SNPs in 108 genes with minor allele frequency (MAF)>0.2. The association results showed that five marker-trait combinations were significant at the 5% experiment-wise level (ADCK4 for rear leg, MYH3 for rear leg, Hunter B, Loin weight and Shearforce) and four at the 10% experiment-wise level (DHX38 for average daily gain at live weight, LGALS9 for crude lipid, NGEF for front leg and LIFR for pH at 24 h). In addition, 49 SNPs in 44 genes showing significant association with the traits were detected at the 1% comparison-wise level. A large number of genes that function as enzymes, transcription factors or signalling molecules were considered as genetic markers for pig growth (RNF103, TSPAN31, DHX38, ABCF1, ABCC10, SCD5, KIAA0999 and FKBP10), muscling (HSPA5, PTPRM, NUP88, ADCK4, PLOD1, DLX1 and GRM8), fatness (PTGIS, IDH3B, RYR2 and NOL4) and meat quality traits (DUSP4, LIFR, NGEF, EWSR1, ACTN2, PLXND1, DLX3, LGALS9, ENO3, EPRS, TRIM29, EHMT2, RBM42, SESN2 and RAB4B). The SNPs or genes reported here may be beneficial to future marker assisted selection breeding in pigs. PMID:21107721

Li, Xiaoping; Kim, Sang-Wook; Do, Kyoung-Tag; Ha, You-Kyoung; Lee, Yun-Mi; Yoon, Suk-Hee; Kim, Hee-Bal; Kim, Jong-Joo; Choi, Bong-Hwan; Kim, Kwan-Suk

2011-08-01

311

Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels.  

PubMed

Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10(-4)), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (?LDL-C=0.135, ?TC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

van Leeuwen, Elisabeth M; Karssen, Lennart C; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J; Huffman, Jennifer E; White, Charles C; Feitosa, Mary F; Bartz, Traci M; Manichaikul, Ani; Joshi, Peter K; Peloso, Gina M; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J; Milaneschi, Yuri; Penninx, Brenda W J H; Francioli, Laurent C; Menelaou, Androniki; Pulit, Sara L; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A; Franco, Oscar H; Mateo Leach, Irene; Beekman, Marian; de Craen, Anton J M; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J; Porteous, David J; Sattar, Naveed; Packard, Chris J; Buckley, Brendan M; Brody, Jennifer A; Bis, Joshua C; Rotter, Jerome I; Mychaleckyj, Josyf C; Campbell, Harry; Duan, Qing; Lange, Leslie A; Wilson, James F; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F; Rich, Stephen S; Psaty, Bruce M; Borecki, Ingrid B; Kearney, Patricia M; Stott, David J; Adrienne Cupples, L; Jukema, J Wouter; van der Harst, Pim; Sijbrands, Eric J; Hottenga, Jouke-Jan; Uitterlinden, Andre G; Swertz, Morris A; van Ommen, Gert-Jan B; de Bakker, Paul I W; Eline Slagboom, P; Boomsma, Dorret I; Wijmenga, Cisca; van Duijn, Cornelia M

2015-01-01

312

Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels  

PubMed Central

Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10?4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (?LDL-C=0.135, ?TC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

2015-01-01

313

Three novel SNPs in the coding region of PPAR? gene and their associations with meat quality traits in cattle  

Microsoft Academic Search

The peroxisome proliferator-activated receptor ? (PPAR?) is a nuclear hormone receptor that regulates adipogenesis and many other biological processes. In the present study, we\\u000a carried out PCR–SSCP and DNA sequencing analyses to examine SNPs in coding region of the PPAR? gene. A total of 660 individuals from five Chinese cattle breeds were genotyped. We identified three SNPs and their associations

Yue Yuan Fan; Lin Sen Zan; Chang Zhen Fu; Wan Qiang Tian; Hong Bao Wang; Yan Yan Liu; Ya Ping Xin

2011-01-01

314

A real-time PCR genotyping assay to detect FAD2A SNPs in peanuts (Arachis hypogaea L.)  

Technology Transfer Automated Retrieval System (TEKTRAN)

The high oleic (C18:1) phenotype in peanuts has been previously demonstrated to result from a homozygous recessive genotype (ol1ol1ol2ol2) in two homeologous fatty acid desaturase genes (FAD2A and FAD2B) with two key SNPs. These mutant SNPs, specifically G448A in FAD2A and 442insA in FAD2B, signifi...

315

Novel SNPs of the mannan-binding lectin 2 gene and their association with production traits in Chinese Holsteins.  

PubMed

The mannan-binding lectin gene (MBL) participates as an opsonin in the innate immune system of mammals, and single nucleotide polymorphisms (SNPs) in MBL cause various immune dysfunctions. In this study, we detected SNPs in MBL2 at exon 1 using polymerase chain reaction single-strand conformation polymorphism analysis and DNA sequencing techniques in 825 Chinese Holstein cows. Four new SNPs with various allele frequencies were also found. The g.1164 G>A SNP was predicted to substitute arginine with glutamine at the N-terminus of the cysteine-rich domain. In the collagen-like domain, SNPs g.1197 C>A and g.1198 G>A changed proline to glutamine, whereas SNP g.1207 T>C was identified as a synonymous mutation. Correlation analysis showed that the g.1197 C>A marker was significantly correlated to somatic cell score (SCS), and the g.1164 G>A locus had significant effects on SCS, fat content, and protein content (P < 0.05), suggesting possible roles of these SNPs in the host response against mastitis. Nine haplotypes and nine haplotype pairs corresponding to the loci of the 4 novel SNPs were found in Chinese Holsteins. Haplotype pairs MM, MN, and BQ were correlated with the lowest SCS; MN with the highest protein yield; MM with the highest protein rate, and MN with the highest 305- day milk yield. Thus, MM, MN, and BQ are possible candidates for marker-assisted selection in dairy cattle breeding programs. PMID:23096694

Zhao, Z L; Wang, C F; Li, Q L; Ju, Z H; Huang, J M; Li, J B; Zhong, J F; Zhang, J B

2012-01-01

316

Computational identification of pathogenic associated nsSNPs and its structural impact in UROD gene: a molecular dynamics approach.  

PubMed

Uroporphyrinogen decarboxylase is a cytosolic enzyme involved in the biosynthetic pathway of heme production. Decreased activity of this enzyme results in porphyria cutanea tarda and hepato erythropoietic porphyria. Nonsynonymous single nucleotide polymorphisms (nsSNPs) alter protein sequence and can cause disease. Identifying the deleterious nsSNPs that contribute to disease is an important task. We used five different in silico tools namely SIFT, PANTHER, PolyPhen2, SNPs&GO, and I-mutant3 to identify deleterious nsSNPs in UROD gene. Further, we used molecular dynamic (MD) approach to evaluate the impact of deleterious mutations on UROD protein structure. By comparing the results of all the five prediction results, we screened 35 (51.47 %) nsSNPs as highly deleterious. MD analysis results show that all the three L161Q, L282R, and I334T deleterious variants were affecting the UROD protein structural stability and flexibility. Our findings provide strong evidence on the effect of deleterious nsSNPs in UROD gene. A detailed MD study provides a new insight in the conformational changes occurred in the mutant structures of UROD protein. PMID:24777812

Doss, C George Priya; Magesh, R

2014-11-01

317

SNP Mining in Crassostrea gigas EST Data: Transferability to Four Other Crassostrea Species, Phylogenetic Inferences and Outlier SNPs under Selection  

PubMed Central

Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies. PMID:25238392

Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

2014-01-01

318

Connecting the Dots: Potential of Data Integration to Identify Regulatory SNPs in Late-Onset Alzheimer's Disease GWAS Findings  

PubMed Central

Late-onset Alzheimer's disease (LOAD) is a multifactorial disorder with over twenty loci associated with disease risk. Given the number of genome-wide significant variants that fall outside of coding regions, it is possible that some of these variants alter some function of gene expression rather than tagging coding variants that alter protein structure and/or function. RegulomeDB is a database that annotates regulatory functions of genetic variants. In this study, we utilized RegulomeDB to investigate potential regulatory functions of lead single nucleotide polymorphisms (SNPs) identified in five genome-wide association studies (GWAS) of risk and age-at onset (AAO) of LOAD, as well as SNPs in LD (r2?0.80) with the lead GWAS SNPs. Of a total 614 SNPs examined, 394 returned RegulomeDB scores of 1–6. Of those 394 variants, 34 showed strong evidence of regulatory function (RegulomeDB score <3), and only 3 of them were genome-wide significant SNPs (ZCWPW1/rs1476679, CLU/rs1532278 and ABCA7/rs3764650). This study further supports the assumption that some of the non-coding GWAS SNPs are true associations rather than tagged associations and demonstrates the application of RegulomeDB to GWAS data. PMID:24743338

Rosenthal, Samantha L.; Barmada, M. Michael; Wang, Xingbin; Demirci, F. Yesim; Kamboh, M. Ilyas

2014-01-01

319

Functional characterization of SNPs in CHRNA3/B4 intergenic region associated with drug behaviors  

PubMed Central

The cluster of human neuronal nicotinic receptor genes (CHRNA5/A3/B4) (15q25.1) has been associated with a variety of smoking and drug-related behaviors, as well as risk for lung cancer. CHRNA3/B4 intergenic single nucleotide polymorphisms (SNPs) rs1948 and rs8023462 have been associated with early initiation of alcohol and tobacco use, and rs6495309 has been associated with nicotine dependence and risk for lung cancer. An in vitro luciferase expression assay was used to determine whether these SNPs and surrounding sequences contribute to differences in gene expression using cell lines either expressing proteins characteristic of neuronal tissue or derived from lung cancers. Electrophoretic mobility shift assays (EMSAs) were performed to investigate whether nuclear proteins from these cell lines bind SNP alleles differentially. Results from expression assays were dependent on cell culture type and haplotype. EMSAs indicated that rs8023462 and rs6495309 bind nuclear proteins in an allele-specific way. Additionally, GATA transcription factors appeared to bind rs8023462 only when the minor/risk allele was present. Much work has been done to describe the rat Chrnb4/a3 intergenic region, but few studies have examined the human intergenic region effects on expression; therefore, these studies greatly aid human genetic research as it relates to observed nicotine phenotypes, lung cancer risk and potential underlying genetic mechanisms. Data from these experiments support the hypothesis that SNPs associated with human addiction-related phenotypes and lung cancer risk can affect gene expression, and are potential therapeutic targets. Additionally, this is the first evidence that rs8023462 interacts with GATA transcription factors to influence gene expression. PMID:23872218

Flora, Amber V; Zambrano, Cristian A; Gallego, Xavier; Miyamoto, Jill H; Johnson, Krista A; Cowan, Katelyn A; Stitzel, Jerry A; Ehringer, Marissa A

2013-01-01

320

Association study of FOXO3A SNPs and aging phenotypes in Danish oldest-old individuals.  

PubMed

FOXO3A variation has repeatedly been reported to associate with human longevity, yet only few studies have investigated whether FOXO3A variation also associates with aging-related traits. Here, we investigate the association of 15 FOXO3A tagging single nucleotide polymorphisms (SNPs) in 1088 oldest-old Danes (age 92-93) with 4 phenotypes known to predict their survival: cognitive function, hand grip strength, activity of daily living (ADL), and self-rated health. Based on previous studies in humans and foxo animal models, we also explore self-reported diabetes, cancer, cardiovascular disease, osteoporosis, and bone (femur/spine/hip/wrist) fracture. Gene-based testing revealed significant associations of FOXO3A variation with ADL (P = 0.044) and bone fracture (P = 0.006). The single-SNP statistics behind the gene-based analysis indicated increased ADL (decreased disability) and reduced bone fracture risk for carriers of the minor alleles of 8 and 10 SNPs, respectively. These positive directions of effects are in agreement with the positive effects on longevity previously reported for these SNPs. However, when correcting for the test of 9 phenotypes by Bonferroni correction, bone fracture showed borderline significance (P = 0.054), while ADL did not (P = 0.396). Although the single-SNP associations did not formally replicate in another study population of oldest-old Danes (n = 1279, age 94-100), the estimates were of similar direction of effect as observed in the Discovery sample. A pooled analysis of both study populations displayed similar or decreased sized P-values for most associations, hereby supporting the initial findings. Nevertheless, confirmation in additional study populations is needed. PMID:25470651

Soerensen, Mette; Nygaard, Marianne; Dato, Serena; Stevnsner, Tinna; Bohr, Vilhelm A; Christensen, Kaare; Christiansen, Lene

2015-02-01

321

A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context  

PubMed Central

Genome-wide association studies (GWAS) with hundreds of ?thousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network-oriented analysis and prior knowledge from functional properties of a SNP. PMID:22046267

Bakir-Gungor, Burcu; Sezerman, Osman Ugur

2011-01-01

322

Association between SNPs in genes involved in folate metabolism and preterm birth risk.  

PubMed

We investigated the association between 12 single nucleotide polymorphisms (SNPs) in 11 genes involved in folate metabolic and preterm birth. A subset of SNPs selected from 11 genes/loci involved in the folic acid metabolism pathway were subjected to SNaPshot analysis in a case-control study. Twelve SNPs (CBS-C699T, DHFR-c594+59del19, GST01-C428T, MTHFD-G1958A, MTHFR-C677T, MTHFR-A1298C, MTR-A2756G, MTRR-A66G, NFE2L2-ins1+C11108T, RFC1-G80A, TCN2-C776G, and TYMS-1494del6) in 503 DNA samples were simultaneously tested, and included 315 preterm births and 188 controls. None of the 12 SNP genotype distributions related to the folic acid metabolism pathway showed a significant difference between preterm and term babies. The frequency of the compound mutation genotype of MTHFD-G1958A, MTR-A2756G and RFC1-G80A in preterm babies was 7.3%, which was significantly higher than the 2.7% in term babies. Seven babies carried the compound mutation genotype of MTHFD-G1958A, MTR-A2756G, and CBS-C699T, but this was not observed in term babies. The frequency of the combined wild-type genotype of MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1- G80A in preterm babies was 3.17%, which was significantly lower than the 7.4% in term babies. The 12 SNPs screened in this study were not independent risk factors of preterm birth. Compound mutation genotypes, including MTHFD-G1958A, MTR-A2756G, and RFC1- G80A and MTHFD-G1958A, MTR-A2756G, and CBS-C699T, may increase the risk of preterm birth. The combined wild-type genotype MTHFD-G1958A, MTR-A2756G, MTRR-A66G, MTHFR-A1298C, NFE2L2-ins1+C11108T, and RFC1-G80A may decrease the risk of preterm birth. PMID:25730024

Wang, B J; Liu, M J; Wang, Y; Dai, J R; Tao, J Y; Wang, S N; Zhong, N; Chen, Y

2015-01-01

323

Compressed bloom filters  

Microsoft Academic Search

A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this draw-back when the probability of an error is sufficiently low. We introduce compressed Bloom filters, which improve performance when the Bloom filter is passed as a

Michael Mitzenmacher

2001-01-01

324

ELECTRET AIR FILTERS  

Microsoft Academic Search

This review summarizes the research progress made so far on electret air filters used for separation of airborne particles from complex air stream. A set of different categories of these filters are delineated and the methods of manufacturing of these filters are described. The principles and mechanisms of filtration and modeling of pressure drop by these filters are analyzed. The

Rashmi Thakur; Dipayan Das; Apurba Das

2012-01-01

325

Hepa filter dissolution process  

DOEpatents

A process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

Brewer, Ken N. (Arco, ID); Murphy, James A. (Idaho Falls, ID)

1994-01-01

326

Recirculating electric air filter  

DOEpatents

An electric air filter cartridge has a cylindrical inner high voltage electrode, a layer of filter material, and an outer ground electrode formed of a plurality of segments moveably connected together. The outer electrode can be easily opened to remove or insert filter material. Air flows through the two electrodes and the filter material and is exhausted from the center of the inner electrode.

Bergman, W.

1985-01-09

327

HEPA filter dissolution process  

SciTech Connect

This invention is comprised of a process for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal.

Brewer, K.N.; Murphy, J.A.

1992-12-31

328

HEPA filter dissolution process  

DOEpatents

A process is described for dissolution of spent high efficiency particulate air (HEPA) filters and then combining the complexed filter solution with other radioactive wastes prior to calcining the mixed and blended waste feed. The process is an alternate to a prior method of acid leaching the spent filters which is an inefficient method of treating spent HEPA filters for disposal. 4 figures.

Brewer, K.N.; Murphy, J.A.

1994-02-22

329

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests  

PubMed Central

Background Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. Results This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. Conclusion The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods. PMID:25708662

2015-01-01

330

Properties of multilayer filters  

NASA Technical Reports Server (NTRS)

New methods were investigated of using optical interference coatings to produce bandpass filters for the spectral region 110 nm to 200 nm. The types of filter are: triple cavity metal dielectric filters; all dielectric reflection filters; and all dielectric Fabry Perot type filters. The latter two types use thorium fluoride and either cryolite films or magnesium fluoride films in the stacks. The optical properties of the thorium fluoride were also measured.

Baumeister, P. W.

1973-01-01

331

Comprehensive analysis of the impact of SNPs and CNVs on human microRNAs and their regulatory genes  

PubMed Central

Human microRNAs (miRNAs) are potent regulators of gene expression and thus involved in a broad range of biological processes. The objective of this study was to update the properties of human miRNAs and to search for SNPs and CNVs with potential effects on them. Based on the latest miRBase 13.0 database, we identified 380 (53.9%) precursor miRNAs (pre-miRNAs) embedded in gene loci that are enriched in biological processes such as “Neuronal activities”, “Cell Cycle” and “Protein phosphorylation” (Bonferroni p < 0.05). Gene lengths of the pre-miRNA host genes are significantly larger than other genes in the genome (p < 2.2E-16). Using data mining public resources, we performed a genome-scale search for the regulatory polymorphisms in the loci of pre-miRNAs and their related genes. Altogether, we found 187 SNPs in the pre-miRNAs, 497 consensus SNPs in the seed-matching untranslated regions of target genes, 385 CNVs harboring pre-miRNA precursors and 9 CNVs covering important miRNA processing genes. We also noticed that minimum free energy changed by pre-miRNA-residing SNPs could be ranked by the order from low to high as the SNPs in the loop domain, the SNPs in the adjacent stem and basal stem domains, and the SNPs in mature miRNA and its complementary sequence domains (p = 0.0065). With a full list of miRNA-related polymorphisms, this study will facilitate future association studies between the genetic polymorphisms in miRNA targets or pre-miRNAs and the disease susceptibility or therapeutic outcome. PMID:19458495

Duan, Shiwei; Mi, Shuangli; Zhang, Wei; Dolan, M. Eileen

2009-01-01

332

snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing  

PubMed Central

Background A typical bacterial pathogen genome mapping project can identify thousands of single nucleotide polymorphisms (SNP). Interpreting SNP data is complex and it is difficult to conceptualise the data contained within the large flat files that are the typical output from most SNP calling algorithms. One solution to this problem is to construct a database that can be queried using simple commands so that SNP interrogation and output is both easy and comprehensible. Results Here we present snp-search, a tool that manages SNP data and allows for manipulation and searching of SNP data. After creation of a SNP database from a VCF file, snp-search can be used to convert the selected SNP data into FASTA sequences, construct phylogenies, look for unique SNPs, and output contextual information about each SNP. The FASTA output from snp-search is particularly useful for the generation of robust phylogenetic trees that are based on SNP differences across the conserved positions in whole genomes. Queries can be designed to answer critical genomic questions such as the association of SNPs with particular phenotypes. Conclusions snp-search is a tool that manages SNP data and outputs useful information which can be used to test important biological hypotheses. PMID:24246037

2013-01-01

333

No Observed Association for Mitochondrial SNPs with Preterm Delivery and Related Outcomes  

PubMed Central

Background Preterm delivery (PTD) is the leading cause of neonatal morbidity and mortality. Epidemiologic studies indicate recurrence of PTD is maternally inherited creating a strong possibility that mitochondrial variants contribute to its etiology. This study examines the association between mitochondrial genotypes with PTD and related outcomes. Methods This study combined, through meta-analysis, two case-control, genome-wide association studies (GWAS); one from the Danish National Birth Cohort (DNBC) Study and one from the Norwegian Mother and Child Cohort Study (MoBa) conducted by the Norwegian Institute of Public Health. The outcomes of PTD (?36 weeks), very PTD (?32 weeks) and preterm prelabor rupture of membranes (PPROM) were examined. 135 individual SNP associations were tested using the combined genome from mothers and neonates (case vs. control) in each population and then pooled via meta-analysis. Results After meta-analysis there were four SNPs for the outcome of PTD below p?0.10, and two below p?0.05. For the additional outcomes of very PTD and PPROM there were three and four SNPs respectively below p?0.10. Conclusion Given the number of tests no single SNP reached study wide significance (p=0.0006). Our study does not support the hypothesis that mitochondrial genetics contributes to the maternal transmission of PTD and related outcomes. PMID:22902432

Alleman, Brandon W.; Myking, Solveig; Ryckman, Kelli K.; Myhre, Ronny; Feingold, Eleanor; Feenstra, Bjarke; Geller, Frank; Boyd, Heather A.; Shaffer, John R.; Zhang, Qi; Begum, Ferdouse; Crosslin, David; Doheny, Kim; Pugh, Elizabeth; Pay, Aase Serine Devold; Østensen, Ingrid H.G.; Morken, Nils-Halvdan; Magnus, Per; Marazita, Mary L.; Jacobsson, Bo; Melbye, Mads; Murray, Jeffrey C.

2013-01-01

334

Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs  

PubMed Central

Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17–29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn’s disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

2013-01-01

335

New separation-free assay technique for SNPs using two-photon excitation fluorometry.  

PubMed

A new separation-free method for detection of single nucleotide polymorphisms (SNPs) is described. The method is based on the single base extension principle, fluorescently labeled dideoxy nucleotides and two-photon fluorescence excitation technology, known as ArcDia trade mark TPX technology. In this assay technique, template-directed single base extension is carried out for primers which have been immobilized on polymer microparticles. Depending on the sequence of the template DNA, the primers are extended either with a labeled or with a non-labeled nucleotide. The genotype of the sample is determined on the basis of two-photon excited fluorescence of individual microparticles. The effect of various assay condition parameters on the performance of the assay method is studied. The performance of the new assay method is demonstrated by genotyping the SNPs of human individuals using double-stranded PCR amplicons as samples. The results show that the new SNP assay method provides sensitivity and reliability comparable to the state-of-the-art SNaPshot trade mark assay method. Applicability of the new method in routine laboratory use is discussed with respect to alternative assay techniques. PMID:15263064

Vaarno, Jonne; Ylikoski, Emmi; Meltola, Niko J; Soini, Juhani T; Hänninen, Pekka; Lahesmaa, Riitta; Soini, Aleksi E

2004-01-01

336

Investigation of MC1R SNPs and Their Relationships with Plumage Colors in Korean Native Chicken  

PubMed Central

The melanocortin 1 receptor (MC1R) gene is related to the plumage color variations in chicken. Initially, the MC1R gene from 30 individuals was sequenced and nine polymorphisms were obtained. Of these, three and six single nucleotide polymorphisms (SNPs) were confirmed as synonymous and nonsynonymous mutations, respectively. Among these, three selected SNPs were genotyped using the restriction fragment length polymorphism (RFLP) method in 150 individuals from five chicken breeds, which identified the plumage color responding alleles. The neighbor-joining phylogenetic tree using MC1R gene sequences indicated three well-differentiated different plumage pigmentations (eumelanin, pheomelanin and albino). Also, the genotype analyses indicated that the TT, AA and GG genotypes corresponded to the eumelanin, pheomelanin and albino plumage pigmentations at nucleotide positions 69, 376 and 427, respectively. In contrast, high allele frequencies with T, A and G alleles corresponded to black, red/yellow and white plumage color in 69, 376 and 427 nucleotide positions, respectively. Also, amino acids changes at position Asn23Asn, Val126Ile and Thr143Ala were observed in melanin synthesis with identified possible alleles, respectively. In addition, high haplotype frequencies in TGA, CGG and CAA haplotypes were well discriminated based on the plumage pigmentation in chicken breeds. The results obtained in this study can be used for designing proper breeding and conservation strategies for the Korean native chicken breeds, as well as for the developing breed identification markers in chicken. PMID:25049831

Hoque, M. R.; Jin, S.; Heo, K. N.; Kang, B. S.; Jo, C.; Lee, J. H.

2013-01-01

337

A North American Yersinia pestis Draft Genome Sequence: SNPs and Phylogenetic Analysis  

PubMed Central

Background Yersinia pestis, the causative agent of plague, is responsible for some of the greatest epidemic scourges of mankind. It is widespread in the western United States, although it has only been present there for just over 100 years. As a result, there has been very little time for diversity to accumulate in this region. Much of the diversity that has been detected among North American isolates is at loci that mutate too quickly to accurately reconstruct large-scale phylogenetic patterns. Slowly-evolving but stable markers such as SNPs could be useful for this purpose, but are difficult to identify due to the monomorphic nature of North American isolates. Methodology/Principal Findings To identify SNPs that are polymorphic among North American populations of Y. pestis, a gapped genome sequence of Y. pestis strain FV-1 was generated. Sequence comparison of FV-1 with another North American strain, CO92, identified 19 new SNP loci that differ among North American isolates. Conclusions/Significance The 19 SNP loci identified in this study should facilitate additional studies of the genetic population structure of Y. pestis across North America. PMID:17311096

Hao, Jicheng; Mastrian, Stephen D.; Shah, Maulik K.; Vogler, Amy J.; Allender, Christopher J.; Clark, Erin A.; Benitez, Debbie S.; Youngkin, David J.; Girard, Jessica M.; Auerbach, Raymond K.; Beckstrom-Sternberg, Stephen M.; Keim, Paul

2007-01-01

338

Genomics and introgression: discovery and mapping ofthousands of species-diagnostic SNPs using RAD sequencing  

USGS Publications Warehouse

Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

2015-01-01

339

[Detecting selection signatures on X chromosome in pig through high density SNPs].  

PubMed

In the process of domestic pig breeding, many important economic traits were subject to strong artificial se-lection pressure. With the availability of high density single nucleotide polymorphism (SNP) markers in farm animals, selection occurring in those traits could be traced by detecting selection signatures on genome, and the genes experiencing selection can also be further mined based on selection signatures. Due to the special characteristic of X chromosome, many approaches of genetic analysis fitted for autosome are not plausible for X chromosome. Fortunately, detecting selection signature provides an effective tool to settle such situation. In this study, the Cross Population Extend Haplotype Homozygosity Test (XP-EHH) was implemented to identify selection signatures on chromosome X in three pig breeds (Landrace, Songliao, and Yorkshire) using high density SNPs, and the genes located within selection signature regions were revealed through bioinformatic analysis. In total, 29, 13, and 15 selection signature regions, with 3.59, 4.92, and 4.07 SNPs on average in each region, were identified in Landrace, Songliao, and Yorkshire, respectively. Some overlaps of selection signature regions were observed between Songliao and Landrace, and between Landrace and Yorkshire, while no overlaps between Yorkshire and Songliao were found. Bioinformatic analysis revealed that many genes in the selection signature regions were related to reproduction and immune traits, and some of them have not been reported in pigs, which might serve as important candidate genes in future study. PMID:23099781

Ma, Yun-Long; Zhang, Qin; Ding, Xiang-Dong

2012-10-01

340

SNPs Previously Associated with Dupuytren’s Disease Replicated in a North American Cohort  

PubMed Central

Objective Dupuytren’s disease is a progressive fibrosis of the hand that often results in debilitating flexion contractures. Its etiology is not completely understood but likely involves both genetic and environmental factors. A recent study performed in Europe identified DNA variants that associate with Dupuytren’s disease. Given the likelihood for genetic variation among populations, we planned to validate the genetic variants identified by this study in a North American population. Methods In the Marshfield Clinic’s Personalized Medicine Research Project, 296 cases with Dupuytren’s disease were identified and matched 3-to-1 to controls without Dupuytren’s disease. Clinical data were abstracted from the electronic medical record. The top 12 single nucleotide polymorphisms (SNPs) from the European study were selected and tested in a multiplex assay using the MassArray Analyzer 4 (Sequenom, Inc., San Diego, CA). Differences in allele frequency were determined, and variants with a P value of <0.004 were considered significant. Results We replicated 5 of the 12 SNPs previously reported to be associated with Dupuytren’s disease. Conclusion Our findings support a role for the Wnt signaling pathway in the development of Dupuytren’s disease, and suggest that further study of this pathway may result in early diagnosis and non-surgical treatments for Dupuytren’s disease. PMID:24573701

Anderson, Eric R.; Ye, Zhan; Caldwell, Michael D.; Burmester, James K.

2014-01-01

341

Male lineage strata of Brazilian population disclosed by the simultaneous analysis of STRs and SNPs.  

PubMed

Brazil has a large territory divided in five geographical regions harboring highly diverse populations that resulted from different degrees and modes of admixture between Native Americans, Europeans and Africans. In this study, a sample of 605 unrelated males was genotyped for 17 Y-STRs and 46 Y-SNPs aiming a deep characterization of the male gene pool of Rio de Janeiro and its comparison with other Brazilian populations. High values of Y-STR haplotype diversity (0.9999±0.0001) and Y-SNP haplogroup diversity (0.7589±0.0171) were observed. Population comparisons at both haplotype and haplogroup levels showed significant differences between Brazilian South Eastern and Northern populations that can be explained by differences in the proportion of African and Native American Y chromosomes. Statistical significant differences between admixed urban samples from the five regions of Brazil were not previously detected at haplotype level based on smaller size samples from South East, which emphasizes the importance of sample size to detected population stratification for an accurate interpretation of profile matches in kinship and forensic casework. Although not having an intra-population discrimination power as high as the Y-STRs, the Y-SNPs are more powerful to disclose differences in admixed populations. In this study, the combined analysis of these two types of markers proved to be a good strategy to predict population sub-structure, which should be taken into account when delineating forensic database strategies for Y chromosome haplotypes. PMID:25259770

Oliveira, Andréa M; Domingues, Patricia M; Gomes, Verónica; Amorim, António; Jannuzzi, Juliana; de Carvalho, Elizeu F; Gusmão, Leonor

2014-11-01

342

Genetic Diversity and Demographic History of Cajanus spp. Illustrated from Genome-Wide SNPs  

PubMed Central

Understanding genetic structure of Cajanus spp. is essential for achieving genetic improvement by quantitative trait loci (QTL) mapping or association studies and use of selected markers through genomic assisted breeding and genomic selection. After developing a comprehensive set of 1,616 single nucleotide polymorphism (SNPs) and their conversion into cost effective KASPar assays for pigeonpea (Cajanus cajan), we studied levels of genetic variability both within and between diverse set of Cajanus lines including 56 breeding lines, 21 landraces and 107 accessions from 18 wild species. These results revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, 75.8% of successful SNP assays revealed polymorphism, and more than 95% of these assays could be successfully transferred to related wild species. To show regional patterns of variation, we used STRUCTURE and Analysis of Molecular Variance (AMOVA) to partition variance among hierarchical sets of landraces and wild species at either the continental scale or within India. STRUCTURE separated most of the domesticated germplasm from wild ecotypes, and separates Australian and Asian wild species as has been found previously. Among Indian regions and states within regions, we found 36% of the variation between regions, and 64% within landraces or wilds within states. The highest level of polymorphism in wild relatives and landraces was found in Madhya Pradesh and Andhra Pradesh provinces of India representing the centre of origin and domestication of pigeonpea respectively. PMID:24533111

Saxena, Rachit K.; von Wettberg, Eric; Upadhyaya, Hari D.; Sanchez, Vanessa; Songok, Serah; Saxena, Kulbhushan; Kimurto, Paul; Varshney, Rajeev K.

2014-01-01

343

Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.  

PubMed

Most psychiatric disorders are moderately to highly heritable. The degree to which genetic variation is unique to individual disorders or shared across disorders is unclear. To examine shared genetic etiology, we use genome-wide genotype data from the Psychiatric Genomics Consortium (PGC) for cases and controls in schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorders (ASD) and attention-deficit/hyperactivity disorder (ADHD). We apply univariate and bivariate methods for the estimation of genetic variation within and covariation between disorders. SNPs explained 17-29% of the variance in liability. The genetic correlation calculated using common SNPs was high between schizophrenia and bipolar disorder (0.68 ± 0.04 s.e.), moderate between schizophrenia and major depressive disorder (0.43 ± 0.06 s.e.), bipolar disorder and major depressive disorder (0.47 ± 0.06 s.e.), and ADHD and major depressive disorder (0.32 ± 0.07 s.e.), low between schizophrenia and ASD (0.16 ± 0.06 s.e.) and non-significant for other pairs of disorders as well as between psychiatric disorders and the negative control of Crohn's disease. This empirical evidence of shared genetic etiology for psychiatric disorders can inform nosology and encourages the investigation of common pathophysiologies for related disorders. PMID:23933821

Lee, S Hong; Ripke, Stephan; Neale, Benjamin M; Faraone, Stephen V; Purcell, Shaun M; Perlis, Roy H; Mowry, Bryan J; Thapar, Anita; Goddard, Michael E; Witte, John S; Absher, Devin; Agartz, Ingrid; Akil, Huda; Amin, Farooq; Andreassen, Ole A; Anjorin, Adebayo; Anney, Richard; Anttila, Verneri; Arking, Dan E; Asherson, Philip; Azevedo, Maria H; Backlund, Lena; Badner, Judith A; Bailey, Anthony J; Banaschewski, Tobias; Barchas, Jack D; Barnes, Michael R; Barrett, Thomas B; Bass, Nicholas; Battaglia, Agatino; Bauer, Michael; Bayés, Mònica; Bellivier, Frank; Bergen, Sarah E; Berrettini, Wade; Betancur, Catalina; Bettecken, Thomas; Biederman, Joseph; Binder, Elisabeth B; Black, Donald W; Blackwood, Douglas H R; Bloss, Cinnamon S; Boehnke, Michael; Boomsma, Dorret I; Breen, Gerome; Breuer, René; Bruggeman, Richard; Cormican, Paul; Buccola, Nancy G; Buitelaar, Jan K; Bunney, William E; Buxbaum, Joseph D; Byerley, William F; Byrne, Enda M; Caesar, Sian; Cahn, Wiepke; Cantor, Rita M; Casas, Miguel; Chakravarti, Aravinda; Chambert, Kimberly; Choudhury, Khalid; Cichon, Sven; Cloninger, C Robert; Collier, David A; Cook, Edwin H; Coon, Hilary; Cormand, Bru; Corvin, Aiden; Coryell, William H; Craig, David W; Craig, Ian W; Crosbie, Jennifer; Cuccaro, Michael L; Curtis, David; Czamara, Darina; Datta, Susmita; Dawson, Geraldine; Day, Richard; De Geus, Eco J; Degenhardt, Franziska; Djurovic, Srdjan; Donohoe, Gary J; Doyle, Alysa E; Duan, Jubao; Dudbridge, Frank; Duketis, Eftichia; Ebstein, Richard P; Edenberg, Howard J; Elia, Josephine; Ennis, Sean; Etain, Bruno; Fanous, Ayman; Farmer, Anne E; Ferrier, I Nicol; Flickinger, Matthew; Fombonne, Eric; Foroud, Tatiana; Frank, Josef; Franke, Barbara; Fraser, Christine; Freedman, Robert; Freimer, Nelson B; Freitag, Christine M; Friedl, Marion; Frisén, Louise; Gallagher, Louise; Gejman, Pablo V; Georgieva, Lyudmila; Gershon, Elliot S; Geschwind, Daniel H; Giegling, Ina; Gill, Michael; Gordon, Scott D; Gordon-Smith, Katherine; Green, Elaine K; Greenwood, Tiffany A; Grice, Dorothy E; Gross, Magdalena; Grozeva, Detelina; Guan, Weihua; Gurling, Hugh; De Haan, Lieuwe; Haines, Jonathan L; Hakonarson, Hakon; Hallmayer, Joachim; Hamilton, Steven P; Hamshere, Marian L; Hansen, Thomas F; Hartmann, Annette M; Hautzinger, Martin; Heath, Andrew C; Henders, Anjali K; Herms, Stefan; Hickie, Ian B; Hipolito, Maria; Hoefels, Susanne; Holmans, Peter A; Holsboer, Florian; Hoogendijk, Witte J; Hottenga, Jouke-Jan; Hultman, Christina M; Hus, Vanessa; Ingason, Andrés; Ising, Marcus; Jamain, Stéphane; Jones, Edward G; Jones, Ian; Jones, Lisa; Tzeng, Jung-Ying; Kähler, Anna K; Kahn, René S; Kandaswamy, Radhika; Keller, Matthew C; Kennedy, James L; Kenny, Elaine; Kent, Lindsey; Kim, Yunjung; Kirov, George K; Klauck, Sabine M; Klei, Lambertus; Knowles, James A; Kohli, Martin A; Koller, Daniel L; Konte, Bettina; Korszun, Ania; Krabbendam, Lydia; Krasucki, Robert; Kuntsi, Jonna; Kwan, Phoenix; Landén, Mikael; Långström, Niklas; Lathrop, Mark; Lawrence, Jacob; Lawson, William B; Leboyer, Marion; Ledbetter, David H; Lee, Phil H; Lencz, Todd; Lesch, Klaus-Peter; Levinson, Douglas F; Lewis, Cathryn M; Li, Jun; Lichtenstein, Paul; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Liu, Chunyu; Lohoff, Falk W; Loo, Sandra K; Lord, Catherine; Lowe, Jennifer K; Lucae, Susanne; MacIntyre, Donald J; Madden, Pamela A F; Maestrini, Elena; Magnusson, Patrik K E; Mahon, Pamela B; Maier, Wolfgang; Malhotra, Anil K; Mane, Shrikant M; Martin, Christa L; Martin, Nicholas G; Mattheisen, Manuel; Matthews, Keith; Mattingsdal, Morten; McCarroll, Steven A; McGhee, Kevin A; McGough, James J; McGrath, Patrick J; McGuffin, Peter; McInnis, Melvin G; McIntosh, Andrew; McKinney, Rebecca; McLean, Alan W; McMahon, Francis J; McMahon, William M; McQuillin, Andrew; Medeiros, Helena; Medland, Sarah E; Meier, Sandra; Melle, Ingrid; Meng, Fan; Meyer, Jobst; Middeldorp, Christel M; Middleton, Lefkos; Milanova, Vihra; Miranda, Ana; Monaco, Anthony P; Montgomery, Grant W; Moran, Jennifer L; Moreno-De-Luca, Daniel; Morken, Gunnar; Morris, Derek W; Morrow, Eric M; Moskvina, Valentina; Muglia, Pierandrea; Mühleisen, Thomas W; Muir, Walter J; Müller-Myhsok, Bertram; Murtha, Michael; Myers, Richard M; Myin-Germeys, Inez; Neale, Michael C; Nelson, Stan F; Nievergelt, Caroline M; Nikolov, Ivan; Nimgaonkar, Vishwajit; Nolen, Willem A; Nöthen, Markus M; Nurnberger, John I; Nwulia, Evaristus A; Nyholt, Dale R; O'Dushlaine, Colm; Oades, Robert D; Olincy, Ann; Oliveira, Guiomar; Olsen, Line; Ophoff, Roel A; Osby, Urban; Owen, Michael J; Palotie, Aarno; Parr, Jeremy R

2013-09-01

344

Identification of novel single nucleotide polymorphisms (SNPs) in deer (Odocoileus spp.) using the BovineSNP50 BeadChip.  

PubMed

Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the F(ST)-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1-30.1 million years before present). PMID:22590559

Haynes, Gwilym D; Latch, Emily K

2012-01-01

345

Mapping the genetic variation of regional brain volumes as explained by all common SNPs from the ADNI study.  

PubMed

Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p?=?0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p?=?0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer's disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs. PMID:24015190

Bryant, Christopher; Giovanello, Kelly S; Ibrahim, Joseph G; Chang, Jing; Shen, Dinggang; Peterson, Bradley S; Zhu, Hongtu

2013-01-01

346

Identification of Pyrus single nucleotide polymorphisms (SNPs) and evaluation for genetic mapping in European pear and interspecific Pyrus hybrids.  

PubMed

We have used new generation sequencing (NGS) technologies to identify single nucleotide polymorphism (SNP) markers from three European pear (Pyrus communis L.) cultivars and subsequently developed a subset of 1096 pear SNPs into high throughput markers by combining them with the set of 7692 apple SNPs on the IRSC apple Infinium® II 8K array. We then evaluated this apple and pear Infinium® II 9K SNP array for large-scale genotyping in pear across several species, using both pear and apple SNPs. The segregating populations employed for array validation included a segregating population of European pear ('Old Home'×'Louise Bon Jersey') and four interspecific breeding families derived from Asian (P. pyrifolia Nakai and P. bretschneideri Rehd.) and European pear pedigrees. In total, we mapped 857 polymorphic pear markers to construct the first SNP-based genetic maps for pear, comprising 78% of the total pear SNPs included in the array. In addition, 1031 SNP markers derived from apple (13% of the total apple SNPs included in the array) were polymorphic and were mapped in one or more of the pear populations. These results are the first to demonstrate SNP transferability across the genera Malus and Pyrus. Our construction of high density SNP-based and gene-based genetic maps in pear represents an important step towards the identification of chromosomal regions associated with a range of horticultural characters, such as pest and disease resistance, orchard yield and fruit quality. PMID:24155917

Montanari, Sara; Saeed, Munazza; Knäbel, Mareike; Kim, YoonKyeong; Troggio, Michela; Malnoy, Mickael; Velasco, Riccardo; Fontana, Paolo; Won, KyungHo; Durel, Charles-Eric; Perchepied, Laure; Schaffer, Robert; Wiedow, Claudia; Bus, Vincent; Brewer, Lester; Gardiner, Susan E; Crowhurst, Ross N; Chagné, David

2013-01-01

347

Genotypes, haplotypes and diplotypes of IGF-II SNPs and their association with growth traits in largemouth bass (Micropterus salmoides).  

PubMed

Insulin-like growth factor II (IGF-II) is involved in the regulation of somatic growth and metabolism in many fishes. IGF-II is an important candidate gene for growth traits in fishes and its polymorphisms were associated with the growth traits. The aim of this study is to screen single nucleotide polymorphisms (SNPs) of the largemouth bass (Micropterus salmoides) IGF-II gene and to analyze potential association between IGF-II gene polymorphisms and growth traits in largemouth bass. Four SNPs (C127T, T1012G, C1836T and C1861T) were detected and verified by DNA sequencing in the largemouth bass IGF-II gene. These SNPs were found to organize into seven haplotypes, which formed 13 diplotypes (haplotype pairs). Association analysis showed that four individual SNPs were not significantly associated with growth traits. Significant associations were, however, noted between diplotypes and growth traits (P < 0.05). The fish with H1H3 (CTCC/CGCC) and H1H5 (CTCC/TTTT) had greater body weight than those with H1H1 (CTCC/CTCC), H1H2 (CTCC/TGTT) and H4H4 (TGCT/TGCT/) did. Our data suggest a significant association between genetic variations in the largemouth bass IGF-II gene and growth traits. IGF-II SNPs could be used as potential genetic markers in future breeding programs of largemouth bass. PMID:21894518

Li, Xiaohui; Bai, Junjie; Hu, Yinchang; Ye, Xing; Li, Shengjie; Yu, Lingyun

2012-04-01

348

High-throughput identification, database storage and analysis of SNPs in EST sequences.  

PubMed

Single nucleotide polymorphisms (SNPs) are the most frequent form of DNA variation and disease-causing mutations in many genes. Due to their abundance and slow mutation rate within generations, they are thought to be the next generation of genetic markers that can be used in a myriad of important biological, genetic, pharmacological, and medical applications. There are several strategies both experimental, and in-silico for SNP discovery and mapping. Experimental SNP discovery consists of a number of labourious steps that make this process complex and expensive. In-silico discovery has been proposed as an alternative discovery method that makes use and takes advantage of large data sets with potential SNP information that have been generated with other purposes and have not been used as a SNP information source yet. However, in order to successfully apply the in-silico method to large data sets, the following challenges need to be addressed: First it is necessary to build an integrated SNP pipeline that handles data processing steps smoothly from the beginning (collecting sequence information) to end (SNPs in the database). Also, SNP detection tool parameters have to be optimized to satisfy specific goals of the project. Finally, SNP data could not be fully used until the in-silico method is validated experimentally. In this paper we present a design and implementation of an in-silico SNP detection software pipeline that exploits the existence of large EST (expressed sequence tag) data sets and effectively addresses the above challenges. First, the pipeline allows for smooth data transition between its different components by implementing data interfaces that translate the data formats of the different tools in the different stages. Second, we optimized PolyBayes parameters for SNP detection in maize EST. Finally, we implemented a user interface that along with the database structure created allows the scientist to perform preliminary analysis of the data and to perform basic statistics on the SNP data prior to experimental validation. The pipeline works with two different types of sequence assemblers (PHRAP (http://www.phrap.org/) and CAT from DoubleTwist (http://www.doubletwist.com/). It uses a Bayesian engine for SNP detection (PolyBayes), selects relevant polymorphism information which is then uploaded into a database. We detected 2439 SNPs and 822 insertion deletions (INDELs) with a PolyBayes probability higher than 0.99 on the public set of 68,000 maize ESTs. The user interface allowed us analyzing the polymorphism information right after discovery in several ways that allowed us to gain insight into the distribution and significance of the newly acquired data. PMID:11791238

Useche, F J; Gao, G; Harafey, M; Rafalski, A

2001-01-01

349

Cordierite silicon nitride filters  

SciTech Connect

The objective of this project was to develop a silicon nitride based crossflow filter. This report summarizes the findings and results of the project. The project was phased with Phase I consisting of filter material development and crossflow filter design. Phase II involved filter manufacturing, filter testing under simulated conditions and reporting the results. In Phase I, Cordierite Silicon Nitride (CSN) was developed and tested for permeability and strength. Target values for each of these parameters were established early in the program. The values were met by the material development effort in Phase I. The crossflow filter design effort proceeded by developing a macroscopic design based on required surface area and estimated stresses. Then the thermal and pressure stresses were estimated using finite element analysis. In Phase II of this program, the filter manufacturing technique was developed, and the manufactured filters were tested. The technique developed involved press-bonding extruded tiles to form a filter, producing a monolithic filter after sintering. Filters manufactured using this technique were tested at Acurex and at the Westinghouse Science and Technology Center. The filters did not delaminate during testing and operated and high collection efficiency and good cleanability. Further development in areas of sintering and filter design is recommended.

Sawyer, J.; Buchan, B. (Acurex Environmental Corp., Mountain View, CA (United States)); Duiven, R.; Berger, M. (Aerotherm Corp., Mountain View, CA (United States)); Cleveland, J.; Ferri, J. (GTE Products Corp., Towanda, PA (United States))

1992-02-01

350

Filter type gas sampler with filter consolidation  

DOEpatents

Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, whereafter the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant.

Miley, Harry S. (219 Rockwood Dr., Richland, WA 99352); Thompson, Robert C. (5313 Phoebe La., West Richland, WA 99352); Hubbard, Charles W. (1900 Stevens, Apt. 526, Richland, WA 99352); Perkins, Richard W. (1413 Sunset, Richland, WA 99352)

1997-01-01

351

Filter type gas sampler with filter consolidation  

DOEpatents

Disclosed is an apparatus for automatically consolidating a filter or, more specifically, an apparatus for drawing a volume of gas through a plurality of sections of a filter, where after the sections are subsequently combined for the purpose of simultaneously interrogating the sections to detect the presence of a contaminant. 5 figs.

Miley, H.S.; Thompson, R.C.; Hubbard, C.W.; Perkins, R.W.

1997-03-25

352

The identification of trans-associations between prostate cancer GWAS SNPs and RNA expression differences in tumor-adjacent stroma  

PubMed Central

Here we tested the hypothesis that SNPs associated with prostate cancer risk, might differentially affect RNA expression in prostate cancer stroma. The most significant 35 SNP loci were selected from Genome Wide Association (GWA) studies of ~40,000 patients. We also selected 4030 transcripts previously associated with prostate cancer diagnosis and prognosis. eQTL analysis was carried out by a modified BAYES method to analyze the associations between the risk variants and expressed transcripts jointly in a single model. We observed 47 significant associations between eight risk variants and the expression patterns of 46 genes. This is the first study to identify associations between multiple SNPs and multiple in trans gene expression differences in cancer stroma. Potentially, a combination of SNPs and associated expression differences in prostate stroma may increase the power of risk assessment for individuals, and for cancer progression. PMID:25638161

Chen, Xin; McClelland, Michael; Jia, Zhenyu; Rahmatpanah, Farah B.; Sawyers, Anne; Trent, Jeffrey; Duggan, David; Mercola, Dan

2015-01-01

353

Susceptibility to coronary artery disease and diabetes is encoded by distinct, tightly linked SNPs in the ANRIL locus on chromosome 9p  

Microsoft Academic Search

erature') and tagging SNPs. We replicated the literature SNPs (P 5 8310213;OR5 1.29; 95% CI: 1.20-1.38) and showed that the strong consistent association detected by these SNPs is a consequence of a 'yin-yang' haplo- type pattern spanning 53 kb. There was no evidence of additional CAD susceptibility alleles over the major risk haplotype. CAD patients without myocardial infarction (MI) showed

Helen M. Broadbent; John F. Peden; Stefan Lorkowski; Anuj Goel; Halit Ongen; Fiona Green; Robert Clarke; Rory Collins; Maria Grazia Franzosi; Gianni Tognoni; Udo Seedorf; Stephan Rust; Per Eriksson; Anders Hamsten; Martin Farrall; Hugh Watkins

2007-01-01

354

Earth Water Filter  

NSDL National Science Digital Library

In this video segment adapted from ZOOM, cast members try to make the most effective water filter. They experiment with filtering dirty, salty water through different combinations of sand, gravel, and a cotton bandana.

2005-12-17

355

Lymphotoxin-alpha and galectin-2 SNPs are not associated with myocardial infarction in two different German populations.  

PubMed

Recent data provided strong evidence for the association of single nucleotide polymorphisms (SNPs) in the lymphotoxin-alpha (LTA) and galectin-2 (LGALS2) genes with myocardial infarction (MI) in a Japanese population. For populations of other genetic background, the relevance of these polymorphisms in the pathogenesis of MI remains controversial. We aimed to define the role of LTA and LGALS2 SNPs in two German MI populations with markedly different ascertainment strategies. Two different MI populations were studied. In the first population, MI patients were ascertained by a strong family history of MI (n = 1214). Controls were unrelated disease-free participants of the study (n = 1080). The second population included patients suffering from sporadic (nonfamilial) MI from the German KORA register (n = 607). The control group consisted of participants of the WHO MONICA survey in Germany (n = 1492). TaqMan assays were used to determine the genotypes of 4 SNPs in the LTA genomic region and 1 SNP in the LGALS2 gene. Single SNPs in both genomic regions as well as haplotypes in the LTA genomic region were tested for association in various models of inheritance. No association with MI could be found for any of the examined SNPs in the LTA genomic region and LGALS2 gene, or for haplotypes spanning the LTA genomic region. In two MI populations of European descent with markedly different ascertainment strategies, we were not able to identify a significant association of SNPs in the LTA genomic region or the LGALS2 gene with MI. These variants are unlikely to play a significant role in populations of European origin. PMID:17497114

Sedlacek, Kamil; Neureuther, Katharina; Mueller, Jakob C; Stark, Klaus; Fischer, Marcus; Baessler, Andrea; Reinhard, Wibke; Broeckel, Ulrich; Lieb, Wolfgang; Erdmann, Jeanette; Schunkert, Heribert; Riegger, Günter; Illig, Thomas; Meitinger, Thomas; Hengstenberg, Christian

2007-09-01

356

Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis.  

PubMed

Common diseases such as endometriosis (ED), Alzheimer's disease (AD) and multiple sclerosis (MS) account for a significant proportion of the health care burden in many countries. Genome-wide association studies (GWASs) for these diseases have identified a number of individual genetic variants contributing to the risk of those diseases. However, the effect size for most variants is small and collectively the known variants explain only a small proportion of the estimated heritability. We used a linear mixed model to fit all single nucleotide polymorphisms (SNPs) simultaneously, and estimated genetic variances on the liability scale using SNPs from GWASs in unrelated individuals for these three diseases. For each of the three diseases, case and control samples were not all genotyped in the same laboratory. We demonstrate that a careful analysis can obtain robust estimates, but also that insufficient quality control (QC) of SNPs can lead to spurious results and that too stringent QC is likely to remove real genetic signals. Our estimates show that common SNPs on commercially available genotyping chips capture significant variation contributing to liability for all three diseases. The estimated proportion of total variation tagged by all SNPs was 0.26 (SE 0.04) for ED, 0.24 (SE 0.03) for AD and 0.30 (SE 0.03) for MS. Further, we partitioned the genetic variance explained into five categories by a minor allele frequency (MAF), by chromosomes and gene annotation. We provide strong evidence that a substantial proportion of variation in liability is explained by common SNPs, and thereby give insights into the genetic architecture of the diseases. PMID:23193196

Lee, S Hong; Harold, Denise; Nyholt, Dale R; Goddard, Michael E; Zondervan, Krina T; Williams, Julie; Montgomery, Grant W; Wray, Naomi R; Visscher, Peter M

2013-02-15

357

SIRT3 SNPs validation in 640 individuals, functional analyses and new insights into SIRT3 stability.  

PubMed

Sirtuins are critical players within multiple cellular pathways such as stress response, apoptosis and energy metabolism. They are associated with metabolic and degenerative diseases, the pathogenesis of cancer and are key elements in the regulation of cellular life span. From within the 7 known human sirtuins, SIRT3 recently stepped out of the shadow of SIRT1 showing strong effects on stress response, apoptosis, cell cycle and energy metabolism, mimicking effects of caloric restriction. We have identified two non-synonymous human SIRT3 SNPs and evaluated their impact on SIRT3 activity and stability. We assessed their influence on cellular energy metabolism in relation to SIRT1 and identified SIRT3 to increase cellular respiration by 80% when compared to SIRT1, which increased cellular respiration by only 30%. PMID:20198340

Dransfeld, Christian-Lars; Alborzinia, Hamed; Wölfl, Stefan; Mahlknecht, Ulrich

2010-04-01

358

Survey of digital filtering  

NASA Technical Reports Server (NTRS)

A three part survey is made of the state-of-the-art in digital filtering. Part one presents background material including sampled data transformations and the discrete Fourier transform. Part two, digital filter theory, gives an in-depth coverage of filter categories, transfer function synthesis, quantization and other nonlinear errors, filter structures and computer aided design. Part three presents hardware mechanization techniques. Implementations by general purpose, mini-, and special-purpose computers are presented.

Nagle, H. T., Jr.

1972-01-01

359

A genome-wide study of common SNPs and CNVs in cognitive performance in the CANTAB.  

PubMed

Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in approximately 750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics. PMID:19734545

Need, Anna C; Attix, Deborah K; McEvoy, Jill M; Cirulli, Elizabeth T; Linney, Kristen L; Hunt, Priscilla; Ge, Dongliang; Heinzen, Erin L; Maia, Jessica M; Shianna, Kevin V; Weale, Michael E; Cherkas, Lynn F; Clement, Gail; Spector, Tim D; Gibson, Greg; Goldstein, David B

2009-12-01

360

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties.  

PubMed

Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. PMID:20689580

Huang, Tao; Wang, Ping; Ye, Zhi-Qiang; Xu, Heng; He, Zhisong; Feng, Kai-Yan; Hu, Lele; Cui, Weiren; Wang, Kai; Dong, Xiao; Xie, Lu; Kong, Xiangyin; Cai, Yu-Dong; Li, Yixue

2010-01-01

361

Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties  

PubMed Central

Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. PMID:20689580

Xu, Heng; He, Zhisong; Feng, Kai-Yan; Hu, LeLe; Cui, WeiRen; Wang, Kai; Dong, Xiao; Xie, Lu; Kong, Xiangyin; Cai, Yu-Dong; Li, Yixue

2010-01-01

362

Association of OCT derived drusen measurements with AMD associated-genotypic SNPs in Amish population  

PubMed Central

Purpose To investigate the association of OCT derived drusen measures in Amish age-related macular degeneration (AMD) patients with known loci for macular degeneration. Methods Members of the Old Order Amish community in Pennsylvania ages 50 and older were assessed for drusen area, volume and regions of retinal pigment epithelium (RPE) atrophy using a Cirrus High- Definition-OCT. Measurements were obtained in the macula region within a central circle (CC) of 3 mm diameter and a surrounding perifoveal ring (PR) of 3 to 5 mm diameter using the Cirrus OCT RPE analysis software. Other demographic information including age, gender and smoking status were collected. Study subjects were further genotyped to determine their risk for the AMD associated SNPs in SYN3, LIPC, ARMS2, C3, CFB, CETP, CFI and CFH genes using TaqMan genotyping assays. The association of genotypes with OCT measures were assessed using linear trend p-values calculated from univariate and multivariate generalized linear models. Results 432 eyes were included in the analysis. Multivariate analysis (adjusted by age, gender and smoking status) confirmed the known significant association between AMD and macular drusen with the number of CFH risk alleles for drusen area (area increased 0.12 mm2 for a risk allele increase, p<0.01), drusen volume (volume increased 0.01 mm3 for a risk allele increase, p?0.05) and area of RPE atrophy (area increased 0.43 mm2 for a risk allele increase, p=0.003). SYN3 risk allele G is significantly associated with larger area PR (area increased 0.09 mm2 for a risk allele increase, p=0.03) and larger drusen volume in central circle (volume increased 0.01 mm3 for a risk allele increase, p=0.04). Conclusion Among the genotyped SNPs tested, the CFH risk genotype appears to play a major role in determining the drusen phenotype in the Amish AMD population.

Chavali, Venkata Ramana Murthy; Diniz, Bruno; Huang, Jiayan; Ying, Gui-shuang; Sadda, SriniVas R.; Stambolian, Dwight

2015-01-01

363

Genetic Associations between Neuregulin-1 SNPs and Neurocognitive Function in Multigenerational, Multiplex Schizophrenia Families  

PubMed Central

Objectives Recent work shows promising associations between schizophrenia and polymorphisms in Neuregulin-1 (NRG1) and a large literature also finds strong familial relationships between schizophrenia and cognitive deficits. Given the role of NRG1 in glutamate regulation and glutamate’s effect on cognition, we hypothesized that cognitive deficits may be related to variation within NRG1, providing a possible mechanism to increase risk for schizophrenia. Method This study examined the associations between NRG1, cognition, and schizophrenia using a multigenerational multiplex family sample (total N = 419, 40 families), including 58 affected participants (schizophrenia or schizoaffective disorder-depressed type) and their 361 unaffected relatives. Participants were genotyped for 40 NRG1 single nucleotide polymorphisms (SNPs), chosen largely based on previous associations with schizophrenia. All participants completed structured diagnostic interviews and a computerized neurocognitive battery assessing eight cognitive domains. Variance component quantitative trait analyses tested for associations between individual NRG1 SNPs and cognitive performance in the total sample, a subsample of healthy participants with no DSM diagnosis, and using general intelligence as a covariate. Results Effect sizes (within-family beta coefficients) ranged from 0.08 to 0.73, and 61 of these associations were nominally significant (p?.05), with 12 associations at p?.01, although none achieved the modified Bonferroni significance threshold of p<.0003. Attention was the most frequently nominally associated domain and rs10503929, a non-synonymous SNP, was the most frequently nominally associated SNP. Conclusions Although not significant experiment-wise, these findings suggest that further study of the associations between variation in NRG1 and cognition may be productive. PMID:22183611

Yokley, Jessica L.; Prasad, Konasale M.; Chowdari, Kodavali V.; Talkowski, Michael E.; Wood, Joel; Gur, Ruben C.; Gur, Raquel E.; Almasy, Laura; Nimgaonkar, Vishwajit L.; Pogue-Geile, Michael F.

2011-01-01

364

The evolutionary history of Afrocanarian blue tits inferred from genomewide SNPs.  

PubMed

A common challenge in phylogenetic reconstruction is to find enough suitable genomic markers to reliably trace splitting events with short internodes. Here, we present phylogenetic analyses based on genomewide single-nucleotide polymorphisms (SNPs) of an enigmatic avian radiation, the subspecies complex of Afrocanarian blue tits (Cyanistes teneriffae). The two sister species, the Eurasian blue tit (Cyanistes caeruleus) and the azure tit (Cyanistes cyanus), constituted the out-group. We generated a large data set of SNPs for analysis of population structure and phylogeny. We also adapted our protocol to utilize degraded DNA from old museum skins from Libya. We found strong population structuring that largely confirmed subspecies monophyly and constructed a coalescent-based phylogeny with full support at all major nodes. The results are consistent with a recent hypothesis that La Palma and Libya are relic populations of an ancient Afrocanarian blue tit, although a small data set for Libya could not resolve its position relative to La Palma. The birds on the eastern islands of Fuerteventura and Lanzarote are similar to those in Morocco. Together they constitute the sister group to the clade containing the other Canary Islands (except La Palma), in which El Hierro is sister to the three central islands. Hence, extant Canary Islands populations seem to originate from multiple independent colonization events. We also found population divergences in a key reproductive trait, viz. sperm length, which may constitute reproductive barriers between certain populations. We recommend a taxonomic revision of this polytypic species, where several subspecies should qualify for species rank. PMID:25407440

Gohli, Jostein; Leder, Erica H; Garcia-Del-Rey, Eduardo; Johannessen, Lars Erik; Johnsen, Arild; Laskemoen, Terje; Popp, Magnus; Lifjeld, Jan T

2015-01-01

365

Disrupted-in-Schizophrenia-1 SNPs and Susceptibility to Schizophrenia: Evidence from Malaysia  

PubMed Central

Objective Even though the role of the DICS1 gene as a risk factor for schizophrenia is still unclear, there is substantial evidence from functional and cell biology studies that supports the connection of the gene with schizophrenia. The studies associating the DISC1 gene with schizophrenia in Asian populations are limited to East-Asian populations. Our study examined several DISC1 markers of schizophrenia that were identified in the Caucasian and East-Asian populations in Malaysia and assessed the role of rs2509382, which is located at 11q14.3, the mutual translocation region of the famous DISC1 translocation [t (1; 11) (p42.1; q14.3)]. Methods We genotyped eleven single-neucleotide polymorphism (SNPs) within or related to DISC1 (rs821597, rs821616, rs4658971, rs1538979, rs843979, rs2812385, rs1407599, rs4658890, and rs2509382) using the PCR-RFLP methods. Results In all, there were 575 participants (225 schizophrenic patients and 350 healthy controls) of either Malay or Chinese ethnicity. The case-control analyses found two SNPs that were associated with schizophrenia [rs4658971 (p=0.030; OR=1.43 (1.35-1.99) and rs1538979-(p=0.036; OR=1.35 (1.02-1.80)] and rs2509382-susceptibility among the males schizophrenics [p=0.0082; OR=2.16 (1.22-3.81)]. This is similar to the meta-analysis findings for the Caucasian populations. Conclusion The study supports the notion that the DISC1 gene is a marker of schizophrenia susceptibility and that rs2509382 in the mutual DISC1 translocation region is a susceptibility marker for schizophrenia among males in Malaysia. However, the finding of the study is limited due to possible genetic stratification and the small sample size. PMID:25670952

Kartini, Abdullah; Norsidah, Kuzaifah; Ramli, Musa; Tariq, Abdul Razak; Wan Rohani, Wan Taib

2015-01-01

366

A genome-wide study of common SNPs and CNVs in cognitive performance in the CANTAB  

PubMed Central

Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in ?750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics. PMID:19734545

Need, Anna C.; Attix, Deborah K.; McEvoy, Jill M.; Cirulli, Elizabeth T.; Linney, Kristen L.; Hunt, Priscilla; Ge, Dongliang; Heinzen, Erin L.; Maia, Jessica M.; Shianna, Kevin V.; Weale, Michael E.; Cherkas, Lynn F.; Clement, Gail; Spector, Tim D.; Gibson, Greg; Goldstein, David B.

2009-01-01

367

Novel Backup Filter Device for Candle Filters  

SciTech Connect

The currently preferred means of particulate removal from process or combustion gas generated by advanced coal-based power production processes is filtration with candle filters. However, candle filters have not shown the requisite reliability to be commercially viable for hot gas clean up for either integrated gasifier combined cycle (IGCC) or pressurized fluid bed combustion (PFBC) processes. Even a single candle failure can lead to unacceptable ash breakthrough, which can result in (a) damage to highly sensitive and expensive downstream equipment, (b) unacceptably low system on-stream factor, and (c) unplanned outages. The U.S. Department of Energy (DOE) has recognized the need to have fail-safe devices installed within or downstream from candle filters. In addition to CeraMem, DOE has contracted with Siemens-Westinghouse, the Energy & Environmental Research Center (EERC) at the University of North Dakota, and the Southern Research Institute (SRI) to develop novel fail-safe devices. Siemens-Westinghouse is evaluating honeycomb-based filter devices on the clean-side of the candle filter that can operate up to 870 C. The EERC is developing a highly porous ceramic disk with a sticky yet temperature-stable coating that will trap dust in the event of filter failure. SRI is developing the Full-Flow Mechanical Safeguard Device that provides a positive seal for the candle filter. Operation of the SRI device is triggered by the higher-than-normal gas flow from a broken candle. The CeraMem approach is similar to that of Siemens-Westinghouse and involves the development of honeycomb-based filters that operate on the clean-side of a candle filter. The overall objective of this project is to fabricate and test silicon carbide-based honeycomb failsafe filters for protection of downstream equipment in advanced coal conversion processes. The fail-safe filter, installed directly downstream of a candle filter, should have the capability for stopping essentially all particulate bypassing a broken or leaking candle while having a low enough pressure drop to allow the candle to be backpulse-regenerated. Forward-flow pressure drop should increase by no more than 20% because of incorporation of the fail-safe filter.

Bishop, B.; Goldsmith, R.; Dunham, G.; Henderson, A.

2002-09-18

368

Filter service system  

DOEpatents

According to an exemplary embodiment of the present disclosure, a system for removing matter from a filtering device includes a gas pressurization assembly. An element of the assembly is removably attachable to a first orifice of the filtering device. The system also includes a vacuum source fluidly connected to a second orifice of the filtering device.

Sellers, Cheryl L. (Peoria, IL); Nordyke, Daniel S. (Arlington Heights, IL); Crandell, Richard A. (Morton, IL); Tomlins, Gregory (Peoria, IL); Fei, Dong (Peoria, IL); Panov, Alexander (Dunlap, IL); Lane, William H. (Chillicothe, IL); Habeger, Craig F. (Chillicothe, IL)

2008-12-09

369

Practical Active Capacitor Filter  

NASA Technical Reports Server (NTRS)

A method and apparatus is described that filters an electrical signal. The filtering uses a capacitor multiplier circuit where the capacitor multiplier circuit uses at least one amplifier circuit and at least one capacitor. A filtered electrical signal results from a direct connection from an output of the at least one amplifier circuit.

Shuler, Robert L., Jr. (Inventor)

2005-01-01

370

Guided image filtering.  

PubMed

In this paper, we propose a novel explicit image filter called guided filter. Derived from a local linear model, the guided filter computes the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter [1], but it has better behaviors near edges. The guided filter is also a more generic concept beyond smoothing: It can transfer the structures of the guidance image to the filtering output, enabling new filtering applications like dehazing and guided feathering. Moreover, the guided filter naturally has a fast and nonapproximate linear time algorithm, regardless of the kernel size and the intensity range. Currently, it is one of the fastest edge-preserving filters. Experiments show that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications, including edge-aware smoothing, detail enhancement, HDR compression, image matting/feathering, dehazing, joint upsampling, etc. PMID:23599054

He, Kaiming; Sun, Jian; Tang, Xiaoou

2013-06-01

371

Nonlinear Attitude Filtering Methods  

NASA Technical Reports Server (NTRS)

This paper provides a survey of modern nonlinear filtering methods for attitude estimation. Early applications relied mostly on the extended Kalman filter for attitude estimation. Since these applications, several new approaches have been developed that have proven to be superior to the extended Kalman filter. Several of these approaches maintain the basic structure of the extended Kalman filter, but employ various modifications in order to provide better convergence or improve other performance characteristics. Examples of such approaches include: filter QUEST, extended QUEST, the super-iterated extended Kalman filter, the interlaced extended Kalman filter, and the second-order Kalman filter. Filters that propagate and update a discrete set of sigma points rather than using linearized equations for the mean and covariance are also reviewed. A two-step approach is discussed with a first-step state that linearizes the measurement model and an iterative second step to recover the desired attitude states. These approaches are all based on the Gaussian assumption that the probability density function is adequately specified by its mean and covariance. Other approaches that do not require this assumption are reviewed, including particle filters and a Bayesian filter based on a non-Gaussian, finite-parameter probability density function on SO(3). Finally, the predictive filter, nonlinear observers and adaptive approaches are shown. The strengths and weaknesses of the various approaches are discussed.

Markley, F. Landis; Crassidis, John L.; Cheng, Yang

2005-01-01

372

HEPA filter encapsulation  

DOEpatents

A low viscosity resin is delivered into a spent HEPA filter or other waste. The resin is introduced into the filter or other waste using a vacuum to assist in the mass transfer of the resin through the filter media or other waste.

Gates-Anderson, Dianne D. (Union City, CA); Kidd, Scott D. (Brentwood, CA); Bowers, John S. (Manteca, CA); Attebery, Ronald W. (San Lorenzo, CA)

2003-01-01

373

Regenerative particulate filter development  

NASA Technical Reports Server (NTRS)

Development, design, and fabrication of a prototype filter regeneration unit for regenerating clean fluid particle filter elements by using a backflush/jet impingement technique are reported. Development tests were also conducted on a vortex particle separator designed for use in zero gravity environment. A maintainable filter was designed, fabricated and tested that allows filter element replacement without any leakage or spillage of system fluid. Also described are spacecraft fluid system design and filter maintenance techniques with respect to inflight maintenance for the space shuttle and space station.

Descamp, V. A.; Boex, M. W.; Hussey, M. W.; Larson, T. P.

1972-01-01

374

Compact planar microwave blocking filters  

NASA Technical Reports Server (NTRS)

A compact planar microwave blocking filter includes a dielectric substrate and a plurality of filter unit elements disposed on the substrate. The filter unit elements are interconnected in a symmetrical series cascade with filter unit elements being organized in the series based on physical size. In the filter, a first filter unit element of the plurality of filter unit elements includes a low impedance open-ended line configured to reduce the shunt capacitance of the filter.

U-Yen, Kongpop (Inventor); Wollack, Edward J. (Inventor)

2012-01-01

375

Ceramic fiber filter technology  

SciTech Connect

Fibrous filters have been used for centuries to protect individuals from dust, disease, smoke, and other gases or particulates. In the 1970s and 1980s ceramic filters were developed for filtration of hot exhaust gases from diesel engines. Tubular, or candle, filters have been made to remove particles from gases in pressurized fluidized-bed combustion and gasification-combined-cycle power plants. Very efficient filtration is necessary in power plants to protect the turbine blades. The limited lifespan of ceramic candle filters has been a major obstacle in their development. The present work is focused on forming fibrous ceramic filters using a papermaking technique. These filters are highly porous and therefore very lightweight. The papermaking process consists of filtering a slurry of ceramic fibers through a steel screen to form paper. Papermaking and the selection of materials will be discussed, as well as preliminary results describing the geometry of papers and relative strengths.

Holmes, B.L.; Janney, M.A.

1996-06-01

376

Functional nsSNPs from carcinogenesis-related genes expressed in breast tissue: Potential breast cancer risk alleles and their distribution across human populations  

PubMed Central

Although highly penetrant alleles of BRCA1 and BRCA2 have been shown to predispose to breast cancer, the majority of breast cancer cases are assumed to result from the presence of low-moderate penetrant alleles and environmental carcinogens. Non-synonymous single nucleotide polymorphisms (nsSNPs) are hypothesised to contribute to disease susceptibility and approximately 30 per cent of them are predicted to have a biological significance. In this study, we have applied a bioinformatics-based strategy to identify breast cancer-related nsSNPs from 981 carcinogenesis-related genes expressed in breast tissue. Our results revealed a total of 367 validated nsSNPs, 109 (29.7 per cent) of which are predicted to affect the protein function (functional nsSNPs), suggesting that these nsSNPs are likely to influence the development and homeostasis of breast tissue and hence contribute to breast cancer susceptibility. Sixty-seven of the functional nsSNPs presented as commonly occurring nsSNPs (minor allele frequencies ? 5 per cent), representing excellent candidates for breast cancer susceptibility. Additionally, a non-uniform distribution of the common functional nsSNPs among different human populations was observed: 15 nsSNPs were reported to be present in all populations analysed, whereas another set of 15 nsSNPs was specific to particular population(s). We propose that the nsSNPs analysed in this study constitute a unique resource of potential genetic factors for breast cancer susceptibility. Furthermore, the variations in functional nsSNP allele frequencies across major population backgrounds may point to the potential variability of the molecular basis of breast cancer predisposition and treatment response among different human populations. PMID:16595073

2006-01-01

377

Simultaneous analysis of hundreds of Y-chromosomal SNPs for high-resolution paternal lineage classification using targeted semiconductor sequencing.  

PubMed

SNPs from the non-recombining part of the human Y chromosome (Y-SNPs) are informative to classify paternal lineages in forensic, genealogical, anthropological, and evolutionary studies. Although thousands of Y-SNPs were identified thus far, previous Y-SNP multiplex tools target only dozens of markers simultaneously, thereby restricting the provided Y-haplogroup resolution and limiting their applications. Here, we overcome this shortcoming by introducing a high-resolution multiplex tool for parallel genotyping-by-sequencing of 530 Y-SNPs using the Ion Torrent PGM platform, which allows classification of 432 worldwide Y haplogroups. Contrary to previous Y-SNP multiplex tools, our approach covers branches of the entire Y tree, thereby maximizing the paternal lineage classification obtainable. We used a default DNA input amount of 10 ng per reaction but preliminary sensitivity testing revealed positive results from as little as 100 pg input DNA. Furthermore, we demonstrate that sample pooling using barcodes is feasible, allowing increased throughput for lower per-sample costs. In addition to the wetlab protocol, we provide a software tool for automated data quality control and haplogroup classification. The unique combination of ultra-high marker density and high sensitivity achievable from low amounts of potentially degraded DNA makes this new multiplex tool suitable for a wide range of Y-chromosome applications. PMID:25338970

Ralf, Arwin; van Oven, Mannis; Zhong, Kaiyin; Kayser, Manfred

2015-01-01

378

Analysis of artificially degraded DNA using STRs and SNPs—results of a collaborative European (EDNAP) exercise  

Microsoft Academic Search

Recently, there has been much debate about what kinds of genetic markers should be implemented as new core loci that constitute national DNA databases. The choices lie between conventional STRs, ranging in size from 100 to 450bp; mini-STRs, with amplicon sizes less than 200bp; and single nucleotide polymorphisms (SNPs). There is general agreement by the European DNA Profiling Group (EDNAP)

L. A. Dixon; A. E. Dobbins; H. K. Pulker; J. M. Butler; P. M. Vallone; M. D. Coble; W. Parson; B. Berger; P. Grubwieser; H. S. Mogensen; N. Morling; K. Nielsen; J. J. Sanchez; E. Petkovski; A. Carracedo; P. Sanchez-Diz; E. Ramos-Luis; M. Bri?n; J. A. Irwin; R. S. Just; O. Loreille; T. J. Parsons; D. Syndercombe-Court; H. Schmitter; B. Stradmann-Bellinghausen; K. Bender; P. Gill

2006-01-01

379

IN SILICO DISCOVERY, MAPPING, AND GENOTYPING OF 1,039 CATTLE SNPS ON A PANEL OF EIGHTEEN BREEDS  

Technology Transfer Automated Retrieval System (TEKTRAN)

To contribute to cattle haplotype map construction we discovered ~3,000 putative single nucleotide polymorphisms (SNPs) by comparison of repeat-masked BAC-end sequences (BESs) from the cattle RPCI-42 BAC library with the cattle whole-genome shotgun (WGS) contigs. For the sequence alignment, the Time...

380

Non-replication of an association of SGIP1 SNPs with alcohol dependence and resting theta EEG power  

PubMed Central

OBJECTIVE A recent study in a sample of Plains Indians showed association between eight SNPs located in the SGIP1 gene and resting theta electroencephalogram (EEG) power (Hodgkinson et al., 2010). This association appeared to generalize to alcohol use disorders, for which EEG power is a potential endophenotype. METHODS We analyzed a large, diverse sample for replication of the association of these implicated SGIP1 SNPs (genotyped on the Illumina 1M platform) with alcohol dependence (N = 3988) and theta EEG power (N = 1066). RESULTS We found no evidence of association of the previously implicated SGIP1 SNPs with either alcohol dependence or theta EEG power (all p > 0.15) in the current sample. CONCLUSIONS The previously implicated SNPs located in SGIP1 showed no association with alcohol dependence or theta EEG power in the present sample of individuals with European and/or African ancestry. This failure to replicate may be the result of differences in ancestry between the current and original samples. PMID:21317682

Derringer, Jaime; Krueger, Robert F.; Manz, Niklas; Porjesz, Bernice; Almasy, Laura; Bookman, Ebony; Edenberg, Howard J.; Kramer, John R.; Tischfield, Jay A.; Bierut, Laura J.

2011-01-01

381

Assignment of chromosomal locations for unassigned SNPs\\/scaffolds based on pair-wise linkage disequilibrium estimates  

Microsoft Academic Search

Background: Recent developments of high-density SNP chips across a number of species require accurate genetic maps. Despite rapid advances in genome sequence assembly and availability of a number of tools for creating genetic maps, the exact genome location for a number of SNPs from these SNP chips still remains unknown. We have developed a locus ordering procedure based on linkage

Mehar S. Khatkar; Matthew Hobbs; Markus Neuditschko; Johann Sölkner; Frank W. Nicholas; Herman W. Raadsma

2010-01-01

382

In Silico Screening, Genotyping, Molecular Dynamics Simulation and Activity Studies of SNPs in Pyruvate Kinase M2  

PubMed Central

Role of, 29-non-synonymous, 15-intronic, 3-close to UTR, single nucleotide polymorphisms (SNPs) and 2 mutations of Human Pyruvate Kinase (PK) M2 were investigated by in-silico and in-vitro functional studies. Prediction of deleterious substitutions based on sequence homology and structure based servers, SIFT, PANTHER, SNPs&GO, PhD-SNP, SNAP and PolyPhen, depicted that 19% emerged common between all the mentioned programs. SNPeffect and HOPE showed three substitutions (C31F, Q310P and S437Y) in-silico as deleterious and functionally important. In-vitro activity assays showed C31F and S437Y variants of PKM2 with reduced activity, while Q310P variant was catalytically inactive. The allosteric activation due to binding of fructose 1-6 bisphosphate (FBP) was compromised in case of S437Y nsSNP variant protein. This was corroborated through molecular dynamics (MD) simulation study, which was also carried out in other two variant proteins. The 5 intronic SNPs of PKM2, associated with sporadic breast cancer in a case-control study, when subjected to different computational analyses, indicated that 3 SNPs (rs2856929, rs8192381 and rs8192431) could generate an alternative transcript by influencing splicing factor binding to PKM2. We propose that these, potentially functional and important variations, both within exons and introns, could have a bearing on cancer metabolism, since PKM2 has been implicated in cancer in the recent past. PMID:25768091

Kalaiarasan, Ponnusamy; Kumar, Bhupender; Chopra, Rupali; Gupta, Vibhor; Subbarao, Naidu; Bamezai, Rameshwar N. K.

2015-01-01

383

Application of SNPs in forensic casework: Identification of pathological and autoptical specimens due to sample mix-up  

Microsoft Academic Search

The analysis of single nucleotide polymorphisms (SNPs) together with conventional short tandem repeat (STR) and mitochondrial DNA (mtDNA) typing provide a forensic genetic approach for the identification of pathological and autoptical specimens in cases where the average length of DNA fragments is shorter than 150bp in highly degraded samples. We applied a forensic genetic approach to digesta accidentally left after

Yuzo Takada; Tomoharu Tokutomi; Jun Kanetake; Masahiro Mukaida

2009-01-01

384

A Brief and Preliminary Look at SNPs Data for some Bering-Chukchi- Beaufort Seas Bowhead Whales  

Microsoft Academic Search

We present preliminary analyses of 18 single nucleotide polymorphism markers (SNPs) for 106 bowhead whales from Barrow and St. Lawrence Island. We find no evidence for disequilibrium, population substructure, or genetic variation associated with temporal spacing of whales in the migration. We analyzed data for 18 SNP loci from 106 bowhead whales. The samples were collected from three sites: Barrow

Geof Givens; Marisa Williams; Phillip A. Morin; Brittany Hancock; J. Craig George

385

Association between IL-10a SNPs and resistance to cyprinid herpesvirus-3 infection in common carp (Cyprinus carpio)  

Technology Transfer Automated Retrieval System (TEKTRAN)

Analysis of gene polymorphisms and disease association is essential for assessing putative candidate genes affecting susceptibility or resistance to disease. In this paper, we report the results of an association analysis between SNPs in common carp innate immune response genes and resistance to Cy...

386

Association of SNPs in GHSR rs292216 and rs509035 on dietary intake in Indonesian obese female adolescents  

PubMed Central

Background: Obesity has been linked to high dietary intake and low physical activity. Studies showed that those factors were not only regulated by environment but also by genetic. However, the relationship is less been understood in obese children and adolescents. Objective: The objective of this study was to examine the role of SNPs in GHSR rs292216 and rs509035 on dietary intake in obese female adolescents. Methods: This is an observational study with cross sectional design. Respondents were obese female adolescents enrolled from obesity screening done in six junior high schools in Yogyakarta. Dietary intake was measured using 6 days 24 hours inconsecutive dietary recall. Genotyping of 2 SNPs from GHSR was done using FRLP-PCR. Results: There were 78 obese female adolescents joined this study. We found that no significant association between SNPs GHSR and dietary intake (p < 0.05). In addition, a SNP-SNP interaction analysis shown there is no difference between combination of GHSR rs292216 and rs509035 on dietary intake (p < 0.05). Conclusion: We concluded that SNPs on GHSR rs292216 and rs509035 were not related to dietary intake in Indonesian obese female adolescents. Further study is necessary to investigate the effect of those genes on dietary intake in the broader population.

Luglio, Harry Freitag; Inggriyani, Cut Gina; Huriyati, Emy; Julia, Madarina; Susilowati, Rina

2014-01-01

387

Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project  

PubMed Central

Researchers have successfully applied exome sequencing to discover causal variants in selected individuals with familial, highly penetrant disorders. We demonstrate the utility of exome sequencing followed by imputation for discovering low-frequency variants associated with complex quantitative traits. We performed exome sequencing in a reference panel of 761 African Americans and then imputed newly discovered variants into a larger sample of more than 13,000 African Americans for association testing with the blood cell traits hemoglobin, hematocrit, white blood count, and platelet count. First, we illustrate the feasibility of our approach by demonstrating genome-wide-significant associations for variants that are not covered by conventional genotyping arrays; for example, one such association is that between higher platelet count and an MPL c.117G>T (p.Lys39Asn) variant encoding a p.Lys39Asn amino acid substitution of the thrombpoietin receptor gene (p = 1.5 × 10?11). Second, we identified an association between missense variants of LCT and higher white blood count (p = 4 × 10?13). Third, we identified low-frequency coding variants that might account for allelic heterogeneity at several known blood cell-associated loci: MPL c.754T>C (p.Tyr252His) was associated with higher platelet count; CD36 c.975T>G (p.Tyr325?) was associated with lower platelet count; and several missense variants at the ?-globin gene locus were associated with lower hemoglobin. By identifying low-frequency missense variants associated with blood cell traits not previously reported by genome-wide association studies, we establish that exome sequencing followed by imputation is a powerful approach to dissecting complex, genetically heterogeneous traits in large population-based studies. PMID:23103231

Auer, Paul L.; Johnsen, Jill M.; Johnson, Andrew D.; Logsdon, Benjamin A.; Lange, Leslie A.; Nalls, Michael A.; Zhang, Guosheng; Franceschini, Nora; Fox, Keolu; Lange, Ethan M.; Rich, Stephen S.; O’Donnell, Christopher J.; Jackson, Rebecca D.; Wallace, Robert B.; Chen, Zhao; Graubert, Timothy A.; Wilson, James G.; Tang, Hua; Lettre, Guillaume; Reiner, Alex P.; Ganesh, Santhi K.; Li, Yun

2012-01-01

388

Combining information from two data sources with misreporting and incompleteness to assess hospice-use among cancer patients: a multiple imputation approach.  

PubMed

Combining information from multiple data sources can enhance estimates of health-related measures by using one source to supply information that is lacking in another, assuming the former has accurate and complete data. However, there is little research conducted on combining methods when each source might be imperfect, for example, subject to measurement errors and/or missing data. In a multisite study of hospice-use by late-stage cancer patients, this variable was available from patients' abstracted medical records, which may be considerably underreported because of incomplete acquisition of these records. Therefore, data for Medicare-eligible patients were supplemented with their Medicare claims that contained information on hospice-use, which may also be subject to underreporting yet to a lesser degree. In addition, both sources suffered from missing data because of unit nonresponse from medical record abstraction and sample undercoverage for Medicare claims. We treat the true hospice-use status from these patients as a latent variable and propose to multiply impute it using information from both data sources, borrowing the strength from each. We characterize the complete-data model as a product of an 'outcome' model for the probability of hospice-use and a 'reporting' model for the probability of underreporting from both sources, adjusting for other covariates. Assuming the reports of hospice-use from both sources are missing at random and the underreporting are conditionally independent, we develop a Bayesian multiple imputation algorithm and conduct multiple imputation analyses of patient hospice-use in demographic and clinical subgroups. The proposed approach yields more sensible results than alternative methods in our example. Our model is also related to dual system estimation in population censuses and dual exposure assessment in epidemiology. PMID:24804628

He, Yulei; Landrum, Mary Beth; Zaslavsky, Alan M

2014-09-20

389

Cacao single-nucleotide polymorphism (SNP) markers: A discovery strategy to identify SNPs for genotyping, genetic mapping and genome wide association studies (GWAS)  

Technology Transfer Automated Retrieval System (TEKTRAN)

Single-nucleotide polymorphisms (SNPs) are the most common genetic markers in Theobroma cacao, occurring approximately once in every 200 nucleotides. SNPs, like microsatellites, are co-dominant and PCR-based, but they have several advantages over microsatellites. They are unambiguous, so that a SN...

390

Single nucleotide polymorphisms (SNPs) in a set of expressed-sequence tag (EST) and conserved ortholog set II (COSII) markers in cultivated tomato (Solanum lycopersicum L.)  

Technology Transfer Automated Retrieval System (TEKTRAN)

Single nucleotide polymorphisms (SNPs) are the fundamental unit of genetic variation and are applied as molecular tools for genetic mapping, breeding, germplasm characterization, taxonomy, and evaluation of distinctness, uniformity and stability (DUS). We report 29 novel SNPs in 10 EST and COSII ma...

391

CDH13 promoter SNPs with pleiotropic effect on cardiometabolic parameters represent methylation QTLs.  

PubMed

CDH13 encodes T-cadherin, a receptor for high molecular weight (HMW) adiponectin and low-density lipoprotein, promoting proliferation and migration of endothelial cells. Genome-wide association studies have mapped multiple variants in CDH13 associated with cardiometabolic traits (CMT) with variable effects across studies. We hypothesized that this heterogeneity might reflect interplay with DNA methylation within the region. Resequencing and EpiTYPER™ assay were applied for the HYPertension in ESTonia/Coronary Artery Disease in Czech (HYPEST/CADCZ; n = 358) samples to identify CDH13 promoter SNPs acting as methylation Quantitative Trait Loci (meQTLs) and to investigate their associations with CMT. In silico data were extracted from genome-wide DNA methylation and genotype datasets of the population-based sample Estonian Genome Center of the University of Tartu (EGCUT; n = 165). HYPEST-CADCZ meta-analysis identified a rare variant rs113460564 as highly significant meQTL for a 134-bp distant CpG site (P = 5.90 × 10(-6); ? = 3.19 %). Four common SNPs (rs12443878, rs12444338, rs62040565, rs8060301) exhibited effect on methylation level of up to 3 neighboring CpG sites in both datasets. The strongest association was detected in EGCUT between rs8060301 and cg09415485 (false discovery rate corrected P value = 1.89 × 10(-30)). Simultaneously, rs8060301 showed association with diastolic blood pressure, serum high-density lipoprotein and HMW adiponectin (P < 0.005). Novel strong associations were identified between rare CDH13 promoter meQTLs (minor allele frequency <5 %) and HMW adiponectin: rs2239857 (P = 5.50 × 10(-5), ? = -1,841.9 ng/mL) and rs77068073 (P = 2.67 × 10(-4), ? = -2,484.4 ng/mL). Our study shows conclusively that CDH13 promoter harbors meQTLs associated with CMTs. It paves the way to deeper understanding of the interplay between DNA variation and methylation in susceptibility to common diseases. PMID:25543204

Putku, Margus; Kals, Mart; Inno, Rain; Kasela, Silva; Org, Elin; Kožich, Viktor; Milani, Lili; Laan, Maris

2015-03-01

392

Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae)  

PubMed Central

Background Cucurbita pepo belongs to the Cucurbitaceae family. The "Zucchini" types rank among the highest-valued vegetables worldwide, and other C. pepo and related Cucurbita spp., are food staples and rich sources of fat and vitamins. A broad range of genomic tools are today available for other cucurbits that have become models for the study of different metabolic processes. However, these tools are still lacking in the Cucurbita genus, thus limiting gene discovery and the process of breeding. Results We report the generation of a total of 512,751 C. pepo EST sequences, using 454 GS FLX Titanium technology. ESTs were obtained from normalized cDNA libraries (root, leaves, and flower tissue) prepared using two varieties with contrasting phenotypes for plant, flowering and fruit traits, representing the two C. pepo subspecies: subsp. pepo cv. Zucchini and subsp. ovifera cv Scallop. De novo assembling was performed to generate a collection of 49,610 Cucurbita unigenes (average length of 626 bp) that represent the first transcriptome of the species. Over 60% of the unigenes were functionally annotated and assigned to one or more Gene Ontology terms. The distributions of Cucurbita unigenes followed similar tendencies than that reported for Arabidopsis or melon, suggesting that the dataset may represent the whole Cucurbita transcriptome. About 34% unigenes were detected to have known orthologs of Arabidopsis or melon, including genes potentially involved in disease resistance, flowering and fruit quality. Furthermore, a set of 1,882 unigenes with SSR motifs and 9,043 high confidence SNPs between Zucchini and Scallop were identified, of which 3,538 SNPs met criteria for use with high throughput genotyping platforms, and 144 could be detected as CAPS. A set of markers were validated, being 80% of them polymorphic in a set of variable C. pepo and C. moschata accessions. Conclusion We present the first broad survey of gene sequences and allelic variation in C. pepo, where limited prior genomic information existed. The transcriptome provides an invaluable new tool for biological research. The developed molecular markers are the basis for future genetic linkage and quantitative trait loci analysis, and will be essential to speed up the process of breeding new and better adapted squash varieties. PMID:21310031

2011-01-01

393

Generic Kalman Filter Software  

NASA Technical Reports Server (NTRS)

The Generic Kalman Filter (GKF) software provides a standard basis for the development of application-specific Kalman-filter programs. Historically, Kalman filters have been implemented by customized programs that must be written, coded, and debugged anew for each unique application, then tested and tuned with simulated or actual measurement data. Total development times for typical Kalman-filter application programs have ranged from months to weeks. The GKF software can simplify the development process and reduce the development time by eliminating the need to re-create the fundamental implementation of the Kalman filter for each new application. The GKF software is written in the ANSI C programming language. It contains a generic Kalman-filter-development directory that, in turn, contains a code for a generic Kalman filter function; more specifically, it contains a generically designed and generically coded implementation of linear, linearized, and extended Kalman filtering algorithms, including algorithms for state- and covariance-update and -propagation functions. The mathematical theory that underlies the algorithms is well known and has been reported extensively in the open technical literature. Also contained in the directory are a header file that defines generic Kalman-filter data structures and prototype functions and template versions of application-specific subfunction and calling navigation/estimation routine code and headers. Once the user has provided a calling routine and the required application-specific subfunctions, the application-specific Kalman-filter software can be compiled and executed immediately. During execution, the generic Kalman-filter function is called from a higher-level navigation or estimation routine that preprocesses measurement data and post-processes output data. The generic Kalman-filter function uses the aforementioned data structures and five implementation- specific subfunctions, which have been developed by the user on the basis of the aforementioned templates. The GKF software can be used to develop many different types of unfactorized Kalman filters. A developer can choose to implement either a linearized or an extended Kalman filter algorithm, without having to modify the GKF software. Control dynamics can be taken into account or neglected in the filter-dynamics model. Filter programs developed by use of the GKF software can be made to propagate equations of motion for linear or nonlinear dynamical systems that are deterministic or stochastic. In addition, filter programs can be made to operate in user-selectable "covariance analysis" and "propagation-only" modes that are useful in design and development stages.

Lisano, Michael E., II; Crues, Edwin Z.

2005-01-01

394

Filtering separators having filter cleaning apparatus  

SciTech Connect

This invention relates to filtering separators of the kind having a housing which is subdivided by a partition, provided with parallel rows of holes or slots, into a dust-laden gas space for receiving filter elements positioned in parallel rows and being impinged upon by dust-laden gas from the outside towards the inside, and a clean gas space. In addition, the housing is provided with a chamber for cleansing the filter element surfaces of a row by counterflow action while covering at the same time the partition holes or slots leading to the adjacent rows of filter elements. The chamber is arranged for the supply of compressed air to at least one injector arranged to feed compressed air and secondary air to the row of filter elements to be cleansed. The chamber is also reciprocatingly displaceable along the partition in periodic and intermittent manner. According to the invention, a surface of the chamber facing towards the partition covers at least two of the rows of holes or slots of the partition, and the chamber is closed upon itself with respect to the clean gas space, and is connected to a compressed air reservoir via a distributor pipe and a control valve. At least one of the rows of holes or slots of the partition and the respective row of filter elements in flow communication therewith are in flow communication with the discharge side of at least one injector acted upon with compressed air. At least one other row of the rows of holes or slots of the partition and the respective row of filter elements is in flow communication with the suction side of the injector.

Margraf, A.

1984-08-28

395

Contactor/filter improvements  

DOEpatents

A contactor/filter arrangement for removing particulate contaminants from a gaseous stream is described. The filter includes a housing having a substantially vertically oriented granular material retention member with upstream and downstream faces, a substantially vertically oriented microporous gas filter element, wherein the retention member and the filter element are spaced apart to provide a zone for the passage of granular material therethrough. A gaseous stream containing particulate contaminants passes through the gas inlet means as well as through the upstream face of the granular material retention member, passing through the retention member, the body of granular material, the microporous gas filter element, exiting out of the gas outlet means. A cover screen isolates the filter element from contact with the moving granular bed. In one embodiment, the granular material is comprised of porous alumina impregnated with CuO, with the cover screen cleaned by the action of the moving granular material as well as by backflow pressure pulses. 6 figs.

Stelman, D.

1988-06-30

396

Hybrid Filter Membrane  

NASA Technical Reports Server (NTRS)

Cabin environmental control is an important issue for a successful Moon mission. Due to the unique environment of the Moon, lunar dust control is one of the main problems that significantly diminishes the air quality inside spacecraft cabins. Therefore, this innovation was motivated by NASA s need to minimize the negative health impact that air-suspended lunar dust particles have on astronauts in spacecraft cabins. It is based on fabrication of a hybrid filter comprising nanofiber nonwoven layers coated on porous polymer membranes with uniform cylindrical pores. This design results in a high-efficiency gas particulate filter with low pressure drop and the ability to be easily regenerated to restore filtration performance. A hybrid filter was developed consisting of a porous membrane with uniform, micron-sized, cylindrical pore channels coated with a thin nanofiber layer. Compared to conventional filter media such as a high-efficiency particulate air (HEPA) filter, this filter is designed to provide high particle efficiency, low pressure drop, and the ability to be regenerated. These membranes have well-defined micron-sized pores and can be used independently as air filters with discreet particle size cut-off, or coated with nanofiber layers for filtration of ultrafine nanoscale particles. The filter consists of a thin design intended to facilitate filter regeneration by localized air pulsing. The two main features of this invention are the concept of combining a micro-engineered straight-pore membrane with nanofibers. The micro-engineered straight pore membrane can be prepared with extremely high precision. Because the resulting membrane pores are straight and not tortuous like those found in conventional filters, the pressure drop across the filter is significantly reduced. The nanofiber layer is applied as a very thin coating to enhance filtration efficiency for fine nanoscale particles. Additionally, the thin nanofiber coating is designed to promote capture of dust particles on the filter surface and to facilitate dust removal with pulse or back airflow.

Laicer, Castro; Rasimick, Brian; Green, Zachary

2012-01-01

397

Y-SNPs Do Not Indicate Hybridisation between European Aurochs and Domestic Cattle  

PubMed Central

Background Previous genetic studies of modern and ancient mitochondrial DNA have confirmed the Near Eastern origin of early European domestic cattle. However, these studies were not able to test whether hybridisation with male aurochs occurred post-domestication. To address this issue, Götherström and colleagues (2005) investigated the frequencies of two Y-chromosomal haplotypes in extant bulls. They found a significant influence of wild aurochs males on domestic populations thus challenging the common view on early domestication and Neolithic stock-rearing. To test their hypothesis, we applied these Y-markers on Neolithic bone specimens from various European archaeological sites. Methods and Findings Here, we have analysed the ancient DNA of 59 Neolithic skeletal samples. After initial molecular sexing, two segregating Y-SNPs were identified in 13 bulls. Strikingly, our results do not support the hypothesis that these markers distinguish European aurochs from domesticated cattle. Conclusions The model of a rapid introduction of domestic cattle into Central Europe without significant crossbreeding with local wild cattle remains unchallenged. PMID:18852900

Bollongino, Ruth; Elsner, Julia; Vigne, Jean-Denis; Burger, Joachim

2008-01-01

398

A new ALF from Litopenaeus vannamei and its SNPs related to WSSV resistance  

NASA Astrophysics Data System (ADS)

Anti-lipopolysaccharide factors (ALFs) are basic components of the crustacean immune system that defend against a range of pathogens. The cDNA sequence of a new ALF, designated nLvALF2, with an open reading frame encoding 132 amino acids was cloned. Its deduced amino acid sequence contained the conserved functional domain of ALFs, the LPS binding domain (LBD). Its genomic sequence consisted of three exons and four introns. nLvALF2 was mainly expressed in the Oka organ and gills of shrimps. The transcriptional level of nLvALF2 increased significantly after white spot syndrome virus (WSSV) infection, suggesting its important roles in protecting shrimps from WSSV. Single nucleotide polymorphisms (SNPs) were found in the genomic sequence of nLvALF2, of which 38 were analyzed for associations with the susceptibility/resistance of shrimps to WSSV. The loci g.2422 A>G, g.2466 T>C, and g.2529 G>A were significantly associated with the resistance to WSSV ( P<0.05). These SNP loci could be developed as markers for selection of WSSV-resistant varieties of Litopenaeus vannamei.

Liu, Jingwen; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

2014-11-01

399

Externalizing Behaviors are associated with SNPs in the CHRNA5/CHRNA3/CHRNB4 gene cluster  

PubMed Central

There is strong evidence for shared genetic factors contributing to childhood externalizing disorders and substance abuse. Externalizing disorders often precede early substance experimentation, leading to the idea that individuals inherit a genetic vulnerability to generalized disinhibitory psychopathology. Genetic variation in the CHRNA5/CHRNA3/CHRNB4 gene cluster has been associated with early substance experimentation, nicotine dependence, and other drug behaviors. This study examines whether the CHRNA5/CHRNA3/CHRNB4 locus is correlated also with externalizing behaviors in three independent longitudinally assessed adolescent samples. We developed a common externalizing behavior phenotype from the available measures in the three samples, and tested for association with 10 SNPs in the gene cluster. Significant results were detected in two of the samples, including rs8040868, which remained significant after controlling for smoking quantity. These results expand on previous work focused mainly on drug behaviors, and support the hypothesis that variation in the CHRNA5/CHRNA3/CHRNB4 locus is associated with early externalizing behaviors. PMID:22042234

Stephens, Sarah H.; Hoft, Nicole R.; Schlaepfer, Isabel R.; Young, Susan E.; Corley, Robin C.; McQueen, Matthew B.; Hopfer, Christian; Crowley, Thomas; Stallings, Michael; Hewitt, John; Ehringer, Marissa A.

2012-01-01

400

Association of ESR1 gene tagging SNPs with breast cancer risk  

PubMed Central

We have conducted a three-stage, comprehensive single nucleotide polymorphism (SNP)-tagging association study of ESR1 gene variants (SNPs) in more than 55 000 breast cancer cases and controls from studies within the Breast Cancer Association Consortium (BCAC). No large risks or highly significant associations were revealed. SNP rs3020314, tagging a region of ESR1 intron 4, is associated with an increase in breast cancer susceptibility with a dominant mode of action in European populations. Carriers of the c-allele have an odds ratio (OR) of 1.05 [95% Confidence Intervals (CI) 1.02–1.09] relative to t-allele homozygotes, P = 0.004. There is significant heterogeneity between studies, P = 0.002. The increased risk appears largely confined to oestrogen receptor-positive tumour risk. The region tagged by SNP rs3020314 contains sequence that is more highly conserved across mammalian species than the rest of intron 4, and it may subtly alter the ratio of two mRNA splice forms. PMID:19126777

Dunning, Alison M.; Healey, Catherine S.; Baynes, Caroline; Maia, Ana-Teresa; Scollen, Serena; Vega, Ana; Rodríguez, Raquel; Barbosa-Morais, Nuno L.; Ponder, Bruce A.J.; Low, Yen-Ling; Bingham, Sheila; Haiman, Christopher A.; Le Marchand, Loic; Broeks, Annegien; Schmidt, Marjanka K.; Hopper, John; Southey, Melissa; Beckmann, Matthias W.; Fasching, Peter A.; Peto, Julian; Johnson, Nichola; Bojesen, Stig E.; Nordestgaard, Børge; Milne, Roger L.; Benitez, Javier; Hamann, Ute; Ko, Yon; Schmutzler, Rita K.; Burwinkel, Barbara; Schürmann, Peter; Dörk, Thilo; Heikkinen, Tuomas; Nevanlinna, Heli; Lindblom, Annika; Margolin, Sara; Mannermaa, Arto; Kosma, Veli-Matti; Chen, Xiaoqing; Spurdle, Amanda; Change-Claude, Jenny; Flesch-Janys, Dieter; Couch, Fergus J.; Olson, Janet E.; Severi, Gianluca; Baglietto, Laura; Børresen-Dale, Anne-Lise; Kristensen, Vessela; Hunter, David J.; Hankinson, Susan E.; Devilee, Peter; Vreeswijk, Maaike; Lissowska, Jolanta; Brinton, Louise; Liu, Jianjun; Hall, Per; Kang, Daehee; Yoo, Keun-Young; Shen, Chen-Yang; Yu, Jyh-Cherng; Anton-Culver, Hoda; Ziogoas, Argyrios; Sigurdson, Alice; Struewing, Jeff; Easton, Douglas F.; Garcia-Closas, Montserrat; Humphreys, Manjeet K.; Morrison, Jonathan; Pharoah, Paul D.P.; Pooley, Karen A.; Chenevix-Trench, Georgia

2009-01-01

401

Use of Long Term Molecular Dynamics Simulation in Predicting Cancer Associated SNPs  

PubMed Central

Computational prediction of cancer associated SNPs from the large pool of SNP dataset is now being used as a tool for detecting the probable oncogenes, which are further examined in the wet lab experiments. The lack in prediction accuracy has been a major hurdle in relying on the computational results obtained by implementing multiple tools, platforms and algorithms for cancer associated SNP prediction. Our result obtained from the initial computational compilations suggests the strong chance of Aurora-A G325W mutation (rs11539196) to cause hepatocellular carcinoma. The implementation of molecular dynamics simulation (MDS) approaches has significantly aided in raising the prediction accuracy of these results, but measuring the difference in the convergence time of mutant protein structures has been a challenging task while setting the simulation timescale. The convergence time of most of the protein structures may vary from 10 ns to 100 ns or more, depending upon its size. Thus, in this work we have implemented 200 ns of MDS to aid the final results obtained from computational SNP prediction technique. The MDS results have significantly explained the atomic alteration related with the mutant protein and are useful in elaborating the change in structural conformations coupled with the computationally predicted cancer associated mutation. With further advancements in the computational techniques, it will become much easier to predict such mutations with higher accuracy level. PMID:24722014

Kumar, Ambuj; Purohit, Rituraj

2014-01-01

402

SNPs3D: Candidate gene and SNP selection for association studies  

PubMed Central

Background The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level. Description The resource has three primary modules. One module identifies which genes are candidates for involvement in a specified disease. A second module provides information about the relationships between sets of candidate genes. The third module analyzes the likely impact of non-synonymous SNPs on protein function. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using simple but effective text profiling. SNP/protein function relationships are derived by two methods, one using principles of protein structure and stability, the other based on sequence conservation. Entries for each gene include a number of links to other data, such as expression profiles, pathway context, mouse knockout information and papers. Gene-gene interactions are presented in an interactive graphical interface, providing rapid access to the underlying information, as well as convenient navigation through the network. Use of the resource is illustrated with aspects of the inflammatory response and hypertension. Conclusion The combination of SNP impact analysis, a knowledge based network of gene relationships and candidate genes, and access to a wide range of data and literature allow a user to quickly assimilate available information, and so develop models of gene-pathway-disease interaction. PMID:16551372

Yue, Peng; Melamud, Eugene; Moult, John

2006-01-01

403

Au-nanoprobes for detection of SNPs associated with antibiotic resistance in Mycobacterium tuberculosis  

NASA Astrophysics Data System (ADS)

Tuberculosis (TB) is one of the leading causes of infection in humans, causing high morbility and mortality all over the world. The rate of new cases of multidrug resistant tuberculosis (MDRTB) continues to increase, and since these infections are very difficult to manage, they constitute a serious health problem. In most cases, drug resistance in Mycobacterium tuberculosis has been related to mutations in several loci within the pathogen's genome. The development of fast, cheap and simple screening methodologies would be of paramount relevance for the early detection of these mutations, essential for the timely and effective diagnosis and management of MDRTB patients. The use of gold nanoparticles derivatized with thiol-modified oligonucleotides (Au-nanoprobes) has led to new approaches in molecular diagnostics. Based on the differential non-cross-linking aggregation of Au-nanoprobes, we were able to develop a colorimetric method for the detection of specific sequences and to apply this approach to pathogen identification and single base mutations/single nucleotide polymorphisms (SNP) discrimination. Here we report on the development of Au-nanoprobes for the specific identification of SNPs within the beta subunit of the RNA polymerase (rpoB locus), responsible for resistance to rifampicin in over 95% of rifampicin resistant M. tuberculosis strains.

Veigas, Bruno; Machado, Diana; Perdigão, João; Portugal, Isabel; Couto, Isabel; Viveiros, Miguel; Baptista, Pedro V.

2010-10-01

404

Novel SNPs of the bovine LEPR gene and their association with growth traits.  

PubMed

In this study, polymorphism in the bovine LEPR gene exon 4 was detected by PCR-SSCP and DNA sequencing methods in 653 individuals from five Chinese cattle breeds. Two haplotypes (M and N), three observed genotypes (MM, MN, and NN), and five single nucleotide polymorphisms (SNPs) (NC_007301:g.26767T>C, NC_007301:g.26805C>T, NC_007301:g.27050A>G, NC_007301:g.27063G>A, NC_007301:g.27079G>A) were detected. The frequencies of haplotypes M and N in the five breeds were 0.661-0.747 and 0.253-0.339, respectively. The SNP locus was in Hardy-Weinberg equilibrium in Nanyang, Jiaxian red, Angus, and Jinnan cattle (P > 0.05) and was in Hardy-Weinberg disequilibrium in Qinchuan cattle (P < 0.05). Polymorphism of the LEPR gene was shown to be associated with growth traits in the Nanyang breed. The SNP in the bovine LEPR gene had significant effects on body height, body length, body weight, heart girth, and average daily gain at 6 and 12 months old (P < 0.01 or P < 0.05). Therefore, these results suggest that the LEPR gene is a strong candidate gene that affects growth traits in cattle. PMID:18807168

Guo, Yikun; Chen, Hong; Lan, Xianyong; Zhang, Bao; Pan, Chuanying; Zhang, Liangzhi; Zhang, Cunfang; Zhao, Miao

2008-12-01

405

Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs.  

PubMed

Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis. PMID:24530838

Cao, Hongbao; Duan, Junbo; Lin, Dongdong; Shugart, Yin Yao; Calhoun, Vince; Wang, Yu-Ping

2014-11-15

406

Linear phase compressive filter  

DOEpatents

A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmission line.

McEwan, Thomas E. (Livermore, CA)

1995-01-01

407

Linear phase compressive filter  

DOEpatents

A phase linear filter for soliton suppression is in the form of a laddered series of stages of non-commensurate low pass filters with each low pass filter having a series coupled inductance (L) and a reverse biased, voltage dependent varactor diode, to ground which acts as a variable capacitance (C). L and C values are set to levels which correspond to a linear or conventional phase linear filter. Inductance is mapped directly from that of an equivalent nonlinear transmission line and capacitance is mapped from the linear case using a large signal equivalent of a nonlinear transmiss