Sample records for genome scanning method

  1. An empirical Bayes method for updating inferences in analysis of quantitative trait loci using information from related genome scans.

    PubMed

    Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B

    2006-08-01

    Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.

  2. Navigating the Interface Between Landscape Genetics and Landscape Genomics.

    PubMed

    Storfer, Andrew; Patton, Austin; Fraik, Alexandra K

    2018-01-01

    As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used.

  3. Navigating the Interface Between Landscape Genetics and Landscape Genomics

    PubMed Central

    Storfer, Andrew; Patton, Austin; Fraik, Alexandra K.

    2018-01-01

    As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used. PMID:29593776

  4. Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions.

    PubMed

    Hoban, Sean; Kelley, Joanna L; Lotterhos, Katie E; Antolin, Michael F; Bradburd, Gideon; Lowry, David B; Poss, Mary L; Reed, Laura K; Storfer, Andrew; Whitlock, Michael C

    2016-10-01

    Uncovering the genetic and evolutionary basis of local adaptation is a major focus of evolutionary biology. The recent development of cost-effective methods for obtaining high-quality genome-scale data makes it possible to identify some of the loci responsible for adaptive differences among populations. Two basic approaches for identifying putatively locally adaptive loci have been developed and are broadly used: one that identifies loci with unusually high genetic differentiation among populations (differentiation outlier methods) and one that searches for correlations between local population allele frequencies and local environments (genetic-environment association methods). Here, we review the promises and challenges of these genome scan methods, including correcting for the confounding influence of a species' demographic history, biases caused by missing aspects of the genome, matching scales of environmental data with population structure, and other statistical considerations. In each case, we make suggestions for best practices for maximizing the accuracy and efficiency of genome scans to detect the underlying genetic basis of local adaptation. With attention to their current limitations, genome scan methods can be an important tool in finding the genetic basis of adaptive evolutionary change.

  5. RADseq provides unprecedented insights into molecular ecology and evolutionary genetics: comment on Breaking RAD by Lowry et al. (2016).

    PubMed

    McKinney, Garrett J; Larson, Wesley A; Seeb, Lisa W; Seeb, James E

    2017-05-01

    In their recently corrected manuscript, "Breaking RAD: An evaluation of the utility of restriction site associated DNA sequencing for genome scans of adaptation", Lowry et al. argue that genome scans using RADseq will miss many loci under selection due to a combination of sparse marker density and low levels of linkage disequilibrium in most species. We agree that marker density and levels of LD are important considerations when designing a RADseq study; however, we dispute that RAD-based genome scans are as prone to failure as Lowry et al. suggest. Their arguments ignore the flexible nature of RADseq; the availability of different restriction enzymes and capacity for combining restriction enzymes ensures that a well-designed study should be able to generate enough markers for efficient genome coverage. We further believe that simplifying assumptions about linkage disequilibrium in their simulations are invalid in many species. Finally, it is important to note that the alternative methods proposed by Lowry et al. have limitations equal to or greater than RADseq. The wealth of studies with positive impactful findings that have used RAD genome scans instead supports the argument that properly conducted RAD genome scans are an effective method for gaining insight into ecology and evolution, particularly for non-model organisms and those with large or complex genomes. © 2016 John Wiley & Sons Ltd.

  6. Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

    PubMed

    Manel, S; Perrier, C; Pratlong, M; Abi-Rached, L; Paganini, J; Pontarotti, P; Aurelle, D

    2016-01-01

    Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.

  7. cisprimertool: software to implement a comparative genomics strategy for the development of conserved intron scanning (CIS) markers.

    PubMed

    Jayashree, B; Jagadeesh, V T; Hoisington, D

    2008-05-01

    The availability of complete, annotated genomic sequence information in model organisms is a rich resource that can be extended to understudied orphan crops through comparative genomic approaches. We report here a software tool (cisprimertool) for the identification of conserved intron scanning regions using expressed sequence tag alignments to a completely sequenced model crop genome. The method used is based on earlier studies reporting the assessment of conserved intron scanning primers (called CISP) within relatively conserved exons located near exon-intron boundaries from onion, banana, sorghum and pearl millet alignments with rice. The tool is freely available to academic users at http://www.icrisat.org/gt-bt/CISPTool.htm. © 2007 ICRISAT.

  8. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  9. A note on generalized Genome Scan Meta-Analysis statistics

    PubMed Central

    Koziol, James A; Feng, Anne C

    2005-01-01

    Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930

  10. Ultra-barcoding in cacao (Theobroma spp.; malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA

    USDA-ARS?s Scientific Manuscript database

    High-throughput next-generation sequencing was used to scan the genome and generate reliable sequence of high copy number regions. Using this method, we examined whole plastid genomes as well as nearly 6000 bases of nuclear ribosomal DNA sequences for nine genotypes of Theobroma cacao and an indivi...

  11. Global methylation screening in the Arabidopsis thaliana and Mus musculus genome: applications of virtual image restriction landmark genomic scanning (Vi-RLGS)

    PubMed Central

    Matsuyama, Tomoki; Kimura, Makoto T.; Koike, Kuniaki; Abe, Tomoko; Nakano, Takeshi; Asami, Tadao; Ebisuzaki, Toshikazu; Held, William A.; Yoshida, Shigeo; Nagase, Hiroki

    2003-01-01

    Understanding the role of ‘epigenetic’ changes such as DNA methylation and chromatin remodeling has now become critical in understanding many biological processes. In order to delineate the global methylation pattern in a given genomic DNA, computer software has been developed to create a virtual image of restriction landmark genomic scanning (Vi-RLGS). When using a methylation- sensitive enzyme such as NotI as the restriction landmark, the comparison between real and in silico RLGS profiles of the genome provides a methylation map of genomic NotI sites. A methylation map of the Arabidopsis genome was created that could be confirmed by a methylation-sensitive PCR assay. The method has also been applied to the mouse genome. Although a complete methylation map has not been completed, a region of methylation difference between two tissues has been tested and confirmed by bisulfite sequencing. Vi-RLGS in conjunction with real RLGS will make it possible to develop a more complete map of genomic sites that are methylated or demethylated as a consequence of normal or abnormal development. PMID:12888509

  12. Enhancer scanning to locate regulatory regions in genomic loci

    PubMed Central

    Buckley, Melissa; Gjyshi, Anxhela; Mendoza-Fandiño, Gustavo; Baskin, Rebekah; Carvalho, Renato S.; Carvalho, Marcelo A.; Woods, Nicholas T.; Monteiro, Alvaro N.A.

    2016-01-01

    The present protocol provides a rapid, streamlined and scalable strategy to systematically scan genomic regions for the presence of transcriptional regulatory regions active in a specific cell type. It creates genomic tiles spanning a region of interest that are subsequently cloned by recombination into a luciferase reporter vector containing the Simian Virus 40 promoter. Tiling clones are transfected into specific cell types to test for the presence of transcriptional regulatory regions. The protocol includes testing of different SNP (single nucleotide polymorphism) alleles to determine their effect on regulatory activity. This procedure provides a systematic framework to identify candidate functional SNPs within a locus during functional analysis of genome-wide association studies. This protocol adapts and combines previous well-established molecular biology methods to provide a streamlined strategy, based on automated primer design and recombinational cloning to rapidly go from a genomic locus to a set of candidate functional SNPs in eight weeks. PMID:26658467

  13. Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites

    PubMed Central

    Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming

    2016-01-01

    Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400

  14. Two Quantitative Trait Loci Influence Whipworm (Trichuris trichiura) Infection in a Nepalese Population

    PubMed Central

    Williams-Blangero, Sarah; VandeBerg, John L.; Subedi, Janardan; Jha, Bharat; Dyer, T.D.; Blangero, John

    2014-01-01

    Background Whipworm (Trichuris trichiura) is a soil-transmitted helminth which infects over a billion people. It is a serious public health problem in many developing countries and can result in deficits in growth and cognitive development. In a follow-up study of a significant heritability for whipworm infection, we conducted the first genome scan for susceptibility to this important parasitic disease. Methods We assessed whipworm eggs per gram of feces in 1253 members of the Jirel population of eastern Nepal. All sampled individuals belonged to a single pedigree containing over 26,000 relative pairs that are informative for genetic analysis. Results Linkage analysis of genome scan data generated for the pedigree provided unambiguous evidence for two quantitative trait loci influencing susceptibility to whipworm infection, one located on chromosome 9 (LOD = 3.35, genome-wide p = 0.0138) and the other located on chromosome 18 (LOD = 3.29, genome-wide p = 0.0159). There was also suggestive evidence for two loci located on chromosomes 12 and 13 influencing whipworm infection. Conclusion The results of this first genome scan for susceptibility to whipworm infection may ultimately lead to the identification of novel targets for vaccine and drug development efforts. PMID:18462166

  15. The detection of large deletions or duplications in genomic DNA.

    PubMed

    Armour, J A L; Barton, D E; Cockburn, D J; Taylor, G R

    2002-11-01

    While methods for the detection of point mutations and small insertions or deletions in genomic DNA are well established, the detection of larger (>100 bp) genomic duplications or deletions can be more difficult. Most mutation scanning methods use PCR as a first step, but the subsequent analyses are usually qualitative rather than quantitative. Gene dosage methods based on PCR need to be quantitative (i.e., they should report molar quantities of starting material) or semi-quantitative (i.e., they should report gene dosage relative to an internal standard). Without some sort of quantitation, heterozygous deletions and duplications may be overlooked and therefore be under-ascertained. Gene dosage methods provide the additional benefit of reporting allele drop-out in the PCR. This could impact on SNP surveys, where large-scale genotyping may miss null alleles. Here we review recent developments in techniques for the detection of this type of mutation and compare their relative strengths and weaknesses. We emphasize that comprehensive mutation analysis should include scanning for large insertions and deletions and duplications. Copyright 2002 Wiley-Liss, Inc.

  16. A genome scan for selection signatures comparing farmed Atlantic salmon with two wild populations: Testing colocalization among outlier markers, candidate genes, and quantitative trait loci for production traits.

    PubMed

    Liu, Lei; Ang, Keng Pee; Elliott, J A K; Kent, Matthew Peter; Lien, Sigbjørn; MacDonald, Danielle; Boulding, Elizabeth Grace

    2017-03-01

    Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.

  17. GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

    PubMed

    Issac, Biju; Raghava, G P S

    2002-09-01

    Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.

  18. Scanning the Effects of Ethyl Methanesulfonate on the Whole Genome of Lotus japonicus Using Second-Generation Sequencing Analysis

    PubMed Central

    Mohd-Yusoff, Nur Fatihah; Ruperao, Pradeep; Tomoyoshi, Nurain Emylia; Edwards, David; Gresshoff, Peter M.; Biswas, Bandana; Batley, Jacqueline

    2015-01-01

    Genetic structure can be altered by chemical mutagenesis, which is a common method applied in molecular biology and genetics. Second-generation sequencing provides a platform to reveal base alterations occurring in the whole genome due to mutagenesis. A model legume, Lotus japonicus ecotype Miyakojima, was chemically mutated with alkylating ethyl methanesulfonate (EMS) for the scanning of DNA lesions throughout the genome. Using second-generation sequencing, two individually mutated third-generation progeny (M3, named AM and AS) were sequenced and analyzed to identify single nucleotide polymorphisms and reveal the effects of EMS on nucleotide sequences in these mutant genomes. Single-nucleotide polymorphisms were found in every 208 kb (AS) and 202 kb (AM) with a bias mutation of G/C-to-A/T changes at low percentage. Most mutations were intergenic. The mutation spectrum of the genomes was comparable in their individual chromosomes; however, each mutated genome has unique alterations, which are useful to identify causal mutations for their phenotypic changes. The data obtained demonstrate that whole genomic sequencing is applicable as a high-throughput tool to investigate genomic changes due to mutagenesis. The identification of these single-point mutations will facilitate the identification of phenotypically causative mutations in EMS-mutated germplasm. PMID:25660167

  19. A Genome-Wide Breast Cancer Scan in African Americans

    DTIC Science & Technology

    2010-06-01

    SNPs from the African American breast cancer scan to COGs , a European collaborative study which is has designed a SNP array with that will be genotyped...Award Number: W81XWH-08-1-0383 TITLE: A Genome-wide Breast Cancer Scan in African Americans PRINCIPAL INVESTIGATOR: Christopher A...SUBTITLE A Genome-wide Breast Cancer Scan in African Americans 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-08-1-0383 5c. PROGRAM

  20. Ten Years of Landscape Genomics: Challenges and Opportunities.

    PubMed

    Li, Yong; Zhang, Xue-Xia; Mao, Run-Li; Yang, Jie; Miao, Cai-Yun; Li, Zhuo; Qiu, Ying-Xiong

    2017-01-01

    Landscape genomics is a relatively new discipline that aims to reveal the relationship between adaptive genetic imprints in genomes and environmental heterogeneity among natural populations. Although the interest in landscape genomics has increased since this term was coined, studies on this topic remain scarce. Landscape genomics has become a powerful method to scan and determine the genes responsible for the complex adaptive evolution of species at population (mostly) and individual (more rarely) level. This review outlines the sampling strategies, molecular marker types and research categories in 37 articles published during the first 10 years of this field (i.e., 2007-2016). We also address major challenges and future directions for landscape genomics. This review aims to promote interest in conducting additional studies in landscape genomics.

  1. Beyond an AFLP genome scan towards the identification of immune genes involved in plague resistance in Rattus rattus from Madagascar.

    PubMed

    Tollenaere, C; Jacquet, S; Ivanova, S; Loiseau, A; Duplantier, J-M; Streiff, R; Brouat, C

    2013-01-01

    Genome scans using amplified fragment length polymorphism (AFLP) markers became popular in nonmodel species within the last 10 years, but few studies have tried to characterize the anonymous outliers identified. This study follows on from an AFLP genome scan in the black rat (Rattus rattus), the reservoir of plague (Yersinia pestis infection) in Madagascar. We successfully sequenced 17 of the 22 markers previously shown to be potentially affected by plague-mediated selection and associated with a plague resistance phenotype. Searching these sequences in the genome of the closely related species Rattus norvegicus assigned them to 14 genomic regions, revealing a random distribution of outliers in the genome (no clustering). We compared these results with those of an in silico AFLP study of the R. norvegicus genome, which showed that outlier sequences could not have been inferred by this method in R. rattus (only four of the 15 sequences were predicted). However, in silico analysis allowed the prediction of AFLP markers distribution and the estimation of homoplasy rates, confirming its potential utility for designing AFLP studies in nonmodel species. The 14 genomic regions surrounding AFLP outliers (less than 300 kb from the marker) contained 75 genes encoding proteins of known function, including nine involved in immune function and pathogen defence. We identified the two interleukin 1 genes (Il1a and Il1b) that share homology with an antigen of Y. pestis, as the best candidates for genes subject to plague-mediated natural selection. At least six other genes known to be involved in proinflammatory pathways may also be affected by plague-mediated selection. © 2012 Blackwell Publishing Ltd.

  2. Population structure of eleven Spanish ovine breeds and detection of selective sweeps with BayeScan and hapFLK

    PubMed Central

    Manunza, A.; Cardoso, T. F.; Noce, A.; Martínez, A.; Pons, A.; Bermejo, L. A.; Landi, V.; Sànchez, A.; Jordana, J.; Delgado, J. V.; Adán, S.; Capote, J.; Vidal, O.; Ugarte, E.; Arranz, J. J.; Calvo, J. H.; Casellas, J.; Amills, M.

    2016-01-01

    The goals of the current work were to analyse the population structure of 11 Spanish ovine breeds and to detect genomic regions that may have been targeted by selection. A total of 141 individuals were genotyped with the Infinium 50 K Ovine SNP BeadChip (Illumina). We combined this dataset with Spanish ovine data previously reported by the International Sheep Genomics Consortium (N = 229). Multidimensional scaling and Admixture analyses revealed that Canaria de Pelo and, to a lesser extent, Roja Mallorquina, Latxa and Churra are clearly differentiated populations, while the remaining seven breeds (Ojalada, Castellana, Gallega, Xisqueta, Ripollesa, Rasa Aragonesa and Segureña) share a similar genetic background. Performance of a genome scan with BayeScan and hapFLK allowed us identifying three genomic regions that are consistently detected with both methods i.e. Oar3 (150–154 Mb), Oar6 (4–49 Mb) and Oar13 (68–74 Mb). Neighbor-joining trees based on polymorphisms mapping to these three selective sweeps did not show a clustering of breeds according to their predominant productive specialization (except the local tree based on Oar13 SNPs). Such cryptic signatures of selection have been also found in the bovine genome, posing a considerable challenge to understand the biological consequences of artificial selection. PMID:27272025

  3. Population structure of eleven Spanish ovine breeds and detection of selective sweeps with BayeScan and hapFLK.

    PubMed

    Manunza, A; Cardoso, T F; Noce, A; Martínez, A; Pons, A; Bermejo, L A; Landi, V; Sànchez, A; Jordana, J; Delgado, J V; Adán, S; Capote, J; Vidal, O; Ugarte, E; Arranz, J J; Calvo, J H; Casellas, J; Amills, M

    2016-06-07

    The goals of the current work were to analyse the population structure of 11 Spanish ovine breeds and to detect genomic regions that may have been targeted by selection. A total of 141 individuals were genotyped with the Infinium 50 K Ovine SNP BeadChip (Illumina). We combined this dataset with Spanish ovine data previously reported by the International Sheep Genomics Consortium (N = 229). Multidimensional scaling and Admixture analyses revealed that Canaria de Pelo and, to a lesser extent, Roja Mallorquina, Latxa and Churra are clearly differentiated populations, while the remaining seven breeds (Ojalada, Castellana, Gallega, Xisqueta, Ripollesa, Rasa Aragonesa and Segureña) share a similar genetic background. Performance of a genome scan with BayeScan and hapFLK allowed us identifying three genomic regions that are consistently detected with both methods i.e. Oar3 (150-154 Mb), Oar6 (4-49 Mb) and Oar13 (68-74 Mb). Neighbor-joining trees based on polymorphisms mapping to these three selective sweeps did not show a clustering of breeds according to their predominant productive specialization (except the local tree based on Oar13 SNPs). Such cryptic signatures of selection have been also found in the bovine genome, posing a considerable challenge to understand the biological consequences of artificial selection.

  4. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    PubMed

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient.

    PubMed

    Nunes, Vera L; Beaumont, Mark A; Butlin, Roger K; Paulo, Octávio S

    2011-01-01

    Identification of loci with adaptive importance is a key step to understand the speciation process in natural populations, because those loci are responsible for phenotypic variation that affects fitness in different environments. We conducted an AFLP genome scan in populations of ocellated lizards (Lacerta lepida) to search for candidate loci influenced by selection along an environmental gradient in the Iberian Peninsula. This gradient is strongly influenced by climatic variables, and two subspecies can be recognized at the opposite extremes: L. lepida iberica in the northwest and L. lepida nevadensis in the southeast. Both subspecies show substantial morphological differences that may be involved in their local adaptation to the climatic extremes. To investigate how the use of a particular outlier detection method can influence the results, a frequentist method, DFDIST, and a Bayesian method, BayeScan, were used to search for outliers influenced by selection. Additionally, the spatial analysis method was used to test for associations of AFLP marker band frequencies with 54 climatic variables by logistic regression. Results obtained with each method highlight differences in their sensitivity. DFDIST and BayeScan detected a similar proportion of outliers (3-4%), but only a few loci were simultaneously detected by both methods. Several loci detected as outliers were also associated with temperature, insolation or precipitation according to spatial analysis method. These results are in accordance with reported data in the literature about morphological and life-history variation of L. lepida subspecies along the environmental gradient. © 2010 Blackwell Publishing Ltd.

  6. Genome-wide scans of genetic variants for psychophysiological endophenotypes: a methodological overview.

    PubMed

    Iacono, William G; Malone, Stephen M; Vaidyanathan, Uma; Vrieze, Scott I

    2014-12-01

    This article provides an introductory overview of the investigative strategy employed to evaluate the genetic basis of 17 endophenotypes examined as part of a 20-year data collection effort from the Minnesota Center for Twin and Family Research. Included are characterization of the study samples, descriptive statistics for key properties of the psychophysiological measures, and rationale behind the steps taken in the molecular genetic study design. The statistical approach included (a) biometric analysis of twin and family data, (b) heritability analysis using 527,829 single nucleotide polymorphisms (SNPs), (c) genome-wide association analysis of these SNPs and 17,601 autosomal genes, (d) follow-up analyses of candidate SNPs and genes hypothesized to have an association with each endophenotype, (e) rare variant analysis of nonsynonymous SNPs in the exome, and (f) whole genome sequencing association analysis using 27 million genetic variants. These methods were used in the accompanying empirical articles comprising this special issue, Genome-Wide Scans of Genetic Variants for Psychophysiological Endophenotypes. Copyright © 2014 Society for Psychophysiological Research.

  7. Automated Processing of 2-D Gel Electrophoretograms of Genomic DNA for Hunting Pathogenic DNA Molecular Changes.

    PubMed

    Takahashi; Nakazawa; Watanabe; Konagaya

    1999-01-01

    We have developed the automated processing algorithms for 2-dimensional (2-D) electrophoretograms of genomic DNA based on RLGS (Restriction Landmark Genomic Scanning) method, which scans the restriction enzyme recognition sites as the landmark and maps them onto a 2-D electrophoresis gel. Our powerful processing algorithms realize the automated spot recognition from RLGS electrophoretograms and the automated comparison of a huge number of such images. In the final stage of the automated processing, a master spot pattern, on which all the spots in the RLGS images are mapped at once, can be obtained. The spot pattern variations which seemed to be specific to the pathogenic DNA molecular changes can be easily detected by simply looking over the master spot pattern. When we applied our algorithms to the analysis of 33 RLGS images derived from human colon tissues, we successfully detected several colon tumor specific spot pattern changes.

  8. Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.

    PubMed

    McCallum, Kenneth J; Ionita-Laza, Iuliana

    2015-12-01

    Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.

  9. Personal genomics and individual identities: motivations and moral imperatives of early users

    PubMed Central

    McGowan, Michelle L.; Fishman, Jennifer R.; Lambrix, Marcie A.

    2010-01-01

    Since 2007, consumer genomics companies have marketed personal genome scanning services to assess users’ genetic predispositions to a variety of complex diseases and traits. This study investigates early users’ reasons for utilizing personal genome services, their evaluation of the technology, how they interpret the results, and how they incorporate the results into health-related decision-making. The analysis contextualizes early users’ relationships to the technology, the knowledge generated by it, and how it mediates their relationship to their own health and to biomedicine more broadly. The results reveal that early users approach personal genome scanning with both optimism for genomic research and scepticism about the technology’s current capabilities, which runs contrary to concerns that consumers may be ill equipped to interpret and understand genome scan results. These findings provide important qualitative insight into early users’ conceptualizations of personal genomic risk assessment and illuminate their involvement in configuring this technology in the making. PMID:21076647

  10. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  11. Identification of polymorphic inversions from genotypes

    PubMed Central

    2012-01-01

    Background Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. Results We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data [1], utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS). Conclusions We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model [2]. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals [3,4]. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion. PMID:22321652

  12. Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome

    USGS Publications Warehouse

    Limborg, Morten T.; Larson, Wesley; Seeb, Lisa W.; Seeb, James E.

    2017-01-01

    A whole-genome duplication (WGD) doubles the entire genomic content of a species and is thought to have catalysed adaptive radiation in some polyploid-origin lineages. However, little is known about general consequences of a WGD because gene duplicates (i.e., paralogs) are commonly filtered in genomic studies; such filtering may remove substantial portions of the genome in data sets from polyploid-origin species. We demonstrate a new method that enables genome-wide scans for signatures of selection at both nonduplicated and duplicated loci by taking locus-specific copy number into account. We apply this method to RAD sequence data from different ecotypes of a polyploid-origin salmonid (Oncorhynchus nerka) and reveal signatures of divergent selection that would have been missed if duplicated loci were filtered. We also find conserved signatures of elevated divergence at pairs of homeologous chromosomes with residual tetrasomic inheritance, suggesting that joint evolution of some nondiverged gene duplicates may affect the adaptive potential of these genes. These findings illustrate that including duplicated loci in genomic analyses enables novel insights into the evolutionary consequences of WGDs and local segmental gene duplications.

  13. High-throughput physical mapping of chromosomes using automated in situ hybridization.

    PubMed

    George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V

    2012-06-28

    Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.

  14. SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets.

    PubMed

    Mao, Hongliang; Wang, Hao

    2017-03-01

    Short Interspersed Nuclear Elements (SINEs) are transposable elements (TEs) that amplify through a copy-and-paste mode via RNA intermediates. The computational identification of new SINEs are challenging because of their weak structural signals and rapid diversification in sequences. Here we report SINE_Scan, a highly efficient program to predict SINE elements in genomic DNA sequences. SINE_Scan integrates hallmark of SINE transposition, copy number and structural signals to identify a SINE element. SINE_Scan outperforms the previously published de novo SINE discovery program. It shows high sensitivity and specificity in 19 plant and animal genome assemblies, of which sizes vary from 120 Mb to 3.5 Gb. It identifies numerous new families and substantially increases the estimation of the abundance of SINEs in these genomes. The code of SINE_Scan is freely available at http://github.com/maohlzj/SINE_Scan , implemented in PERL and supported on Linux. wangh8@fudan.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  15. SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets

    PubMed Central

    Mao, Hongliang

    2017-01-01

    Abstract Motivation: Short Interspersed Nuclear Elements (SINEs) are transposable elements (TEs) that amplify through a copy-and-paste mode via RNA intermediates. The computational identification of new SINEs are challenging because of their weak structural signals and rapid diversification in sequences. Results: Here we report SINE_Scan, a highly efficient program to predict SINE elements in genomic DNA sequences. SINE_Scan integrates hallmark of SINE transposition, copy number and structural signals to identify a SINE element. SINE_Scan outperforms the previously published de novo SINE discovery program. It shows high sensitivity and specificity in 19 plant and animal genome assemblies, of which sizes vary from 120 Mb to 3.5 Gb. It identifies numerous new families and substantially increases the estimation of the abundance of SINEs in these genomes. Availability and Implementation: The code of SINE_Scan is freely available at http://github.com/maohlzj/SINE_Scan, implemented in PERL and supported on Linux. Contact: wangh8@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28062442

  16. Length polymorphism scanning is an efficient approach for revealing chloroplast DNA variation.

    Treesearch

    Matthew E. Horning; Richard C. Cronn

    2006-01-01

    Phylogeographic and population genetic screens of chloroplast DNA (cpDNA) provide insights into seedbased gene flow in angiosperms, yet studies are frequently hampered by the low mutation rate of this genome. Detection methods for intraspecific variation can be either direct (DNA sequencing) or indirect (PCR-RFLP), although no single method incorporates the best...

  17. A High Resolution Genome-Wide Scan for Significant Selective Sweeps: An Application to Pooled Sequence Data in Laying Chickens

    PubMed Central

    Qanbari, Saber; Strom, Tim M.; Haberer, Georg; Weigend, Steffen; Gheyas, Almas A.; Turner, Frances; Burt, David W.; Preisinger, Rudolf; Gianola, Daniel; Simianer, Henner

    2012-01-01

    In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes. PMID:23209582

  18. Measurement and genetics of human subcortical and hippocampal asymmetries in large datasets.

    PubMed

    Guadalupe, Tulio; Zwiers, Marcel P; Teumer, Alexander; Wittfeld, Katharina; Vasquez, Alejandro Arias; Hoogman, Martine; Hagoort, Peter; Fernandez, Guillen; Buitelaar, Jan; Hegenscheid, Katrin; Völzke, Henry; Franke, Barbara; Fisher, Simon E; Grabe, Hans J; Francks, Clyde

    2014-07-01

    Functional and anatomical asymmetries are prevalent features of the human brain, linked to gender, handedness, and cognition. However, little is known about the neurodevelopmental processes involved. In zebrafish, asymmetries arise in the diencephalon before extending within the central nervous system. We aimed to identify genes involved in the development of subtle, left-right volumetric asymmetries of human subcortical structures using large datasets. We first tested the feasibility of measuring left-right volume differences in such large-scale samples, as assessed by two automated methods of subcortical segmentation (FSL|FIRST and FreeSurfer), using data from 235 subjects who had undergone MRI twice. We tested the agreement between the first and second scan, and the agreement between the segmentation methods, for measures of bilateral volumes of six subcortical structures and the hippocampus, and their volumetric asymmetries. We also tested whether there were biases introduced by left-right differences in the regional atlases used by the methods, by analyzing left-right flipped images. While many bilateral volumes were measured well (scan-rescan r = 0.6-0.8), most asymmetries, with the exception of the caudate nucleus, showed lower repeatabilites. We meta-analyzed genome-wide association scan results for caudate nucleus asymmetry in a combined sample of 3,028 adult subjects but did not detect associations at genome-wide significance (P < 5 × 10(-8) ). There was no enrichment of genetic association in genes involved in left-right patterning of the viscera. Our results provide important information for researchers who are currently aiming to carry out large-scale genome-wide studies of subcortical and hippocampal volumes, and their asymmetries. Copyright © 2013 Wiley Periodicals, Inc.

  19. Meta-Analysis of Genome-Wide Scans Provides Evidence for Sex- and Site-Specific Regulation of Bone Mass

    PubMed Central

    Sham, Pak C; Zintzaras, Elias; Lewis, Cathryn M; Deng, Hong-Wen; Econs, Michael J; Karasik, David; Devoto, Marcella; Kammerer, Candace M; Spector, Tim; Andrew, Toby; Cupples, L Adrienne; Duncan, Emma L; Foroud, Tatiana; Kiel, Douglas P; Koller, Daniel; Langdahl, Bente; Mitchell, Braxton D; Peacock, Munro; Recker, Robert; Shen, Hui; Sol-Church, Katia; Spotila, Loretta D; Uitterlinden, Andre G; Wilson, Scott G; Kung, Annie WC; Ralston, Stuart H

    2014-01-01

    Several genome-wide scans have been performed to detect loci that regulate BMD, but these have yielded inconsistent results, with limited replication of linkage peaks in different studies. In an effort to improve statistical power for detection of these loci, we performed a meta-analysis of genome-wide scans in which spine or hip BMD were studied. Evidence was gained to suggest that several chromosomal loci regulate BMD in a site-specific and sex-specific manner. Introduction BMD is a heritable trait and an important predictor of osteoporotic fracture risk. Several genome-wide scans have been performed in an attempt to detect loci that regulate BMD, but there has been limited replication of linkage peaks between studies. In an attempt to resolve these inconsistencies, we conducted a collaborative meta-analysis of genome-wide linkage scans in which femoral neck BMD (FN-BMD) or lumbar spine BMD (LS-BMD) had been studied. Materials and Methods Data were accumulated from nine genome-wide scans involving 11,842 subjects. Data were analyzed separately for LS-BMD and FN-BMD and by sex. For each study, genomic bins of 30 cM were defined and ranked according to the maximum LOD score they contained. While various densitometers were used in different studies, the ranking approach that we used means that the results are not confounded by the fact that different measurement devices were used. Significance for high average rank and heterogeneity was obtained through Monte Carlo testing. Results For LS-BMD, the quantitative trait locus (QTL) with greatest significance was on chromosome 1p13.3-q23.3 (p = 0.004), but this exhibited high heterogeneity and the effect was specific for women. Other significant LS-BMD QTLs were on chromosomes 12q24.31-qter, 3p25.3-p22.1, 11p12-q13.3, and 1q32-q42.3, including one on 18p11-q12.3 that had not been detected by individual studies. For FN-BMD, the strongest QTL was on chromosome 9q31.1-q33.3 (p = 0.002). Other significant QTLs were identified on chromosomes 17p12-q21.33, 14q13.1-q24.1, 9q21.32-q31.1, and 5q14.3-q23.2. There was no correlation in average ranks of bins between men and women and the loci that regulated BMD in men and women and at different sites were largely distinct. Conclusions This large-scale meta-analysis provided evidence for replication of several QTLs identified in previous studies and also identified a QTL on chromosome 18p11-q12.3, which had not been detected by individual studies. However, despite the large sample size, none of the individual loci identified reached genome-wide significance. PMID:17228994

  20. Evaluation of redundancy analysis to identify signatures of local adaptation.

    PubMed

    Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric

    2018-05-26

    Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  1. Recovery of known T-cell epitopes by computational scanning of a viral genome

    NASA Astrophysics Data System (ADS)

    Logean, Antoine; Rognan, Didier

    2002-04-01

    A new computational method (EpiDock) is proposed for predicting peptide binding to class I MHC proteins, from the amino acid sequence of any protein of immunological interest. Starting from the primary structure of the target protein, individual three-dimensional structures of all possible MHC-peptide (8-, 9- and 10-mers) complexes are obtained by homology modelling. A free energy scoring function (Fresno) is then used to predict the absolute binding free energy of all possible peptides to the class I MHC restriction protein. Assuming that immunodominant epitopes are usually found among the top MHC binders, the method can thus be applied to predict the location of immunogenic peptides on the sequence of the protein target. When applied to the prediction of HLA-A*0201-restricted T-cell epitopes from the Hepatitis B virus, EpiDock was able to recover 92% of known high affinity binders and 80% of known epitopes within a filtered subset of all possible nonapeptides corresponding to about one tenth of the full theoretical list. The proposed method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture. As it requires very few starting experimental data, EpiDock can be used: (i) to predict potential T-cell epitopes from viral genomes (ii) to roughly predict still unknown peptide binding motifs for novel class I MHC alleles.

  2. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    PubMed

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  3. The truth about mouse, human, worms and yeast

    PubMed Central

    2004-01-01

    Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years. PMID:15601543

  4. The truth about mouse, human, worms and yeast.

    PubMed

    Nelson, David R; Nebert, Daniel W

    2004-01-01

    Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years.

  5. QTL Mapping and CRISPR/Cas9 Editing to Identify a Drug Resistance Gene in Toxoplasma gondii.

    PubMed

    Shen, Bang; Powell, Robin H; Behnke, Michael S

    2017-06-22

    Scientific knowledge is intrinsically linked to available technologies and methods. This article will present two methods that allowed for the identification and verification of a drug resistance gene in the Apicomplexan parasite Toxoplasma gondii, the method of Quantitative Trait Locus (QTL) mapping using a Whole Genome Sequence (WGS) -based genetic map and the method of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 -based gene editing. The approach of QTL mapping allows one to test if there is a correlation between a genomic region(s) and a phenotype. Two datasets are required to run a QTL scan, a genetic map based on the progeny of a recombinant cross and a quantifiable phenotype assessed in each of the progeny of that cross. These datasets are then formatted to be compatible with R/qtl software that generates a QTL scan to identify significant loci correlated with the phenotype. Although this can greatly narrow the search window of possible candidates, QTLs span regions containing a number of genes from which the causal gene needs to be identified. Having WGS of the progeny was critical to identify the causal drug resistance mutation at the gene level. Once identified, the candidate mutation can be verified by genetic manipulation of drug sensitive parasites. The most facile and efficient method to genetically modify T. gondii is the CRISPR/Cas9 system. This system comprised of just 2 components both encoded on a single plasmid, a single guide RNA (gRNA) containing a 20 bp sequence complementary to the genomic target and the Cas9 endonuclease that generates a double-strand DNA break (DSB) at the target, repair of which allows for insertion or deletion of sequences around the break site. This article provides detailed protocols to use CRISPR/Cas9 based genome editing tools to verify the gene responsible for sinefungin resistance and to construct transgenic parasites.

  6. QTL Mapping and CRISPR/Cas9 Editing to Identify a Drug Resistance Gene in Toxoplasma gondii

    PubMed Central

    Shen, Bang; Powell, Robin H.; Behnke, Michael S.

    2017-01-01

    Scientific knowledge is intrinsically linked to available technologies and methods. This article will present two methods that allowed for the identification and verification of a drug resistance gene in the Apicomplexan parasite Toxoplasma gondii, the method of Quantitative Trait Locus (QTL) mapping using a Whole Genome Sequence (WGS) -based genetic map and the method of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 -based gene editing. The approach of QTL mapping allows one to test if there is a correlation between a genomic region(s) and a phenotype. Two datasets are required to run a QTL scan, a genetic map based on the progeny of a recombinant cross and a quantifiable phenotype assessed in each of the progeny of that cross. These datasets are then formatted to be compatible with R/qtl software that generates a QTL scan to identify significant loci correlated with the phenotype. Although this can greatly narrow the search window of possible candidates, QTLs span regions containing a number of genes from which the causal gene needs to be identified. Having WGS of the progeny was critical to identify the causal drug resistance mutation at the gene level. Once identified, the candidate mutation can be verified by genetic manipulation of drug sensitive parasites. The most facile and efficient method to genetically modify T. gondii is the CRISPR/Cas9 system. This system comprised of just 2 components both encoded on a single plasmid, a single guide RNA (gRNA) containing a 20 bp sequence complementary to the genomic target and the Cas9 endonuclease that generates a double-strand DNA break (DSB) at the target, repair of which allows for insertion or deletion of sequences around the break site. This article provides detailed protocols to use CRISPR/Cas9 based genome editing tools to verify the gene responsible for sinefungin resistance and to construct transgenic parasites. PMID:28671645

  7. ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.

    PubMed

    Yang, Rendong; Nelson, Andrew C; Henzler, Christine; Thyagarajan, Bharat; Silverstein, Kevin A T

    2015-12-07

    Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.

  8. Using Pool-seq to Search for Genomic Regions Affected by Hybrid Inviability in the copepod T. californicus.

    PubMed

    Lima, Thiago G; Willett, Christopher S

    2018-05-11

    The formation of reproductive barriers between allopatric populations involves the accumulation of incompatibilities that lead to intrinsic postzygotic isolation. The evolution of these incompatibilities is usually explained by the Dobzhansky-Muller model, where epistatic interactions that arise within the diverging populations, lead to deleterious interactions when they come together in a hybrid genome. These incompatibilities can lead to hybrid inviability, killing individuals with certain genotypic combinations, and causing the population's allele frequency to deviate from Mendelian expectations. Traditionally, hybrid inviability loci have been detected by genotyping individuals at different loci across the genome. However, this method becomes time consuming and expensive as the number of markers or individuals increases. Here, we test if a Pool-seq method can be used to scan the genome of F2 hybrids to detect genomic regions that are affected by hybrid inviability. We survey the genome of hybrids between 2 populations of the copepod Tigriopus californicus, and show that this method has enough power to detect even small changes in allele frequency caused by hybrid inviability. We show that allele frequency estimates in Pool-seq can be affected by the sampling of alleles from the pool of DNA during the library preparation and sequencing steps and that special considerations must be taken when aligning hybrid reads to a reference when the populations/species are divergent.

  9. SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes.

    PubMed

    Jaron, Kamil S; Moravec, Jiří C; Martínková, Natália

    2014-04-15

    Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. 376090@mail.muni.cz or martinkova@ivb.cz. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Signatures of selection in the Iberian honey bee (Apis mellifera iberiensis) revealed by a genome scan analysis of single nucleotide polymorphisms.

    PubMed

    Chávez-Galarza, Julio; Henriques, Dora; Johnston, J Spencer; Azevedo, João C; Patton, John C; Muñoz, Irene; De la Rúa, Pilar; Pinto, M Alice

    2013-12-01

    Understanding the genetic mechanisms of adaptive population divergence is one of the most fundamental endeavours in evolutionary biology and is becoming increasingly important as it will allow predictions about how organisms will respond to global environmental crisis. This is particularly important for the honey bee, a species of unquestionable ecological and economical importance that has been exposed to increasing human-mediated selection pressures. Here, we conducted a single nucleotide polymorphism (SNP)-based genome scan in honey bees collected across an environmental gradient in Iberia and used four FST -based outlier tests to identify genomic regions exhibiting signatures of selection. Additionally, we analysed associations between genetic and environmental data for the identification of factors that might be correlated or act as selective pressures. With these approaches, 4.4% (17 of 383) of outlier loci were cross-validated by four FST -based methods, and 8.9% (34 of 383) were cross-validated by at least three methods. Of the 34 outliers, 15 were found to be strongly associated with one or more environmental variables. Further support for selection, provided by functional genomic information, was particularly compelling for SNP outliers mapped to different genes putatively involved in the same function such as vision, xenobiotic detoxification and innate immune response. This study enabled a more rigorous consideration of selection as the underlying cause of diversity patterns in Iberian honey bees, representing an important first step towards the identification of polymorphisms implicated in local adaptation and possibly in response to recent human-mediated environmental changes. © 2013 John Wiley & Sons Ltd.

  11. Position-based scanning for comparative genomics and identification of genetic islands in Haemophilus influenzae type b.

    PubMed

    Bergman, Nicholas H; Akerley, Brian J

    2003-03-01

    Bacteria exhibit extensive genetic heterogeneity within species. In many cases, these differences account for virulence properties unique to specific strains. Several such loci have been discovered in the genome of the type b serotype of Haemophilus influenzae, a human pathogen able to cause meningitis, pneumonia, and septicemia. Here we report application of a PCR-based scanning procedure to compare the genome of a virulent type b (Hib) strain with that of the laboratory-passaged Rd KW20 strain for which a complete genome sequence is available. We have identified seven DNA segments or H. influenzae genetic islands (HiGIs) present in the type b genome and absent from the Rd genome. These segments vary in size and content and show signs of horizontal gene transfer in that their percent G+C content differs from that of the rest of the H. influenzae genome, they contain genes similar to those found on phages or other mobile elements, or they are flanked by DNA repeats. Several of these loci represent potential pathogenicity islands, because they contain genes likely to mediate interactions with the host. These newly identified genetic islands provide areas of investigation into both the evolution and pathogenesis of H. influenzae. In addition, the genome scanning approach developed to identify these islands provides a rapid means to compare the genomes of phenotypically diverse bacterial strains once the genome sequence of one representative strain has been determined.

  12. Identification of endometrial cancer methylation features using combined methylation analysis methods

    PubMed Central

    Trimarchi, Michael P.; Yan, Pearlly; Groden, Joanna; Bundschuh, Ralf; Goodfellow, Paul J.

    2017-01-01

    Background DNA methylation is a stable epigenetic mark that is frequently altered in tumors. DNA methylation features are attractive biomarkers for disease states given the stability of DNA methylation in living cells and in biologic specimens typically available for analysis. Widespread accumulation of methylation in regulatory elements in some cancers (specifically the CpG island methylator phenotype, CIMP) can play an important role in tumorigenesis. High resolution assessment of CIMP for the entire genome, however, remains cost prohibitive and requires quantities of DNA not available for many tissue samples of interest. Genome-wide scans of methylation have been undertaken for large numbers of tumors, and higher resolution analyses for a limited number of cancer specimens. Methods for analyzing such large datasets and integrating findings from different studies continue to evolve. An approach for comparison of findings from a genome-wide assessment of the methylated component of tumor DNA and more widely applied methylation scans was developed. Methods Methylomes for 76 primary endometrial cancer and 12 normal endometrial samples were generated using methylated fragment capture and second generation sequencing, MethylCap-seq. Publically available Infinium HumanMethylation 450 data from The Cancer Genome Atlas (TCGA) were compared to MethylCap-seq data. Results Analysis of methylation in promoter CpG islands (CGIs) identified a subset of tumors with a methylator phenotype. We used a two-stage approach to develop a 13-region methylation signature associated with a “hypermethylator state.” High level methylation for the 13-region methylation signatures was associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration in the TCGA test set. In addition, the signature devised showed good agreement with previously described methylation clusters devised by TCGA. Conclusion We identified a methylation signature for a “hypermethylator phenotype” in endometrial cancer and developed methods that may prove useful for identifying extreme methylation phenotypes in other cancers. PMID:28278225

  13. NMR-based investigations into target DNA search processes of proteins.

    PubMed

    Iwahara, Junji; Zandarashvili, Levani; Kemme, Catherine A; Esadze, Alexandre

    2018-05-10

    To perform their function, transcription factors and DNA-repair/modifying enzymes must first locate their targets in the vast presence of nonspecific, but structurally similar sites on genomic DNA. Before reaching their targets, these proteins stochastically scan DNA and dynamically move from one site to another on DNA. Solution NMR spectroscopy provides unique atomic-level insights into the dynamic DNA-scanning processes, which are difficult to gain by any other experimental means. In this review, we provide an introductory overview on the NMR methods for the structural, dynamic, and kinetic investigations of target DNA search by proteins. We also discuss advantages and disadvantages of these NMR methods over other methods such as single-molecule techniques and biochemical approaches. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Detecting the Population Structure and Scanning for Signatures of Selection in Horses (Equus caballus) From Whole-Genome Sequencing Data

    PubMed Central

    Zhang, Cheng; Ni, Pan; Ahmad, Hafiz Ishfaq; Gemingguli, M; Baizilaitibei, A; Gulibaheti, D; Fang, Yaping; Wang, Haiyang; Asif, Akhtar Rasool; Xiao, Changyi; Chen, Jianhai; Ma, Yunlong; Liu, Xiangdong; Du, Xiaoyong; Zhao, Shuhong

    2018-01-01

    Animal domestication gives rise to gradual changes at the genomic level through selection in populations. Selective sweeps have been traced in the genomes of many animal species, including humans, cattle, and dogs. However, little is known regarding positional candidate genes and genomic regions that exhibit signatures of selection in domestic horses. In addition, an understanding of the genetic processes underlying horse domestication, especially the origin of Chinese native populations, is still lacking. In our study, we generated whole genome sequences from 4 Chinese native horses and combined them with 48 publicly available full genome sequences, from which 15 341 213 high-quality unique single-nucleotide polymorphism variants were identified. Kazakh and Lichuan horses are 2 typical Asian native breeds that were formed in Kazakh or Northwest China and South China, respectively. We detected 1390 loss-of-function (LoF) variants in protein-coding genes, and gene ontology (GO) enrichment analysis revealed that some LoF-affected genes were overrepresented in GO terms related to the immune response. Bayesian clustering, distance analysis, and principal component analysis demonstrated that the population structure of these breeds largely reflected weak geographic patterns. Kazakh and Lichuan horses were assigned to the same lineage with other Asian native breeds, in agreement with previous studies on the genetic origin of Chinese domestic horses. We applied the composite likelihood ratio method to scan for genomic regions showing signals of recent selection in the horse genome. A total of 1052 genomic windows of 10 kB, corresponding to 933 distinct core regions, significantly exceeded neutral simulations. The GO enrichment analysis revealed that the genes under selective sweeps were overrepresented with GO terms, including “negative regulation of canonical Wnt signaling pathway,” “muscle contraction,” and “axon guidance.” Frequent exercise training in domestic horses may have resulted in changes in the expression of genes related to metabolism, muscle structure, and the nervous system.

  15. Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce

    PubMed Central

    Namroud, Marie-Claire; Beaulieu, Jean; Juge, Nicolas; Laroche, Jérôme; Bousquet, Jean

    2008-01-01

    Conifers are characterized by a large genome size and a rapid decay of linkage disequilibrium, most often within gene limits. Genome scans based on noncoding markers are less likely to detect molecular adaptation linked to genes in these species. In this study, we assessed the effectiveness of a genome-wide single nucleotide polymorphism (SNP) scan focused on expressed genes in detecting local adaptation in a conifer species. Samples were collected from six natural populations of white spruce (Picea glauca) moderately differentiated for several quantitative characters. A total of 534 SNPs representing 345 expressed genes were analysed. Genes potentially under natural selection were identified by estimating the differentiation in SNP frequencies among populations (FST) and identifying outliers, and by estimating local differentiation using a Bayesian approach. Both average expected heterozygosity and population differentiation estimates (HE = 0.270 and FST = 0.006) were comparable to those obtained with other genetic markers. Of all genes, 5.5% were identified as outliers with FST at the 95% confidence level, while 14% were identified as candidates for local adaptation with the Bayesian method. There was some overlap between the two gene sets. More than half of the candidate genes for local adaptation were specific to the warmest population, about 20% to the most arid population, and 15% to the coldest and most humid higher altitude population. These adaptive trends were consistent with the genes’ putative functions and the divergence in quantitative traits noted among the populations. The results suggest that an approach separating the locus and population effects is useful to identify genes potentially under selection. These candidates are worth exploring in more details at the physiological and ecological levels. PMID:18662225

  16. Refining genome-wide linkage intervals using a meta-analysis of genome-wide association studies identifies loci influencing personality dimensions

    PubMed Central

    Amin, Najaf; Hottenga, Jouke-Jan; Hansell, Narelle K; Janssens, A Cecile JW; de Moor, Marleen HM; Madden, Pamela AF; Zorkoltseva, Irina V; Penninx, Brenda W; Terracciano, Antonio; Uda, Manuela; Tanaka, Toshiko; Esko, Tonu; Realo, Anu; Ferrucci, Luigi; Luciano, Michelle; Davies, Gail; Metspalu, Andres; Abecasis, Goncalo R; Deary, Ian J; Raikkonen, Katri; Bierut, Laura J; Costa, Paul T; Saviouk, Viatcheslav; Zhu, Gu; Kirichenko, Anatoly V; Isaacs, Aaron; Aulchenko, Yurii S; Willemsen, Gonneke; Heath, Andrew C; Pergadia, Michele L; Medland, Sarah E; Axenovich, Tatiana I; de Geus, Eco; Montgomery, Grant W; Wright, Margaret J; Oostra, Ben A; Martin, Nicholas G; Boomsma, Dorret I; van Duijn, Cornelia M

    2013-01-01

    Personality traits are complex phenotypes related to psychosomatic health. Individually, various gene finding methods have not achieved much success in finding genetic variants associated with personality traits. We performed a meta-analysis of four genome-wide linkage scans (N=6149 subjects) of five basic personality traits assessed with the NEO Five-Factor Inventory. We compared the significant regions from the meta-analysis of linkage scans with the results of a meta-analysis of genome-wide association studies (GWAS) (N∼17 000). We found significant evidence of linkage of neuroticism to chromosome 3p14 (rs1490265, LOD=4.67) and to chromosome 19q13 (rs628604, LOD=3.55); of extraversion to 14q32 (ATGG002, LOD=3.3); and of agreeableness to 3p25 (rs709160, LOD=3.67) and to two adjacent regions on chromosome 15, including 15q13 (rs970408, LOD=4.07) and 15q14 (rs1055356, LOD=3.52) in the individual scans. In the meta-analysis, we found strong evidence of linkage of extraversion to 4q34, 9q34, 10q24 and 11q22, openness to 2p25, 3q26, 9p21, 11q24, 15q26 and 19q13 and agreeableness to 4q34 and 19p13. Significant evidence of association in the GWAS was detected between openness and rs677035 at 11q24 (P-value=2.6 × 10−06, KCNJ1). The findings of our linkage meta-analysis and those of the GWAS suggest that 11q24 is a susceptible locus for openness, with KCNJ1 as the possible candidate gene. PMID:23211697

  17. Genetic dissection of Al tolerance QTLs in the maize genome by high density SNP scan

    USDA-ARS?s Scientific Manuscript database

    Aluminum (Al) toxicity is an important limitation to food security in the tropical and subtropical regions. High Al saturation in acid soils limits root development and its ability to uptake water and nutrients. In this study, we present a genome scan for Al tolerance loci with over 50,000 GBS-based...

  18. Analysis tools for the interplay between genome layout and regulation.

    PubMed

    Bouyioukos, Costas; Elati, Mohamed; Képès, François

    2016-06-06

    Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.

  19. Discovery of rare, diagnostic AluYb8/9 elements in diverse human populations.

    PubMed

    Feusier, Julie; Witherspoon, David J; Scott Watkins, W; Goubert, Clément; Sasani, Thomas A; Jorde, Lynn B

    2017-01-01

    Polymorphic human Alu elements are excellent tools for assessing population structure, and new retrotransposition events can contribute to disease. Next-generation sequencing has greatly increased the potential to discover Alu elements in human populations, and various sequencing and bioinformatics methods have been designed to tackle the problem of detecting these highly repetitive elements. However, current techniques for Alu discovery may miss rare, polymorphic Alu elements. Combining multiple discovery approaches may provide a better profile of the polymorphic Alu mobilome. Alu Yb8/9 elements have been a focus of our recent studies as they are young subfamilies (~2.3 million years old) that contribute ~30% of recent polymorphic Alu retrotransposition events. Here, we update our ME-Scan methods for detecting Alu elements and apply these methods to discover new insertions in a large set of individuals with diverse ancestral backgrounds. We identified 5,288 putative Alu insertion events, including several hundred novel Alu Yb8/9 elements from 213 individuals from 18 diverse human populations. Hundreds of these loci were specific to continental populations, and 23 non-reference population-specific loci were validated by PCR. We provide high-quality sequence information for 68 rare Alu Yb8/9 elements, of which 11 have hallmarks of an active source element. Our subfamily distribution of rare Alu Yb8/9 elements is consistent with previous datasets, and may be representative of rare loci. We also find that while ME-Scan and low-coverage, whole-genome sequencing (WGS) detect different Alu elements in 41 1000 Genomes individuals, the two methods yield similar population structure results. Current in-silico methods for Alu discovery may miss rare, polymorphic Alu elements. Therefore, using multiple techniques can provide a more accurate profile of Alu elements in individuals and populations. We improved our false-negative rate as an indicator of sample quality for future ME-Scan experiments. In conclusion, we demonstrate that ME-Scan is a good supplement for next-generation sequencing methods and is well-suited for population-level analyses.

  20. Scanning the human genome at kilobase resolution.

    PubMed

    Chen, Jun; Kim, Yeong C; Jung, Yong-Chul; Xuan, Zhenyu; Dworkin, Geoff; Zhang, Yanming; Zhang, Michael Q; Wang, San Ming

    2008-05-01

    Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.

  1. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences.

    PubMed

    Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques

    2008-01-01

    This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.

  2. Genome-wide scan for visceral leishmaniasis in mixed-breed dogs identifies candidate genes involved in T helper cells and macrophage signaling

    USDA-ARS?s Scientific Manuscript database

    We conducted a genome-wide scan for visceral leishmaniasis in mixed-breed dogs from a highly endemic area in Brazil using 149,648 single nucleotide polymorphism (SNP) markers genotyped in 20 cases and 28 controls. Using a mixed model approach, we found two candidate loci on canine autosomes 1 and 2....

  3. Non-random mate choice in humans: insights from a genome scan.

    PubMed

    Laurent, R; Toupance, B; Chaix, R

    2012-02-01

    Little is known about the genetic factors influencing mate choice in humans. Still, there is evidence for non-random mate choice with respect to physical traits. In addition, some studies suggest that the Major Histocompatibility Complex may affect pair formation. Nowadays, the availability of high density genomic data sets gives the opportunity to scan the genome for signatures of non-random mate choice without prior assumptions on which genes may be involved, while taking into account socio-demographic factors. Here, we performed a genome scan to detect extreme patterns of similarity or dissimilarity among spouses throughout the genome in three populations of African, European American, and Mexican origins from the HapMap 3 database. Our analyses identified genes and biological functions that may affect pair formation in humans, including genes involved in skin appearance, morphogenesis, immunity and behaviour. We found little overlap between the three populations, suggesting that the biological functions potentially influencing mate choice are population specific, in other words are culturally driven. Moreover, whenever the same functional category of genes showed a significant signal in two populations, different genes were actually involved, which suggests the possibility of evolutionary convergences. © 2011 Blackwell Publishing Ltd.

  4. A genome-wide linkage scan for quantitative trait loci underlying obesity related phenotypes in 434 Caucasian families.

    PubMed

    Zhao, Lan-Juan; Xiao, Peng; Liu, Yong-Jun; Xiong, Dong-Hai; Shen, Hui; Recker, Robert R; Deng, Hong-Wen

    2007-03-01

    To identify quantitative trait loci (QTLs) that contribute to obesity, we performed a large-scale whole genome linkage scan (WGS) involving 4,102 individuals from 434 Caucasian families. The most pronounced linkage evidence was found at the genomic region 20p11-12 for fat mass (LOD = 3.31) and percentage fat mass (PFM) (LOD = 2.92). We also identified several regions showing suggestive linkage signals (threshold LOD = 1.9) for obesity phenotypes, including 5q35, 8q13, 10p12, and 17q11.

  5. Genome-wide Linkage Scan of Antisocial Behavior, Depression and Impulsive Substance Use in the UCSF Family Alcoholism Study

    PubMed Central

    Gizer, Ian R.; Ehlers, Cindy L.; Vieten, Cassandra; Feiler, Heidi S.; Gilder, David A.; Wilhelmsen, Kirk C.

    2012-01-01

    OBJECTIVE Epidemiological and clinical studies suggest that rates of antisocial behavior, depression, and impulsive substance use are increased among individuals diagnosed with alcohol dependence relative to those who are not. Thus, the present study conducted genome-wide linkage scans of antisocial behavior, depression, and impulsive substance use in the University of California at San Francisco Family Alcoholism Study. METHODS Antisocial behavior, depressive symptoms, and impulsive substance use were assessed using three scales from the MMPI-2, the Antisocial Practices content scale (ASP), the Depression content scale (DEP), and the revised MacAndrew Alcoholism scale (MAC-R). Linkage analyses were conducted using a variance components approach. RESULTS Suggestive evidence of linkage to three genomic regions independent of alcohol and cannabis dependence diagnostic status was observed: the ASP scale showed evidence of linkage to chromosome 13 at 11 cM, the MAC-R scale showed evidence of linkage to chromosome 15 at 47 cM, and all 3 scales showed evidence of linkage to chromosome 17 at 57–58 cM. CONCLUSIONS Each of these regions has shown prior evidence of linkage and association to substance dependence as well as other psychiatric disorders such as mood and anxiety disorders, ADHD, and schizophrenia thus suggesting potentially broad relations between these regions and psychopathology. PMID:22517380

  6. Parent-Of-Origin Effects in Autism Identified through Genome-Wide Linkage Analysis of 16,000 SNPs

    PubMed Central

    Fradin, Delphine; Cheslack-Postava, Keely; Ladd-Acosta, Christine; Newschaffer, Craig; Chakravarti, Aravinda; Arking, Dan E.; Feinberg, Andrew; Fallin, M. Daniele

    2010-01-01

    Background Autism is a common heritable neurodevelopmental disorder with complex etiology. Several genome-wide linkage and association scans have been carried out to identify regions harboring genes related to autism or autism spectrum disorders, with mixed results. Given the overlap in autism features with genetic abnormalities known to be associated with imprinting, one possible reason for lack of consistency would be the influence of parent-of-origin effects that may mask the ability to detect linkage and association. Methods and Findings We have performed a genome-wide linkage scan that accounts for potential parent-of-origin effects using 16,311 SNPs among families from the Autism Genetic Resource Exchange (AGRE) and the National Institute of Mental Health (NIMH) autism repository. We report parametric (GH, Genehunter) and allele-sharing linkage (Aspex) results using a broad spectrum disorder case definition. Paternal-origin genome-wide statistically significant linkage was observed on chromosomes 4 (LODGH = 3.79, empirical p<0.005 and LODAspex = 2.96, p = 0.008), 15 (LODGH = 3.09, empirical p<0.005 and LODAspex = 3.62, empirical p = 0.003) and 20 (LODGH = 3.36, empirical p<0.005 and LODAspex = 3.38, empirical p = 0.006). Conclusions These regions may harbor imprinted sites associated with the development of autism and offer fruitful domains for molecular investigation into the role of epigenetic mechanisms in autism. PMID:20824079

  7. Recent advances in understanding the role of nutrition in human genome evolution.

    PubMed

    Ye, Kaixiong; Gu, Zhenglong

    2011-11-01

    Dietary transitions in human history have been suggested to play important roles in the evolution of mankind. Genetic variations caused by adaptation to diet during human evolution could have important health consequences in current society. The advance of sequencing technologies and the rapid accumulation of genome information provide an unprecedented opportunity to comprehensively characterize genetic variations in human populations and unravel the genetic basis of human evolution. Series of selection detection methods, based on various theoretical models and exploiting different aspects of selection signatures, have been developed. Their applications at the species and population levels have respectively led to the identification of human specific selection events that distinguish human from nonhuman primates and local adaptation events that contribute to human diversity. Scrutiny of candidate genes has revealed paradigms of adaptations to specific nutritional components and genome-wide selection scans have verified the prevalence of diet-related selection events and provided many more candidates awaiting further investigation. Understanding the role of diet in human evolution is fundamental for the development of evidence-based, genome-informed nutritional practices in the era of personal genomics.

  8. A Genome-Wide Search for Linkage of Estimated Glomerular Filtration Rate (eGFR) in the Family Investigation of Nephropathy and Diabetes (FIND)

    PubMed Central

    Thameem, Farook; Igo, Robert P.; Freedman, Barry I.; Langefeld, Carl; Hanson, Robert L.; Schelling, Jeffrey R.; Elston, Robert C.; Duggirala, Ravindranath; Nicholas, Susanne B.; Goddard, Katrina A. B.; Divers, Jasmin; Guo, Xiuqing; Ipp, Eli; Kimmel, Paul L.; Meoni, Lucy A.; Shah, Vallabh O.; Smith, Michael W.; Winkler, Cheryl A.; Zager, Philip G.; Knowler, William C.; Nelson, Robert G.; Pahl, Madeline V.; Parekh, Rulan S.; Kao, W. H. Linda; Rasooly, Rebekah S.; Adler, Sharon G.; Abboud, Hanna E.; Iyengar, Sudha K.; Sedor, John R.

    2013-01-01

    Objective Estimated glomerular filtration rate (eGFR), a measure of kidney function, is heritable, suggesting that genes influence renal function. Genes that influence eGFR have been identified through genome-wide association studies. However, family-based linkage approaches may identify loci that explain a larger proportion of the heritability. This study used genome-wide linkage and association scans to identify quantitative trait loci (QTL) that influence eGFR. Methods Genome-wide linkage and sparse association scans of eGFR were performed in families ascertained by probands with advanced diabetic nephropathy (DN) from the multi-ethnic Family Investigation of Nephropathy and Diabetes (FIND) study. This study included 954 African Americans (AA), 781 American Indians (AI), 614 European Americans (EA) and 1,611 Mexican Americans (MA). A total of 3,960 FIND participants were genotyped for 6,000 single nucleotide polymorphisms (SNPs) using the Illumina Linkage IVb panel. GFR was estimated by the Modification of Diet in Renal Disease (MDRD) formula. Results The non-parametric linkage analysis, accounting for the effects of diabetes duration and BMI, identified the strongest evidence for linkage of eGFR on chromosome 20q11 (log of the odds [LOD] = 3.34; P = 4.4×10−5) in MA and chromosome 15q12 (LOD = 2.84; P = 1.5×10−4) in EA. In all subjects, the strongest linkage signal for eGFR was detected on chromosome 10p12 (P = 5.5×10−4) at 44 cM near marker rs1339048. A subsequent association scan in both ancestry-specific groups and the entire population identified several SNPs significantly associated with eGFR across the genome. Conclusion The present study describes the localization of QTL influencing eGFR on 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups participating in the FIND study. Identification of causal genes/variants influencing eGFR, within these linkage and association loci, will open new avenues for functional analyses and development of novel diagnostic markers for DN. PMID:24358131

  9. A DNA microarray-based methylation-sensitive (MS)-AFLP hybridization method for genetic and epigenetic analyses.

    PubMed

    Yamamoto, F; Yamamoto, M

    2004-07-01

    We previously developed a PCR-based DNA fingerprinting technique named the Methylation Sensitive (MS)-AFLP method, which permits comparative genome-wide scanning of methylation status with a manageable number of fingerprinting experiments. The technique uses the methylation sensitive restriction enzyme NotI in the context of the existing Amplified Fragment Length Polymorphism (AFLP) method. Here we report the successful conversion of this gel electrophoresis-based DNA fingerprinting technique into a DNA microarray hybridization technique (DNA Microarray MS-AFLP). By performing a total of 30 (15 x 2 reciprocal labeling) DNA Microarray MS-AFLP hybridization experiments on genomic DNA from two breast and three prostate cancer cell lines in all pairwise combinations, and Southern hybridization experiments using more than 100 different probes, we have demonstrated that the DNA Microarray MS-AFLP is a reliable method for genetic and epigenetic analyses. No statistically significant differences were observed in the number of differences between the breast-prostate hybridization experiments and the breast-breast or prostate-prostate comparisons.

  10. Quantification of AAV particle titers by infrared fluorescence scanning of coomassie-stained sodium dodecyl sulfate-polyacrylamide gels.

    PubMed

    Kohlbrenner, Erik; Henckaerts, Els; Rapti, Kleopatra; Gordon, Ronald E; Linden, R Michael; Hajjar, Roger J; Weber, Thomas

    2012-06-01

    Adeno-associated virus (AAV)-based vectors have gained increasing attention as gene delivery vehicles in basic and preclinical studies as well as in human gene therapy trials. Especially for the latter two-for both safety and therapeutic efficacy reasons-a detailed characterization of all relevant parameters of the vector preparation is essential. Two important parameters that are routinely used to analyze recombinant AAV vectors are (1) the titer of viral particles containing a (recombinant) viral genome and (2) the purity of the vector preparation, most commonly assessed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) followed by silver staining. An important, third parameter, the titer of total viral particles, that is, the combined titer of both genome-containing and empty viral capsids, is rarely determined. Here, we describe a simple and inexpensive method that allows the simultaneous assessment of both vector purity and the determination of the total viral particle titer. This method, which was validated by comparison with established methods to determine viral particle titers, is based on the fact that Coomassie Brilliant Blue, when bound to proteins, fluoresces in the infrared spectrum. Viral samples are separated by SDS-PAGE followed by Coomassie Brilliant Blue staining and gel analysis with an infrared laser-scanning device. In combination with a protein standard, our method allows the rapid and accurate determination of viral particle titers simultaneously with the assessment of vector purity.

  11. Detection of selection signatures in Piemontese and Marchigiana cattle, two breeds with similar production aptitudes but different selection histories.

    PubMed

    Sorbolini, Silvia; Marras, Gabriele; Gaspa, Giustino; Dimauro, Corrado; Cellesi, Massimo; Valentini, Alessio; Macciotta, Nicolò Pp

    2015-06-23

    Domestication and selection are processes that alter the pattern of within- and between-population genetic variability. They can be investigated at the genomic level by tracing the so-called selection signatures. Recently, sequence polymorphisms at the genome-wide level have been investigated in a wide range of animals. A common approach to detect selection signatures is to compare breeds that have been selected for different breeding goals (i.e. dairy and beef cattle). However, genetic variations in different breeds with similar production aptitudes and similar phenotypes can be related to differences in their selection history. In this study, we investigated selection signatures between two Italian beef cattle breeds, Piemontese and Marchigiana, using genotyping data that was obtained with the Illumina BovineSNP50 BeadChip. The comparison was based on the fixation index (Fst), combined with a locally weighted scatterplot smoothing (LOWESS) regression and a control chart approach. In addition, analyses of Fst were carried out to confirm candidate genes. In particular, data were processed using the varLD method, which compares the regional variation of linkage disequilibrium between populations. Genome scans confirmed the presence of selective sweeps in the genomic regions that harbour candidate genes that are known to affect productive traits in cattle such as DGAT1, ABCG2, CAPN3, MSTN and FTO. In addition, several new putative candidate genes (for example ALAS1, ABCB8, ACADS and SOD1) were detected. This study provided evidence on the different selection histories of two cattle breeds and the usefulness of genomic scans to detect selective sweeps even in cattle breeds that are bred for similar production aptitudes.

  12. Polymorphisms of the Tissue Inhibitor of Metalloproteinase 3 Gene Are Associated with Resistance to High-Altitude Pulmonary Edema (HAPE) in a Japanese Population: A Case Control Study Using Polymorphic Microsatellite Markers

    PubMed Central

    Kobayashi, Nobumitsu; Hanaoka, Masayuki; Droma, Yunden; Ito, Michiko; Katsuyama, Yoshihiko; Kubo, Keishi; Ota, Masao

    2013-01-01

    Introduction High-altitude pulmonary edema (HAPE) is a hypoxia-induced, life-threatening, high permeability type of edema attributable to pulmonary capillary stress failure. Genome-wide association analysis is necessary to better understand how genetics influence the outcome of HAPE. Materials and Methods DNA samples were collected from 53 subjects susceptible to HAPE (HAPE-s) and 67 elite Alpinists resistant to HAPE (HAPE-r). The genome scan was carried out using 400 polymorphic microsatellite markers throughout the whole genome in all subjects. In addition, six single nucleotide polymorphisms (SNPs) of the gene encoding the tissue inhibitor of metalloproteinase 3 (TIMP3) were genotyped by Taqman® SNP Genotyping Assays. Results The results were analyzed using case-control comparisons. Whole genome scanning revealed that allele frequencies in nine markers were statistically different between HAPE-s and HAPE-r subjects. The SNP genotyping of the TIMP3 gene revealed that the derived allele C of rs130293 was associated with resistance to HAPE [odds ratio (OR) = 0.21, P = 0.0012) and recessive inheritance of the phenotype of HAPE-s (P = 0.0012). A haplotype CAC carrying allele C of rs130293 was associated with resistance to HAPE. Discussion This genome-wide association study revealed several novel candidate genes associated with susceptibility or resistance to HAPE in a Japanese population. Among those, the minor allele C of rs130293 (C/T) in the TIMP3 gene was linked to resistance to HAPE; while, the ancestral allele T was associated with susceptibility to HAPE. PMID:23991023

  13. Maternal noncoding transcripts antagonize the targeting of DNA elimination by scanRNAs in Paramecium tetraurelia

    PubMed Central

    Lepère, Gersende; Bétermier, Mireille; Meyer, Eric; Duharcourt, Sandra

    2008-01-01

    The germline genome of ciliates is extensively rearranged during the development of a new somatic macronucleus from the germline micronucleus, after sexual events. In Paramecium tetraurelia, single-copy internal eliminated sequences (IESs) are precisely excised from coding sequences and intergenic regions. For a subset of IESs, introduction of the IES sequence into the maternal macronucleus specifically inhibits excision of the homologous IES in the developing zygotic macronucleus, suggesting that epigenetic regulation of excision involves a global comparison of germline and somatic genomes. ScanRNAs (scnRNAs) produced during micronuclear meiosis by a developmentally regulated RNAi pathway have been proposed to mediate this transnuclear cross-talk. In this study, microinjection experiments provide direct evidence that 25-nucleotide (nt) scnRNAs promote IES excision. We further show that noncoding RNAs are produced from the somatic maternal genome, both during vegetative growth and during sexual events. Maternal inhibition of IES excision is abolished when maternal somatic transcripts containing an IES are targeted for degradation by a distinct RNAi pathway involving 23-nt siRNAs. The results strongly support a scnRNA/macronuclear RNA scanning model in which a natural genomic subtraction, occurring during meiosis between deletion-inducing scnRNAs and antagonistic transcripts from the maternal macronucleus, regulates rearrangements of the zygotic genome. PMID:18519642

  14. Limited Evidence for Classic Selective Sweeps in African Populations

    PubMed Central

    Granka, Julie M.; Henn, Brenna M.; Gignoux, Christopher R.; Kidd, Jeffrey M.; Bustamante, Carlos D.; Feldman, Marcus W.

    2012-01-01

    While hundreds of loci have been identified as reflecting strong-positive selection in human populations, connections between candidate loci and specific selective pressures often remain obscure. This study investigates broader patterns of selection in African populations, which are underrepresented despite their potential to offer key insights into human adaptation. We scan for hard selective sweeps using several haplotype and allele-frequency statistics with a data set of nearly 500,000 genome-wide single-nucleotide polymorphisms in 12 highly diverged African populations that span a range of environments and subsistence strategies. We find that positive selection does not appear to be a strong determinant of allele-frequency differentiation among these African populations. Haplotype statistics do identify putatively selected regions that are shared across African populations. However, as assessed by extensive simulations, patterns of haplotype sharing between African populations follow neutral expectations and suggest that tails of the empirical distributions contain false-positive signals. After highlighting several genomic regions where positive selection can be inferred with higher confidence, we use a novel method to identify biological functions enriched among populations’ empirical tail genomic windows, such as immune response in agricultural groups. In general, however, it seems that current methods for selection scans are poorly suited to populations that, like the African populations in this study, are affected by ascertainment bias and have low levels of linkage disequilibrium, possibly old selective sweeps, and potentially reduced phasing accuracy. Additionally, population history can confound the interpretation of selection statistics, suggesting that greater care is needed in attributing broad genetic patterns to human adaptation. PMID:22960214

  15. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  16. Genetic characterization of a core collection of flax (Linum usitatissimum L.) suitable for association mapping studies and evidence of divergent selection between fiber and linseed types

    PubMed Central

    2013-01-01

    Background Flax is valued for its fiber, seed oil and nutraceuticals. Recently, the fiber industry has invested in the development of products made from linseed stems, making it a dual purpose crop. Simultaneous targeting of genomic regions controlling stem fiber and seed quality traits could enable the development of dual purpose cultivars. However, the genetic diversity, population structure and linkage disequilibrium (LD) patterns necessary for association mapping (AM) have not yet been assessed in flax because genomic resources have only recently been developed. We characterized 407 globally distributed flax accessions using 448 microsatellite markers. The data was analyzed to assess the suitability of this core collection for AM. Genomic scans to identify candidate genes selected during the divergent breeding process of fiber flax and linseed were conducted using the whole genome shotgun sequence of flax. Results Combined genetic structure analysis assigned all accessions to two major groups with six sub-groups. Population differentiation was weak between the major groups (FST = 0.094) and for most of the pairwise comparisons among sub-groups. The molecular coancestry analysis indicated weak relatedness (mean = 0.287) for most individual pairs. Abundant genetic diversity was observed in the total panel (5.32 alleles per locus), and some sub-groups showed a high proportion of private alleles. The average genome-wide LD (r2) was 0.036, with a relatively fast decay of 1.5 cM. Genomic scans between fiber flax and linseed identified candidate genes involved in cell-wall biogenesis/modification, xylem identity and fatty acid biosynthesis congruent with genes previously identified in flax and other plant species. Conclusions Based on the abundant genetic diversity, weak population structure and relatedness and relatively fast LD decay, we concluded that this core collection is suitable for AM studies targeting multiple agronomic and quality traits aiming at the improvement of flax as a true dual purpose crop. Our genomic scans provide the first insights into candidate regions affected by divergent selection in flax. In combination with AM, genomic scans have the ability to increase the power to detect loci influencing complex traits. PMID:23647851

  17. Genetic characterization of a core collection of flax (Linum usitatissimum L.) suitable for association mapping studies and evidence of divergent selection between fiber and linseed types.

    PubMed

    Soto-Cerda, Braulio J; Diederichsen, Axel; Ragupathy, Raja; Cloutier, Sylvie

    2013-05-06

    Flax is valued for its fiber, seed oil and nutraceuticals. Recently, the fiber industry has invested in the development of products made from linseed stems, making it a dual purpose crop. Simultaneous targeting of genomic regions controlling stem fiber and seed quality traits could enable the development of dual purpose cultivars. However, the genetic diversity, population structure and linkage disequilibrium (LD) patterns necessary for association mapping (AM) have not yet been assessed in flax because genomic resources have only recently been developed. We characterized 407 globally distributed flax accessions using 448 microsatellite markers. The data was analyzed to assess the suitability of this core collection for AM. Genomic scans to identify candidate genes selected during the divergent breeding process of fiber flax and linseed were conducted using the whole genome shotgun sequence of flax. Combined genetic structure analysis assigned all accessions to two major groups with six sub-groups. Population differentiation was weak between the major groups (F(ST) = 0.094) and for most of the pairwise comparisons among sub-groups. The molecular coancestry analysis indicated weak relatedness (mean = 0.287) for most individual pairs. Abundant genetic diversity was observed in the total panel (5.32 alleles per locus), and some sub-groups showed a high proportion of private alleles. The average genome-wide LD (r²) was 0.036, with a relatively fast decay of 1.5 cM. Genomic scans between fiber flax and linseed identified candidate genes involved in cell-wall biogenesis/modification, xylem identity and fatty acid biosynthesis congruent with genes previously identified in flax and other plant species. Based on the abundant genetic diversity, weak population structure and relatedness and relatively fast LD decay, we concluded that this core collection is suitable for AM studies targeting multiple agronomic and quality traits aiming at the improvement of flax as a true dual purpose crop. Our genomic scans provide the first insights into candidate regions affected by divergent selection in flax. In combination with AM, genomic scans have the ability to increase the power to detect loci influencing complex traits.

  18. Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA.

    PubMed

    Kane, Nolan; Sveinsson, Saemundur; Dempewolf, Hannes; Yang, Ji Yong; Zhang, Dapeng; Engels, Johannes M M; Cronk, Quentin

    2012-02-01

    To reliably identify lineages below the species level such as subspecies or varieties, we propose an extension to DNA-barcoding using next-generation sequencing to produce whole organellar genomes and substantial nuclear ribosomal sequence. Because this method uses much longer versions of the traditional DNA-barcoding loci in the plastid and ribosomal DNA, we call our approach ultra-barcoding (UBC). We used high-throughput next-generation sequencing to scan the genome and generate reliable sequence of high copy number regions. Using this method, we examined whole plastid genomes as well as nearly 6000 bases of nuclear ribosomal DNA sequences for nine genotypes of Theobroma cacao and an individual of the related species T. grandiflorum, as well as an additional publicly available whole plastid genome of T. cacao. All individuals of T. cacao examined were uniquely distinguished, and evidence of reticulation and gene flow was observed. Sequence variation was observed in some of the canonical barcoding regions between species, but other regions of the chloroplast were more variable both within species and between species, as were ribosomal spacers. Furthermore, no single region provides the level of data available using the complete plastid genome and rDNA. Our data demonstrate that UBC is a viable, increasingly cost-effective approach for reliably distinguishing varieties and even individual genotypes of T. cacao. This approach shows great promise for applications where very closely related or interbreeding taxa must be distinguished.

  19. Genome-wide Association Study Identifies African-Specific Susceptibility Loci in African Americans with Inflammatory Bowel Disease

    PubMed Central

    Brant, Steven R.; Okou, David T.; Simpson, Claire L.; Cutler, David J.; Haritunians, Talin; Bradfield, Jonathan P.; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W.; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J.; Klapproth, Jan-Micheal A.; Quiros, Antonio J.; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S.; Baldassano, Robert N.; Dudley-Brown, Sharon; Cross, Raymond K.; Dassopoulos, Themistocles; Denson, Lee A.; Dhere, Tanvi A.; Dryden, Gerald W.; Hanson, John S.; Hou, Jason K.; Hussain, Sunny Z.; Hyams, Jeffrey S.; Isaacs, Kim L.; Kader, Howard; Kappelman, Michael D.; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S.; Kuemmerle, John F.; Kwon, John H.; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E.; Newberry, Rodney D.; Osuntokun, Bankole O.; Patel, Ashish S.; Saeed, Shehzad A.; Targan, Stephan R.; Valentine, John F.; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D.; Duerr, Richard H.; Silverberg, Mark S.; Cho, Judy H.; Hakonarson, Hakon; Zwick, Michael E.; McGovern, Dermot P.B.; Kugathasan, Subra

    2016-01-01

    Background & Aims The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn’s disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. Methods We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified [IBD-U]) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P<5.0×10−8 in meta-analysis with a nominal evidence (P<.05) in each scan were considered to have genome-wide significance. Results We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance associations for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P<1.6×10−6): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B, PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. Conclusions We performed a genome-wide association study of African Americans with IBD and identified loci associated with CD and UC in only this population; we also replicated loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry. PMID:27693347

  20. Horizon scanning for new genomic tests.

    PubMed

    Gwinn, Marta; Grossniklaus, Daurice A; Yu, Wei; Melillo, Stephanie; Wulf, Anja; Flome, Jennifer; Dotson, W David; Khoury, Muin J

    2011-02-01

    The development of health-related genomic tests is decentralized and dynamic, involving government, academic, and commercial entities. Consequently, it is not easy to determine which tests are in development, currently available, or discontinued. We developed and assessed the usefulness of a systematic approach to identifying new genomic tests on the Internet. We devised targeted queries of Web pages, newspaper articles, and blogs (Google Alerts) to identify new genomic tests. We finalized search and review procedures during a pilot phase that ended in March 2010. Queries continue to run daily and are compiled weekly; selected data are indexed in an online database, the Genomic Applications in Practice and Prevention Finder. After the pilot phase, our scan detected approximately two to three new genomic tests per week. Nearly two thirds of all tests (122/188, 65%) were related to cancer; only 6% were related to hereditary disorders. Although 88 (47%) of the tests, including 2 marketed directly to consumers, were commercially available, only 12 (6%) claimed United States Food and Drug Administration licensure. Systematic surveillance of the Internet provides information about genomic tests that can be used in combination with other resources to evaluate genomic tests. The Genomic Applications in Practice and Prevention Finder makes this information accessible to a wide group of stakeholders.

  1. Parasitism drives host genome evolution: Insights from the Pasteuria ramosa-Daphnia magna system.

    PubMed

    Bourgeois, Yann; Roulin, Anne C; Müller, Kristina; Ebert, Dieter

    2017-04-01

    Because parasitism is thought to play a major role in shaping host genomes, it has been predicted that genomic regions associated with resistance to parasites should stand out in genome scans, revealing signals of selection above the genomic background. To test whether parasitism is indeed such a major factor in host evolution and to better understand host-parasite interaction at the molecular level, we studied genome-wide polymorphisms in 97 genotypes of the planktonic crustacean Daphnia magna originating from three localities across Europe. Daphnia magna is known to coevolve with the bacterial pathogen Pasteuria ramosa for which host genotypes (clonal lines) are either resistant or susceptible. Using association mapping, we identified two genomic regions involved in resistance to P. ramosa, one of which was already known from a previous QTL analysis. We then performed a naïve genome scan to test for signatures of positive selection and found that the two regions identified with the association mapping further stood out as outliers. Several other regions with evidence for selection were also found, but no link between these regions and phenotypic variation could be established. Our results are consistent with the hypothesis that parasitism is driving host genome evolution. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.

  2. Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system.

    PubMed

    Huang, Jie; Li, Yu-Zhi; Du, Lian-Ming; Yang, Bo; Shen, Fu-Jun; Zhang, He-Min; Zhang, Zhi-He; Zhang, Xiu-Yue; Yue, Bi-Song

    2015-02-07

    The giant panda (Ailuropoda melanoleuca) is a critically endangered species endemic to China. Microsatellites have been preferred as the most popular molecular markers and proven effective in estimating population size, paternity test, genetic diversity for the critically endangered species. The availability of the giant panda complete genome sequences provided the opportunity to carry out genome-wide scans for all types of microsatellites markers, which now opens the way for the analysis and development of microsatellites in giant panda. By screening the whole genome sequence of giant panda in silico mining, we identified microsatellites in the genome of giant panda and analyzed their frequency and distribution in different genomic regions. Based on our search criteria, a repertoire of 855,058 SSRs was detected, with mono-nucleotides being the most abundant. SSRs were found in all genomic regions and were more abundant in non-coding regions than coding regions. A total of 160 primer pairs were designed to screen for polymorphic microsatellites using the selected tetranucleotide microsatellite sequences. The 51 novel polymorphic tetranucleotide microsatellite loci were discovered based on genotyping blood DNA from 22 captive giant pandas in this study. Finally, a total of 15 markers, which showed good polymorphism, stability, and repetition in faecal samples, were used to establish the novel microsatellite marker system for giant panda. Meanwhile, a genotyping database for Chengdu captive giant pandas (n = 57) were set up using this standardized system. What's more, a universal individual identification method was established and the genetic diversity were analysed in this study as the applications of this marker system. The microsatellite abundance and diversity were characterized in giant panda genomes. A total of 154,677 tetranucleotide microsatellites were identified and 15 of them were discovered as the polymorphic and stable loci. The individual identification method and the genetic diversity analysis method in this study provided adequate material for the future study of giant panda.

  3. High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast

    PubMed Central

    Liachko, Ivan; Youngblood, Rachel A.; Keich, Uri; Dunham, Maitreya J.

    2013-01-01

    DNA replication origins are necessary for the duplication of genomes. In addition, plasmid-based expression systems require DNA replication origins to maintain plasmids efficiently. The yeast autonomously replicating sequence (ARS) assay has been a valuable tool in dissecting replication origin structure and function. However, the dearth of information on origins in diverse yeasts limits the availability of efficient replication origin modules to only a handful of species and restricts our understanding of origin function and evolution. To enable rapid study of origins, we have developed a sequencing-based suite of methods for comprehensively mapping and characterizing ARSs within a yeast genome. Our approach finely maps genomic inserts capable of supporting plasmid replication and uses massively parallel deep mutational scanning to define molecular determinants of ARS function with single-nucleotide resolution. In addition to providing unprecedented detail into origin structure, our data have allowed us to design short, synthetic DNA sequences that retain maximal ARS function. These methods can be readily applied to understand and modulate ARS function in diverse systems. PMID:23241746

  4. Family-Based Genome-Wide Association Scan of Attention-Deficit/Hyperactivity Disorder

    ERIC Educational Resources Information Center

    Mick, Eric; Todorov, Alexandre; Smalley, Susan; Hu, Xiaolan; Loo, Sandra; Todd, Richard D.; Biederman, Joseph; Byrne, Deirdre; Dechairo, Bryan; Guiney, Allan; McCracken, James; McGough, James; Nelson, Stanley F.; Reiersen, Angela M.; Wilens, Timothy E.; Wozniak, Janet; Neale, Benjamin M.; Faraone, Stephen V.

    2010-01-01

    Objective: Genes likely play a substantial role in the etiology of attention-deficit/hyperactivity disorder (ADHD). However, the genetic architecture of the disorder is unknown, and prior genome-wide association studies (GWAS) have not identified a genome-wide significant association. We have conducted a third, independent, multisite GWAS of…

  5. A multivariate prediction model for Rho-dependent termination of transcription.

    PubMed

    Nadiras, Cédric; Eveno, Eric; Schwartz, Annie; Figueroa-Bossi, Nara; Boudvillain, Marc

    2018-06-21

    Bacterial transcription termination proceeds via two main mechanisms triggered either by simple, well-conserved (intrinsic) nucleic acid motifs or by the motor protein Rho. Although bacterial genomes can harbor hundreds of termination signals of either type, only intrinsic terminators are reliably predicted. Computational tools to detect the more complex and diversiform Rho-dependent terminators are lacking. To tackle this issue, we devised a prediction method based on Orthogonal Projections to Latent Structures Discriminant Analysis [OPLS-DA] of a large set of in vitro termination data. Using previously uncharacterized genomic sequences for biochemical evaluation and OPLS-DA, we identified new Rho-dependent signals and quantitative sequence descriptors with significant predictive value. Most relevant descriptors specify features of transcript C>G skewness, secondary structure, and richness in regularly-spaced 5'CC/UC dinucleotides that are consistent with known principles for Rho-RNA interaction. Descriptors collectively warrant OPLS-DA predictions of Rho-dependent termination with a ∼85% success rate. Scanning of the Escherichia coli genome with the OPLS-DA model identifies significantly more termination-competent regions than anticipated from transcriptomics and predicts that regions intrinsically refractory to Rho are primarily located in open reading frames. Altogether, this work delineates features important for Rho activity and describes the first method able to predict Rho-dependent terminators in bacterial genomes.

  6. In Silico Genome Mismatch Scanning to Map Breast Cancer Genes in Extended Pedigrees

    DTIC Science & Technology

    2009-07-01

    contiguous window of loci. The current graphical model is applied to both the paternal and maternal haplotypes for each observed individual. The values at... paternal and maternal haplotypes of each individual in the sample in parallel. These affect the observed genotypes shown as white squares. The states...individuals, but also, by listing the parents as zero, allows for samples of unrelated individuals as required by these methods. IntervalLDwill treat

  7. Interaction Analysis through Proteomic Phage Display

    PubMed Central

    2014-01-01

    Phage display is a powerful technique for profiling specificities of peptide binding domains. The method is suited for the identification of high-affinity ligands with inhibitor potential when using highly diverse combinatorial peptide phage libraries. Such experiments further provide consensus motifs for genome-wide scanning of ligands of potential biological relevance. A complementary but considerably less explored approach is to display expression products of genomic DNA, cDNA, open reading frames (ORFs), or oligonucleotide libraries designed to encode defined regions of a target proteome on phage particles. One of the main applications of such proteomic libraries has been the elucidation of antibody epitopes. This review is focused on the use of proteomic phage display to uncover protein-protein interactions of potential relevance for cellular function. The method is particularly suited for the discovery of interactions between peptide binding domains and their targets. We discuss the largely unexplored potential of this method in the discovery of domain-motif interactions of potential biological relevance. PMID:25295249

  8. Investigation of inversion polymorphisms in the human genome using principal components analysis.

    PubMed

    Ma, Jianzhong; Amos, Christopher I

    2012-01-01

    Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct "populations" of inversion homozygotes of different orientations and their 1:1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.

  9. Genome-wide scans for loci under selection in humans

    PubMed Central

    2005-01-01

    Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection. PMID:16004726

  10. How are information seeking, scanning, and processing related to beliefs about the roles of genetics and behavior in cancer causation?

    PubMed Central

    Waters, Erika A.; Wheeler, Courtney; Hamilton, Jada G.

    2016-01-01

    Background Understanding that cancer is caused by both genetic and behavioral risk factors is an important component of genomic literacy. However, a considerable percentage of people in the U.S. do not endorse such multifactorial beliefs. Methods Using nationally representative cross-sectional data from the U.S. Health Information National Trends Survey (N=2,529), we examined how information seeking, information scanning, and key information processing characteristics were associated with endorsing a multifactorial model of cancer causation. Results Multifactorial beliefs about cancer were more common among respondents who engaged in cancer information scanning (p=.001), were motivated to process health information (p=.005), and who reported a family history of cancer (p=.0002). Respondents who reported having previous negative information seeking experiences had lower odds of endorsing multifactorial beliefs (p=.01). Multifactorial beliefs were not associated with cancer information seeking, trusting cancer information obtained from the Internet, trusting cancer information from a physician, self-efficacy for obtaining cancer information, numeracy, or being aware of direct-to-consumer genetic testing (ps>.05). Conclusion Gaining additional understanding of how people access, process, and use health information will be critical for the continued development and dissemination of effective health communication interventions and for the further translation of genomics research to public health and clinical practice. PMID:27661291

  11. Genomic scan as a tool for assessing the genetic component of phenotypic variance in wild populations.

    PubMed

    Herrera, Carlos M

    2012-01-01

    Methods for estimating quantitative trait heritability in wild populations have been developed in recent years which take advantage of the increased availability of genetic markers to reconstruct pedigrees or estimate relatedness between individuals, but their application to real-world data is not exempt from difficulties. This chapter describes a recent marker-based technique which, by adopting a genomic scan approach and focusing on the relationship between phenotypes and genotypes at the individual level, avoids the problems inherent to marker-based estimators of relatedness. This method allows the quantification of the genetic component of phenotypic variance ("degree of genetic determination" or "heritability in the broad sense") in wild populations and is applicable whenever phenotypic trait values and multilocus data for a large number of genetic markers (e.g., amplified fragment length polymorphisms, AFLPs) are simultaneously available for a sample of individuals from the same population. The method proceeds by first identifying those markers whose variation across individuals is significantly correlated with individual phenotypic differences ("adaptive loci"). The proportion of phenotypic variance in the sample that is statistically accounted for by individual differences in adaptive loci is then estimated by fitting a linear model to the data, with trait value as the dependent variable and scores of adaptive loci as independent ones. The method can be easily extended to accommodate quantitative or qualitative information on biologically relevant features of the environment experienced by each sampled individual, in which case estimates of the environmental and genotype × environment components of phenotypic variance can also be obtained.

  12. Distinguishing noise from signal in patterns of genomic divergence in a highly polymorphic avian radiation.

    PubMed

    Campagna, Leonardo; Gronau, Ilan; Silveira, Luís Fábio; Siepel, Adam; Lovette, Irby J

    2015-08-01

    Recently diverged taxa provide the opportunity to search for the genetic basis of the phenotypes that distinguish them. Genomic scans aim to identify loci that are diverged with respect to an otherwise weakly differentiated genetic background. These loci are candidates for being past targets of selection because they behave differently from the rest of the genome that has either not yet differentiated or that may cross species barriers through introgressive hybridization. Here we use a reduced-representation genomic approach to explore divergence among six species of southern capuchino seedeaters, a group of recently radiated sympatric passerine birds in the genus Sporophila. For the first time in these taxa, we discovered a small proportion of markers that appeared differentiated among species. However, when assessing the significance of these signatures of divergence, we found that similar patterns can also be recovered from random grouping of individuals representing different species. A detailed demographic inference indicates that genetic differences among Sporophila species could be the consequence of neutral processes, which include a very large ancestral effective population size that accentuates the effects of incomplete lineage sorting. As these neutral phenomena can generate genomic scan patterns that mimic those of markers involved in speciation and phenotypic differentiation, they highlight the need for caution when ascertaining and interpreting differentiated markers between species, especially when large numbers of markers are surveyed. Our study provides new insights into the demography of the southern capuchino radiation and proposes controls to distinguish signal from noise in similar genomic scans. © 2015 John Wiley & Sons Ltd.

  13. Genomic estimation of additive and dominance effects and impact of accounting for dominance on accuracy of genomic evaluation in sheep populations.

    PubMed

    Moghaddar, N; van der Werf, J H J

    2017-12-01

    The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.

  14. Genome scans on experimentally evolved populations reveal candidate regions for adaptation to plant resistance in the potato cyst nematode Globodera pallida.

    PubMed

    Eoche-Bosy, D; Gautier, M; Esquibet, M; Legeai, F; Bretaudeau, A; Bouchez, O; Fournet, S; Grenier, E; Montarry, J

    2017-09-01

    Improving resistance durability involves to be able to predict the adaptation speed of pathogen populations. Identifying the genetic bases of pathogen adaptation to plant resistances is a useful step to better understand and anticipate this phenomenon. Globodera pallida is a major pest of potato crop for which a resistance QTL, GpaV vrn , has been identified in Solanum vernei. However, its durability is threatened as G. pallida populations are able to adapt to the resistance in few generations. The aim of this study was to investigate the genomic regions involved in the resistance breakdown by coupling experimental evolution and high-density genome scan. We performed a whole-genome resequencing of pools of individuals (Pool-Seq) belonging to G. pallida lineages derived from two independent populations having experimentally evolved on susceptible and resistant potato cultivars. About 1.6 million SNPs were used to perform the genome scan using a recent model testing for adaptive differentiation and association to population-specific covariables. We identified 275 outliers and 31 of them, which also showed a significant reduction in diversity in adapted lineages, were investigated for their genic environment. Some candidate genomic regions contained genes putatively encoding effectors and were enriched in SPRYSECs, known in cyst nematodes to be involved in pathogenicity and in (a)virulence. Validated candidate SNPs will provide a useful molecular tool to follow frequencies of virulence alleles in natural G. pallida populations and define efficient strategies of use of potato resistances maximizing their durability. © 2017 John Wiley & Sons Ltd.

  15. In silico prediction of splice-altering single nucleotide variants in the human genome.

    PubMed

    Jian, Xueqiu; Boerwinkle, Eric; Liu, Xiaoming

    2014-12-16

    In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.

  16. Splice-site mutations identified in PDE6A responsible for retinitis pigmentosa in consanguineous Pakistani families

    PubMed Central

    Khan, Shahid Y.; Ali, Shahbaz; Naeem, Muhammad Asif; Khan, Shaheen N.; Husnain, Tayyab; Butt, Nadeem H.; Qazi, Zaheeruddin A.; Akram, Javed; Riazuddin, Sheikh; Ayyagari, Radha; Hejtmancik, J. Fielding

    2015-01-01

    Purpose This study was conducted to localize and identify causal mutations associated with autosomal recessive retinitis pigmentosa (RP) in consanguineous familial cases of Pakistani origin. Methods Ophthalmic examinations that included funduscopy and electroretinography (ERG) were performed to confirm the affectation status. Blood samples were collected from all participating individuals, and genomic DNA was extracted. A genome-wide scan was performed, and two-point logarithm of odds (LOD) scores were calculated. Sanger sequencing was performed to identify the causative variants. Subsequently, we performed whole exome sequencing to rule out the possibility of a second causal variant within the linkage interval. Sequence conservation was performed with alignment analyses of PDE6A orthologs, and in silico splicing analysis was completed with Human Splicing Finder version 2.4.1. Results A large multigenerational consanguineous family diagnosed with early-onset RP was ascertained. An ophthalmic clinical examination consisting of fundus photography and electroretinography confirmed the diagnosis of RP. A genome-wide scan was performed, and suggestive two-point LOD scores were observed with markers on chromosome 5q. Haplotype analyses identified the region; however, the region did not segregate with the disease phenotype in the family. Subsequently, we performed a second genome-wide scan that excluded the entire genome except the chromosome 5q region harboring PDE6A. Next-generation whole exome sequencing identified a splice acceptor site mutation in intron 16: c.2028–1G>A, which was completely conserved in PDE6A orthologs and was absent in ethnically matched 350 control chromosomes, the 1000 Genomes database, and the NHLBI Exome Sequencing Project. Subsequently, we investigated our entire cohort of RP familial cases and identified a second family who harbored a splice acceptor site mutation in intron 10: c.1408–2A>G. In silico analysis suggested that these mutations will result in the elimination of wild-type splice acceptor sites that would result in either skipping of the respective exon or the creation of a new cryptic splice acceptor site; both possibilities would result in retinal photoreceptor cells that lack PDE6A wild-type protein. Conclusions we report two splice acceptor site variations in PDE6A in consanguineous Pakistani families who manifested cardinal symptoms of RP. Taken together with our previously published work, our data suggest that mutations in PDE6A account for about 2% of the total genetic load of RP in our cohort and possibly in the Pakistani population as well. PMID:26321862

  17. Improving tRNAscan-SE Annotation Results via Ensemble Classifiers.

    PubMed

    Zou, Quan; Guo, Jiasheng; Ju, Ying; Wu, Meihong; Zeng, Xiangxiang; Hong, Zhiling

    2015-11-01

    tRNAScan-SE is a tRNA detection program that is widely used for tRNA annotation; however, the false positive rate of tRNAScan-SE is unacceptable for large sequences. Here, we used a machine learning method to try to improve the tRNAScan-SE results. A new predictor, tRNA-Predict, was designed. We obtained real and pseudo-tRNA sequences as training data sets using tRNAScan-SE and constructed three different tRNA feature sets. We then set up an ensemble classifier, LibMutil, to predict tRNAs from the training data. The positive data set of 623 tRNA sequences was obtained from tRNAdb 2009 and the negative data set was the false positive tRNAs predicted by tRNAscan-SE. Our in silico experiments revealed a prediction accuracy rate of 95.1 % for tRNA-Predict using 10-fold cross-validation. tRNA-Predict was developed to distinguish functional tRNAs from pseudo-tRNAs rather than to predict tRNAs from a genome-wide scan. However, tRNA-Predict can work with the output of tRNAscan-SE, which is a genome-wide scanning method, to improve the tRNAscan-SE annotation results. The tRNA-Predict web server is accessible at http://datamining.xmu.edu.cn/∼gjs/tRNA-Predict. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Genomic signatures of positive selection in humans and the limits of outlier approaches.

    PubMed

    Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M

    2006-08-01

    Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.

  19. In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence.

    PubMed

    Laing, Chad R; Buchanan, Cody; Taboada, Eduardo N; Zhang, Yongxiang; Karmali, Mohamed A; Thomas, James E; Gannon, Victor Pj

    2009-06-29

    Many approaches have been used to study the evolution, population structure and genetic diversity of Escherichia coli O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of E. coli O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH). Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP) typing. In this study an in silico comparison of six different genotyping approaches was performed on 19 E. coli genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the E. coli O157:H7 population, and to compare genotyping methods for O157:H7 strains. In silico determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to E. coli K12 and E. coli O55:H7, O145:NM and sorbitol-fermenting O157 strains. The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping methods should provide data that can be stored centrally and accessed locally in an easily transferable, informative and extensible format based on comparative genomic analyses.

  20. pKWmEB: integration of Kruskal-Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study.

    PubMed

    Ren, Wen-Long; Wen, Yang-Jun; Dunwell, Jim M; Zhang, Yuan-Ming

    2018-03-01

    Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal-Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal-Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.

  1. Genome scan for nonadditive heterotic trait loci reveals mainly underdominant effects in Saccharomyces cerevisiae.

    PubMed

    Laiba, Efrat; Glikaite, Ilana; Levy, Yael; Pasternak, Zohar; Fridman, Eyal

    2016-04-01

    The overdominant model of heterosis explains the superior phenotype of hybrids by synergistic allelic interaction within heterozygous loci. To map such genetic variation in yeast, we used a population doubling time dataset of Saccharomyces cerevisiae 16 × 16 diallel and searched for major contributing heterotic trait loci (HTL). Heterosis was observed for the majority of hybrids, as they surpassed their best parent growth rate. However, most of the local heterozygous loci identified by genome scan were surprisingly underdominant, i.e., reduced growth. We speculated that in these loci adverse effects on growth resulted from incompatible allelic interactions. To test this assumption, we eliminated these allelic interactions by creating hybrids with local hemizygosity for the underdominant HTLs, as well as for control random loci. Growth of hybrids was indeed elevated for most hemizygous to HTL genes but not for control genes, hence validating the results of our genome scan. Assessing the consequences of local heterozygosity by reciprocal hemizygosity and allele replacement assays revealed the influence of genetic background on the underdominant effects of HTLs. Overall, this genome-wide study on a multi-parental hybrid population provides a strong argument against single gene overdominance as a major contributor to heterosis, and favors the dominance complementation model.

  2. A genome-wide association scan for acute insulin response to glucose in Hispanic Americans: The IRAS Family Study

    PubMed Central

    Rich, S. S.; Goodarzi, M. O.; Palmer, N. D.; Langefeld, C. D.; Ziegler, J.; Haffner, S. M.; Bryer-Ash, M.; Norris, J. M.; Taylor, K. D.; Haritunians, T.; Rotter, J. I.; Chen, Y-D. I.; Wagenknecht, L. E.; Bowden, D. W.; Bergman, R. N.

    2009-01-01

    Aims/Hypothesis The goal of this study was to identify genes and regions in the human genome that are associated with the acute insulin response to glucose (AIRg), an important predictor of type 2 diabetes, in Hispanic-American participants from the Insulin Resistance Atherosclerosis Family Study (IRAS FS). Methods A two-stage genome-wide association scan (GWAS) was performed in IRAS FS Hispanic-American samples. In the first stage, 318K single nucleotide polymorphisms (SNPs) were assessed in 229 Hispanic-American DNA samples (from 34 families) from San Antonio, TX. SNPs with the most significant associations with AIRg were genotyped in the entire set of IRAS FS Hispanic-American samples (n = 1190). In chromosomal regions with evidence of association, additional SNPs were genotyped to capture variation in genes. Results No individual SNP achieved genome-wide levels of significance (P < 5 × 10-7); however, two regions — chromosomes 6p21 and 20p11 — had multiple highly-ranked SNPs that were associated with AIRg. Additional genotyping in these regions supported the initial evidence for variants contributing to variation in AIRg. One region resides in a gene desert between PXT1 and KCTD20 on 6p21 while the region on 20p11 has several viable candidate genes (ENTPD6, PYGB, GINS1 and R4-691N24.1). Conclusions/Interpretation A GWAS in Hispanic-American samples identified several candidate genes and loci that may be associated with AIRg. These associations explain a small component of variation in AIRg. The genes identified are involved in phosphorylation and ion transport and provide preliminary evidence that these processes have importance in beta cell response. PMID:19430760

  3. Detecting genotypic changes associated with selective mortality at sea in Atlantic salmon: polygenic multilocus analysis surpasses genome scan.

    PubMed

    Bourret, Vincent; Dionne, Mélanie; Bernatchez, Louis

    2014-09-01

    Wild populations of Atlantic salmon have declined worldwide. While the causes for this decline may be complex and numerous, increased mortality at sea is predicted to be one of the major contributing factors. Examining the potential changes occurring in the genome-wide composition of populations during this migration has the potential to tease apart some of the factors influencing marine mortality. Here, we genotyped 5568 SNPs in Atlantic salmon populations representing two distinct regional genetic groups and across two cohorts to test for differential allelic and genotypic frequencies between juveniles (smolts) migrating to sea and adults (grilses) returning to freshwater after 1 year at sea. Given the complexity of the traits potentially associated with sea mortality, we contrasted the outcomes of a single-locus F(ST) based genome scan method with a new multilocus framework to test for genetically based differential mortality at sea. While numerous outliers were identified by the single-locus analysis, no evidence for parallel, temporally repeated selection was found. In contrast, the multilocus approach detected repeated patterns of selection for a multilocus group of 34 covarying SNPs in one of the two populations. No significant pattern of selective mortality was detected in the other population, suggesting different causes of mortality among populations. These results first support the hypothesis that selection mainly causes small changes in allele frequencies among many covarying loci rather than a small number of changes in loci with large effects. They also point out that moving away from the a strict 'selective sweep paradigm' towards a multilocus genetics framework may be a more useful approach for studying the genomic signatures of natural selection on complex traits in wild populations. © 2014 John Wiley & Sons Ltd.

  4. Vitamin D receptor variants in 192 patients with schizophrenia and other psychiatric diseases.

    PubMed

    Yan, Jin; Feng, Jinong; Craddock, Nick; Jones, Ian R; Cook, Edwin H; Goldman, David; Heston, Leonard L; Chen, Jiesheng; Burkhart, Patricia; Li, Wenyan; Shibayama, Akane; Sommer, Steve S

    Intriguing parallels have been noted previously between the biology of Vitamin D and the epidemiology of schizophrenia. We have scanned the Vitamin D receptor (VDR) gene by DOVAM-S (Detection of Virtually All Mutations-SSCP), a robotically enhanced multiplexed scanning method. In total, 100 patients with schizophrenia (86 Caucasians and 14 African-Americans) were scanned. In addition, pilot experiments were performed in patients with bipolar disorder (BPD) (24), autism (24), attention deficit hyperactivity disorder (ADHD) (24), and alcoholism (20). A total of 762 kb of the VDR genomic sequence was scanned. R208N and V339I were each found in one African-American patient, while absent in 35 African-American controls without schizophrenia (2/14 versus 0/35, P=0.08). Within the power of the study (> or =1.6-fold relative risk), the common M1T variant is not associated with schizophrenia. In the 92 scanned patients with other psychiatric diseases, R173S was found in a single patient with bipolar disorder. In conclusion, we describe three novel structural variants of the Vitamin D receptor. Further study is required to clarify their role, if any, in psychiatric disease.

  5. Pathway-based analysis of GWAs data identifies association of sex determination genes with susceptibility to testicular germ cell tumors.

    PubMed

    Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A

    2014-11-15

    Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. Genome-wide significant loci for addiction and anxiety.

    PubMed

    Hodgson, K; Almasy, L; Knowles, E E M; Kent, J W; Curran, J E; Dyer, T D; Göring, H H H; Olvera, R L; Fox, P T; Pearlson, G D; Krystal, J H; Duggirala, R; Blangero, J; Glahn, D C

    2016-08-01

    Psychiatric comorbidity is common among individuals with addictive disorders, with patients frequently suffering from anxiety disorders. While the genetic architecture of comorbid addictive and anxiety disorders remains unclear, elucidating the genes involved could provide important insights into the underlying etiology. Here we examine a sample of 1284 Mexican-Americans from randomly selected extended pedigrees. Variance decomposition methods were used to examine the role of genetics in addiction phenotypes (lifetime history of alcohol dependence, drug dependence or chronic smoking) and various forms of clinically relevant anxiety. Genome-wide univariate and bivariate linkage scans were conducted to localize the chromosomal regions influencing these traits. Addiction phenotypes and anxiety were shown to be heritable and univariate genome-wide linkage scans revealed significant quantitative trait loci for drug dependence (14q13.2-q21.2, LOD=3.322) and a broad anxiety phenotype (12q24.32-q24.33, LOD=2.918). Significant positive genetic correlations were observed between anxiety and each of the addiction subtypes (ρg=0.550-0.655) and further investigation with bivariate linkage analyses identified significant pleiotropic signals for alcohol dependence-anxiety (9q33.1-q33.2, LOD=3.054) and drug dependence-anxiety (18p11.23-p11.22, LOD=3.425). This study confirms the shared genetic underpinnings of addiction and anxiety and identifies genomic loci involved in the etiology of these comorbid disorders. The linkage signal for anxiety on 12q24 spans the location of TMEM132D, an emerging gene of interest from previous GWAS of anxiety traits, whilst the bivariate linkage signal identified for anxiety-alcohol on 9q33 peak coincides with a region where rare CNVs have been associated with psychiatric disorders. Other signals identified implicate novel regions of the genome in addiction genetics. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  7. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures

    PubMed Central

    Holmes, Avram J.; Hollinshead, Marisa O.; O’Keefe, Timothy M.; Petrov, Victor I.; Fariello, Gabriele R.; Wald, Lawrence L.; Fischl, Bruce; Rosen, Bruce R.; Mair, Ross W.; Roffman, Joshua L.; Smoller, Jordan W.; Buckner, Randy L.

    2015-01-01

    The goal of the Brain Genomics Superstruct Project (GSP) is to enable large-scale exploration of the links between brain function, behavior, and ultimately genetic variation. To provide the broader scientific community data to probe these associations, a repository of structural and functional magnetic resonance imaging (MRI) scans linked to genetic information was constructed from a sample of healthy individuals. The initial release, detailed in the present manuscript, encompasses quality screened cross-sectional data from 1,570 participants ages 18 to 35 years who were scanned with MRI and completed demographic and health questionnaires. Personality and cognitive measures were obtained on a subset of participants. Each dataset contains a T1-weighted structural MRI scan and either one (n=1,570) or two (n=1,139) resting state functional MRI scans. Test-retest reliability datasets are included from 69 participants scanned within six months of their initial visit. For the majority of participants self-report behavioral and cognitive measures are included (n=926 and n=892 respectively). Analyses of data quality, structure, function, personality, and cognition are presented to demonstrate the dataset’s utility. PMID:26175908

  8. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures.

    PubMed

    Holmes, Avram J; Hollinshead, Marisa O; O'Keefe, Timothy M; Petrov, Victor I; Fariello, Gabriele R; Wald, Lawrence L; Fischl, Bruce; Rosen, Bruce R; Mair, Ross W; Roffman, Joshua L; Smoller, Jordan W; Buckner, Randy L

    2015-01-01

    The goal of the Brain Genomics Superstruct Project (GSP) is to enable large-scale exploration of the links between brain function, behavior, and ultimately genetic variation. To provide the broader scientific community data to probe these associations, a repository of structural and functional magnetic resonance imaging (MRI) scans linked to genetic information was constructed from a sample of healthy individuals. The initial release, detailed in the present manuscript, encompasses quality screened cross-sectional data from 1,570 participants ages 18 to 35 years who were scanned with MRI and completed demographic and health questionnaires. Personality and cognitive measures were obtained on a subset of participants. Each dataset contains a T1-weighted structural MRI scan and either one (n=1,570) or two (n=1,139) resting state functional MRI scans. Test-retest reliability datasets are included from 69 participants scanned within six months of their initial visit. For the majority of participants self-report behavioral and cognitive measures are included (n=926 and n=892 respectively). Analyses of data quality, structure, function, personality, and cognition are presented to demonstrate the dataset's utility.

  9. Detecting Signatures of Positive Selection along Defined Branches of a Population Tree Using LSD.

    PubMed

    Librado, Pablo; Orlando, Ludovic

    2018-06-01

    Identifying the genomic basis underlying local adaptation is paramount to evolutionary biology, and bears many applications in the fields of conservation biology, crop, and animal breeding, as well as personalized medicine. Although many approaches have been developed to detect signatures of positive selection within single populations and population pairs, the increasing wealth of high-throughput sequencing data requires improved methods capable of handling multiple, and ideally large number of, populations in a single analysis. In this study, we introduce LSD (levels of exclusively shared differences), a fast and flexible framework to perform genome-wide selection scans, along the internal and external branches of a given population tree. We use forward simulations to demonstrate that LSD can identify branches targeted by positive selection with remarkable sensitivity and specificity. We illustrate a range of potential applications by analyzing data from the 1000 Genomes Project and uncover a list of adaptive candidates accompanying the expansion of anatomically modern humans out of Africa and their spread to Europe.

  10. A genome-wide scan for selection signatures in Nelore cattle

    USDA-ARS?s Scientific Manuscript database

    Brazilian Nelore cattle have been selected for growth traits over more than four decades. In recent years, reproductive and meat quality traits have become more important because of increasing consumption, exports and consumer demand. The identification of genomic regions altered by artificial selec...

  11. A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features

    PubMed Central

    Adhikari, Kaustubh; Fontanil, Tania; Cal, Santiago; Mendoza-Revilla, Javier; Fuentes-Guajardo, Macarena; Chacón-Duque, Juan-Camilo; Al-Saadi, Farah; Johansson, Jeanette A.; Quinto-Sanchez, Mirsha; Acuña-Alonzo, Victor; Jaramillo, Claudia; Arias, William; Barquera Lozano, Rodrigo; Macín Pérez, Gastón; Gómez-Valdés, Jorge; Villamil-Ramírez, Hugo; Hunemeier, Tábita; Ramallo, Virginia; Silva de Cerqueira, Caio C.; Hurtado, Malena; Villegas, Valeria; Granja, Vanessa; Gallo, Carla; Poletti, Giovanni; Schuler-Faccini, Lavinia; Salzano, Francisco M.; Bortolini, Maria-Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Bedoya, Gabriel; Gonzalez-José, Rolando; Headon, Denis; López-Otín, Carlos; Tobin, Desmond J.; Balding, David; Ruiz-Linares, Andrés

    2016-01-01

    We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 × 10−8 to 3 × 10−119), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair. PMID:26926045

  12. In Silico Genome Mismatch Scanning to Map Breast Cancer Genes in Extended Pedigrees

    DTIC Science & Technology

    2008-07-01

    University College London Annals of Human Genetics (2008) 72,279–287 279 A. Thomas et al. Methods IBD sharing in pedigrees There is considerable literature...is sufficient to maintain interest in the region. 282 Annals of Human Genetics (2008) 72,279–287 C© 2007 The Authors Journal compilation C© 2007...for observed IBS instead of IBD, and for sporadic cases reducing the number of meioses, pedigrees with meiosis count d in the 25 to 30 range are

  13. Genome-wide detection of selection signatures in Chinese indigenous Laiwu pigs revealed candidate genes regulating fat deposition in muscle.

    PubMed

    Chen, Minhui; Wang, Jiying; Wang, Yanping; Wu, Ying; Fu, Jinluan; Liu, Jian-Feng

    2018-05-18

    Currently, genome-wide scans for positive selection signatures in commercial breed have been investigated. However, few studies have focused on selection footprints of indigenous breeds. Laiwu pig is an invaluable Chinese indigenous pig breed with extremely high proportion of intramuscular fat (IMF), and an excellent model to detect footprint as the result of natural and artificial selection for fat deposition in muscle. In this study, based on GeneSeek Genomic profiler Porcine HD data, three complementary methods, F ST , iHS (integrated haplotype homozygosity score) and CLR (composite likelihood ratio), were implemented to detect selection signatures in the whole genome of Laiwu pigs. Totally, 175 candidate selected regions were obtained by at least two of the three methods, which covered 43.75 Mb genomic regions and corresponded to 1.79% of the genome sequence. Gene annotation of the selected regions revealed a list of functionally important genes for feed intake and fat deposition, reproduction, and immune response. Especially, in accordance to the phenotypic features of Laiwu pigs, among the candidate genes, we identified several genes, NPY1R, NPY5R, PIK3R1 and JAKMIP1, involved in the actions of two sets of neurons, which are central regulators in maintaining the balance between food intake and energy expenditure. Our results identified a number of regions showing signatures of selection, as well as a list of functionally candidate genes with potential effect on phenotypic traits, especially fat deposition in muscle. Our findings provide insights into the mechanisms of artificial selection of fat deposition and further facilitate follow-up functional studies.

  14. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less

  15. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity.

    PubMed

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A; Awosika, Joy; Briska, Adam; Ptashkin, Ryan N; Wagner, Trevor; Rajanna, Chythanya; Tsang, Hsinyi; Johnson, Shannon L; Mokashi, Vishwesh P; Chain, Patrick S G; Sozhamannan, Shanmuga

    2015-01-01

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, ordered restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.

  16. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity

    DOE PAGES

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.; ...

    2015-03-20

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less

  17. PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix.

    PubMed

    Ambrosini, Giovanna; Groux, Romain; Bucher, Philipp

    2018-03-05

    Transcription factors (TFs) regulate gene expression by binding to specific short DNA sequences of 5 to 20-bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or TF binding site model from a public database. The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively. giovanna.ambrosini@epfl.ch. SUPPLEMENTARY DATA ARE AVAILABLE AT BIOINFORMATICS ONLINE.

  18. Genomewide Linkage Scan for Diabetic Renal Failure and Albuminuria: The FIND Study

    PubMed Central

    Igo, Robert P.; Iyengar, Sudha K.; Nicholas, Susanne B.; Goddard, Katrina A.B.; Langefeld, Carl D.; Hanson, Robert L.; Duggirala, Ravindranath; Divers, Jasmin; Abboud, Hanna; Adler, Sharon G.; Arar, Nedal H.; Horvath, Amanda; Elston, Robert C.; Bowden, Donald W.; Guo, Xiuqing; Ipp, Eli; Kao, W.H. Linda; Kimmel, Paul L.; Knowler, William C.; Meoni, Lucy A.; Molineros, Julio; Nelson, Robert G.; Pahl, Madeline V.; Parekh, Rulan S.; Rasooly, Rebekah S.; Schelling, Jeffrey R.; Shah, Vallabh O.; Smith, Michael W.; Winkler, Cheryl A.; Zager, Philip G.; Sedor, John R.; Freedman, Barry I.

    2011-01-01

    Background Diabetic nephropathy (DN) is a leading cause of mortality and morbidity in patients with type 1 and type 2 diabetes. The multicenter FIND consortium aims to identify genes for DN and its associated quantitative traits, e.g. the urine albumin:creatinine ratio (ACR). Herein, the results of whole-genome linkage analysis and a sparse association scan for ACR and a dichotomous DN phenotype are reported in diabetic individuals. Methods A genomewide scan comprising more than 5,500 autosomal single nucleotide polymorphism markers (average spacing of 0.6 cM) was performed on 1,235 nuclear and extended pedigrees (3,972 diabetic participants) ascertained for DN from African-American (AA), American-Indian (AI), European-American (EA) and Mexican-American (MA) populations. Results Strong evidence for linkage to DN was detected on chromosome 6p (p = 8.0 × 10−5, LOD = 3.09) in EA families as well as suggestive evidence for linkage to chromosome 7p in AI families. Regions on chromosomes 3p in AA, 7q in EA, 16q in AA and 22q in MA displayed suggestive evidence of linkage for urine ACR. The linkage peak on chromosome 22q overlaps the MYH9/APOL1 gene region, previously implicated in AA diabetic and nondiabetic nephropathies. Conclusion These results strengthen the evidence for previously identified genomic regions and implicate several novel loci potentially involved in the pathogenesis of DN. PMID:21454968

  19. Mutation scanning analysis of genetic variation within and among Echinococcus species: implications and future prospects.

    PubMed

    Jabbar, Abdul; Gasser, Robin B

    2013-07-01

    Adult tapeworms of the genus Echinococcus (family Taeniidae) occur in the small intestines of carnivorous definitive hosts and are transmitted to particular intermediate mammalian hosts, in which they develop as fluid-filled larvae (cysts) in internal organs (usually lung and liver), causing the disease echinococcosis. Echinococcus species are of major medical importance and also cause losses to the meat and livestock industries, mainly due to the condemnation of infected offal. Decisions regarding the treatment and control of echinococcosis rely on the accurate identification of species and population variants (strains). Conventional, phenetic methods for specific identification have some significant limitations. Despite advances in the development of molecular tools, there has been limited application of mutation scanning methods to species of Echinococcus. Here, we briefly review key genetic markers used for the identification of Echinococcus species and techniques for the analysis of genetic variation within and among populations, and the diagnosis of echinococcosis. We also discuss the benefits of utilizing mutation scanning approaches to elucidate the population genetics and epidemiology of Echinococcus species. These benefits are likely to become more evident following the complete characterization of the genomes of E. granulosus and E. multilocularis.

  20. InterProScan 5: genome-scale protein function classification

    PubMed Central

    Jones, Philip; Binns, David; Chang, Hsin-Yu; Fraser, Matthew; Li, Weizhong; McAnulla, Craig; McWilliam, Hamish; Maslen, John; Mitchell, Alex; Nuka, Gift; Pesseat, Sebastien; Quinn, Antony F.; Sangrador-Vegas, Amaia; Scheremetjew, Maxim; Yong, Siew-Yit; Lopez, Rodrigo; Hunter, Sarah

    2014-01-01

    Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code. Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk or mitchell@ebi.ac.uk PMID:24451626

  1. Genome-Wide Variation Patterns Uncover the Origin and Selection in Cultivated Ginseng (Panax ginseng Meyer)

    PubMed Central

    Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili

    2017-01-01

    Abstract Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. PMID:28922794

  2. Genome evolution in Reptilia, the sister group of mammals.

    PubMed

    Janes, Daniel E; Organ, Christopher L; Fujita, Matthew K; Shedlock, Andrew M; Edwards, Scott V

    2010-01-01

    The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.

  3. A genomic scan for selection reveals candidates for genes involved in the evolution of cultivated sunflower (Helianthus annuus).

    PubMed

    Chapman, Mark A; Pashley, Catherine H; Wenzler, Jessica; Hvala, John; Tang, Shunxue; Knapp, Steven J; Burke, John M

    2008-11-01

    Genomic scans for selection are a useful tool for identifying genes underlying phenotypic transitions. In this article, we describe the results of a genome scan designed to identify candidates for genes targeted by selection during the evolution of cultivated sunflower. This work involved screening 492 loci derived from ESTs on a large panel of wild, primitive (i.e., landrace), and improved sunflower (Helianthus annuus) lines. This sampling strategy allowed us to identify candidates for selectively important genes and investigate the likely timing of selection. Thirty-six genes showed evidence of selection during either domestication or improvement based on multiple criteria, and a sequence-based test of selection on a subset of these loci confirmed this result. In view of what is known about the structure of linkage disequilibrium across the sunflower genome, these genes are themselves likely to have been targeted by selection, rather than being merely linked to the actual targets. While the selection candidates showed a broad range of putative functions, they were enriched for genes involved in amino acid synthesis and protein catabolism. Given that a similar pattern has been detected in maize (Zea mays), this finding suggests that selection on amino acid composition may be a general feature of the evolution of crop plants. In terms of genomic locations, the selection candidates were significantly clustered near quantitative trait loci (QTL) that contribute to phenotypic differences between wild and cultivated sunflower, and specific instances of QTL colocalization provide some clues as to the roles that these genes may have played during sunflower evolution.

  4. fastBMA: scalable network inference and transitive reduction.

    PubMed

    Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee

    2017-10-01

    Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/). © The Authors 2017. Published by Oxford University Press.

  5. Genome-wide scans reveal variants at EDAR predominantly affecting hair straightness in Han Chinese and Uyghur populations.

    PubMed

    Wu, Sijie; Tan, Jingze; Yang, Yajun; Peng, Qianqian; Zhang, Manfei; Li, Jinxi; Lu, Dongsheng; Liu, Yu; Lou, Haiyi; Feng, Qidi; Lu, Yan; Guan, Yaqun; Zhang, Zhaoxia; Jiao, Yi; Sabeti, Pardis; Krutmann, Jean; Tang, Kun; Jin, Li; Xu, Shuhua; Wang, Sijia

    2016-11-01

    Hair straightness/curliness is one of the most conspicuous features of human variation and is particularly diverse among populations. A recent genome-wide scan found common variants in the Trichohyalin (TCHH) gene that are associated with hair straightness in Europeans, but different genes might affect this phenotype in other populations. By sampling 2899 Han Chinese, we performed the first genome-wide scan of hair straightness in East Asians, and found EDAR (rs3827760) as the predominant gene (P = 4.67 × 10 -16 ), accounting for 3.66 % of the total variance. The candidate gene approach did not find further significant associations, suggesting that hair straightness may be affected by a large number of genes with subtle effects. Notably, genetic variants associated with hair straightness in Europeans are generally low in frequency in Han Chinese, and vice versa. To evaluate the relative contribution of these variants, we performed a second genome-wide scan in 709 samples from the Uyghur, an admixed population with both eastern and western Eurasian ancestries. In Uyghurs, both EDAR (rs3827760: P = 1.92 × 10 -12 ) and TCHH (rs11803731: P = 1.46 × 10 -3 ) are associated with hair straightness, but EDAR (OR 0.415) has a greater effect than TCHH (OR 0.575). We found no significant interaction between EDAR and TCHH (P = 0.645), suggesting that these two genes affect hair straightness through different mechanisms. Furthermore, haplotype analysis indicates that TCHH is not subject to selection. While EDAR is under strong selection in East Asia, it does not appear to be subject to selection after the admixture in Uyghurs. These suggest that hair straightness is unlikely a trait under selection.

  6. High-resolution genome-wide scan of genes, gene-networks and cellular systems impacting the yeast ionome

    USDA-ARS?s Scientific Manuscript database

    To balance the demand for uptake of essential elements with their potential toxicity living cells have complex regulatory mechanisms. Here, we describe a genome-wide screen to identify genes that impact the elemental composition (‘ionome’) of yeast Saccharomyces cerevisiae. Using inductively coupled...

  7. Scanning genomic areas under selection sweep and association mapping as tools to identify horticultural important genes in watermelon

    USDA-ARS?s Scientific Manuscript database

    Watermelon (Citrullus lanatus var. lanatus) contains 88% water, sugars, and several important health-related compounds, including lycopene, citrulline, arginine, and glutathione. The current genetic diversity study uses microsatellites with known map positions to identify genomic regions that under...

  8. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan

    USDA-ARS?s Scientific Manuscript database

    Misassembly signatures, created by shuffling the order of sequences while assembling a genome, can be easily seen by analyzing the unexpected behaviour of the linkage disequilibrium (LD) decay. A heuristic process was proposed to identify those misassembly signatures and presented the ones found in ...

  9. The complete genome sequence of a second distinct betabaculovirus from the true armyworm, Mythimna unipuncta

    USDA-ARS?s Scientific Manuscript database

    The betabaculovirus Pseudaletia (Mythimna) sp. granulovirus #8 (MyspGV#8) was examined by electron microscopy, host barcoding PCR, and determination of the nucleotide sequence of its genome. Scanning and transmission electron microscopy revealed that the occlusion bodies of MyspGV#8 possessed the c...

  10. Micro computed tomography (CT) scanned anatomical gateway to insect pest bioinformatics

    USDA-ARS?s Scientific Manuscript database

    An international collaboration to establish an interactive Digital Video Library for a Systems Biology Approach to study the Asian citrus Psyllid and psyllid genomics/proteomics interactions is demonstrated. Advances in micro-CT, digital computed tomography (CT) scan uses X-rays to make detailed pic...

  11. Genomic futures of prenatal screening: ethical reflection.

    PubMed

    Dondorp, W J; Page-Christiaens, G C M L; de Wert, G M W R

    2016-05-01

    The practice of prenatal screening is undergoing important changes as a result of the introduction of genomic testing technologies at different stages of the screening trajectory. It is expected that eventually it will become possible to routinely obtain a comprehensive 'genome scan' of all fetuses. Although this will still take several years, there are clear continuities between present developments and this future scenario. As this review shows, behind the still limited scope of screening for common aneuploidies, a rapid widening of the range of conditions tested for is already taking shape at the invasive testing stage. But the continuities are not just technical; they are also ethical. If screening for Down's syndrome is a matter of providing autonomous reproductive choice, then why would providing the choice to have a full fetal genome scan be something entirely different? There is a clear need for a sustainable normative framework that will have to answer three challenges: the indeterminateness of the autonomy paradigm, the need to acknowledge the future child as an interested stakeholder, and the prospect of broad-scope genomic prenatal screening with a double purpose: autonomy and prevention. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  12. Recent Coselection in Human Populations Revealed by Protein–Protein Interaction Network

    PubMed Central

    Qian, Wei; Zhou, Hang; Tang, Kun

    2015-01-01

    Genome-wide scans for signals of natural selection in human populations have identified a large number of candidate loci that underlie local adaptations. This is surprising given the relatively short evolutionary time since the divergence of the human population. One hypothesis that has not been formally examined is whether and how the recent human evolution may have been shaped by coselection in the context of complex molecular interactome. In this study, genome-wide signals of selection were scanned in East Asians, Europeans, and Africans using 1000 Genome data, and subsequently mapped onto the protein–protein interaction (PPI) network. We found that the candidate genes of recent positive selection localized significantly closer to each other on the PPI network than expected, revealing substantial clustering of selected genes. Furthermore, gene pairs of shorter PPI network distances showed higher similarities of their recent evolutionary paths than those further apart. Last, subnetworks enriched with recent coselection signals were identified, which are substantially overrepresented in biological pathways related to signal transduction, neurogenesis, and immune function. These results provide the first genome-wide evidence for association of recent selection signals with the PPI network, shedding light on the potential mechanisms of recent coselection in the human genome. PMID:25532814

  13. Large-scale mapping of mutations affecting zebrafish development.

    PubMed

    Geisler, Robert; Rauch, Gerd-Jörg; Geiger-Rudolph, Silke; Albrecht, Andrea; van Bebber, Frauke; Berger, Andrea; Busch-Nentwich, Elisabeth; Dahm, Ralf; Dekens, Marcus P S; Dooley, Christopher; Elli, Alexandra F; Gehring, Ines; Geiger, Horst; Geisler, Maria; Glaser, Stefanie; Holley, Scott; Huber, Matthias; Kerr, Andy; Kirn, Anette; Knirsch, Martina; Konantz, Martina; Küchler, Axel M; Maderspacher, Florian; Neuhauss, Stephan C; Nicolson, Teresa; Ober, Elke A; Praeg, Elke; Ray, Russell; Rentzsch, Brit; Rick, Jens M; Rief, Eva; Schauerte, Heike E; Schepp, Carsten P; Schönberger, Ulrike; Schonthaler, Helia B; Seiler, Christoph; Sidi, Samuel; Söllner, Christian; Wehner, Anja; Weiler, Christian; Nüsslein-Volhard, Christiane

    2007-01-09

    Large-scale mutagenesis screens in the zebrafish employing the mutagen ENU have isolated several hundred mutant loci that represent putative developmental control genes. In order to realize the potential of such screens, systematic genetic mapping of the mutations is necessary. Here we report on a large-scale effort to map the mutations generated in mutagenesis screening at the Max Planck Institute for Developmental Biology by genome scanning with microsatellite markers. We have selected a set of microsatellite markers and developed methods and scoring criteria suitable for efficient, high-throughput genome scanning. We have used these methods to successfully obtain a rough map position for 319 mutant loci from the Tübingen I mutagenesis screen and subsequent screening of the mutant collection. For 277 of these the corresponding gene is not yet identified. Mapping was successful for 80 % of the tested loci. By comparing 21 mutation and gene positions of cloned mutations we have validated the correctness of our linkage group assignments and estimated the standard error of our map positions to be approximately 6 cM. By obtaining rough map positions for over 300 zebrafish loci with developmental phenotypes, we have generated a dataset that will be useful not only for cloning of the affected genes, but also to suggest allelism of mutations with similar phenotypes that will be identified in future screens. Furthermore this work validates the usefulness of our methodology for rapid, systematic and inexpensive microsatellite mapping of zebrafish mutations.

  14. A novel fibrinogen variant--Liberec: dysfibrinogenaemia associated with gamma Tyr262Cys substitution.

    PubMed

    Kotlín, Roman; Sobotková, Alzbeta; Suttnar, Jirí; Salaj, Peter; Walterová, Lenka; Riedel, Tomás; Reicheltová, Zuzana; Dyr, Jan Evangelista

    2008-08-01

    A 22-yr-old woman had abnormal preoperative coagulation test results and congenital dysfibrinogenaemia was suspected. The patient from Liberec (Czech Republic) had a low fibrinogen plasma level as determined by Clauss method, normal fibrinogen level as determined by immunoturbidimetrical method, and prolonged thrombin time. To identify the genetic mutation responsible for this dysfibrinogen, genomic DNA extracted from the blood was analysed. Fibrin polymerisation measurement, kinetics of fibrinopeptide release, fibrinogen clottability measurement and scanning electron microscopy were performed. DNA sequencing showed the heterozygous fibrinogen gamma Y262C mutation. Kinetics of fibrinopeptide release was normal, however fibrin polymerisation was impaired. Fibrinogen clottability measurement showed that only about 45% molecules of fibrinogen are involved in the clot formation. Scanning electron microscopy revealed thicker fibres, which were significantly different from the normal control. A case of dysfibrinogenaemia, found by routine coagulation testing, was genetically identified as a novel fibrinogen variant (gamma Y262C) that has been named Liberec.

  15. Investigation of Inversion Polymorphisms in the Human Genome Using Principal Components Analysis

    PubMed Central

    Ma, Jianzhong; Amos, Christopher I.

    2012-01-01

    Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct “populations” of inversion homozygotes of different orientations and their 1∶1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases. PMID:22808122

  16. Gene finding in metatranscriptomic sequences.

    PubMed

    Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

    2014-01-01

    Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.

  17. A Genome-Wide Breast Cancer Scan in African Americans

    DTIC Science & Technology

    2011-06-01

    cancer in women of African ancestry. 13 References 1. Easton DF, P.K., Dunning AM, Pharoah PDP, Thompson D, Ballinger DG, et al . Genome...M, Hankinson, SE, et al . A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer...Millikan, R.C. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. Jama 295, 2492-502 ( 2006 ). 16 17. Huo, D., Ikpatt

  18. Two quantitative trait loci influence whipworm (Trichuris trichiura) infection in a Nepalese population.

    PubMed

    Williams-Blangero, Sarah; Vandeberg, John L; Subedi, Janardan; Jha, Bharat; Dyer, Tom D; Blangero, John

    2008-04-15

    Whipworm (Trichuris trichiura) infection is a soil-transmitted helminth infection that affects >1 billion people. It is a serious public health problem in many developing countries and can result in deficits in growth and cognitive development. In a follow-up study of significant heritability for whipworm infection, we conducted the first genome scan for quantitative trait loci (QTL) influencing the heritability of susceptibility to this important parasitic disease. Whipworm egg counts were determined for 1,253 members of the Jirel population of eastern Nepal. All individuals in the study sample belonged to a single pedigree including >26,000 pairs of relatives that are informative for genetic analysis. Linkage analysis of genome scan data generated for the pedigree provided unambiguous evidence for 2 QTL influencing susceptibility to whipworm infection, one located on chromosome 9 (logarithm of the odds ratio [LOD] score, 3.35; genomewide P = .0138) and the other located on chromosome 18 (LOD score, 3.29; genomewide P = .0159). There was also suggestive evidence that 2 loci located on chromosomes 12 and 13 influenced whipworm infection. The results of this first genome scan for T. trichiura egg counts provides new information on the determinants of genetic predisposition to whipworm infection.

  19. StarScan: a web server for scanning small RNA targets from degradome sequencing data.

    PubMed

    Liu, Shun; Li, Jun-Hao; Wu, Jie; Zhou, Ke-Ren; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2015-07-01

    Endogenous small non-coding RNAs (sRNAs), including microRNAs, PIWI-interacting RNAs and small interfering RNAs, play important gene regulatory roles in animals and plants by pairing to the protein-coding and non-coding transcripts. However, computationally assigning these various sRNAs to their regulatory target genes remains technically challenging. Recently, a high-throughput degradome sequencing method was applied to identify biologically relevant sRNA cleavage sites. In this study, an integrated web-based tool, StarScan (sRNA target Scan), was developed for scanning sRNA targets using degradome sequencing data from 20 species. Given a sRNA sequence from plants or animals, our web server performs an ultrafast and exhaustive search for potential sRNA-target interactions in annotated and unannotated genomic regions. The interactions between small RNAs and target transcripts were further evaluated using a novel tool, alignScore. A novel tool, degradomeBinomTest, was developed to quantify the abundance of degradome fragments located at the 9-11th nucleotide from the sRNA 5' end. This is the first web server for discovering potential sRNA-mediated RNA cleavage events in plants and animals, which affords mechanistic insights into the regulatory roles of sRNAs. The StarScan web server is available at http://mirlab.sysu.edu.cn/starscan/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Quality control and quality assurance in genotypic data for genome-wide association studies

    PubMed Central

    Laurie, Cathy C.; Doheny, Kimberly F.; Mirel, Daniel B.; Pugh, Elizabeth W.; Bierut, Laura J.; Bhangale, Tushar; Boehm, Frederick; Caporaso, Neil E.; Cornelis, Marilyn C.; Edenberg, Howard J.; Gabriel, Stacy B.; Harris, Emily L.; Hu, Frank B.; Jacobs, Kevin; Kraft, Peter; Landi, Maria Teresa; Lumley, Thomas; Manolio, Teri A.; McHugh, Caitlin; Painter, Ian; Paschall, Justin; Rice, John P.; Rice, Kenneth M.; Zheng, Xiuwen; Weir, Bruce S.

    2011-01-01

    Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies. This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium (HWE) test p-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis (PCA) to SNP selection. The methods are illustrated with examples from the ‘Gene Environment Association Studies’ (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of genome-wide association studies. PMID:20718045

  1. Genome-Wide Association Analysis to Identify Loci for Milk Yield in Gyr Breed

    USDA-ARS?s Scientific Manuscript database

    A genome scan was conducted to identify QTL affecting milk yield in a Brazilian Gyr population of progeny test bulls (N=319). Data used in this study was derived from traditional genetic evaluation records computed by the Embrapa Dairy Cattleand released in May/2009 (http://www.cnpgl.embrapa.br/nova...

  2. Genome Wide Scan for Loci influencing Warner Bratzler Shear Force in Five Bos taurus Breeds

    USDA-ARS?s Scientific Manuscript database

    Genetic tests for beef tenderness are currently limited to single nucleotide polymorphisms (SNPs) within µ-calpain (CAPN1) and calpastatin (CAST) and explain little of the phenotypic variation in Warner-Bratzler shear force (WBSF). We performed a genome-wide association study for WBSF by genotyping...

  3. The Siblings With Ischemic Stroke Study (SWISS) Protocol

    PubMed Central

    Meschia, James F; Brown, Robert D; Brott, Thomas G; Chukwudelunzu, Felix E; Hardy, John; Rich, Stephen S

    2002-01-01

    Background Family history and twins studies suggest an inherited component to ischemic stroke risk. Candidate gene association studies have been performed but have limited capacity to identify novel risk factor genes. The Siblings With Ischemic Stroke Study (SWISS) aims to conduct a genome-wide scan in sibling pairs concordant or discordant for ischemic stroke to identify novel genetic risk factors through linkage analysis. Methods Screening at multiple clinical centers identifies patients (probands) with radiographically confirmed ischemic stroke and a family history of at least 1 living full sibling with stroke. After giving informed consent, without violating privacy among other family members, the proband invites siblings concordant and discordant for stroke to participate. Siblings then contact the study coordinating center. The diagnosis of ischemic stroke in potentially concordant siblings is confirmed by systematic centralized review of medical records. The stroke-free status of potentially discordant siblings is confirmed by validated structured telephone interview. Blood samples for DNA analysis are taken from concordant sibling pairs and, if applicable, from 1 discordant sibling. Epstein-Barr virus-transformed lymphoblastoid cell lines are created, and a scan of the human genome is planned. Discussion Conducting adequately powered genomics studies of stroke in humans is challenging because of the heterogeneity of the stroke phenotype and the difficulty of obtaining DNA samples from clinically well-characterized members of a cohort of stroke pedigrees. The multicentered design of this study is intended to efficiently assemble a cohort of ischemic stroke pedigrees without invoking community consent or using cold-calling of pedigree members. PMID:11882254

  4. Genome-wide scan of healthy human connectome discovers SPON1 gene variant influencing dementia severity

    PubMed Central

    Jahanshad, Neda; Rajagopalan, Priya; Hua, Xue; Hibar, Derrek P.; Nir, Talia M.; Toga, Arthur W.; Jack, Clifford R.; Saykin, Andrew J.; Green, Robert C.; Weiner, Michael W.; Medland, Sarah E.; Montgomery, Grant W.; Hansell, Narelle K.; McMahon, Katie L.; de Zubicaray, Greig I.; Martin, Nicholas G.; Wright, Margaret J.; Thompson, Paul M.; Weiner, Michael; Aisen, Paul; Weiner, Michael; Aisen, Paul; Petersen, Ronald; Jack, Clifford R.; Jagust, William; Trojanowski, John Q.; Toga, Arthur W.; Beckett, Laurel; Green, Robert C.; Saykin, Andrew J.; Morris, John; Liu, Enchi; Green, Robert C.; Montine, Tom; Petersen, Ronald; Aisen, Paul; Gamst, Anthony; Thomas, Ronald G.; Donohue, Michael; Walter, Sarah; Gessert, Devon; Sather, Tamie; Beckett, Laurel; Harvey, Danielle; Gamst, Anthony; Donohue, Michael; Kornak, John; Jack, Clifford R.; Dale, Anders; Bernstein, Matthew; Felmlee, Joel; Fox, Nick; Thompson, Paul; Schuff, Norbert; Alexander, Gene; DeCarli, Charles; Jagust, William; Bandy, Dan; Koeppe, Robert A.; Foster, Norm; Reiman, Eric M.; Chen, Kewei; Mathis, Chet; Morris, John; Cairns, Nigel J.; Taylor-Reinwald, Lisa; Trojanowki, J.Q.; Shaw, Les; Lee, Virginia M.Y.; Korecka, Magdalena; Toga, Arthur W.; Crawford, Karen; Neu, Scott; Saykin, Andrew J.; Foroud, Tatiana M.; Potkin, Steven; Shen, Li; Khachaturian, Zaven; Frank, Richard; Snyder, Peter J.; Molchan, Susan; Kaye, Jeffrey; Quinn, Joseph; Lind, Betty; Dolen, Sara; Schneider, Lon S.; Pawluczyk, Sonia; Spann, Bryan M.; Brewer, James; Vanderswag, Helen; Heidebrink, Judith L.; Lord, Joanne L.; Petersen, Ronald; Johnson, Kris; Doody, Rachelle S.; Villanueva-Meyer, Javier; Chowdhury, Munir; Stern, Yaakov; Honig, Lawrence S.; Bell, Karen L.; Morris, John C.; Ances, Beau; Carroll, Maria; Leon, Sue; Mintun, Mark A.; Schneider, Stacy; Marson, Daniel; Griffith, Randall; Clark, David; Grossman, Hillel; Mitsis, Effie; Romirowsky, Aliza; deToledo-Morrell, Leyla; Shah, Raj C.; Duara, Ranjan; Varon, Daniel; Roberts, Peggy; Albert, Marilyn; Onyike, Chiadi; Kielb, Stephanie; Rusinek, Henry; de Leon, Mony J.; Glodzik, Lidia; De Santi, Susan; Doraiswamy, P. Murali; Petrella, Jeffrey R.; Coleman, R. Edward; Arnold, Steven E.; Karlawish, Jason H.; Wolk, David; Smith, Charles D.; Jicha, Greg; Hardy, Peter; Lopez, Oscar L.; Oakley, MaryAnn; Simpson, Donna M.; Porsteinsson, Anton P.; Goldstein, Bonnie S.; Martin, Kim; Makino, Kelly M.; Ismail, M. Saleem; Brand, Connie; Mulnard, Ruth A.; Thai, Gaby; Mc-Adams-Ortiz, Catherine; Womack, Kyle; Mathews, Dana; Quiceno, Mary; Diaz-Arrastia, Ramon; King, Richard; Weiner, Myron; Martin-Cook, Kristen; DeVous, Michael; Levey, Allan I.; Lah, James J.; Cellar, Janet S.; Burns, Jeffrey M.; Anderson, Heather S.; Swerdlow, Russell H.; Apostolova, Liana; Lu, Po H.; Bartzokis, George; Silverman, Daniel H.S.; Graff-Radford, Neill R.; Parfitt, Francine; Johnson, Heather; Farlow, Martin R.; Hake, Ann Marie; Matthews, Brandy R.; Herring, Scott; van Dyck, Christopher H.; Carson, Richard E.; MacAvoy, Martha G.; Chertkow, Howard; Bergman, Howard; Hosein, Chris; Black, Sandra; Stefanovic, Bojana; Caldwell, Curtis; Hsiung, Ging-Yuek Robin; Feldman, Howard; Mudge, Benita; Assaly, Michele; Kertesz, Andrew; Rogers, John; Trost, Dick; Bernick, Charles; Munic, Donna; Kerwin, Diana; Mesulam, Marek-Marsel; Lipowski, Kristina; Wu, Chuang-Kuo; Johnson, Nancy; Sadowsky, Carl; Martinez, Walter; Villena, Teresa; Turner, Raymond Scott; Johnson, Kathleen; Reynolds, Brigid; Sperling, Reisa A.; Johnson, Keith A.; Marshall, Gad; Frey, Meghan; Yesavage, Jerome; Taylor, Joy L.; Lane, Barton; Rosen, Allyson; Tinklenberg, Jared; Sabbagh, Marwan; Belden, Christine; Jacobson, Sandra; Kowall, Neil; Killiany, Ronald; Budson, Andrew E.; Norbash, Alexander; Johnson, Patricia Lynn; Obisesan, Thomas O.; Wolday, Saba; Bwayo, Salome K.; Lerner, Alan; Hudson, Leon; Ogrocki, Paula; Fletcher, Evan; Carmichael, Owen; Olichney, John; DeCarli, Charles; Kittur, Smita; Borrie, Michael; Lee, T.-Y.; Bartha, Rob; Johnson, Sterling; Asthana, Sanjay; Carlsson, Cynthia M.; Potkin, Steven G.; Preda, Adrian; Nguyen, Dana; Tariot, Pierre; Fleisher, Adam; Reeder, Stephanie; Bates, Vernice; Capote, Horacio; Rainka, Michelle; Scharre, Douglas W.; Kataki, Maria; Zimmerman, Earl A.; Celmins, Dzintra; Brown, Alice D.; Pearlson, Godfrey D.; Blank, Karen; Anderson, Karen; Saykin, Andrew J.; Santulli, Robert B.; Schwartz, Eben S.; Sink, Kaycee M.; Williamson, Jeff D.; Garg, Pradeep; Watkins, Franklin; Ott, Brian R.; Querfurth, Henry; Tremont, Geoffrey; Salloway, Stephen; Malloy, Paul; Correia, Stephen; Rosen, Howard J.; Miller, Bruce L.; Mintzer, Jacobo; Longmire, Crystal Flynn; Spicer, Kenneth; Finger, Elizabeth; Rachinsky, Irina; Rogers, John; Kertesz, Andrew; Drost, Dick

    2013-01-01

    Aberrant connectivity is implicated in many neurological and psychiatric disorders, including Alzheimer’s disease and schizophrenia. However, other than a few disease-associated candidate genes, we know little about the degree to which genetics play a role in the brain networks; we know even less about specific genes that influence brain connections. Twin and family-based studies can generate estimates of overall genetic influences on a trait, but genome-wide association scans (GWASs) can screen the genome for specific variants influencing the brain or risk for disease. To identify the heritability of various brain connections, we scanned healthy young adult twins with high-field, high-angular resolution diffusion MRI. We adapted GWASs to screen the brain’s connectivity pattern, allowing us to discover genetic variants that affect the human brain’s wiring. The association of connectivity with the SPON1 variant at rs2618516 on chromosome 11 (11p15.2) reached connectome-wide, genome-wide significance after stringent statistical corrections were enforced, and it was replicated in an independent subsample. rs2618516 was shown to affect brain structure in an elderly population with varying degrees of dementia. Older people who carried the connectivity variant had significantly milder clinical dementia scores and lower risk of Alzheimer’s disease. As a posthoc analysis, we conducted GWASs on several organizational and topological network measures derived from the matrices to discover variants in and around genes associated with autism (MACROD2), development (NEDD4), and mental retardation (UBE2A) significantly associated with connectivity. Connectome-wide, genome-wide screening offers substantial promise to discover genes affecting brain connectivity and risk for brain diseases. PMID:23471985

  5. Genome-wide scan of healthy human connectome discovers SPON1 gene variant influencing dementia severity.

    PubMed

    Jahanshad, Neda; Rajagopalan, Priya; Hua, Xue; Hibar, Derrek P; Nir, Talia M; Toga, Arthur W; Jack, Clifford R; Saykin, Andrew J; Green, Robert C; Weiner, Michael W; Medland, Sarah E; Montgomery, Grant W; Hansell, Narelle K; McMahon, Katie L; de Zubicaray, Greig I; Martin, Nicholas G; Wright, Margaret J; Thompson, Paul M

    2013-03-19

    Aberrant connectivity is implicated in many neurological and psychiatric disorders, including Alzheimer's disease and schizophrenia. However, other than a few disease-associated candidate genes, we know little about the degree to which genetics play a role in the brain networks; we know even less about specific genes that influence brain connections. Twin and family-based studies can generate estimates of overall genetic influences on a trait, but genome-wide association scans (GWASs) can screen the genome for specific variants influencing the brain or risk for disease. To identify the heritability of various brain connections, we scanned healthy young adult twins with high-field, high-angular resolution diffusion MRI. We adapted GWASs to screen the brain's connectivity pattern, allowing us to discover genetic variants that affect the human brain's wiring. The association of connectivity with the SPON1 variant at rs2618516 on chromosome 11 (11p15.2) reached connectome-wide, genome-wide significance after stringent statistical corrections were enforced, and it was replicated in an independent subsample. rs2618516 was shown to affect brain structure in an elderly population with varying degrees of dementia. Older people who carried the connectivity variant had significantly milder clinical dementia scores and lower risk of Alzheimer's disease. As a posthoc analysis, we conducted GWASs on several organizational and topological network measures derived from the matrices to discover variants in and around genes associated with autism (MACROD2), development (NEDD4), and mental retardation (UBE2A) significantly associated with connectivity. Connectome-wide, genome-wide screening offers substantial promise to discover genes affecting brain connectivity and risk for brain diseases.

  6. Genome-Wide Variation Patterns Uncover the Origin and Selection in Cultivated Ginseng (Panax ginseng Meyer).

    PubMed

    Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili; Liu, Bao; Li, Lin-Feng

    2017-09-01

    Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. Identification of high-efficiency 3'GG gRNA motifs in indexed FASTA files with ngg2.

    PubMed

    Roberson, Elisha D O

    CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3'GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans . Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3'GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae , Caenorhabditis elegans , Drosophila melanogaster , Danio rerio , Mus musculus , and Homo sapiens. I also scanned the genomes of pig ( Sus scrofa ) and African elephant ( Loxodonta africana ) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3'GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3'GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3'GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3'GG editing sites in any species with an available genome sequence.

  8. CERAMIC: Case-Control Association Testing in Samples with Related Individuals, Based on Retrospective Mixed Model Analysis with Adjustment for Covariates

    PubMed Central

    Zhong, Sheng; McPeek, Mary Sara

    2016-01-01

    We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan. PMID:27695091

  9. Defining the role of the MADS-box gene, Zea agamous like1, in maize domestication

    USDA-ARS?s Scientific Manuscript database

    Genomic scans for genes that show the signature of past selection have been widely applied to a number of species and have identified a large number of selection candidate genes. In cultivated maize (Zea mays ssp. mays) selection scans have identified several hundred candidate domestication genes...

  10. Mutation Scanning in Wheat by Exon Capture and Next-Generation Sequencing.

    PubMed

    King, Robert; Bird, Nicholas; Ramirez-Gonzalez, Ricardo; Coghill, Jane A; Patil, Archana; Hassani-Pak, Keywan; Uauy, Cristobal; Phillips, Andrew L

    2015-01-01

    Targeted Induced Local Lesions in Genomes (TILLING) is a reverse genetics approach to identify novel sequence variation in genomes, with the aims of investigating gene function and/or developing useful alleles for breeding. Despite recent advances in wheat genomics, most current TILLING methods are low to medium in throughput, being based on PCR amplification of the target genes. We performed a pilot-scale evaluation of TILLING in wheat by next-generation sequencing through exon capture. An oligonucleotide-based enrichment array covering ~2 Mbp of wheat coding sequence was used to carry out exon capture and sequencing on three mutagenised lines of wheat containing previously-identified mutations in the TaGA20ox1 homoeologous genes. After testing different mapping algorithms and settings, candidate SNPs were identified by mapping to the IWGSC wheat Chromosome Survey Sequences. Where sequence data for all three homoeologues were found in the reference, mutant calls were unambiguous; however, where the reference lacked one or two of the homoeologues, captured reads from these genes were mis-mapped to other homoeologues, resulting either in dilution of the variant allele frequency or assignment of mutations to the wrong homoeologue. Competitive PCR assays were used to validate the putative SNPs and estimate cut-off levels for SNP filtering. At least 464 high-confidence SNPs were detected across the three mutagenized lines, including the three known alleles in TaGA20ox1, indicating a mutation rate of ~35 SNPs per Mb, similar to that estimated by PCR-based TILLING. This demonstrates the feasibility of using exon capture for genome re-sequencing as a method of mutation detection in polyploid wheat, but accurate mutation calling will require an improved genomic reference with more comprehensive coverage of homoeologues.

  11. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds.

    PubMed

    Yang, Songbai; Li, Xiuling; Li, Kui; Fan, Bin; Tang, Zhonglin

    2014-01-15

    Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.

  12. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds

    PubMed Central

    2014-01-01

    Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs. PMID:24422716

  13. Genome-wide and fine-resolution association analysis of malaria in West Africa.

    PubMed

    Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

    2009-06-01

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

  14. Applications of CRISPR/Cas System to Bacterial Metabolic Engineering.

    PubMed

    Cho, Suhyung; Shin, Jongoh; Cho, Byung-Kwan

    2018-04-05

    The clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas) adaptive immune system has been extensively used for gene editing, including gene deletion, insertion, and replacement in bacterial and eukaryotic cells owing to its simple, rapid, and efficient activities in unprecedented resolution. Furthermore, the CRISPR interference (CRISPRi) system including deactivated Cas9 (dCas9) with inactivated endonuclease activity has been further investigated for regulation of the target gene transiently or constitutively, avoiding cell death by disruption of genome. This review discusses the applications of CRISPR/Cas for genome editing in various bacterial systems and their applications. In particular, CRISPR technology has been used for the production of metabolites of high industrial significance, including biochemical, biofuel, and pharmaceutical products/precursors in bacteria. Here, we focus on methods to increase the productivity and yield/titer scan by controlling metabolic flux through individual or combinatorial use of CRISPR/Cas and CRISPRi systems with introduction of synthetic pathway in industrially common bacteria including Escherichia coli . Further, we discuss additional useful applications of the CRISPR/Cas system, including its use in functional genomics.

  15. A Genome-Wide Scan of Selective Sweeps and Association Mapping of Fruit Traits Using Microsatellite Markers in Watermelon

    PubMed Central

    Reddy, Umesh K.; Abburi, Lavanya; Abburi, Venkata Lakshmi; Saminathan, Thangasamy; Cantrell, Robert; Vajja, Venkata Gopinath; Reddy, Rishi; Tomason, Yan R.; Levi, Amnon; Wehner, Todd C.; Nimmakayala, Padma

    2015-01-01

    Our genetic diversity study uses microsatellites of known map position to estimate genome level population structure and linkage disequilibrium, and to identify genomic regions that have undergone selection during watermelon domestication and improvement. Thirty regions that showed evidence of selective sweep were scanned for the presence of candidate genes using the watermelon genome browser (www.icugi.org). We localized selective sweeps in intergenic regions, close to the promoters, and within the exons and introns of various genes. This study provided an evidence of convergent evolution for the presence of diverse ecotypes with special reference to American and European ecotypes. Our search for location of linked markers in the whole-genome draft sequence revealed that BVWS00358, a GA repeat microsatellite, is the GAGA type transcription factor located in the 5′ untranslated regions of a structure and insertion element that expresses a Cys2His2 Zinc finger motif, with presumed biological processes related to chitin response and transcriptional regulation. In addition, BVWS01708, an ATT repeat microsatellite, located in the promoter of a DTW domain-containing protein (Cla002761); and 2 other simple sequence repeats that association mapping link to fruit length and rind thickness. PMID:25425675

  16. RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry.

    PubMed

    Horesh, Yair; Wexler, Ydo; Lebenthal, Ilana; Ziv-Ukelson, Michal; Unger, Ron

    2009-03-04

    Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described. Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome. The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

  17. Identification and Characterization of Genomic Amplifications in Ovarian Serous Carcinoma

    DTIC Science & Technology

    2006-01-01

    Wang (2005) Exploring cancer genome using innovative technologies. Curr Opin Oncol, 17:33-38. • G Singer, R Stohr, L Cope, R Dehari, A Hartmann, D -F...tions/plate × 6 plates/ d ). This high-throughput platform permits a systemic scan of cancer genome at the nucleo- tide level in a short time [35]. This...Carter D , Foellmer HG, et al.: Neu proto-oncogene amplification and expression in ovarian adenocarcinoma cell lines. Am J Pathol 1992, 140:23–31. 12

  18. Lessons learned from the dog genome.

    PubMed

    Wayne, Robert K; Ostrander, Elaine A

    2007-11-01

    Extensive genetic resources and a high-quality genome sequence position the dog as an important model species for understanding genome evolution, population genetics and genes underlying complex phenotypic traits. Newly developed genomic resources have expanded our understanding of canine evolutionary history and dog origins. Domestication involved genetic contributions from multiple populations of gray wolves probably through backcrossing. More recently, the advent of controlled breeding practices has segregated genetic variability into distinct dog breeds that possess specific phenotypic traits. Consequently, genome-wide association and selective sweep scans now allow the discovery of genes underlying breed-specific characteristics. The dog is finally emerging as a novel resource for studying the genetic basis of complex traits, including behavior.

  19. Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle

    PubMed Central

    2011-01-01

    Background 'Selection signatures' delimit regions of the genome that are, or have been, functionally important and have therefore been under either natural or artificial selection. In this study, two different and complementary methods--integrated Haplotype Homozygosity Score (|iHS|) and population differentiation index (FST)--were applied to identify traces of decades of intensive artificial selection for traits of economic importance in modern cattle. Results We scanned the genome of a diverse set of dairy and beef breeds from Germany, Canada and Australia genotyped with a 50 K SNP panel. Across breeds, a total of 109 extreme |iHS| values exceeded the empirical threshold level of 5% with 19, 27, 9, 10 and 17 outliers in Holstein, Brown Swiss, Australian Angus, Hereford and Simmental, respectively. Annotating the regions harboring clustered |iHS| signals revealed a panel of interesting candidate genes like SPATA17, MGAT1, PGRMC2 and ACTC1, COL23A1, MATN2, respectively, in the context of reproduction and muscle formation. In a further step, a new Bayesian FST-based approach was applied with a set of geographically separated populations including Holstein, Brown Swiss, Simmental, North American Angus and Piedmontese for detecting differentiated loci. In total, 127 regions exceeding the 2.5 per cent threshold of the empirical posterior distribution were identified as extremely differentiated. In a substantial number (56 out of 127 cases) the extreme FST values were found to be positioned in poor gene content regions which deviated significantly (p < 0.05) from the expectation assuming a random distribution. However, significant FST values were found in regions of some relevant genes such as SMCP and FGF1. Conclusions Overall, 236 regions putatively subject to recent positive selection in the cattle genome were detected. Both |iHS| and FST suggested selection in the vicinity of the Sialic acid binding Ig-like lectin 5 gene on BTA18. This region was recently reported to be a major QTL with strong effects on productive life and fertility traits in Holstein cattle. We conclude that high-resolution genome scans of selection signatures can be used to identify genomic regions contributing to within- and inter-breed phenotypic variation. PMID:21679429

  20. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features

    PubMed Central

    Bakas, Spyridon; Akbari, Hamed; Sotiras, Aristeidis; Bilello, Michel; Rozycki, Martin; Kirby, Justin S.; Freymann, John B.; Farahani, Keyvan; Davatzikos, Christos

    2017-01-01

    Gliomas belong to a group of central nervous system tumors, and consist of various sub-regions. Gold standard labeling of these sub-regions in radiographic imaging is essential for both clinical and computational studies, including radiomic and radiogenomic analyses. Towards this end, we release segmentation labels and radiomic features for all pre-operative multimodal magnetic resonance imaging (MRI) (n=243) of the multi-institutional glioma collections of The Cancer Genome Atlas (TCGA), publicly available in The Cancer Imaging Archive (TCIA). Pre-operative scans were identified in both glioblastoma (TCGA-GBM, n=135) and low-grade-glioma (TCGA-LGG, n=108) collections via radiological assessment. The glioma sub-region labels were produced by an automated state-of-the-art method and manually revised by an expert board-certified neuroradiologist. An extensive panel of radiomic features was extracted based on the manually-revised labels. This set of labels and features should enable i) direct utilization of the TCGA/TCIA glioma collections towards repeatable, reproducible and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments, as well as ii) performance evaluation of computer-aided segmentation methods, and comparison to our state-of-the-art method. PMID:28872634

  1. Heritable DNA methylation marks associated with susceptibility to breast cancer.

    PubMed

    Joo, Jihoon E; Dowty, James G; Milne, Roger L; Wong, Ee Ming; Dugué, Pierre-Antoine; English, Dallas; Hopper, John L; Goldgar, David E; Giles, Graham G; Southey, Melissa C

    2018-02-28

    Mendelian-like inheritance of germline DNA methylation in cancer susceptibility genes has been previously reported. We aimed to scan the genome for heritable methylation marks associated with breast cancer susceptibility by studying 25 Australian multiple-case breast cancer families. Here we report genome-wide DNA methylation measured in 210 peripheral blood DNA samples provided by family members using the Infinium HumanMethylation450. We develop and apply a new statistical method to identify heritable methylation marks based on complex segregation analysis. We estimate carrier probabilities for the 1000 most heritable methylation marks based on family structure, and we use Cox proportional hazards survival analysis to identify 24 methylation marks with corresponding carrier probabilities significantly associated with breast cancer. We replicate an association with breast cancer risk for four of the 24 marks using an independent nested case-control study. Here, we report a novel approach for identifying heritable DNA methylation marks associated with breast cancer risk.

  2. Genic rather than genome-wide differences between sexually deceptive Ophrys orchids with different pollinators.

    PubMed

    Sedeek, Khalid E M; Scopece, Giovanni; Staedler, Yannick M; Schönenberger, Jürg; Cozzolino, Salvatore; Schiestl, Florian P; Schlüter, Philipp M

    2014-12-01

    High pollinator specificity and the potential for simple genetic changes to affect pollinator attraction make sexually deceptive orchids an ideal system for the study of ecological speciation, in which change of flower odour is likely important. This study surveys reproductive barriers and differences in floral phenotypes in a group of four closely related, coflowering sympatric Ophrys species and uses a genotyping-by-sequencing (GBS) approach to obtain information on the proportion of the genome that is differentiated between species. Ophrys species were found to effectively lack postpollination barriers, but are strongly isolated by their different pollinators (floral isolation) and, to a smaller extent, by shifts in flowering time (temporal isolation). Although flower morphology and perhaps labellum coloration may contribute to floral isolation, reproductive barriers may largely be due to differences in flower odour chemistry. GBS revealed shared polymorphism throughout the Ophrys genome, with very little population structure between species. Genome scans for FST outliers identified few markers that are highly differentiated between species and repeatable in several populations. These genome scans also revealed highly differentiated polymorphisms in genes with putative involvement in floral odour production, including a previously identified candidate gene thought to be involved in the biosynthesis of pseudo-pheromones by the orchid flowers. Taken together, these data suggest that ecological speciation associated with different pollinators in sexually deceptive orchids has a genic rather than a genomic basis, placing these species at an early phase of genomic divergence within the 'speciation continuum'. © 2014 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.

  3. [Improvement of butanol production by Escherichia coli via Tn5 transposon mediated mutagenesis].

    PubMed

    Lin, Zhao; Dong, Hongjun; Li, Yin

    2015-12-01

    For engineering an efficient butanol-producing Escherichia coli strain, many efforts have been paid on the known genes or pathways based on current knowledge. However, many genes in the genome could also contribute to butanol production in an unexpected way. In this work, we used Tn5 transposon to construct a mutant library including 1 196 strains in a previously engineered butanol-producing E. coli strain. To screen the strains with improved titer of butanol production, we developed a high-throughput method for pyruvate detection based on dinitrophenylhydrazine reaction using 96-well microplate reader, because pyruvate is the precursor of butanol and its concentration is inversely correlated with butanol in the fermentation broth. Using this method, we successfully screened three mutants with increased butanol titer. The insertion sites of Tn5 transposon was in the ORFs of pykA, tdk, and cadC by inverse PCR and sequencing. These found genes would be efficient targets for further strain improvement. And the genome scanning strategy described here will be helpful for other microbial cell factory construction.

  4. Schizophrenia proteomics: biomarkers on the path to laboratory medicine?

    PubMed Central

    Lakhan, Shaheen Emmanuel

    2006-01-01

    Over two million Americans are afflicted with schizophrenia, a debilitating mental health disorder with a unique symptomatic and epidemiological profile. Genomics studies have hinted towards candidate schizophrenia susceptibility chromosomal loci and genes. Modern proteomic tools, particularly mass spectrometry and expression scanning, aim to identify both pathogenic-revealing and diagnostically significant biomarkers. Only a few studies on basic proteomics have been conducted for psychiatric disorders relative to the plethora of cancer specific experiments. One such proteomic utility enables the discovery of proteins and biological marker fingerprinting profiling techniques (SELDI-TOF-MS), and then subjects them to tandem mass spectrometric fragmentation and de novo protein sequencing (MALDI-TOF/TOF-MS) for the accurate identification and characterization of the proteins. Such utilities can explain the pathogenesis of neuro-psychiatric disease, provide more objective testing methods, and further demonstrate a biological basis to mental illness. Although clinical proteomics in schizophrenia have yet to reveal a biomarker with diagnostic specificity, methods that better characterize the disorder using endophenotypes can advance findings. Schizophrenia biomarkers could potentially revolutionize its psychopharmacology, changing it into a more hypothesis and genomic/proteomic-driven science. PMID:16846510

  5. Atomic force microscopy of biological samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Doktycz, Mitchel John

    2010-01-01

    The ability to evaluate structural-functional relationships in real time has allowed scanning probe microscopy (SPM) to assume a prominent role in post genomic biological research. In this mini-review, we highlight the development of imaging and ancillary techniques that have allowed SPM to permeate many key areas of contemporary research. We begin by examining the invention of the scanning tunneling microscope (STM) by Binnig and Rohrer in 1982 and discuss how it served to team biologists with physicists to integrate high-resolution microscopy into biological science. We point to the problems of imaging nonconductive biological samples with the STM and relate howmore » this led to the evolution of the atomic force microscope (AFM) developed by Binnig, Quate, and Gerber, in 1986. Commercialization in the late 1980s established SPM as a powerful research tool in the biological research community. Contact mode AFM imaging was soon complemented by the development of non-contact imaging modes. These non-contact modes eventually became the primary focus for further new applications including the development of fast scanning methods. The extreme sensitivity of the AFM cantilever was recognized and has been developed into applications for measuring forces required for indenting biological surfaces and breaking bonds between biomolecules. Further functional augmentation to the cantilever tip allowed development of new and emerging techniques including scanning ion-conductance microscopy (SICM), scanning electrochemical microscope (SECM), Kelvin force microscopy (KFM) and scanning near field ultrasonic holography (SNFUH).« less

  6. In Search of Genes Associated with Risk for Psychopathic Tendencies in Children: A Two-Stage Genome-Wide Association Study of Pooled DNA

    ERIC Educational Resources Information Center

    Viding, Essi; Hanscombe, Ken B.; Curtis, Charles J. C.; Davis, Oliver S. P.; Meaburn, Emma L.; Plomin, Robert

    2010-01-01

    Background: Quantitative genetic data from our group indicates that antisocial behaviour (AB) is strongly heritable when coupled with psychopathic, callous-unemotional (CU) personality traits. We have also demonstrated that the genetic influences for AB and CU overlap considerably. We conducted a genome-wide association scan that capitalises on…

  7. Genome-wide scan for seed composition provides insights into the improvement of soybean quality and the impacts of domestication and modern breeding

    USDA-ARS?s Scientific Manuscript database

    Soybean (Glycine max (L.) Merrill) is a world-widely grown major crop rich in both protein and oil. Improvement of seed nutrients has long been one of the most important breeding objectives in soybean. To better understand the genetic architecture of the traits for improvement, we conducted genome-w...

  8. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.

  9. RSAT 2018: regulatory sequence analysis tools 20th anniversary.

    PubMed

    Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

    2018-05-02

    RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  10. Revealing phenotype-associated functional differences by genome-wide scan of ancient haplotype blocks

    PubMed Central

    Onuki, Ritsuko; Yamaguchi, Rui; Shibuya, Tetsuo; Kanehisa, Minoru; Goto, Susumu

    2017-01-01

    Genome-wide scans for positive selection have become important for genomic medicine, and many studies aim to find genomic regions affected by positive selection that are associated with risk allele variations among populations. Most such studies are designed to detect recent positive selection. However, we hypothesize that ancient positive selection is also important for adaptation to pathogens, and has affected current immune-mediated common diseases. Based on this hypothesis, we developed a novel linkage disequilibrium-based pipeline, which aims to detect regions associated with ancient positive selection across populations from single nucleotide polymorphism (SNP) data. By applying this pipeline to the genotypes in the International HapMap project database, we show that genes in the detected regions are enriched in pathways related to the immune system and infectious diseases. The detected regions also contain SNPs reported to be associated with cancers and metabolic diseases, obesity-related traits, type 2 diabetes, and allergic sensitization. These SNPs were further mapped to biological pathways to determine the associations between phenotypes and molecular functions. Assessments of candidate regions to identify functions associated with variations in incidence rates of these diseases are needed in the future. PMID:28445522

  11. Recent history of artificial outcrossing facilitates whole-genome association mapping in elite inbred crop varieties

    PubMed Central

    Rostoks, Nils; Ramsay, Luke; MacKenzie, Katrin; Cardle, Linda; Bhat, Prasanna R.; Roose, Mikeal L.; Svensson, Jan T.; Stein, Nils; Varshney, Rajeev K.; Marshall, David F.; Graner, Andreas; Close, Timothy J.; Waugh, Robbie

    2006-01-01

    Genomewide association studies depend on the extent of linkage disequilibrium (LD), the number and distribution of markers, and the underlying structure in populations under study. Outbreeding species generally exhibit limited LD, and consequently, a very large number of markers are required for effective whole-genome association genetic scans. In contrast, several of the world's major food crops are self-fertilizing inbreeding species with narrow genetic bases and theoretically extensive LD. Together these are predicted to result in a combination of low resolution and a high frequency of spurious associations in LD-based studies. However, inbred elite plant varieties represent a unique human-induced pseudooutbreeding population that has been subjected to strong selection for advantageous alleles. By assaying 1,524 genomewide SNPs we demonstrate that, after accounting for population substructure, the level of LD exhibited in elite northwest European barley, a typical inbred cereal crop, can be effectively exploited to map traits by using whole-genome association scans with several hundred to thousands of biallelic SNPs. PMID:17085595

  12. Simultaneous mutation detection of three homoeologous genes in wheat by High Resolution Melting analysis and Mutation Surveyor.

    PubMed

    Dong, Chongmei; Vincent, Kate; Sharp, Peter

    2009-12-04

    TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful tool for reverse genetics, combining traditional chemical mutagenesis with high-throughput PCR-based mutation detection to discover induced mutations that alter protein function. The most popular mutation detection method for TILLING is a mismatch cleavage assay using the endonuclease CelI. For this method, locus-specific PCR is essential. Most wheat genes are present as three similar sequences with high homology in exons and low homology in introns. Locus-specific primers can usually be designed in introns. However, it is sometimes difficult to design locus-specific PCR primers in a conserved region with high homology among the three homoeologous genes, or in a gene lacking introns, or if information on introns is not available. Here we describe a mutation detection method which combines High Resolution Melting (HRM) analysis of mixed PCR amplicons containing three homoeologous gene fragments and sequence analysis using Mutation Surveyor software, aimed at simultaneous detection of mutations in three homoeologous genes. We demonstrate that High Resolution Melting (HRM) analysis can be used in mutation scans in mixed PCR amplicons containing three homoeologous gene fragments. Combining HRM scanning with sequence analysis using Mutation Surveyor is sensitive enough to detect a single nucleotide mutation in the heterozygous state in a mixed PCR amplicon containing three homoeoloci. The method was tested and validated in an EMS (ethylmethane sulfonate)-treated wheat TILLING population, screening mutations in the carboxyl terminal domain of the Starch Synthase II (SSII) gene. Selected identified mutations of interest can be further analysed by cloning to confirm the mutation and determine the genomic origin of the mutation. Polyploidy is common in plants. Conserved regions of a gene often represent functional domains and have high sequence similarity between homoeologous loci. The method described here is a useful alternative to locus-specific based methods for screening mutations in conserved functional domains of homoeologous genes. This method can also be used for SNP (single nucleotide polymorphism) marker development and eco-TILLING in polyploid species.

  13. Fluorescence-labeled methylation-sensitive amplified fragment length polymorphism (FL-MS-AFLP) analysis for quantitative determination of DNA methylation and demethylation status.

    PubMed

    Kageyama, Shinji; Shinmura, Kazuya; Yamamoto, Hiroko; Goto, Masanori; Suzuki, Koichi; Tanioka, Fumihiko; Tsuneyoshi, Toshihiro; Sugimura, Haruhiko

    2008-04-01

    The PCR-based DNA fingerprinting method called the methylation-sensitive amplified fragment length polymorphism (MS-AFLP) analysis is used for genome-wide scanning of methylation status. In this study, we developed a method of fluorescence-labeled MS-AFLP (FL-MS-AFLP) analysis by applying a fluorescence-labeled primer and fluorescence-detecting electrophoresis apparatus to the existing method of MS-AFLP analysis. The FL-MS-AFLP analysis enables quantitative evaluation of more than 350 random CpG loci per run. It was shown to allow evaluation of the differences in methylation level of blood DNA of gastric cancer patients and evaluation of hypermethylation and hypomethylation in DNA from gastric cancer tissue in comparison with adjacent non-cancerous tissue.

  14. Revealing the first uridyl peptide antibiotic biosynthetic gene cluster and probing pacidamycin biosynthesis.

    PubMed

    Rackham, Emma J; Grüschow, Sabine; Goss, Rebecca J M

    2011-01-01

    There is an urgent need for new antibiotics with resistance continuing to emerge toward existing classes. The pacidamycin antibiotics possess a novel scaffold and exhibit unexploited bioactivity rendering them attractive research targets. We recently reported the first identification of a biosynthetic cluster encoding uridyl peptide antibiotic assembly and the engineering of pacidamycin biosynthesis into a heterologous host. We report here our methods toward identifying the biosynthetic cluster. Our initial experiments employed conventional methods of probing a cosmid library using PCR and Southern blotting, however it became necessary to adopt a state-of-the-art genome scanning  and in silico hybridization approach  to pin point the cluster. Here we describe our "real" and "virtual" probing methods and contrast the benefits and pitfalls of each approach. 

  15. DETECTING SELECTION IN NATURAL POPULATIONS: MAKING SENSE OF GENOME SCANS AND TOWARDS ALTERNATIVE SOLUTIONS

    PubMed Central

    Haasl, Ryan J.; Payseur, Bret A.

    2016-01-01

    Genomewide scans for natural selection (GWSS) have become increasingly common over the last 15 years due to increased availability of genome-scale genetic data. Here, we report a representative survey of GWSS from 1999 to present and find that (i) between 1999 and 2009, 35 of 49 (71%) GWSS focused on human, while from 2010 to present, only 38 of 83 (46%) of GWSS focused on human, indicating increased focus on nonmodel organisms; (ii) the large majority of GWSS incorporate interpopulation or interspecific comparisons using, for example FST, cross-population extended haplotype homozygosity or the ratio of nonsynonymous to synonymous substitutions; (iii) most GWSS focus on detection of directional selection rather than other modes such as balancing selection; and (iv) in human GWSS, there is a clear shift after 2004 from microsatellite markers to dense SNP data. A survey of GWSS meant to identify loci positively selected in response to severe hypoxic conditions support an approach to GWSS in which a list of a priori candidate genes based on potential selective pressures are used to filter the list of significant hits a posteriori. We also discuss four frequently ignored determinants of genomic heterogeneity that complicate GWSS: mutation, recombination, selection and the genetic architecture of adaptive traits. We recommend that GWSS methodology should better incorporate aspects of genomewide heterogeneity using empirical estimates of relevant parameters and/or realistic, whole-chromosome simulations to improve interpretation of GWSS results. Finally, we argue that knowledge of potential selective agents improves interpretation of GWSS results and that new methods focused on correlations between environmental variables and genetic variation can help automate this approach. PMID:26224644

  16. Hush puppy: a new mouse mutant with pinna, ossicle, and inner ear defects.

    PubMed

    Pau, Henry; Fuchs, Helmut; de Angelis, Martin Hrabé; Steel, Karen P

    2005-01-01

    Deafness can be associated with abnormalities of the pinna, ossicles, and cochlea. The authors studied a newly generated mouse mutant with pinna defects and asked whether these defects are associated with peripheral auditory or facial skeletal abnormalities, or both. Furthermore, the authors investigated where the mutation responsible for these defects was located in the mouse genome. The hearing of hush puppy mutants was assessed by Preyer reflex and electrophysiological measurement. The morphological features of their middle and inner ears were investigated by microdissection, paint-filling of the labyrinth, and scanning electron microscopy. Skeletal staining of skulls was performed to assess the craniofacial dimensions. Genome scanning was performed using microsatellite markers to localize the mutation to a chromosomal region. Some hush puppy mutants showed early onset of hearing impairment. They had small, bat-like pinnae and normal malleus but abnormal incus and stapes. Some mutants had asymmetrical defects and showed reduced penetrance of the ear abnormalities. Paint-filling of newborns' inner ears revealed no morphological abnormality, although half of the mice studied were expected to carry the mutation. Reduced numbers of outer hair cells were demonstrated in mutants' cochlea on scanning electron microscopy. Skeletal staining showed that the mutants have significantly shorter snouts and mandibles. Genome scan revealed that the mutation lies on chromosome 8 between markers D8Mit58 and D8Mit289. The study results indicate developmental problems of the first and second branchial arches and otocyst as a result of a single gene mutation. Similar defects are found in humans, and hush puppy provides a mouse model for investigation of such defects.

  17. A Meta-Assembly of Selection Signatures in Cattle

    PubMed Central

    Randhawa, Imtiaz A. S.; Khatkar, Mehar S.; Thomson, Peter C.; Raadsma, Herman W.

    2016-01-01

    Since domestication, significant genetic improvement has been achieved for many traits of commercial importance in cattle, including adaptation, appearance and production. In response to such intense selection pressures, the bovine genome has undergone changes at the underlying regions of functional genetic variants, which are termed “selection signatures”. This article reviews 64 recent (2009–2015) investigations testing genomic diversity for departure from neutrality in worldwide cattle populations. In particular, we constructed a meta-assembly of 16,158 selection signatures for individual breeds and their archetype groups (European, African, Zebu and composite) from 56 genome-wide scans representing 70,743 animals of 90 pure and crossbred cattle breeds. Meta-selection-scores (MSS) were computed by combining published results at every given locus, within a sliding window span. MSS were adjusted for common samples across studies and were weighted for significance thresholds across and within studies. Published selection signatures show extensive coverage across the bovine genome, however, the meta-assembly provides a consensus profile of 263 genomic regions of which 141 were unique (113 were breed-specific) and 122 were shared across cattle archetypes. The most prominent peaks of MSS represent regions under selection across multiple populations and harboured genes of known major effects (coat color, polledness and muscle hypertrophy) and genes known to influence polygenic traits (stature, adaptation, feed efficiency, immunity, behaviour, reproduction, beef and dairy production). As the first meta-assembly of selection signatures, it offers novel insights about the hotspots of selective sweeps in the bovine genome, and this method could equally be applied to other species. PMID:27045296

  18. Recent coselection in human populations revealed by protein-protein interaction network.

    PubMed

    Qian, Wei; Zhou, Hang; Tang, Kun

    2014-12-21

    Genome-wide scans for signals of natural selection in human populations have identified a large number of candidate loci that underlie local adaptations. This is surprising given the relatively short evolutionary time since the divergence of the human population. One hypothesis that has not been formally examined is whether and how the recent human evolution may have been shaped by coselection in the context of complex molecular interactome. In this study, genome-wide signals of selection were scanned in East Asians, Europeans, and Africans using 1000 Genome data, and subsequently mapped onto the protein-protein interaction (PPI) network. We found that the candidate genes of recent positive selection localized significantly closer to each other on the PPI network than expected, revealing substantial clustering of selected genes. Furthermore, gene pairs of shorter PPI network distances showed higher similarities of their recent evolutionary paths than those further apart. Last, subnetworks enriched with recent coselection signals were identified, which are substantially overrepresented in biological pathways related to signal transduction, neurogenesis, and immune function. These results provide the first genome-wide evidence for association of recent selection signals with the PPI network, shedding light on the potential mechanisms of recent coselection in the human genome. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Scanning of Transposable Elements and Analyzing Expression of Transposase Genes of Sweet Potato [Ipomoea batatas

    PubMed Central

    Tao, Xiang; Lai, Xian-Jun; Zhang, Yi-Zheng; Tan, Xue-Mei; Wang, Haiyan

    2014-01-01

    Background Transposable elements (TEs) are the most abundant genomic components in eukaryotes and affect the genome by their replications and movements to generate genetic plasticity. Sweet potato performs asexual reproduction generally and the TEs may be an important genetic factor for genome reorganization. Complete identification of TEs is essential for the study of genome evolution. However, the TEs of sweet potato are still poorly understood because of its complex hexaploid genome and difficulty in genome sequencing. The recent availability of the sweet potato transcriptome databases provides an opportunity for discovering and characterizing the expressed TEs. Methodology/Principal Findings We first established the integrated-transcriptome database by de novo assembling four published sweet potato transcriptome databases from three cultivars in China. Using sequence-similarity search and analysis, a total of 1,405 TEs including 883 retrotransposons and 522 DNA transposons were predicted and categorized. Depending on mapping sets of RNA-Seq raw short reads to the predicted TEs, we compared the quantities, classifications and expression activities of TEs inter- and intra-cultivars. Moreover, the differential expressions of TEs in seven tissues of Xushu 18 cultivar were analyzed by using Illumina digital gene expression (DGE) tag profiling. It was found that 417 TEs were expressed in one or more tissues and 107 in all seven tissues. Furthermore, the copy number of 11 transposase genes was determined to be 1–3 copies in the genome of sweet potato by Real-time PCR-based absolute quantification. Conclusions/Significance Our result provides a new method for TE searching on species with transcriptome sequences while lacking genome information. The searching, identification and expression analysis of TEs will provide useful TE information in sweet potato, which are valuable for the further studies of TE-mediated gene mutation and optimization in asexual reproduction. It contributes to elucidating the roles of TEs in genome evolution. PMID:24608103

  20. Development and Evaluation of a Genome-Wide 6K SNP Array for Diploid Sweet Cherry and Tetraploid Sour Cherry

    PubMed Central

    Peace, Cameron; Bassil, Nahla; Main, Dorrie; Ficklin, Stephen; Rosyara, Umesh R.; Stegmeir, Travis; Sebolt, Audrey; Gilmore, Barbara; Lawley, Cindy; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Iezzoni, Amy

    2012-01-01

    High-throughput genome scans are important tools for genetic studies and breeding applications. Here, a 6K SNP array for use with the Illumina Infinium® system was developed for diploid sweet cherry (Prunus avium) and allotetraploid sour cherry (P. cerasus). This effort was led by RosBREED, a community initiative to enable marker-assisted breeding for rosaceous crops. Next-generation sequencing in diverse breeding germplasm provided 25 billion basepairs (Gb) of cherry DNA sequence from which were identified genome-wide SNPs for sweet cherry and for the two sour cherry subgenomes derived from sweet cherry (avium subgenome) and P. fruticosa (fruticosa subgenome). Anchoring to the peach genome sequence, recently released by the International Peach Genome Initiative, predicted relative physical locations of the 1.9 million putative SNPs detected, preliminarily filtered to 368,943 SNPs. Further filtering was guided by results of a 144-SNP subset examined with the Illumina GoldenGate® assay on 160 accessions. A 6K Infinium® II array was designed with SNPs evenly spaced genetically across the sweet and sour cherry genomes. SNPs were developed for each sour cherry subgenome by using minor allele frequency in the sour cherry detection panel to enrich for subgenome-specific SNPs followed by targeting to either subgenome according to alleles observed in sweet cherry. The array was evaluated using panels of sweet (n = 269) and sour (n = 330) cherry breeding germplasm. Approximately one third of array SNPs were informative for each crop. A total of 1825 polymorphic SNPs were verified in sweet cherry, 13% of these originally developed for sour cherry. Allele dosage was resolved for 2058 polymorphic SNPs in sour cherry, one third of these being originally developed for sweet cherry. This publicly available genomics resource represents a significant advance in cherry genome-scanning capability that will accelerate marker-locus-trait association discovery, genome structure investigation, and genetic diversity assessment in this diploid-tetraploid crop group. PMID:23284615

  1. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  2. A Parallel Genetic Algorithm to Discover Patterns in Genetic Markers that Indicate Predisposition to Multifactorial Disease

    PubMed Central

    Rausch, Tobias; Thomas, Alun; Camp, Nicola J.; Cannon-Albright, Lisa A.; Facelli, Julio C.

    2008-01-01

    This paper describes a novel algorithm to analyze genetic linkage data using pattern recognition techniques and genetic algorithms (GA). The method allows a search for regions of the chromosome that may contain genetic variations that jointly predispose individuals for a particular disease. The method uses correlation analysis, filtering theory and genetic algorithms (GA) to achieve this goal. Because current genome scans use from hundreds to hundreds of thousands of markers, two versions of the method have been implemented. The first is an exhaustive analysis version that can be used to visualize, explore, and analyze small genetic data sets for two marker correlations; the second is a GA version, which uses a parallel implementation allowing searches of higher-order correlations in large data sets. Results on simulated data sets indicate that the method can be informative in the identification of major disease loci and gene-gene interactions in genome-wide linkage data and that further exploration of these techniques is justified. The results presented for both variants of the method show that it can help genetic epidemiologists to identify promising combinations of genetic factors that might predispose to complex disorders. In particular, the correlation analysis of IBD expression patterns might hint to possible gene-gene interactions and the filtering might be a fruitful approach to distinguish true correlation signals from noise. PMID:18547558

  3. A second locus for very-late-onset Alzheimer disease: a genome scan reveals linkage to 20p and epistasis between 20p and the amyloid precursor protein region.

    PubMed

    Olson, Jane M; Goddard, Katrina A B; Dudek, Doreen M

    2002-07-01

    We used a covariate-based linkage method to reanalyze genome scan data from affected sibships collected by the Alzheimer Disease (AD) Genetics Initiative of the National Institute of Mental Health. As reported in an earlier article, the amyloid-beta precursor protein (APP) region is strongly linked to affected sib pairs of the oldest current age (i.e., age either at last exam or at death) who lack E4 alleles at the apolipoprotein E (ApoE) locus. We now report that a region on 20p shows the same pattern. A model that includes current age and the number of E2 alleles as covariates gives a LOD score of 4.1. The signal on 20p is near the location of the gene coding for cystatin-C, previously shown to be associated with late-onset AD and to codeposit with APP in the brains of patients with AD. Two-locus analysis provides evidence of strong epistasis between 20p and the APP region, limited to the oldest age group and to those lacking ApoE4 alleles. We speculate that high-risk polymorphisms in both regions produce a biological interaction between these two proteins that increases susceptibility to a very-late-onset form of AD.

  4. Development of surrogate endpoint biomarkers for clinical trials of cancer chemopreventive agents: relationships to fundamental properties of preinvasive (intraepithelial) neoplasia.

    PubMed

    Boone, C W; Kelloff, G J

    1994-01-01

    The tissue changes offering the greatest immediate potential for development as surrogate endpoint biomarkers (SEBs) to be used in Phase II trials of cancer chemopreventive agents are those derived from the microscopic tissue changes pathologists use to make the diagnosis of preinvasive (intraepithelial) neoplasia. These changes comprise four categories: proliferative index, ploidy, nuclear morphometry (size, shape, texture, and pleomorphism), and nucleolar morphometry (number, size, shape, position, and pleomorphism). Computer-assisted image analysis (CIA) permits dozens of additional morphometric parameters to be developed. Other categories of candidate SEBs are: DNA and chromosomal structural changes associated with genomic instability, activation of oncogenes and inactivation of tumor suppressor genes, structural changes in differentiated molecules, and aberrations of growth factor/receptor structure and function. Self-perpetuating DNA breakage with secondary mutator mutations in genomic stability genes is a major mechanism by which the genomic instability characteristic of neoplasia occurs, and from which stem other basic neoplastic properties, including clonal evolution, along multiple pathways of genetic variation that are stochastically determined, continuously increasing proliferation, rate and extent of phenotypic heterogeneity. SEBs resulting from genomic instability include homogeneously staining regions, double minute chromosomes, micronuclei, dicentrics, gene amplification, loss of heterozygosity, and alterations in chromosome number. Newly developed assays for detecting genomic instability include comparative genomic hybridization using fluorescence in situ hybridization on > 20 micron-thick sections monitored by confocal laser scanning microscopy, assays for microsatellite instability, and restriction landmark genomic scanning. These assays offer promise for detecting the earliest molecular changes of neoplasia in normal-appearing epithelium prior to the onset of the dysplastic phase of intraepithelial neoplasia.

  5. A 2cM genome-wide scan of European Holstein cattle affected by classical BSE

    PubMed Central

    2010-01-01

    Background Classical bovine spongiform encephalopathy (BSE) is an acquired prion disease that is invariably fatal in cattle and has been implicated as a significant human health risk. Polymorphisms that alter the prion protein of sheep or humans have been associated with variations in transmissible spongiform encephalopathy susceptibility or resistance. In contrast, there is no strong evidence that non-synonymous mutations in the bovine prion gene (PRNP) are associated with classical BSE disease susceptibility. However, two bovine PRNP insertion/deletion polymorphisms, one within the promoter region and the other in intron 1, have been associated with susceptibility to classical BSE. These associations do not explain the full extent of BSE susceptibility, and loci outside of PRNP appear to be associated with disease incidence in some cattle populations. To test for associations with BSE susceptibility, we conducted a genome wide scan using a panel of 3,072 single nucleotide polymorphism (SNP) markers on 814 animals representing cases and control Holstein cattle from the United Kingdom BSE epidemic. Results Two sets of BSE affected Holstein cattle were analyzed in this study, one set with known family relationships and the second set of paired cases with controls. The family set comprises half-sibling progeny from six sires. The progeny from four of these sires had previously been scanned with microsatellite markers. The results obtained from the current analysis of the family set yielded both some supporting and new results compared with those obtained in the earlier study. The results revealed 27 SNPs representing 18 chromosomes associated with incidence of BSE disease. These results confirm a region previously reported on chromosome 20, and identify additional regions on chromosomes 2, 14, 16, 21 and 28. This study did not identify a significant association near the PRNP in the family sample set. The only association found in the PRNP region was in the case-control sample set and this was not significant after multiple test correction. The genome scan of the case-control animals did not identify any associations that passed a stringent genome-wide significance threshold. Conclusions Several regions of the genome are statistically associated with the incidence of classical BSE in European Holstein cattle. Further investigation of loci on chromosomes 2, 14, 16, 20, 21 and 28 will be required to uncover any biological significance underlying these marker associations. PMID:20350325

  6. Meta-analysis of 32 genome-wide linkage studies of schizophrenia

    PubMed Central

    Ng, MYM; Levinson, DF; Faraone, SV; Suarez, BK; DeLisi, LE; Arinami, T; Riley, B; Paunio, T; Pulver, AE; Irmansyah; Holmans, PA; Escamilla, M; Wildenauer, DB; Williams, NM; Laurent, C; Mowry, BJ; Brzustowicz, LM; Maziade, M; Sklar, P; Garver, DL; Abecasis, GR; Lerer, B; Fallin, MD; Gurling, HMD; Gejman, PV; Lindholm, E; Moises, HW; Byerley, W; Wijsman, EM; Forabosco, P; Tsuang, MT; Hwu, H-G; Okazaki, Y; Kendler, KS; Wormley, B; Fanous, A; Walsh, D; O’Neill, FA; Peltonen, L; Nestadt, G; Lasseter, VK; Liang, KY; Papadimitriou, GM; Dikeos, DG; Schwab, SG; Owen, MJ; O’Donovan, MC; Norton, N; Hare, E; Raventos, H; Nicolini, H; Albus, M; Maier, W; Nimgaonkar, VL; Terenius, L; Mallet, J; Jay, M; Godard, S; Nertney, D; Alexander, M; Crowe, RR; Silverman, JM; Bassett, AS; Roy, M-A; Mérette, C; Pato, CN; Pato, MT; Roos, J Louw; Kohn, Y; Amann-Zalcenstein, D; Kalsi, G; McQuillin, A; Curtis, D; Brynjolfson, J; Sigmundsson, T; Petursson, H; Sanders, AR; Duan, J; Jazin, E; Myles-Worsley, M; Karayiorgou, M; Lewis, CM

    2009-01-01

    A genome scan meta-analysis (GSMA) was carried out on 32 independent genome-wide linkage scan analyses that included 3255 pedigrees with 7413 genotyped cases affected with schizophrenia (SCZ) or related disorders. The primary GSMA divided the autosomes into 120 bins, rank-ordered the bins within each study according to the most positive linkage result in each bin, summed these ranks (weighted for study size) for each bin across studies and determined the empirical probability of a given summed rank (PSR) by simulation. Suggestive evidence for linkage was observed in two single bins, on chromosomes 5q (142-168 Mb) and 2q (103-134 Mb). Genome-wide evidence for linkage was detected on chromosome 2q (119-152 Mb) when bin boundaries were shifted to the middle of the previous bins. The primary analysis met empirical criteria for ‘aggregate’ genome-wide significance, indicating that some or all of 10 bins are likely to contain loci linked to SCZ, including regions of chromosomes 1, 2q, 3q, 4q, 5q, 8p and 10q. In a secondary analysis of 22 studies of European-ancestry samples, suggestive evidence for linkage was observed on chromosome 8p (16-33 Mb). Although the newer genome-wide association methodology has greater power to detect weak associations to single common DNA sequence variants, linkage analysis can detect diverse genetic effects that segregate in families, including multiple rare variants within one locus or several weakly associated loci in the same region. Therefore, the regions supported by this meta-analysis deserve close attention in future studies. PMID:19349958

  7. CID-miRNA: A web server for prediction of novel miRNA precursors in human genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tyagi, Sonika; Vaz, Candida; Gupta, Vipin

    2008-08-08

    microRNAs (miRNA) are a class of non-protein coding functional RNAs that are thought to regulate expression of target genes by direct interaction with mRNAs. miRNAs have been identified through both experimental and computational methods in a variety of eukaryotic organisms. Though these approaches have been partially successful, there is a need to develop more tools for detection of these RNAs as they are also thought to be present in abundance in many genomes. In this report we describe a tool and a web server, named CID-miRNA, for identification of miRNA precursors in a given DNA sequence, utilising secondary structure-based filteringmore » systems and an algorithm based on stochastic context free grammar trained on human miRNAs. CID-miRNA analyses a given sequence using a web interface, for presence of putative miRNA precursors and the generated output lists all the potential regions that can form miRNA-like structures. It can also scan large genomic sequences for the presence of potential miRNA precursors in its stand-alone form. The web server can be accessed at (http://mirna.jnu.ac.in/cidmirna/)« less

  8. Genome-Wide Association Scan in HIV-1-Infected Individuals Identifying Variants Influencing Disease Course

    PubMed Central

    van Manen, Daniëlle; Delaneau, Olivier; Kootstra, Neeltje A.; Boeser-Nunnink, Brigitte D.; Limou, Sophie; Bol, Sebastiaan M.; Burger, Judith A.; Zwinderman, Aeilko H.; Moerland, Perry D.; van 't Slot, Ruben; Zagury, Jean-François; van 't Wout, Angélique B.; Schuitemaker, Hanneke

    2011-01-01

    Background AIDS develops typically after 7–11 years of untreated HIV-1 infection, with extremes of very rapid disease progression (<2 years) and long-term non-progression (>15 years). To reveal additional host genetic factors that may impact on the clinical course of HIV-1 infection, we designed a genome-wide association study (GWAS) in 404 participants of the Amsterdam Cohort Studies on HIV-1 infection and AIDS. Methods The association of SNP genotypes with the clinical course of HIV-1 infection was tested in Cox regression survival analyses using AIDS-diagnosis and AIDS-related death as endpoints. Results Multiple, not previously identified SNPs, were identified to be strongly associated with disease progression after HIV-1 infection, albeit not genome-wide significant. However, three independent SNPs in the top ten associations between SNP genotypes and time between seroconversion and AIDS-diagnosis, and one from the top ten associations between SNP genotypes and time between seroconversion and AIDS-related death, had P-values smaller than 0.05 in the French Genomics of Resistance to Immunodeficiency Virus cohort on disease progression. Conclusions Our study emphasizes that the use of different phenotypes in GWAS may be useful to unravel the full spectrum of host genetic factors that may be associated with the clinical course of HIV-1 infection. PMID:21811574

  9. Lack of association between arterial stiffness and genetic variants by genome-wide association scan.

    PubMed

    Park, Sungha; Lee, Ji-Young; Kim, Byeong-Keuk; Lee, Sang-Hak; Chang, Hyuk-Jae; Choi, DongHoon; Jang, Yangsoo

    2015-01-01

    Arterial stiffness is an independent predictor of cardiovascular disease risk. However, whether genetic risk variants are associated with arterial stiffness measures, such as pulse-wave velocity (PWV), is largely unknown. Therefore, we performed a genome-wide association study (GWAS) to identify single-nucleotide polymorphisms (SNPs) associated with PWV in a Korea population. Study participants consisted of 402 patients in the Yonsei cardiovascular genome center cohort. Arterial stiffness was measured as brachial-ankle pulse-wave velocity (baPWV). Genotyping was performed in 402 subjects with the Axiom Genome-Wide ASI 1 Array Plate containing more than 600,000 SNP markers. The findings were tested for replication in independent subjects from a community-based cohort of 1206 individuals, using a Taqman assay to include two candidate SNPs. Associations with PWV were evaluated using an additive genetic model that included age, gender, systolic blood pressure and diastolic blood pressure as covariates. GWAS and replication analyses were conducted using the measured genotype method implemented in PLINK and SAS. We observed two candidate SNPs associated with baPWV in GWAS: rs7271920 (p = 7.15 × 10(-9)) and rs10125157 (p = 8.25 × 10(-7)). However, neither of these was significant in the replication cohort. In summary, we did not identify any common genetic variants associated with baPWV in cardiovascular patients.

  10. Genome-wide molecular dissection of serotype M3 group A Streptococcus strains causing two epidemics of invasive infections.

    PubMed

    Beres, Stephen B; Sylva, Gail L; Sturdevant, Daniel E; Granville, Chanel N; Liu, Mengyao; Ricklefs, Stacy M; Whitney, Adeline R; Parkins, Larye D; Hoe, Nancy P; Adams, Gerald J; Low, Donald E; DeLeo, Frank R; McGeer, Allison; Musser, James M

    2004-08-10

    Molecular factors that contribute to the emergence of new virulent bacterial subclones and epidemics are poorly understood. We hypothesized that analysis of a population-based strain sample of serotype M3 group A Streptococcus (GAS) recovered from patients with invasive infection by using genome-wide investigative methods would provide new insight into this fundamental infectious disease problem. Serotype M3 GAS strains (n = 255) cultured from patients in Ontario, Canada, over 11 years and representing two distinct infection peaks were studied. Genetic diversity was indexed by pulsed-field gel electrophoresis, DNA-DNA microarray, whole-genome PCR scanning, prophage genotyping, targeted gene sequencing, and single-nucleotide polymorphism genotyping. All variation in gene content was attributable to acquisition or loss of prophages, a molecular process that generated unique combinations of proven or putative virulence genes. Distinct serotype M3 genotypes experienced rapid population expansion and caused infections that differed significantly in character and severity. Molecular genetic analysis, combined with immunologic studies, implicated a 4-aa duplication in the extreme N terminus of M protein as a factor contributing to an epidemic wave of serotype M3 invasive infections. This finding has implications for GAS vaccine research. Genome-wide analysis of population-based strain samples cultured from clinically well defined patients is crucial for understanding the molecular events underlying bacterial epidemics.

  11. Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits.

    PubMed

    Varshney, Rajeev K; Saxena, Rachit K; Upadhyaya, Hari D; Khan, Aamir W; Yu, Yue; Kim, Changhoon; Rathore, Abhishek; Kim, Dongseon; Kim, Jihun; An, Shaun; Kumar, Vinay; Anuradha, Ghanta; Yamini, Kalinati Narasimhan; Zhang, Wei; Muniswamy, Sonnappa; Kim, Jong-So; Penmetsa, R Varma; von Wettberg, Eric; Datta, Swapan K

    2017-07-01

    Pigeonpea (Cajanus cajan), a tropical grain legume with low input requirements, is expected to continue to have an important role in supplying food and nutritional security in developing countries in Asia, Africa and the tropical Americas. From whole-genome resequencing of 292 Cajanus accessions encompassing breeding lines, landraces and wild species, we characterize genome-wide variation. On the basis of a scan for selective sweeps, we find several genomic regions that were likely targets of domestication and breeding. Using genome-wide association analysis, we identify associations between several candidate genes and agronomically important traits. Candidate genes for these traits in pigeonpea have sequence similarity to genes functionally characterized in other plants for flowering time control, seed development and pod dehiscence. Our findings will allow acceleration of genetic gains for key traits to improve yield and sustainability in pigeonpea.

  12. High-resolution mapping reveals hundreds of genetic incompatibilities in hybridizing fish species.

    PubMed

    Schumer, Molly; Cui, Rongfeng; Powell, Daniel L; Dresner, Rebecca; Rosenthal, Gil G; Andolfatto, Peter

    2014-06-04

    Hybridization is increasingly being recognized as a common process in both animal and plant species. Negative epistatic interactions between genes from different parental genomes decrease the fitness of hybrids and can limit gene flow between species. However, little is known about the number and genome-wide distribution of genetic incompatibilities separating species. To detect interacting genes, we perform a high-resolution genome scan for linkage disequilibrium between unlinked genomic regions in naturally occurring hybrid populations of swordtail fish. We estimate that hundreds of pairs of genomic regions contribute to reproductive isolation between these species, despite them being recently diverged. Many of these incompatibilities are likely the result of natural or sexual selection on hybrids, since intrinsic isolation is known to be weak. Patterns of genomic divergence at these regions imply that genetic incompatibilities play a significant role in limiting gene flow even in young species.

  13. Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival.

    PubMed

    Kim, Sangkyu; Welsh, David A; Myers, Leann; Cherry, Katie E; Wyckoff, Jennifer; Jazwinski, S Michal

    2015-02-28

    We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13-14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity.

  14. Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival

    PubMed Central

    Kim, Sangkyu; Welsh, David A.; Myers, Leann; Cherry, Katie E.; Wyckoff, Jennifer; Jazwinski, S. Michal

    2015-01-01

    We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13–14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity. PMID:25682868

  15. Genome-Wide Linkage and Positional Association Analyses Identify Associations of Novel AFF3 and NTM Genes with Triglycerides: The GenSalt Study

    PubMed Central

    Li, Changwei; Bazzano, Lydia A.L.; Rao, Dabeeru C.; Hixson, James E.; He, Jiang; Gu, Dongfeng; Gu, Charles C.; Shimmin, Lawrence C.; Jaquish, Cashell E.; Schwander, Karen; Liu, De-Pei; Huang, Jianfeng; Lu, Fanghong; Cao, Jie; Chong, Shen; Lu, Xiangfeng; Kelly, Tanika N.

    2016-01-01

    We conducted a genome-wide linkage scan and positional association study to identify genes and variants influencing blood lipid levels among participants of the Genetic Epidemiology Network of Salt-Sensitivity (GenSalt) study. The GenSalt study was conducted among 1906 participants from 633 Han Chinese families. Lipids were measured from overnight fasting blood samples using standard methods. Multipoint quantitative trait genome-wide linkage scans were performed on the high-density lipoprotein, low-density lipoprotein, and log-transformed triglyceride phenotypes. Using dense panels of single nucleotide polymorphisms (SNPs), single-marker and gene-based association analyses were conducted to follow-up on promising linkage signals. Additive associations between each SNP and lipid phenotypes were tested using mixed linear regression models. Gene-based analyses were performed by combining P-values from single-marker analyses within each gene using the truncated product method (TPM). Significant associations were assessed for replication among 777 Asian participants of the Multi-ethnic Study of Atherosclerosis (MESA). Bonferroni correction was used to adjust for multiple testing. In the GenSalt study, suggestive linkage signals were identified at 2p11.2–2q12.1 [maximum multipoint LOD score (MML) = 2.18 at 2q11.2] and 11q24.3–11q25 (MML = 2.29 at 11q25) for the log-transformed triglyceride phenotype. Follow-up analyses of these two regions revealed gene-based associations of charged multivesicular body protein 3 (CHMP3), ring finger protein 103 (RNF103), AF4/FMR2 family, member 3 (AFF3), and neurotrimin (NTM ) with triglycerides (P = 4 × 10−4, 1.00 × 10−5, 2.00 × 10−5, and 1.00 × 10−7, respectively). Both the AFF3 and NTM triglyceride associations were replicated among MESA study participants (P = 1.00 × 10−7 and 8.00 × 10−5, respectively). Furthermore, NTM explained the linkage signal on chromosome 11. In conclusion, we identified novel genes associated with lipid phenotypes in linkage regions on chromosomes 2 and 11. PMID:25819087

  16. Population Genomics of Sub-Saharan Drosophila melanogaster: African Diversity and Non-African Admixture

    PubMed Central

    Pool, John E.; Corbett-Detig, Russell B.; Sugino, Ryuichi P.; Stevens, Kristian A.; Cardeno, Charis M.; Crepeau, Marc W.; Duchen, Pablo; Emerson, J. J.; Saelao, Perot; Begun, David J.; Langley, Charles H.

    2012-01-01

    Drosophila melanogaster has played a pivotal role in the development of modern population genetics. However, many basic questions regarding the demographic and adaptive history of this species remain unresolved. We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection within an African population, between African populations, and between European and African populations. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa FST were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations. PMID:23284287

  17. GenomeVIP: a cloud platform for genomic variant discovery and interpretation

    PubMed Central

    Mashl, R. Jay; Scott, Adam D.; Huang, Kuan-lin; Wyczalkowski, Matthew A.; Yoon, Christopher J.; Niu, Beifang; DeNardo, Erin; Yellapantula, Venkata D.; Handsaker, Robert E.; Chen, Ken; Koboldt, Daniel C.; Ye, Kai; Fenyö, David; Raphael, Benjamin J.; Wendl, Michael C.; Ding, Li

    2017-01-01

    Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets. PMID:28522612

  18. Relative extended haplotype homozygosity signals across breeds reveal dairy and beef specific signatures of selection.

    PubMed

    Bomba, Lorenzo; Nicolazzi, Ezequiel L; Milanesi, Marco; Negrini, Riccardo; Mancini, Giordano; Biscarini, Filippo; Stella, Alessandra; Valentini, Alessio; Ajmone-Marsan, Paolo

    2015-04-02

    A number of methods are available to scan a genome for selection signatures by evaluating patterns of diversity within and between breeds. Among these, "extended haplotype homozygosity" (EHH) is a reliable approach to detect genome regions under recent selective pressure. The objective of this study was to use this approach to identify regions that are under recent positive selection and shared by the most representative Italian dairy and beef cattle breeds. A total of 3220 animals from Italian Holstein (2179), Italian Brown (775), Simmental (493), Marchigiana (485) and Piedmontese (379) breeds were genotyped with the Illumina BovineSNP50 BeadChip v.1. After standard quality control procedures, genotypes were phased and core haplotypes were identified. The decay of linkage disequilibrium (LD) for each core haplotype was assessed by measuring the EHH. Since accurate estimates of local recombination rates were not available, relative EHH (rEHH) was calculated for each core haplotype. Genomic regions that carry frequent core haplotypes and with significant rEHH values were considered as candidates for recent positive selection. Candidate regions were aligned across to identify signals shared by dairy or beef cattle breeds. Overall, 82 and 87 common regions were detected among dairy and beef cattle breeds, respectively. Bioinformatic analysis identified 244 and 232 genes in these common genomic regions. Gene annotation and pathway analysis showed that these genes are involved in molecular functions that are biologically related to milk or meat production. Our results suggest that a multi-breed approach can lead to the identification of genomic signatures in breeds of cattle that are selected for the same production goal and thus to the localisation of genomic regions of interest in dairy and beef production.

  19. Detection of selective sweeps in structured populations: a comparison of recent methods.

    PubMed

    Vatsiou, Alexandra I; Bazin, Eric; Gaggiotti, Oscar E

    2016-01-01

    Identifying genomic regions targeted by positive selection has been a long-standing interest of evolutionary biologists. This objective was difficult to achieve until the recent emergence of next-generation sequencing, which is fostering the development of large-scale catalogues of genetic variation for increasing number of species. Several statistical methods have been recently developed to analyse these rich data sets, but there is still a poor understanding of the conditions under which these methods produce reliable results. This study aims at filling this gap by assessing the performance of genome-scan methods that consider explicitly the physical linkage among SNPs surrounding a selected variant. Our study compares the performance of seven recent methods for the detection of selective sweeps (iHS, nSL, EHHST, xp-EHH, XP-EHHST, XPCLR and hapFLK). We use an individual-based simulation approach to investigate the power and accuracy of these methods under a wide range of population models under both hard and soft sweeps. Our results indicate that XPCLR and hapFLK perform best and can detect soft sweeps under simple population structure scenarios if migration rate is low. All methods perform poorly with moderate-to-high migration rates, or with weak selection and very poorly under a hierarchical population structure. Finally, no single method is able to detect both starting and nearly completed selective sweeps. However, combining several methods (XPCLR or hapFLK with iHS or nSL) can greatly increase the power to pinpoint the selected region. © 2015 John Wiley & Sons Ltd.

  20. Laser Stimulated Genomic Exchange in Stem Cells. Laser Non-cloning Techniques

    NASA Astrophysics Data System (ADS)

    Stefan, V. Alexander

    2012-02-01

    I propose a novel technique for a pluripotent stem cell generation. Genomic exchange is stimulated by the beat-wave free electron laser, (B-W FEL), frequency matching with the frequencies of the DNAootnotetextJ.D. Watson and F. H. C. Crick, Nature, 171, 737-738 (1953). eigen-oscillations. B-W FEL-1ootnotetextV. Stefan, B.I.Cohen, C. Joshi Science, 243,4890, (Jan 27,1989); Stefan, et al., Bull. APS. 32, No. 9, 1713 (1987); Stefan, APS March-2011, #S1.143; APS- March-2009, #K1.276. scans entire stem cell; B-W FEL-2 probes the chromosomes. The scanning and probing lasers: 300-500nm and 100-300nm, respectively; irradiances: the order-of-10s mW/cm^2 (above the threshold value for a particular gene structure); repetition rate of few-100s Hz. A variety of genetic-matching conditions can be arranged. Genomic glitches, (the cell nucleus transferootnotetextScott Noggle et al. Nature, 478, 70-75 (06 October 2011).), can be hedged by the use of lasers.

  1. Genome-wide transposon insertion scanning of environmental survival functions in the polycyclic aromatic hydrocarbon degrading bacterium Sphingomonas wittichii RW1.

    PubMed

    Roggo, Clémence; Coronado, Edith; Moreno-Forero, Silvia K; Harshman, Keith; Weber, Johann; van der Meer, Jan Roelof

    2013-10-01

    Sphingomonas wittichii RW1 is a dibenzofuran and dibenzodioxin-degrading bacterium with potentially interesting properties for bioaugmentation of contaminated sites. In order to understand the capacity of the microorganism to survive in the environment we used a genome-wide transposon scanning approach. RW1 transposon libraries were generated with around 22,000 independent insertions. Libraries were grown for an average of 50 generations (five successive passages in batch liquid medium) with salicylate as sole carbon and energy source in presence or absence of salt stress at -1.5 MPa. Alternatively, libraries were grown in sand with salicylate, at 50% water holding capacity, for 4 and 10 days (equivalent to 7 generations). Library DNA was recovered from the different growth conditions and scanned by ultrahigh throughput sequencing for the positions and numbers of inserted transposed kanamycin resistance gene. No transposon reads were recovered in 579 genes (10% of all annotated genes in the RW1 genome) in any of the libraries, suggesting those to be essential for survival under the used conditions. Libraries recovered from sand differed strongly from those incubated in liquid batch medium. In particular, important functions for survival of cells in sand at the short term concerned nutrient scavenging, energy metabolism and motility. In contrast to this, fatty acid metabolism and oxidative stress response were essential for longer term survival of cells in sand. Comparison to transcriptome data suggested important functions in sand for flagellar movement, pili synthesis, trehalose and polysaccharide synthesis and putative cell surface antigen proteins. Interestingly, a variety of genes were also identified, interruption of which cause significant increase in fitness during growth on salicylate. One of these was an Lrp family transcription regulator and mutants in this gene covered more than 90% of the total library after 50 generations of growth on salicylate. Our results demonstrate the power of genome-wide transposon scanning approaches for analysis of complex traits. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  2. Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci.

    PubMed

    Yap, John Stephen; Fan, Jianqing; Wu, Rongling

    2009-12-01

    Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.

  3. Construction of a combinatorial pipeline using two somatic variant  calling  methods  for whole exome sequence data of gastric cancer.

    PubMed

    Kohmoto, Tomohiro; Masuda, Kiyoshi; Naruto, Takuya; Tange, Shoichiro; Shoda, Katsutoshi; Hamada, Junichi; Saito, Masako; Ichikawa, Daisuke; Tajima, Atsushi; Otsuji, Eigo; Imoto, Issei

    2017-01-01

    High-throughput next-generation sequencing is a powerful tool to identify the genotypic landscapes of somatic variants and therapeutic targets in various cancers including gastric cancer, forming the basis for personalized medicine in the clinical setting. Although the advent of many computational algorithms leads to higher accuracy in somatic variant calling, no standard method exists due to the limitations of each method. Here, we constructed a new pipeline. We combined two different somatic variant callers with different algorithms, Strelka and VarScan 2, and evaluated performance using whole exome sequencing data obtained from 19 Japanese cases with gastric cancer (GC); then, we characterized these tumors based on identified driver molecular alterations. More single nucleotide variants (SNVs) and small insertions/deletions were detected by Strelka and VarScan 2, respectively. SNVs detected by both tools showed higher accuracy for estimating somatic variants compared with those detected by only one of the two tools and accurately showed the mutation signature and mutations of driver genes reported for GC. Our combinatorial pipeline may have an advantage in detection of somatic mutations in GC and may be useful for further genomic characterization of Japanese patients with GC to improve the efficacy of GC treatments. J. Med. Invest. 64: 233-240, August, 2017.

  4. Automated detection system of single nucleotide polymorphisms using two kinds of functional magnetic nanoparticles

    NASA Astrophysics Data System (ADS)

    Liu, Hongna; Li, Song; Wang, Zhifei; Li, Zhiyang; Deng, Yan; Wang, Hua; Shi, Zhiyang; He, Nongyue

    2008-11-01

    Single nucleotide polymorphisms (SNPs) comprise the most abundant source of genetic variation in the human genome wide codominant SNPs identification. Therefore, large-scale codominant SNPs identification, especially for those associated with complex diseases, has induced the need for completely high-throughput and automated SNP genotyping method. Herein, we present an automated detection system of SNPs based on two kinds of functional magnetic nanoparticles (MNPs) and dual-color hybridization. The amido-modified MNPs (NH 2-MNPs) modified with APTES were used for DNA extraction from whole blood directly by electrostatic reaction, and followed by PCR, was successfully performed. Furthermore, biotinylated PCR products were captured on the streptavidin-coated MNPs (SA-MNPs) and interrogated by hybridization with a pair of dual-color probes to determine SNP, then the genotype of each sample can be simultaneously identified by scanning the microarray printed with the denatured fluorescent probes. This system provided a rapid, sensitive and highly versatile automated procedure that will greatly facilitate the analysis of different known SNPs in human genome.

  5. Genetic Influences on Conduct Disorder

    PubMed Central

    Salvatore, Jessica E.; Dick, Danielle M.

    2016-01-01

    Conduct disorder (CD) is a moderately heritable psychiatric disorder of childhood and adolescence characterized by aggression toward people and animals, destruction of property, deceitfulness or theft, and serious violation of rules. Genome-wide scans using linkage and association methods have identified a number of suggestive genomic regions that are pending replication. A small number of candidate genes (e.g., GABRA2, MAOA, SLC6A4, AVPR1A) are associated with CD related phenotypes across independent studies; however, failures to replicate also exist. Studies of gene-environment interplay show that CD genetic predispositions also contribute to selection into higher-risk environments, and that environmental factors can alter the importance of CD genetic factors and differentially methylate CD candidate genes. The field’s understanding of CD etiology will benefit from larger, adequately powered studies in gene identification efforts; the incorporation of polygenic approaches in gene-environment interplay studies; attention to the mechanisms of risk from genes to brain to behavior; and the use of genetically informative data to test quasi-causal hypotheses about purported risk factors. PMID:27350097

  6. TEAM: efficient two-locus epistasis tests in human genome-wide association study.

    PubMed

    Zhang, Xiang; Huang, Shunping; Zou, Fei; Wang, Wei

    2010-06-15

    As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene-gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus, existing methods are not readily applicable to human datasets. In this article, we propose an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e. it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both family-wise error rate and false discovery rate controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speed up over the brute force approach.

  7. Genomic scan of selective sweeps in thin and fat tail sheep breeds for identifying of candidate regions associated with fat deposition

    PubMed Central

    2012-01-01

    Background Identification of genomic regions that have been targets of selection for phenotypic traits is one of the most important and challenging areas of research in animal genetics. However, currently there are relatively few genomic regions identified that have been subject to positive selection. In this study, a genome-wide scan using ~50,000 Single Nucleotide Polymorphisms (SNPs) was performed in an attempt to identify genomic regions associated with fat deposition in fat-tail breeds. This trait and its modification are very important in those countries grazing these breeds. Results Two independent experiments using either Iranian or Ovine HapMap genotyping data contrasted thin and fat tail breeds. Population differentiation using FST in Iranian thin and fat tail breeds revealed seven genomic regions. Almost all of these regions overlapped with QTLs that had previously been identified as affecting fat and carcass yield traits in beef and dairy cattle. Study of selection sweep signatures using FST in thin and fat tail breeds sampled from the Ovine HapMap project confirmed three of these regions located on Chromosomes 5, 7 and X. We found increased homozygosity in these regions in favour of fat tail breeds on chromosome 5 and X and in favour of thin tail breeds on chromosome 7. Conclusions In this study, we were able to identify three novel regions associated with fat deposition in thin and fat tail sheep breeds. Two of these were associated with an increase of homozygosity in the fat tail breeds which would be consistent with selection for mutations affecting fat tail size several thousand years after domestication. PMID:22364287

  8. A Genome Scan Conducted in a Multigenerational Pedigree with Convergent Strabismus Supports a Complex Genetic Determinism

    PubMed Central

    Georges, Anouk; Cambisano, Nadine; Ahariz, Naïma; Karim, Latifa; Georges, Michel

    2013-01-01

    A genome-wide linkage scan was conducted in a Northern-European multigenerational pedigree with nine of 40 related members affected with concomitant strabismus. Twenty-seven members of the pedigree including all affected individuals were genotyped using a SNP array interrogating > 300,000 common SNPs. We conducted parametric and non-parametric linkage analyses assuming segregation of an autosomal dominant mutation, yet allowing for incomplete penetrance and phenocopies. We detected two chromosome regions with near-suggestive evidence for linkage, respectively on chromosomes 8 and 18. The chromosome 8 linkage implied a penetrance of 0.80 and a rate of phenocopy of 0.11, while the chromosome 18 linkage implied a penetrance of 0.64 and a rate of phenocopy of 0. Our analysis excludes a simple genetic determinism of strabismus in this pedigree. PMID:24376720

  9. A genome scan conducted in a multigenerational pedigree with convergent strabismus supports a complex genetic determinism.

    PubMed

    Georges, Anouk; Cambisano, Nadine; Ahariz, Naïma; Karim, Latifa; Georges, Michel

    2013-01-01

    A genome-wide linkage scan was conducted in a Northern-European multigenerational pedigree with nine of 40 related members affected with concomitant strabismus. Twenty-seven members of the pedigree including all affected individuals were genotyped using a SNP array interrogating > 300,000 common SNPs. We conducted parametric and non-parametric linkage analyses assuming segregation of an autosomal dominant mutation, yet allowing for incomplete penetrance and phenocopies. We detected two chromosome regions with near-suggestive evidence for linkage, respectively on chromosomes 8 and 18. The chromosome 8 linkage implied a penetrance of 0.80 and a rate of phenocopy of 0.11, while the chromosome 18 linkage implied a penetrance of 0.64 and a rate of phenocopy of 0. Our analysis excludes a simple genetic determinism of strabismus in this pedigree.

  10. Whole-genome scan identifies quantitative trait loci for chronic pastern dermatitis in German draft horses.

    PubMed

    Mittmann, E Henrike; Mömke, Stefanie; Distl, Ottmar

    2010-02-01

    Chronic pastern dermatitis (CPD), also known as chronic progressive lymphedema (CPL), is a skin disease that affects draft horses. This disease causes painful lower-leg swelling, nodule formation, and skin ulceration, interfering with movement. The aim of this whole-genome scan was to identify quantitative trait loci (QTL) for CPD in German draft horses. We recorded clinical data for CPD in 917 German draft horses and collected blood samples from these horses. Of these 917 horses, 31 paternal half-sib families comprising 378 horses from the breeds Rhenish German, Schleswig, Saxon-Thuringian, and South German were chosen for genotyping. Each half-sib family was constituted by only one draft horse breed. Genotyping was done for 318 polymorphic microsatellites evenly distributed on all equine autosomes and the X chromosome with a mean distance of 7.5 Mb. An across-breed multipoint linkage analysis revealed chromosome-wide significant QTL on horse chromosomes (ECA) 1, 9, 16, and 17. Analyses by breed confirmed the QTL on ECA1 in South German and the QTL on ECA9, 16, and 17 in Saxon-Thuringian draft horses. For the Rhenish German and Schleswig draft horses, additional QTL on ECA4 and 10 and for the South German draft horses an additional QTL on ECA7 were found. This is the first whole-genome scan for CPD in draft horses and it is an important step toward the identification of candidate genes.

  11. Genomic analysis of thermophilic Bacillus coagulans strains: efficient producers for platform bio-chemicals.

    PubMed

    Su, Fei; Xu, Ping

    2014-01-29

    Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species.

  12. Genomic analysis of thermophilic Bacillus coagulans strains: efficient producers for platform bio-chemicals

    PubMed Central

    Su, Fei; Xu, Ping

    2014-01-01

    Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species. PMID:24473268

  13. A genomic perspective on the generation and maintenance of genetic diversity in herbivorous insects

    PubMed Central

    Gloss, Andrew D.; Groen, Simon C.; Whiteman, Noah K.

    2017-01-01

    Understanding the processes that generate and maintain genetic variation within populations is a central goal in evolutionary biology. Theory predicts that some of this variation is maintained as a consequence of adapting to variable habitats. Studies in herbivorous insects have played a key role in confirming this prediction. Here, we highlight theoretical and conceptual models for the maintenance of genetic diversity in herbivorous insects, empirical genomic studies testing these models, and pressing questions within the realm of evolutionary and functional genomic studies. To address key gaps, we propose an integrative approach combining population genomic scans for adaptation, genome-wide characterization of targets of selection through experimental manipulations, mapping the genetic architecture of traits influencing fitness, and functional studies. We also stress the importance of studying the maintenance of genetic variation across biological scales—from variation within populations to divergence among populations—to form a comprehensive view of adaptation in herbivorous insects. PMID:28736510

  14. A genome-wide search for linkage of estimated glomerular filtration rate (eGFR) in the Family Investigation of Nephropathy and Diabetes (FIND).

    PubMed

    Thameem, Farook; Igo, Robert P; Freedman, Barry I; Langefeld, Carl; Hanson, Robert L; Schelling, Jeffrey R; Elston, Robert C; Duggirala, Ravindranath; Nicholas, Susanne B; Goddard, Katrina A B; Divers, Jasmin; Guo, Xiuqing; Ipp, Eli; Kimmel, Paul L; Meoni, Lucy A; Shah, Vallabh O; Smith, Michael W; Winkler, Cheryl A; Zager, Philip G; Knowler, William C; Nelson, Robert G; Pahl, Madeline V; Parekh, Rulan S; Kao, W H Linda; Rasooly, Rebekah S; Adler, Sharon G; Abboud, Hanna E; Iyengar, Sudha K; Sedor, John R

    2013-01-01

    Estimated glomerular filtration rate (eGFR), a measure of kidney function, is heritable, suggesting that genes influence renal function. Genes that influence eGFR have been identified through genome-wide association studies. However, family-based linkage approaches may identify loci that explain a larger proportion of the heritability. This study used genome-wide linkage and association scans to identify quantitative trait loci (QTL) that influence eGFR. Genome-wide linkage and sparse association scans of eGFR were performed in families ascertained by probands with advanced diabetic nephropathy (DN) from the multi-ethnic Family Investigation of Nephropathy and Diabetes (FIND) study. This study included 954 African Americans (AA), 781 American Indians (AI), 614 European Americans (EA) and 1,611 Mexican Americans (MA). A total of 3,960 FIND participants were genotyped for 6,000 single nucleotide polymorphisms (SNPs) using the Illumina Linkage IVb panel. GFR was estimated by the Modification of Diet in Renal Disease (MDRD) formula. The non-parametric linkage analysis, accounting for the effects of diabetes duration and BMI, identified the strongest evidence for linkage of eGFR on chromosome 20q11 (log of the odds [LOD] = 3.34; P = 4.4 × 10(-5)) in MA and chromosome 15q12 (LOD = 2.84; P = 1.5 × 10(-4)) in EA. In all subjects, the strongest linkage signal for eGFR was detected on chromosome 10p12 (P = 5.5 × 10(-4)) at 44 cM near marker rs1339048. A subsequent association scan in both ancestry-specific groups and the entire population identified several SNPs significantly associated with eGFR across the genome. The present study describes the localization of QTL influencing eGFR on 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups participating in the FIND study. Identification of causal genes/variants influencing eGFR, within these linkage and association loci, will open new avenues for functional analyses and development of novel diagnostic markers for DN.

  15. Divergence hitchhiking and the spread of genomic isolation during ecological speciation-with-gene-flow

    PubMed Central

    Via, Sara

    2012-01-01

    In allopatric populations, geographical separation simultaneously isolates the entire genome, allowing genetic divergence to accumulate virtually anywhere in the genome. In sympatric populations, however, the strong divergent selection required to overcome migration produces a genetic mosaic of divergent and non-divergent genomic regions. In some recent genome scans, each divergent genomic region has been interpreted as an independent incidence of migration/selection balance, such that the reduction of gene exchange is restricted to a few kilobases around each divergently selected gene. I propose an alternative mechanism, ‘divergence hitchhiking’ (DH), in which divergent selection can reduce gene exchange for several megabases around a gene under strong divergent selection. Not all genes/markers within a DH region are divergently selected, yet the entire region is protected to some degree from gene exchange, permitting genetic divergence from mechanisms other than divergent selection to accumulate secondarily. After contrasting DH and multilocus migration/selection balance (MM/SB), I outline a model in which genomic isolation at a given genomic location is jointly determined by DH and genome-wide effects of the progressive reduction in realized migration, then illustrate DH using data from several pairs of incipient species in the wild. PMID:22201174

  16. Tracking genes of ecological relevance using a genome scan in two independent regional population samples of Arabis alpina.

    PubMed

    Poncet, Bénédicte N; Herrmann, Doris; Gugerli, Felix; Taberlet, Pierre; Holderegger, Rolf; Gielly, Ludovic; Rioux, Delphine; Thuiller, Wilfried; Aubert, Serge; Manel, Stéphanie

    2010-07-01

    Understanding the genetic basis of adaptation in response to environmental variation is fundamental as adaptation plays a key role in the extension of ecological niches to marginal habitats and in ecological speciation. Based on the assumption that some genomic markers are correlated to environmental variables, we aimed to detect loci of ecological relevance in the alpine plant Arabis alpina L. sampled in two regions, the French (99 locations) and the Swiss (109 locations) Alps. We used an unusually large genome scan [825 amplified fragment length polymorphism loci (AFLPs)] and four environmental variables related to temperature, precipitation and topography. We detected linkage disequilibrium among only 3.5% of the considered AFLP loci. A population structure analysis identified no admixture in the study regions, and the French and Swiss Alps were differentiated and therefore could be considered as two independent regions. We applied generalized estimating equations (GEE) to detect ecologically relevant loci separately in the French and Swiss Alps. We identified 78 loci of ecological relevance (9%), which were mainly related to mean annual minimum temperature. Only four of these loci were common across the French and Swiss Alps. Finally, we discuss that the genomic characterization of these ecologically relevant loci, as identified in this study, opens up new perspectives for studying functional ecology in A. alpina, its relatives and other alpine plant species.

  17. Genome scanning of Amazonian Plasmodium falciparum shows subtelomeric instability and clindamycin-resistant parasites

    PubMed Central

    Dharia, Neekesh V.; Plouffe, David; Bopp, Selina E.R.; González-Páez, Gonzalo E.; Lucas, Carmen; Salas, Carola; Soberon, Valeria; Bursulaya, Badry; Kochel, Tadeusz J.; Bacon, David J.; Winzeler, Elizabeth A.

    2010-01-01

    Here, we fully characterize the genomes of 14 Plasmodium falciparum patient isolates taken recently from the Iquitos region using genome scanning, a microarray-based technique that delineates the majority of single-base changes, indels, and copy number variants distinguishing the coding regions of two clones. We show that the parasite population in the Peruvian Amazon bears a limited number of genotypes and low recombination frequencies. Despite the essentially clonal nature of some isolates, we see high frequencies of mutations in subtelomeric highly variable genes and internal var genes, indicating mutations arising during self-mating or mitotic replication. The data also reveal that one or two meioses separate different isolates, showing that P. falciparum clones isolated from different individuals in defined geographical regions could be useful in linkage analyses or quantitative trait locus studies. Through pairwise comparisons of different isolates we discovered point mutations in the apicoplast genome that are close to known mutations that confer clindamycin resistance in other species, but which were hitherto unknown in malaria parasites. Subsequent drug sensitivity testing revealed over 100-fold increase of clindamycin EC50 in strains harboring one of these mutations. This evidence of clindamycin-resistant parasites in the Amazon suggests that a shift should be made in health policy away from quinine + clindamycin therapy for malaria in pregnant women and infants, and that the development of new lincosamide antibiotics for malaria should be reconsidered. PMID:20829224

  18. Genomic islands of divergence are not affected by geography of speciation in sunflowers.

    PubMed

    Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

    2013-01-01

    Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.

  19. A knowledge base for tracking the impact of genomics on population health.

    PubMed

    Yu, Wei; Gwinn, Marta; Dotson, W David; Green, Ridgely Fisk; Clyne, Mindy; Wulf, Anja; Bowen, Scott; Kolor, Katherine; Khoury, Muin J

    2016-12-01

    We created an online knowledge base (the Public Health Genomics Knowledge Base (PHGKB)) to provide systematically curated and updated information that bridges population-based research on genomics with clinical and public health applications. Weekly horizon scanning of a wide variety of online resources is used to retrieve relevant scientific publications, guidelines, and commentaries. After curation by domain experts, links are deposited into Web-based databases. PHGKB currently consists of nine component databases. Users can search the entire knowledge base or search one or more component databases directly and choose options for customizing the display of their search results. PHGKB offers researchers, policy makers, practitioners, and the general public a way to find information they need to understand the complicated landscape of genomics and population health.Genet Med 18 12, 1312-1314.

  20. Genomic scan for genes predisposing to schizophrenia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coon, H.; Jensen. S.; Holik, J.

    1994-03-15

    We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55more » also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.« less

  1. Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis.

    PubMed

    Cho, Seoae; Kim, Haseong; Oh, Sohee; Kim, Kyunga; Park, Taesung

    2009-12-15

    The current trend in genome-wide association studies is to identify regions where the true disease-causing genes may lie by evaluating thousands of single-nucleotide polymorphisms (SNPs) across the whole genome. However, many challenges exist in detecting disease-causing genes among the thousands of SNPs. Examples include multicollinearity and multiple testing issues, especially when a large number of correlated SNPs are simultaneously tested. Multicollinearity can often occur when predictor variables in a multiple regression model are highly correlated, and can cause imprecise estimation of association. In this study, we propose a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic-net regularization, a variable selection method that allows one to address multicollinearity. At Step 1, the single-marker association analysis was conducted to screen SNPs. At Step 2, the multiple-marker association was scanned based on the elastic-net regularization. The proposed approach was applied to the rheumatoid arthritis (RA) case-control data set of Genetic Analysis Workshop 16. While the selected SNPs at the screening step are located mostly on chromosome 6, the elastic-net approach identified putative RA-related SNPs on other chromosomes in an increased proportion. For some of those putative RA-related SNPs, we identified the interactions with sex, a well known factor affecting RA susceptibility.

  2. Landscape genomics of Sphaeralcea ambigua in the Mojave Desert: a multivariate, spatially-explicit approach to guide ecological restoration

    USGS Publications Warehouse

    Shryock, Daniel F.; Havrilla, Caroline A.; DeFalco, Lesley; Esque, Todd C.; Custer, Nathan; Wood, Troy E.

    2015-01-01

    Local adaptation influences plant species’ responses to climate change and their performance in ecological restoration. Fine-scale physiological or phenological adaptations that direct demographic processes may drive intraspecific variability when baseline environmental conditions change. Landscape genomics characterize adaptive differentiation by identifying environmental drivers of adaptive genetic variability and mapping the associated landscape patterns. We applied such an approach to Sphaeralcea ambigua, an important restoration plant in the arid southwestern United States, by analyzing variation at 153 amplified fragment length polymorphism loci in the context of environmental gradients separating 47 Mojave Desert populations. We identified 37 potentially adaptive loci through a combination of genome scan approaches. We then used a generalized dissimilarity model (GDM) to relate variability in potentially adaptive loci with spatial gradients in temperature, precipitation, and topography. We identified non-linear thresholds in loci frequencies driven by summer maximum temperature and water stress, along with continuous variation corresponding to temperature seasonality. Two GDM-based approaches for mapping predicted patterns of local adaptation are compared. Additionally, we assess uncertainty in spatial interpolations through a novel spatial bootstrapping approach. Our study presents robust, accessible methods for deriving spatially-explicit models of adaptive genetic variability in non-model species that will inform climate change modelling and ecological restoration.

  3. A genome-wide scan for signatures of selection in Azeri and Khuzestani buffalo breeds.

    PubMed

    Mokhber, Mahdi; Moradi-Shahrbabak, Mohammad; Sadeghi, Mostafa; Moradi-Shahrbabak, Hossein; Stella, Alessandra; Nicolzzi, Ezequiel; Rahmaninia, Javad; Williams, John L

    2018-06-11

    Identification of genomic regions that have been targets of selection may shed light on the genetic history of livestock populations and help to identify variation controlling commercially important phenotypes. The Azeri and Kuzestani buffalos are the most common indigenous Iranian breeds which have been subjected to divergent selection and are well adapted to completely different regions. Examining the genetic structure of these populations may identify genomic regions associated with adaptation to the different environments and production goals. A set of 385 water buffalo samples from Azeri (N = 262) and Khuzestani (N = 123) breeds were genotyped using the Axiom® Buffalo Genotyping 90 K Array. The unbiased fixation index method (F ST ) was used to detect signatures of selection. In total, 13 regions with outlier F ST values (0.1%) were identified. Annotation of these regions using the UMD3.1 Bos taurus Genome Assembly was performed to find putative candidate genes and QTLs within the selected regions. Putative candidate genes identified include FBXO9, NDFIP1, ACTR3, ARHGAP26, SERPINF2, BOLA-DRB3, BOLA-DQB, CLN8, and MYOM2. Candidate genes identified in regions potentially under selection were associated with physiological pathways including milk production, cytoskeleton organization, growth, metabolic function, apoptosis and domestication-related changes include immune and nervous system development. The QTL identified are involved in economically important traits in buffalo related to milk composition, udder structure, somatic cell count, meat quality, and carcass and body weight.

  4. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    PubMed

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Copyright © 2016 Teng et al.

  5. Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome

    PubMed Central

    Shedlock, Andrew M.; Botka, Christopher W.; Zhao, Shaying; Shetty, Jyoti; Zhang, Tingting; Liu, Jun S.; Deschavanne, Patrick J.; Edwards, Scott V.

    2007-01-01

    We report results of a megabase-scale phylogenomic analysis of the Reptilia, the sister group of mammals. Large-scale end-sequence scanning of genomic clones of a turtle, alligator, and lizard reveals diverse, mammal-like landscapes of retroelements and simple sequence repeats (SSRs) not found in the chicken. Several global genomic traits, including distinctive phylogenetic lineages of CR1-like long interspersed elements (LINEs) and a paucity of A-T rich SSRs, characterize turtles and archosaur genomes, whereas higher frequencies of tandem repeats and a lower global GC content reveal mammal-like features in Anolis. Nonavian reptile genomes also possess a high frequency of diverse and novel 50-bp unit tandem duplications not found in chicken or mammals. The frequency distributions of ≈65,000 8-mer oligonucleotides suggest that rates of DNA-word frequency change are an order of magnitude slower in reptiles than in mammals. These results suggest a diverse array of interspersed and SSRs in the common ancestor of amniotes and a genomic conservatism and gradual loss of retroelements in reptiles that culminated in the minimalist chicken genome. PMID:17307883

  6. A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation

    PubMed Central

    Adhikari, Kaustubh; Fuentes-Guajardo, Macarena; Quinto-Sánchez, Mirsha; Mendoza-Revilla, Javier; Camilo Chacón-Duque, Juan; Acuña-Alonzo, Victor; Jaramillo, Claudia; Arias, William; Lozano, Rodrigo Barquera; Pérez, Gastón Macín; Gómez-Valdés, Jorge; Villamil-Ramírez, Hugo; Hunemeier, Tábita; Ramallo, Virginia; Silva de Cerqueira, Caio C.; Hurtado, Malena; Villegas, Valeria; Granja, Vanessa; Gallo, Carla; Poletti, Giovanni; Schuler-Faccini, Lavinia; Salzano, Francisco M.; Bortolini, Maria- Cátira; Canizales-Quinteros, Samuel; Cheeseman, Michael; Rosique, Javier; Bedoya, Gabriel; Rothhammer, Francisco; Headon, Denis; González-José, Rolando; Balding, David; Ruiz-Linares, Andrés

    2016-01-01

    We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association (P values<5 × 10−8) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subsample of ∼3,000 individuals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3. Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR, DCHS2, RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1. Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion. PMID:27193062

  7. A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation.

    PubMed

    Adhikari, Kaustubh; Fuentes-Guajardo, Macarena; Quinto-Sánchez, Mirsha; Mendoza-Revilla, Javier; Camilo Chacón-Duque, Juan; Acuña-Alonzo, Victor; Jaramillo, Claudia; Arias, William; Lozano, Rodrigo Barquera; Pérez, Gastón Macín; Gómez-Valdés, Jorge; Villamil-Ramírez, Hugo; Hunemeier, Tábita; Ramallo, Virginia; Silva de Cerqueira, Caio C; Hurtado, Malena; Villegas, Valeria; Granja, Vanessa; Gallo, Carla; Poletti, Giovanni; Schuler-Faccini, Lavinia; Salzano, Francisco M; Bortolini, Maria-Cátira; Canizales-Quinteros, Samuel; Cheeseman, Michael; Rosique, Javier; Bedoya, Gabriel; Rothhammer, Francisco; Headon, Denis; González-José, Rolando; Balding, David; Ruiz-Linares, Andrés

    2016-05-19

    We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association (P values<5 × 10(-8)) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subsample of ∼3,000 individuals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3. Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR, DCHS2, RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1. Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion.

  8. Genetic Predisposition to Self-Curing Infection with the Protozoan Leishmania chagasi: A Genome Wide Scan

    PubMed Central

    Jeronimo, Selma M. B.; Duggal, Priya; Ettinger, Nicholas A.; Nascimento, Eliana T.; Monteiro, Gloria R.; Cabral, Angela P.; Pontes, Núbia N.; Lacerda, Hênio G.; Queiroz, Paula V.; Maia, Carlos G.; Pearson, Richard D.; Blackwell, Jenefer M.; Beaty, Terri H.; Wilson, Mary E.

    2008-01-01

    The protozoan Leishmania chagasi can cause disseminated, fatal visceral leishmaniasis (VL) or asymptomatic human infection. We hypothesized that genetic factors contribute to this variable response to infection. A family study was performed in endemic neighborhoods near Natal, northeast Brazil. Subjects were assessed for VL or asymptomatic infection, defined as a positive delayed type hypersensitivity (DTH) skin test response to Leishmania antigen without disease symptoms. A genome scan of 405 microsatellite markers in 1254 subjects was analyzed for regions of linkage. The results indicated loci of potential linkage to DTH response on chromosomes 2, 13, 15 and 19, and a novel region of potential interest for VL on chromosome 9. An understanding of the genetic factors determining whether an individual will develop symptomatic or asymptomatic infection with L. chagasi may illuminate proteins essential for immune protection against this parasitic disease; findings could reveal strategies for immunotherapy or prevention. PMID:17955446

  9. Scanning the Human Genome for Novel Therapeutic Targets for Breast Cancer

    DTIC Science & Technology

    2005-04-01

    colon cancer genome, in sum representing only 34 annotated genes (Figure 3A). Consistent with its role in the pathogenesis of human cancers ( Ruas and...high-confidence list includes two previously established tumor suppressors, p16INKaA and TGFI3RII (Derynck et al., 2001; Ruas and Peters, 1998; Siegel...cancer. Nat Rev Cancer 4, 118-132. Chong, J. A., Tapia- Ramirez , J., Kim, S., Toledo-Aral, J. J., Zheng, Y., Boutros, M. C.. Altshuller, Y. M., Frohman

  10. Positive Selection Driving Cytoplasmic Genome Evolution of the Medicinally Important Ginseng Plant Genus Panax

    PubMed Central

    Jiang, Peng; Shi, Feng-Xue; Li, Ming-Rui; Liu, Bao; Wen, Jun; Xiao, Hong-Xing; Li, Lin-Feng

    2018-01-01

    Panax L. (the ginseng genus) is a shade-demanding group within the family Araliaceae and all of its species are of crucial significance in traditional Chinese medicine. Phylogenetic and biogeographic analyses demonstrated that two rounds of whole genome duplications accompanying with geographic and ecological isolations promoted the diversification of Panax species. However, contributions of the cytoplasmic genomes to the adaptive evolution of Panax species remained largely uninvestigated. In this study, we sequenced the chloroplast and mitochondrial genomes of 11 accessions belonging to seven Panax species. Our results show that heterogeneity in nucleotide substitution rate is abundant in both of the two cytoplasmic genomes, with the mitochondrial genome possessing more variants at the total level but the chloroplast showing higher sequence polymorphisms at the genic regions. Genome-wide scanning of positive selection identified five and 12 genes from the chloroplast and mitochondrial genomes, respectively. Functional analyses further revealed that these selected genes play important roles in plant development, cellular metabolism and adaptation. We therefore conclude that positive selection might be one of the potential evolutionary forces that shaped nucleotide variation pattern of these Panax species. In particular, the mitochondrial genes evolved under stronger selective pressure compared to the chloroplast genes. PMID:29670636

  11. Positive Selection Driving Cytoplasmic Genome Evolution of the Medicinally Important Ginseng Plant Genus Panax.

    PubMed

    Jiang, Peng; Shi, Feng-Xue; Li, Ming-Rui; Liu, Bao; Wen, Jun; Xiao, Hong-Xing; Li, Lin-Feng

    2018-01-01

    Panax L. (the ginseng genus) is a shade-demanding group within the family Araliaceae and all of its species are of crucial significance in traditional Chinese medicine. Phylogenetic and biogeographic analyses demonstrated that two rounds of whole genome duplications accompanying with geographic and ecological isolations promoted the diversification of Panax species. However, contributions of the cytoplasmic genomes to the adaptive evolution of Panax species remained largely uninvestigated. In this study, we sequenced the chloroplast and mitochondrial genomes of 11 accessions belonging to seven Panax species. Our results show that heterogeneity in nucleotide substitution rate is abundant in both of the two cytoplasmic genomes, with the mitochondrial genome possessing more variants at the total level but the chloroplast showing higher sequence polymorphisms at the genic regions. Genome-wide scanning of positive selection identified five and 12 genes from the chloroplast and mitochondrial genomes, respectively. Functional analyses further revealed that these selected genes play important roles in plant development, cellular metabolism and adaptation. We therefore conclude that positive selection might be one of the potential evolutionary forces that shaped nucleotide variation pattern of these Panax species. In particular, the mitochondrial genes evolved under stronger selective pressure compared to the chloroplast genes.

  12. PrionScan: an online database of predicted prion domains in complete proteomes.

    PubMed

    Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier

    2014-02-05

    Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.

  13. Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part II: Schizophrenia

    PubMed Central

    Lewis, Cathryn M.; Levinson, Douglas F.; Wise, Lesley H.; DeLisi, Lynn E.; Straub, Richard E.; Hovatta, Iiris; Williams, Nigel M.; Schwab, Sibylle G.; Pulver, Ann E.; Faraone, Stephen V.; Brzustowicz, Linda M.; Kaufmann, Charles A.; Garver, David L.; Gurling, Hugh M. D.; Lindholm, Eva; Coon, Hilary; Moises, Hans W.; Byerley, William; Shaw, Sarah H.; Mesen, Andrea; Sherrington, Robin; O’Neill, F. Anthony; Walsh, Dermot; Kendler, Kenneth S.; Ekelund, Jesper; Paunio, Tiina; Lönnqvist, Jouko; Peltonen, Leena; O’Donovan, Michael C.; Owen, Michael J.; Wildenauer, Dieter B.; Maier, Wolfgang; Nestadt, Gerald; Blouin, Jean-Louis; Antonarakis, Stylianos E.; Mowry, Bryan J.; Silverman, Jeremy M.; Crowe, Raymond R.; Cloninger, C. Robert; Tsuang, Ming T.; Malaspina, Dolores; Harkavy-Friedman, Jill M.; Svrakic, Dragan M.; Bassett, Anne S.; Holcomb, Jennifer; Kalsi, Gursharan; McQuillin, Andrew; Brynjolfson, Jon; Sigmundsson, Thordur; Petursson, Hannes; Jazin, Elena; Zoëga, Tomas; Helgason, Tomas

    2003-01-01

    Schizophrenia is a common disorder with high heritability and a 10-fold increase in risk to siblings of probands. Replication has been inconsistent for reports of significant genetic linkage. To assess evidence for linkage across studies, rank-based genome scan meta-analysis (GSMA) was applied to data from 20 schizophrenia genome scans. Each marker for each scan was assigned to 1 of 120 30-cM bins, with the bins ranked by linkage scores (1 = most significant) and the ranks averaged across studies (Ravg) and then weighted for sample size (\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} \\begin{equation*}\\sqrt{N[affected cases]}\\end{equation*}\\end{document}). A permutation test was used to compute the probability of observing, by chance, each bin’s average rank (PAvgRnk) or of observing it for a bin with the same place (first, second, etc.) in the order of average ranks in each permutation (Pord). The GSMA produced significant genomewide evidence for linkage on chromosome 2q (PAvgRnk<.000417). Two aggregate criteria for linkage were also met (clusters of nominally significant P values that did not occur in 1,000 replicates of the entire data set with no linkage present): 12 consecutive bins with both PAvgRnk and Pord<.05, including regions of chromosomes 5q, 3p, 11q, 6p, 1q, 22q, 8p, 20q, and 14p, and 19 consecutive bins with Pord<.05, additionally including regions of chromosomes 16q, 18q, 10p, 15q, 6q, and 17q. There is greater consistency of linkage results across studies than has been previously recognized. The results suggest that some or all of these regions contain loci that increase susceptibility to schizophrenia in diverse populations. PMID:12802786

  14. A novel nonsense mutation in the NDP gene in a Chinese family with Norrie disease

    PubMed Central

    Liu, Deyuan; Hu, Zhengmao; Peng, Yu; Yu, Changhong; Liu, Yalan; Mo, Xiaoyun; Li, Xiaoping; Lu, Lina; Xu, Xiaojuan; Su, Wei; Pan, Qian

    2010-01-01

    Purpose Norrie disease (ND), a rare X-linked recessive disorder, is characterized by congenital blindness and, occasionally, mental retardation and hearing loss. ND is caused by the Norrie Disease Protein gene (NDP), which codes for norrin, a cysteine-rich protein involved in ocular vascular development. Here, we report a novel mutation of NDP that was identified in a Chinese family in which three members displayed typical ND symptoms and other complex phenotypes, such as cerebellar atrophy, motor disorders, and mental disorders. Methods We conducted an extensive clinical examination of the proband and performed a computed tomography (CT) scan of his brain. Additionally, we performed ophthalmic examinations, haplotype analyses, and NDP DNA sequencing for 26 individuals from the proband’s extended family. Results The proband’s computed tomography scan, in which the fifth ventricle could be observed, indicated cerebellar atrophy. Genome scans and haplotype analyses traced the disease to chromosome Xp21.1-p11.22. Mutation screening of the NDP gene identified a novel nonsense mutation, c.343C>T, in this region. Conclusions Although recent research has shown that multiple different mutations can be responsible for the ND phenotype, additional research is needed to understand the mechanism responsible for the diverse phenotypes caused by mutations in the NDP gene. PMID:21179243

  15. Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

    PubMed

    Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

    2012-10-01

    In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.

  16. Soul on Silicon.

    ERIC Educational Resources Information Center

    Kurzweil, Raymond C.

    1994-01-01

    Summarizes recent advances in computer simulation and "reverse engineering" technologies, highlighting the Human Genome Project to scan the human genetic code; artificial retina chips to copy the human retina's neural organization; high-speed, high-resolution Magnetic Resonance Imaging scanners; and the virtual book. Discusses…

  17. Compartmental genomics in living cells revealed by single-cell nanobiopsy.

    PubMed

    Actis, Paolo; Maalouf, Michelle M; Kim, Hyunsung John; Lohith, Akshar; Vilozny, Boaz; Seger, R Adam; Pourmand, Nader

    2014-01-28

    The ability to study the molecular biology of living single cells in heterogeneous cell populations is essential for next generation analysis of cellular circuitry and function. Here, we developed a single-cell nanobiopsy platform based on scanning ion conductance microscopy (SICM) for continuous sampling of intracellular content from individual cells. The nanobiopsy platform uses electrowetting within a nanopipette to extract cellular material from living cells with minimal disruption of the cellular milieu. We demonstrate the subcellular resolution of the nanobiopsy platform by isolating small subpopulations of mitochondria from single living cells, and quantify mutant mitochondrial genomes in those single cells with high throughput sequencing technology. These findings may provide the foundation for dynamic subcellular genomic analysis.

  18. Copy Number Variation across European Populations

    PubMed Central

    Chen, Wanting; Hayward, Caroline; Wright, Alan F.; Hicks, Andrew A.; Vitart, Veronique; Knott, Sara; Wild, Sarah H.; Pramstaller, Peter P.; Wilson, James F.; Rudan, Igor; Porteous, David J.

    2011-01-01

    Genome analysis provides a powerful approach to test for evidence of genetic variation within and between geographical regions and local populations. Copy number variants which comprise insertions, deletions and duplications of genomic sequence provide one such convenient and informative source. Here, we investigate copy number variants from genome wide scans of single nucleotide polymorphisms in three European population isolates, the island of Vis in Croatia, the islands of Orkney in Scotland and the South Tyrol in Italy. We show that whereas the overall copy number variant frequencies are similar between populations, their distribution is highly specific to the population of origin, a finding which is supported by evidence for increased kinship correlation for specific copy number variants within populations. PMID:21829696

  19. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    PubMed

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  20. Genomic Physics. Multiple Laser Beam Treatment of Alzheimer's Disease

    NASA Astrophysics Data System (ADS)

    Stefan, V. Alexander

    2014-03-01

    The synapses affected by Alzheimer's disease can be rejuvenated by the multiple ultrashort wavelength laser beams.[2] The guiding lasers scan the whole area to detect the amyloid plaques based on the laser scattering technique. The scanning lasers pinpoint the areas with plaques and eliminate them. Laser interaction is highly efficient, because of the focusing capabilities and possibility for the identification of the damaging proteins by matching the protein oscillation eigen-frequency with laser frequency.[3] Supported by Nikola Tesla Labs, La Jolla, California, USA.

  1. New technologies provide insights into genetic basis of psychiatric disorders and explain their co-morbidity.

    PubMed

    Rudan, Igor

    2010-06-01

    The completion of Human Genome Project and the "HapMap" project was followed by translational activities from companies within the private sector. This led to the introduction of genome-wide scans based on hundreds of thousands of single nucleotide polymorphysms (SNP). These scans were based on common genetic variants in human populations. This new and powerful technology was then applied to the existing DNA-based datasets with information on psychiatric disorders. As a result, an unprecedented amount of novel scientific insights related to the underlying biology and genetics of psychiatric disorders was obtained. The dominant design of these studies, so called "genome-wide association studies" (GWAS), used statistical methods which minimized the risk of false positive reports and provided much greater power to detect genotype-phenotype associations. All findings were entirely data-driven rather than hypothesis-driven, which often made it difficult for researchers to understand or interpret the findings. Interestingly, this work in genetics is indicating how non-specific some genes are for psychiatric disorders, having associations in common for schizophrenia, bipolar disorder and autism. This suggests that the earlier stages of psychiatric disorders may be multi-valent and that early detection, coupled with a clearer understanding of the environmental factors, may allow prevention. At the present time, the rich "harvest" from GWAS still has very limited power to predict the variation in psychiatric disease status at individual level, typically explaining less than 5% of the total risk variance. The most recent studies of common genetic variation implicated the role of major histocompatibility complex in schizophrenia and other disorders. They also provided molecular evidence for a substantial polygenic component to the risk of psychiatric diseases, involving thousands of common alleles of very small effect. The studies of structural genetic variation, such as copy number variants (CNV), coupled with the efforts targeting rare genetic variation (using the emerging whole-genome "deep" sequencing technologies) will become the area of the greatest interest in the field of genetic epidemiology. This will be complemented by the studies of epigenetic phoenomena, changes of expression at a large scale and understanding gene-gene interactions in complex networks using systems biology approaches. A deeper understanding of the underlying biology of psychiatric disorders is essential to improve diagnoses and therapies of these diseases. New technologies - genome-wide association studies, imaging and the optical manipulation of neural circuits - are promising to provide novel insights and lead to new treatments.

  2. Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping

    PubMed Central

    Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H.; Hansen, Mark S. T.; Lawley, Cindy T.; Karlsson, Elinor K.; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Åke; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T.

    2011-01-01

    The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease. PMID:22022279

  3. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping.

    PubMed

    Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H; Hansen, Mark S T; Lawley, Cindy T; Karlsson, Elinor K; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Ake; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T

    2011-10-01

    The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.

  4. PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies.

    PubMed

    Zhang, Wenchao; Dai, Xinbin; Wang, Qishan; Xu, Shizhong; Zhao, Patrick X

    2016-05-01

    The term epistasis refers to interactions between multiple genetic loci. Genetic epistasis is important in regulating biological function and is considered to explain part of the 'missing heritability,' which involves marginal genetic effects that cannot be accounted for in genome-wide association studies. Thus, the study of epistasis is of great interest to geneticists. However, estimating epistatic effects for quantitative traits is challenging due to the large number of interaction effects that must be estimated, thus significantly increasing computing demands. Here, we present a new web server-based tool, the Pipeline for estimating EPIStatic genetic effects (PEPIS), for analyzing polygenic epistatic effects. The PEPIS software package is based on a new linear mixed model that has been used to predict the performance of hybrid rice. The PEPIS includes two main sub-pipelines: the first for kinship matrix calculation, and the second for polygenic component analyses and genome scanning for main and epistatic effects. To accommodate the demand for high-performance computation, the PEPIS utilizes C/C++ for mathematical matrix computing. In addition, the modules for kinship matrix calculations and main and epistatic-effect genome scanning employ parallel computing technology that effectively utilizes multiple computer nodes across our networked cluster, thus significantly improving the computational speed. For example, when analyzing the same immortalized F2 rice population genotypic data examined in a previous study, the PEPIS returned identical results at each analysis step with the original prototype R code, but the computational time was reduced from more than one month to about five minutes. These advances will help overcome the bottleneck frequently encountered in genome wide epistatic genetic effect analysis and enable accommodation of the high computational demand. The PEPIS is publically available at http://bioinfo.noble.org/PolyGenic_QTL/.

  5. Genome wide scan for quantitative trait loci affecting tick resistance in cattle (Bos taurus × Bos indicus)

    PubMed Central

    2010-01-01

    Background In tropical countries, losses caused by bovine tick Rhipicephalus (Boophilus) microplus infestation have a tremendous economic impact on cattle production systems. Genetic variation between Bos taurus and Bos indicus to tick resistance and molecular biology tools might allow for the identification of molecular markers linked to resistance traits that could be used as an auxiliary tool in selection programs. The objective of this work was to identify QTL associated with tick resistance/susceptibility in a bovine F2 population derived from the Gyr (Bos indicus) × Holstein (Bos taurus) cross. Results Through a whole genome scan with microsatellite markers, we were able to map six genomic regions associated with bovine tick resistance. For most QTL, we have found that depending on the tick evaluation season (dry and rainy) different sets of genes could be involved in the resistance mechanism. We identified dry season specific QTL on BTA 2 and 10, rainy season specific QTL on BTA 5, 11 and 27. We also found a highly significant genome wide QTL for both dry and rainy seasons in the central region of BTA 23. Conclusions The experimental F2 population derived from Gyr × Holstein cross successfully allowed the identification of six highly significant QTL associated with tick resistance in cattle. QTL located on BTA 23 might be related with the bovine histocompatibility complex. Further investigation of these QTL will help to isolate candidate genes involved with tick resistance in cattle. PMID:20433753

  6. A whole genome Bayesian scan for adaptive genetic divergence in West African cattle

    PubMed Central

    2009-01-01

    Background The recent settlement of cattle in West Africa after several waves of migration from remote centres of domestication has imposed dramatic changes in their environmental conditions, in particular through exposure to new pathogens. West African cattle populations thus represent an appealing model to unravel the genome response to adaptation to tropical conditions. The purpose of this study was to identify footprints of adaptive selection at the whole genome level in a newly collected data set comprising 36,320 SNPs genotyped in 9 West African cattle populations. Results After a detailed analysis of population structure, we performed a scan for SNP differentiation via a previously proposed Bayesian procedure including extensions to improve the detection of loci under selection. Based on these results we identified 53 genomic regions and 42 strong candidate genes. Their physiological functions were mainly related to immune response (MHC region which was found under strong balancing selection, CD79A, CXCR4, DLK1, RFX3, SEMA4A, TICAM1 and TRIM21), nervous system (NEUROD6, OLFM2, MAGI1, SEMA4A and HTR4) and skin and hair properties (EDNRB, TRSP1 and KRTAP8-1). Conclusion The main possible underlying selective pressures may be related to climatic conditions but also to the host response to pathogens such as Trypanosoma(sp). Overall, these results might open the way towards the identification of important variants involved in adaptation to tropical conditions and in particular to resistance to tropical infectious diseases. PMID:19930592

  7. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.

  8. DRD4 and DAT1 in ADHD: Functional neurobiology to pharmacogenetics

    PubMed Central

    Turic, Darko; Swanson, James; Sonuga-Barke, Edmund

    2010-01-01

    Attention deficit/hyperactivity disorder (ADHD) is a common and potentially very impairing neuropsychiatric disorder of childhood. Statistical genetic studies of twins have shown ADHD to be highly heritable, with the combination of genes and gene by environment interactions accounting for around 80% of phenotypic variance. The initial molecular genetic studies where candidates were selected because of the efficacy of dopaminergic compounds in the treatment of ADHD were remarkably successful and provided strong evidence for the role of DRD4 and DAT1 variants in the pathogenesis of ADHD. However, the recent application of non-candidate gene strategies (eg, genome-wide association scans) has failed to identify additional genes with substantial genetic main effects, and the effects for DRD4 and DAT1 have not been replicated. This is the usual pattern observed for most other physical and mental disorders evaluated with current state-of-the-art methods. In this paper we discuss future strategies for genetic studies in ADHD, highlighting both the pitfalls and possible solutions relating to candidate gene studies, genome-wide studies, defining the phenotype, and statistical approaches. PMID:23226043

  9. Cytological techniques to analyze meiosis in Arabidopsis arenosa for investigating adaptation to polyploidy.

    PubMed

    Higgins, James D; Wright, Kevin M; Bomblies, Kirsten; Franklin, F Chris H

    2014-01-01

    Arabidopsis arenosa is a close relative of the model plant A. thaliana, and exists in nature as stable diploid and autotetraploid populations. Natural tetraploids have adapted to whole genome duplication and do not commonly show meiotic errors such as multivalent and univalent formation, which can lead to chromosome non-disjunction and reduced fertility. A genome scan for genes strongly differentiated between diploid and autotetraploid A. arenosa identified a subset of meiotic genes that may be responsible for adaptation to polyploid meiosis. To investigate the mechanisms by which A. arenosa adapted to its polyploid state, and the functionality of the identified potentially adaptive polymorphisms, a thorough cytological analysis is required. Therefore, in this chapter we describe methods and techniques to analyze male meiosis in A. arenosa, including optimum plant growth conditions, and immunocytological and cytological approaches developed with the specific purpose of understanding meiotic adaptation in an autotetraploid. In addition we present a meiotic cytological atlas to be used as a reference for particular stages and discuss observations arising from a comparison of meiosis between diploid and autotetraploid A. arenosa.

  10. Cytological techniques to analyze meiosis in Arabidopsis arenosa for investigating adaptation to polyploidy

    PubMed Central

    Higgins, James D.; Wright, Kevin M.; Bomblies, Kirsten; Franklin, F. Chris H.

    2014-01-01

    Arabidopsis arenosa is a close relative of the model plant A. thaliana, and exists in nature as stable diploid and autotetraploid populations. Natural tetraploids have adapted to whole genome duplication and do not commonly show meiotic errors such as multivalent and univalent formation, which can lead to chromosome non-disjunction and reduced fertility. A genome scan for genes strongly differentiated between diploid and autotetraploid A. arenosa identified a subset of meiotic genes that may be responsible for adaptation to polyploid meiosis. To investigate the mechanisms by which A. arenosa adapted to its polyploid state, and the functionality of the identified potentially adaptive polymorphisms, a thorough cytological analysis is required. Therefore, in this chapter we describe methods and techniques to analyze male meiosis in A. arenosa, including optimum plant growth conditions, and immunocytological and cytological approaches developed with the specific purpose of understanding meiotic adaptation in an autotetraploid. In addition we present a meiotic cytological atlas to be used as a reference for particular stages and discuss observations arising from a comparison of meiosis between diploid and autotetraploid A. arenosa. PMID:24427164

  11. Measuring specific receptor binding of a PET radioligand in human brain without pharmacological blockade: The genomic plot.

    PubMed

    Veronese, Mattia; Zanotti-Fregonara, Paolo; Rizzo, Gaia; Bertoldo, Alessandra; Innis, Robert B; Turkheimer, Federico E

    2016-04-15

    PET studies allow in vivo imaging of the density of brain receptor species. The PET signal, however, is the sum of the fraction of radioligand that is specifically bound to the target receptor and the non-displaceable fraction (i.e. the non-specifically bound radioligand plus the free ligand in tissue). Therefore, measuring the non-displaceable fraction, which is generally assumed to be constant across the brain, is a necessary step to obtain regional estimates of the specific fractions. The nondisplaceable binding can be directly measured if a reference region, i.e. a region devoid of any specific binding, is available. Many receptors are however widely expressed across the brain, and a true reference region is rarely available. In these cases, the nonspecific binding can be obtained after competitive pharmacological blockade, which is often contraindicated in humans. In this work we introduce the genomic plot for estimating the nondisplaceable fraction using baseline scans only. The genomic plot is a transformation of the Lassen graphical method in which the brain maps of mRNA transcripts of the target receptor obtained from the Allen brain atlas are used as a surrogate measure of the specific binding. Thus, the genomic plot allows the calculation of the specific and nondisplaceable components of radioligand uptake without the need of pharmacological blockade. We first assessed the statistical properties of the method with computer simulations. Then we sought ground-truth validation using human PET datasets of seven different neuroreceptor radioligands, where nonspecific fractions were either obtained separately using drug displacement or available from a true reference region. The population nondisplaceable fractions estimated by the genomic plot were very close to those measured by actual human blocking studies (mean relative difference between 2% and 7%). However, these estimates were valid only when mRNA expressions were predictive of protein levels (i.e. there were no significant post-transcriptional changes). This condition can be readily established a priori by assessing the correlation between PET and mRNA expression. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Measuring specific receptor binding of a PET radioligand in human brain without pharmacological blockade: The genomic plot

    PubMed Central

    Veronese, Mattia; Zanotti-Fregonara, Paolo; Rizzo, Gaia; Bertoldo, Alessandra; Innis, Robert B.; Turkheimer, Federico E.

    2016-01-01

    PET studies allow in vivo imaging of the density of brain receptor species. The PET signal, however, is the sum of the fraction of radioligand that is specifically bound to the target receptor and the non-displaceable fraction (i.e. the non-specifically bound radioligand plus the free ligand in tissue). Therefore, measuring the non-displaceable fraction, which is generally assumed to be constant across the brain, is a necessary step to obtain regional estimates of the specific fractions. The nondisplaceable binding can be directly measured if a reference region, i.e. a region devoid of any specific binding, is available. Many receptors are however widely expressed across the brain, and a true reference region is rarely available. In these cases, the nonspecific binding can be obtained after competitive pharmacological blockade, which is often contraindicated in humans. In this work we introduce the genomic plot for estimating the nondisplaceable fraction using baseline scans only. The genomic plot is a transformation of the Lassen graphical method in which the brain maps of mRNA transcripts of the target receptor obtained from the Allen brain atlas are used as a surrogate measure of the specific binding. Thus, the genomic plot allows the calculation of the specific and nondisplaceable components of radioligand uptake without the need of pharmacological blockade. We first assessed the statistical properties of the method with computer simulations. Then we sought ground-truth validation using human PET datasets of seven different neuroreceptor radioligands, where nonspecific fractions were either obtained separately using drug displacement or available from a true reference region. The population nondisplaceable fractions estimated by the genomic plot were very close to those measured by actual human blocking studies (mean relative difference between 2% and 7%). However, these estimates were valid only when mRNA expressions were predictive of protein levels (i.e. there were no significant post-transcriptional changes). This condition can be readily established a priori by assessing the correlation between PET and mRNA expression. PMID:26850512

  13. Leveraging the rice genome sequence for monocot comparative and translational genomics.

    PubMed

    Lohithaswa, H C; Feltus, F A; Singh, H P; Bacon, C D; Bailey, C D; Paterson, A H

    2007-07-01

    Common genome anchor points across many taxa greatly facilitate translational and comparative genomics and will improve our understanding of the Tree of Life. To add to the repertoire of genomic tools applicable to the study of monocotyledonous plants in general, we aligned Allium and Musa ESTs to Oryza BAC sequences and identified candidate Allium-Oryza and Musa-Oryza conserved intron-scanning primers (CISPs). A random sampling of 96 CISP primer pairs, representing loci from 11 of the 12 chromosomes in rice, were tested on seven members of the order Poales and on representatives of the Arecales, Asparagales, and Zingiberales monocot orders. The single-copy amplification success rates of Allium (31.3%), Cynodon (31.4%), Hordeum (30.2%), Musa (37.5%), Oryza (61.5%), Pennisetum (33.3%), Sorghum (47.9%), Zea (33.3%), Triticum (30.2%), and representatives of the palm family (32.3%) suggest that subsets of these primers will provide DNA markers suitable for comparative and translational genomics in orphan crops, as well as for applications in conservation biology, ecology, invasion biology, population biology, systematic biology, and related fields.

  14. Continental-level population differentiation and environmental adaptation in the mushroom Suillus brevipes

    PubMed Central

    Branco, Sara; Bi, Ke; Liao, Hui-Ling; Gladieux, Pierre; Badouin, Hélène; Ellison, Christopher E.; Nguyen, Nhu H.; Vilgalys, Rytas; Peay, Kabir G.; Taylor, John W.; Bruns, Thomas D.

    2016-01-01

    Recent advancements in sequencing technology allowed researchers to better address the patterns and mechanisms involved in microbial environmental adaptation at large spatial scales. Here we investigated the genomic basis of adaptation to climate at the continental scale in Suillus brevipes, an ectomycorrhizal fungus symbiotically associated with the roots of pine trees. We used genomic data from 55 individuals in seven locations across North America to perform genome scans to detect signatures of positive selection and assess whether temperature and precipitation were associated with genetic differentiation. We found that S. brevipes exhibited overall strong population differentiation, with potential admixture in Canadian populations. This species also displayed genomic signatures of positive selection as well as genomic sites significantly associated with distinct climatic regimes and abiotic environmental parameters. These genomic regions included genes involved in transmembrane transport of substances and helicase activity potentially involved in cold stress response. Our study sheds light on large-scale environmental adaptation in fungi by identifying putative adaptive genes and providing a framework to further investigate the genetic basis of fungal adaptation. PMID:27761941

  15. Copy Number Variations in Tilapia Genomes.

    PubMed

    Li, Bi Jun; Li, Hong Lian; Meng, Zining; Zhang, Yong; Lin, Haoran; Yue, Gen Hua; Xia, Jun Hong

    2017-02-01

    Discovering the nature and pattern of genome variation is fundamental in understanding phenotypic diversity among populations. Although several millions of single nucleotide polymorphisms (SNPs) have been discovered in tilapia, the genome-wide characterization of larger structural variants, such as copy number variation (CNV) regions has not been carried out yet. We conducted a genome-wide scan for CNVs in 47 individuals from three tilapia populations. Based on 254 Gb of high-quality paired-end sequencing reads, we identified 4642 distinct high-confidence CNVs. These CNVs account for 1.9% (12.411 Mb) of the used Nile tilapia reference genome. A total of 1100 predicted CNVs were found overlapping with exon regions of protein genes. Further association analysis based on linear model regression found 85 CNVs ranging between 300 and 27,000 base pairs significantly associated to population types (R 2  > 0.9 and P > 0.001). Our study sheds first insights on genome-wide CNVs in tilapia. These CNVs among and within tilapia populations may have functional effects on phenotypes and specific adaptation to particular environments.

  16. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  17. Genomic analyses of Northern snakehead (Channa argus) populations in North America

    PubMed Central

    Resh, Carlee A.; Galaska, Matthew P.

    2018-01-01

    Background The introduction of northern snakehead (Channa argus; Anabantiformes: Channidae) and their subsequent expansion is one of many problematic biological invasions in the United States. This harmful aquatic invasive species has become established in various parts of the eastern United States, including the Potomac River basin, and has recently become established in the Mississippi River basin in Arkansas. Effective management of C. argus and prevention of its further spread depends upon knowledge of current population structure in the United States. Methods Novel methods for invasive species using whole genomic scans provide unprecedented levels of data, which are able to investigate fine scale differences between and within populations of organisms. In this study, we utilize 2b-RAD genomic sequencing to recover 1,007 single-nucleotide polymorphism (SNP) loci from genomic DNA extracted from 165 C. argus individuals: 147 individuals sampled along the East Coast of the United States and 18 individuals sampled throughout Arkansas. Results Analysis of those SNP loci help to resolve existing population structure and recover five genetically distinct populations of C. argus in the United States. Additionally, information from the SNP loci enable us to begin to calculate the long-term effective population size ranges of this harmful aquatic invasive species. We estimate long-term Ne to be 1,840,000–18,400,000 for the Upper Hudson River basin, 4,537,500–45,375,000 for the Lower Hudson River basin, 3,422,500–34,225,000 for the Potomac River basin, 2,715,000–7,150,000 for Philadelphia, and 2,580,000–25,800,000 for Arkansas populations. Discussion and Conclusions This work provides evidence for the presence of more genetic populations than previously estimated and estimates population size, showing the invasive potential of C. argus in the United States. The valuable information gained from this study will allow effective management of the existing populations to avoid expansion and possibly enable future eradication efforts. PMID:29637024

  18. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    PubMed

    Muley, Vijaykumar Yogesh; Ranjan, Akash

    2012-01-01

    Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

  19. Detection of genomic signatures of recent selection in commercial broiler chickens.

    PubMed

    Fu, Weixuan; Lee, William R; Abasht, Behnam

    2016-08-26

    Identification of the genomic signatures of recent selection may help uncover causal polymorphisms controlling traits relevant to recent decades of selective breeding in livestock. In this study, we aimed at detecting signatures of recent selection in commercial broiler chickens using genotype information from single nucleotide polymorphisms (SNPs). A total of 565 chickens from five commercial purebred lines, including three broiler sire (male) lines and two broiler dam (female) lines, were genotyped using the 60K SNP Illumina iSelect chicken array. To detect genomic signatures of recent selection, we applied two methods based on population comparison, cross-population extended haplotype homozygosity (XP-EHH) and cross-population composite likelihood ratio (XP-CLR), and further analyzed the results to find genomic regions under recent selection in multiple purebred lines. A total of 321 candidate selection regions spanning approximately 1.45 % of the chicken genome in each line were detected by consensus of results of both XP-EHH and XP-CLR methods. To minimize false discovery due to genetic drift, only 42 of the candidate selection regions that were shared by 2 or more purebred lines were considered as high-confidence selection regions in the study. Of these 42 regions, 20 were 50 kb or less while 4 regions were larger than 0.5 Mb. In total, 91 genes could be found in the 42 regions, among which 19 regions contained only 1 or 2 genes, and 9 regions were located at gene deserts. Our results provide a genome-wide scan of recent selection signatures in five purebred lines of commercial broiler chickens. We found several candidate genes for recent selection in multiple lines, such as SOX6 (Sex Determining Region Y-Box 6) and cTR (Thyroid hormone receptor beta). These genes may have been under recent selection due to their essential roles in growth, development and reproduction in chickens. Furthermore, our results suggest that in some candidate regions, the same or opposite alleles have been under recent selection in multiple lines. Most of the candidate genes in the selection regions are novel, and as such they should be of great interest for future research into the genetic architecture of traits relevant to modern broiler breeding.

  20. Whole-Genome Sequence Variation among Multiple Isolates of Pseudomonas aeruginosa

    PubMed Central

    Spencer, David H.; Kas, Arnold; Smith, Eric E.; Raymond, Christopher K.; Sims, Elizabeth H.; Hastings, Michele; Burns, Jane L.; Kaul, Rajinder; Olson, Maynard V.

    2003-01-01

    Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, ∼10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel. PMID:12562802

  1. Structural Basis for the Lesion-scanning Mechanism of the MutY DNA Glycosylase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Lan; Chakravarthy, Srinivas; Verdine, Gregory L.

    The highly mutagenic A:8-oxoguanine (oxoG) base pair is generated mainly by misreplication of the C:oxoG base pair, the oxidation product of the C:G base pair. The A:oxoG base pair is particularly insidious because neither base in it carries faithful information to direct the repair of the other. The bacterial MutY (MUTYH in humans) adenine DNA glycosylase is able to initiate the repair of A:oxoG by selectively cleaving the A base from the A:oxoG base pair. The difference between faithful repair and wreaking mutagenic havoc on the genome lies in the accurate discrimination between two structurally similar base pairs: A:oxoG andmore » A:T. Here we present two crystal structures of the MutY N-terminal domain in complex with either undamaged DNA or DNA containing an intrahelical lesion. These structures have captured for the first time a DNA glycosylase scanning the genome for a damaged base in the very first stage of lesion recognition and the base extrusion pathway. The mode of interaction observed here has suggested a common lesion-scanning mechanism across the entire helix-hairpin-helix superfamily to which MutY belongs. In addition, small angle X-ray scattering studies together with accompanying biochemical assays have suggested a possible role played by the C-terminal oxoG-recognition domain of MutY in lesion scanning.« less

  2. Microsatellites as targets of natural selection.

    PubMed

    Haasl, Ryan J; Payseur, Bret A

    2013-02-01

    The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich's ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants.

  3. Microsatellites as Targets of Natural Selection

    PubMed Central

    Haasl, Ryan J.; Payseur, Bret A.

    2013-01-01

    The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich’s ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants. PMID:23104080

  4. Genetic Analysis of Ligation-Induced Neointima Formation in an F2 Intercross of C57BL/6 and FVB/N Inbred Mouse Strains

    PubMed Central

    Östergren, Caroline; Shim, Jeong; Larsen, Jens Vinther; Nielsen, Lars Bo; Bentzon, Jacob F.

    2015-01-01

    Objective Proliferation and migration of vascular smooth muscle cells (SMCs) are central for arterial diseases including atherosclerosis and restenosis. We hypothesized that the underlying mechanisms may be modeled by carotid ligation in mice. In FVB/N inbred mice, ligation leads to abundant neointima formation with proliferating media-derived SMCs, whereas in C57BL/6 mice hardly any neointima is formed. In the present study, we aimed to identify the chromosomal location of the causative gene variants in an F2 intercross between these two mouse strains. Methods and Results The neointimal cross-sectional area was significantly different between FVB/N, C57BL/6 and F1 female mice 4 weeks after ligation. Carotid artery ligation and a genome scan using 800 informative SNP markers were then performed in 157 female F2 mice. Using quantitative trait loci (QTL) analysis, we identified suggestive, but no genome-wide significant, QTLs on chromosomes 7 and 12 for neointimal cross-sectional area and on chromosome 14 for media area. Further analysis of the cross revealed 4 QTLs for plasma cholesterol, which combined explained 69% of the variation among F2 mice. Conclusions We identified suggestive QTLs for neointima and media area after carotid ligation in an intercross of FVB/N and C57BL/6 mice, but none that reached genome-wide significance indicating a complex genetic architecture of the traits. Genome-wide significant QTLs for total cholesterol levels were identified on chromosomes 1, 3, 9, and 12. PMID:25875831

  5. Transcriptome profiling analysis of cultivar-specific apple fruit ripening and texture attributes

    USDA-ARS?s Scientific Manuscript database

    Molecular events regulating cultivar-specific apple fruit ripening and sensory quality are largely unknown. Such knowledge is essential for genomic-assisted apple breeding and postharvest quality management. In this study, transcriptome profile analysis, scanning electron microscopic examination an...

  6. A novel fibrinogen variant--Praha I: hypofibrinogenemia associated with gamma Gly351Ser substitution.

    PubMed

    Kotlín, Roman; Chytilová, Martina; Suttnar, Jirí; Salaj, Peter; Riedel, Tomás; Santrůcek, Jirí; Klener, Pavel; Dyr, Jan Evangelista

    2007-05-01

    A 25-yr-old man from Prague had abnormal bleeding after several surgical operations with low fibrinogen level and hypofibrinogenemia was suspected. The patient, 25 yr-old male had a low fibrinogen concentration as determined by the thrombin time and immunoturbidimetrical method. His 48-yr-old mother presented with normal coagulation tests, normal fibrinogen level and reported no history of bleeding. To identify the genetic mutation responsible for this hypofibrinogen, genomic DNA extracted from the blood was analyzed. Fibrin polymerization measurement, kinetics of fibrinopeptide release, fibrinogen clottability measurement, mass spectroscopy, and scanning electron microscopy were performed. DNA sequencing showed heterogeneous fibrinogen gammaG351S mutation in the propositus. The mutant chain was found not to be expressed to the circulation by matrix-assisted laser desorption/ionization time of flight mass spectrometry. Scanning electron micrographs of the patient's fibrin clot as well as kinetics of fibrinopeptide release and fibrin polymerization were found to be normal. A case of hypofibrinogenemia gammaG351S was found by routine coagulation testing and was genetically identified.

  7. Endogenous hepadnaviruses, bornaviruses and circoviruses in snakes

    PubMed Central

    Gilbert, C.; Meik, J. M.; Dashevsky, D.; Card, D. C.; Castoe, T. A.; Schaack, S.

    2014-01-01

    We report the discovery of endogenous viral elements (EVEs) from Hepadnaviridae, Bornaviridae and Circoviridae in the speckled rattlesnake, Crotalus mitchellii, the first viperid snake for which a draft whole genome sequence assembly is available. Analysis of the draft assembly reveals genome fragments from the three virus families were inserted into the genome of this snake over the past 50 Myr. Cross-species PCR screening of orthologous loci and computational scanning of the python and king cobra genomes reveals that circoviruses integrated most recently (within the last approx. 10 Myr), whereas bornaviruses and hepadnaviruses integrated at least approximately 13 and approximately 50 Ma, respectively. This is, to our knowledge, the first report of circo-, borna- and hepadnaviruses in snakes and the first characterization of non-retroviral EVEs in non-avian reptiles. Our study provides a window into the historical dynamics of viruses in these host lineages and shows that their evolution involved multiple host-switches between mammals and reptiles. PMID:25080342

  8. Data on the genome-wide identification of CNL R-genes in Setaria italica (L.) P. Beauv.

    PubMed

    Andersen, Ethan J; Nepal, Madhav P

    2017-08-01

    We report data associated with the identification of 242 disease resistance genes (R-genes) in the genome of Setaria italica as presented in "Genetic diversity of disease resistance genes in foxtail millet ( Setaria italica L.)" (Andersen and Nepal, 2017) [1]. Our data describe the structure and evolution of the Coiled-coil, Nucleotide-binding site, Leucine-rich repeat (CNL) R-genes in foxtail millet. The CNL genes were identified through rigorous extraction and analysis of recently available plant genome sequences using cutting-edge analytical software. Data visualization includes gene structure diagrams, chromosomal syntenic maps, a chromosomal density plot, and a maximum-likelihood phylogenetic tree comparing Sorghum bicolor , Panicum virgatum , Setaria italica , and Arabidopsis thaliana . Compilation of InterProScan annotations, Gene Ontology (GO) annotations, and Basic Local Alignment Search Tool (BLAST) results for the 242 R-genes identified in the foxtail millet genome are also included in tabular format.

  9. Genome-Wide Association Studies with a Genomic Relationship Matrix: A Case Study with Wheat and Arabidopsis

    PubMed Central

    Gianola, Daniel; Fariello, Maria I.; Naya, Hugo; Schön, Chris-Carolin

    2016-01-01

    Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals (G) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G, provided variance components are unaffected by exclusion of such marker(s) from G. The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G does matter. Removal of eigenvectors from G can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions. PMID:27520956

  10. Genome-wide Scan of 29,141 African Americans Finds No Evidence of Directional Selection since Admixture

    PubMed Central

    Bhatia, Gaurav; Tandon, Arti; Patterson, Nick; Aldrich, Melinda C.; Ambrosone, Christine B.; Amos, Christopher; Bandera, Elisa V.; Berndt, Sonja I.; Bernstein, Leslie; Blot, William J.; Bock, Cathryn H.; Caporaso, Neil; Casey, Graham; Deming, Sandra L.; Diver, W. Ryan; Gapstur, Susan M.; Gillanders, Elizabeth M.; Harris, Curtis C.; Henderson, Brian E.; Ingles, Sue A.; Isaacs, William; De Jager, Phillip L.; John, Esther M.; Kittles, Rick A.; Larkin, Emma; McNeill, Lorna H.; Millikan, Robert C.; Murphy, Adam; Neslund-Dudas, Christine; Nyante, Sarah; Press, Michael F.; Rodriguez-Gil, Jorge L.; Rybicki, Benjamin A.; Schwartz, Ann G.; Signorello, Lisa B.; Spitz, Margaret; Strom, Sara S.; Tucker, Margaret A.; Wiencke, John K.; Witte, John S.; Wu, Xifeng; Yamamura, Yuko; Zanetti, Krista A.; Zheng, Wei; Ziegler, Regina G.; Chanock, Stephen J.; Haiman, Christopher A.; Reich, David; Price, Alkes L.

    2014-01-01

    The extent of recent selection in admixed populations is currently an unresolved question. We scanned the genomes of 29,141 African Americans and failed to find any genome-wide-significant deviations in local ancestry, indicating no evidence of selection influencing ancestry after admixture. A recent analysis of data from 1,890 African Americans reported that there was evidence of selection in African Americans after their ancestors left Africa, both before and after admixture. Selection after admixture was reported on the basis of deviations in local ancestry, and selection before admixture was reported on the basis of allele-frequency differences between African Americans and African populations. The local-ancestry deviations reported by the previous study did not replicate in our very large sample, and we show that such deviations were expected purely by chance, given the number of hypotheses tested. We further show that the previous study’s conclusion of selection in African Americans before admixture is also subject to doubt. This is because the FST statistics they used were inflated and because true signals of unusual allele-frequency differences between African Americans and African populations would be best explained by selection that occurred in Africa prior to migration to the Americas. PMID:25242497

  11. Compartmental Genomics in Living Cells Revealed by Single-Cell Nanobiopsy

    PubMed Central

    Actis, Paolo; Maalouf, Michelle; Kim, Hyunsung John; Lohith, Akshar; Vilozny, Boaz; Seger, R. Adam; Pourmand, Nader

    2014-01-01

    The ability to study the molecular biology of living single cells in heterogeneous cell populations is essential for next generation analysis of cellular circuitry and function. Here, we developed a single-cell nanobiopsy platform based on scanning ion conductance microscopy (SICM) for continuous sampling of intracellular content from individual cells. The nanobiopsy platform uses electrowetting within a nanopipette to extract cellular material from living cells with minimal disruption of the cellular milieu. We demonstrate the subcellular resolution of the nanobiopsy platform by isolating small subpopulations of mitochondria from single living cells, and quantify mutant mitochondrial genomes in those single cells with high throughput sequencing technology. These findings may provide the foundation for dynamic subcellular genomic analysis. PMID:24279711

  12. No genes for intelligence in the fluid genome.

    PubMed

    Ho, Mae-Wan

    2013-01-01

    Revolution is brewing belatedly within the heartlands of the genetic determinist establishment still in denial about the fluid genome that makes identifying genes even for common disease well-nigh impossible. The fruitless hunt for intelligence genes serves to expose the poverty of an obsolete paradigm that is obstructing knowledge and preventing fruitful policies from being widely implemented. Genome-wide scans using state-of-the art technologies on extensive databases have failed to find a single gene for intelligence; instead, environment and maternal effects may account for most, if not all correlation among relatives, while identical twins diverge genetically and epigenetically throughout life. Abundant evidence points to the enormous potential for improving intellectual abilities (and health) through simple environmental and social interventions.

  13. Genome-Wide Association Study Implicates HLA-C*01: 02 as a Risk Factor at the Major Histocompatibility Complex Locus in Schizophrenia

    PubMed Central

    2012-01-01

    Background We performed a genome-wide association study (GWAS) to identify common risk variants for schizophrenia. Methods The discovery scan included 1606 patients and 1794 controls from Ireland, using 6,212,339 directly genotyped or imputed single nucleotide polymorphisms (SNPs). A subset of this sample (270 cases and 860 controls) was subsequently included in the Psychiatric GWAS Consortium-schizophrenia GWAS meta-analysis. Results One hundred eight SNPs were taken forward for replication in an independent sample of 13,195 cases and 31,021 control subjects. The most significant associations in discovery, corrected for genomic inflation, were (rs204999, p combined = 1.34 × 10−9 and in combined samples (rs2523722 p combined = 2.88 × 10−16) mapped to the major histocompatibility complex (MHC) region. We imputed classical human leukocyte antigen (HLA) alleles at the locus; the most significant finding was with HLA-C*01:02. This association was distinct from the top SNP signal. The HLA alleles DRB1*03:01 and B*08:01 were protective, replicating a previous study. Conclusions This study provides further support for involvement of MHC class I molecules in schizophrenia. We found evidence of association with previously reported risk alleles at the TCF4, VRK2, and ZNF804A loci. PMID:22883433

  14. Candidate Loci for Insulin Sensitivity and Disposition Index from a Genome Wide Association Analysis of Hispanics in the IRAS Family Study

    PubMed Central

    Palmer, N. D.; Langefeld, C. D.; Ziegler, J. T.; Hsu, F.; Haffner, S. M.; Fingerlin, T.; Norris, J. M.; Chen, Y. I.; Rich, S. S.; Haritunians, T.; Taylor, K. D.; Bergman, R. N.; Rotter, J. I.; Bowden, D. W.

    2009-01-01

    Aims/Hypothesis —The majority of type 2 diabetes Genome Wide Association Studies (GWAS) to date have been performed in European-derived populations and have identified few variants that mediate their effect through insulin resistance. The aim of this study was to evaluate two quantitative, directly assessed measures of insulin resistance (SI and DI) in Hispanic Americans using an agnostic, high-density SNP scan and validate these findings in additional samples. Methods —A two-stage GWAS was performed in IRAS-FS Hispanic-American samples. In Stage 1, 317K single nucleotide polymorphisms (SNPs) were assessed 229 DNA samples. SNPs with evidence of association with glucose homeostasis and adiposity traits were then genotyped on the entire set of Hispanic-American samples (n=1190). This report focuses on the glucose homeostasis traits: insulin sensitivity index (SI) and disposition index (DI). Results —Although evidence of association did not reach genome-wide significance (P=5×10−7), in the combined analysis SNPs had admixture-adjusted PADD=0.00010–0.0020 with 8–41% differences in genotypic means for SI and DI. Conclusions/Interpretation —Several candidate loci have been identified which are nominally associated with SI and/or DI in Hispanic Americans. Replication of these findings in independent cohorts and additional focused analysis of these loci is warranted. PMID:19902172

  15. Identification of genetic markers associated with Gilles de la Tourette syndrome in an Afrikaner population.

    PubMed Central

    Simonic, I; Gericke, G S; Ott, J; Weber, J L

    1998-01-01

    Because gene-mapping efforts, using large kindreds and parametric methods of analysis, for the neurologic disorder Tourette syndrome have failed, efforts are being redirected toward association studies in young, genetically isolated populations. The availability of dense marker maps makes it feasible to search for association throughout the entire genome. We report the results of such a genome scan using DNA samples from Tourette patients and unaffected control subjects from the South African Afrikaner population. To optimize mapping efficiency, we chose a two-step strategy. First, we screened pools of DNA samples from both affected and control individuals, using a dense collection of 1,167 short tandem-repeat polymorphisms distributed throughout the genome. Second, we typed those markers displaying evidence of allele frequency-distribution shifts, along with additional tightly linked markers, using DNA from each affected and unaffected individual. To reduce false positives, we tested two independent groups of case and control subjects. Strongest evidence for association (P values 10-2 to 10-5) were obtained for markers within chromosomal regions encompassing D2S1790 near the chromosome 2 centromere, D6S477 on distal 6p, D8S257 on 8q, D11S933 on 11q, D14S1003 on proximal 14q, D20S1085 on distal 20q, and D21S1252 on 21q. PMID:9718333

  16. Identification of genetic markers associated with Gilles de la Tourette syndrome in an Afrikaner population.

    PubMed

    Simonic, I; Gericke, G S; Ott, J; Weber, J L

    1998-09-01

    Because gene-mapping efforts, using large kindreds and parametric methods of analysis, for the neurologic disorder Tourette syndrome have failed, efforts are being redirected toward association studies in young, genetically isolated populations. The availability of dense marker maps makes it feasible to search for association throughout the entire genome. We report the results of such a genome scan using DNA samples from Tourette patients and unaffected control subjects from the South African Afrikaner population. To optimize mapping efficiency, we chose a two-step strategy. First, we screened pools of DNA samples from both affected and control individuals, using a dense collection of 1,167 short tandem-repeat polymorphisms distributed throughout the genome. Second, we typed those markers displaying evidence of allele frequency-distribution shifts, along with additional tightly linked markers, using DNA from each affected and unaffected individual. To reduce false positives, we tested two independent groups of case and control subjects. Strongest evidence for association (P values 10-2 to 10-5) were obtained for markers within chromosomal regions encompassing D2S1790 near the chromosome 2 centromere, D6S477 on distal 6p, D8S257 on 8q, D11S933 on 11q, D14S1003 on proximal 14q, D20S1085 on distal 20q, and D21S1252 on 21q.

  17. Genome-scan analysis for quantitative trait loci in an F2 tilapia hybrid.

    PubMed

    Cnaani, A; Zilberman, N; Tinman, S; Hulata, G; Ron, M

    2004-09-01

    We searched for genetic linkage between DNA markers and quantitative trait loci (QTLs) for innate immunity, response to stress, biochemical parameters of blood, and fish size in an F2 population derived from an interspecific tilapia hybrid (Oreochromis mossambicusx O. aureus). A family of 114 fish was scanned for 40 polymorphic microsatellite DNA markers and two polymorphic genes, covering approximately 80% of the tilapia genome. These fish had previously been phenotyped for seven immune-response traits and six blood parameters. Critical values for significance were P <0.05 with the false discovery rate (FDR) controlled at 40%. The genome-scan analysis resulted in 35 significant marker-trait associations, involving 26 markers in 16 linkage groups. In a second experiment, nine markers were re-sampled in a second family of 79 fish of the same species hybrid. Seven markers (GM180, GM553, MHC-I, UNH848, UNH868, UNH898 and UNH925) in five linkage groups (LG 1, 3, 4, 22 and 23) were associated with stress response traits. An additional six markers (GM47, GM552, UNH208, UNH881, UNH952, UNH998) in five linkage groups (LG 4, 16, 19, 20 and 23) were verified for their associations with immune response traits, by linkage to several different traits. The portion of variance explained by each QTL was 11% on average, with a maximum of 29%. The average additive effect of QTLs was 0.2 standard deviation units of stress response traits and fish size, with a maximum of 0.33. In three linkage groups (LG 1, 3 and 23) markers were associated with stress response, body weight and sex determination, confirming the location of QTLs reported by several other studies.

  18. Accurate and exact CNV identification from targeted high-throughput sequence data.

    PubMed

    Nord, Alex S; Lee, Ming; King, Mary-Claire; Walsh, Tom

    2011-04-12

    Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

  19. A Genomewide Scan for Loci Predisposing to Type 2 Diabetes in a U.K. Population (The Diabetes UK Warren 2 Repository): Analysis of 573 Pedigrees Provides Independent Replication of a Susceptibility Locus on Chromosome 1q

    PubMed Central

    Wiltshire, Steven; Hattersley, Andrew T.; Hitman, Graham A.; Walker, Mark; Levy, Jonathan C.; Sampson, Michael; O’Rahilly, Stephen; Frayling, Timothy M.; Bell, John I.; Lathrop, G. Mark; Bennett, Amanda; Dhillon, Ranjit; Fletcher, Christopher; Groves, Christopher J.; Jones, Elizabeth; Prestwich, Philip; Simecek, Nikol; Rao, Pamidighantam V. Subba; Wishart, Marie; Foxon, Richard; Howell, Simon; Smedley, Damian; Cardon, Lon R.; Menzel, Stephan; McCarthy, Mark I.

    2001-01-01

    Improved molecular understanding of the pathogenesis of type 2 diabetes is essential if current therapeutic and preventative options are to be extended. To identify diabetes-susceptibility genes, we have completed a primary (418-marker, 9-cM) autosomal-genome scan of 743 sib pairs (573 pedigrees) with type 2 diabetes who are from the Diabetes UK Warren 2 repository. Nonparametric linkage analysis of the entire data set identified seven regions showing evidence for linkage, with allele-sharing LOD scores ⩾1.18 (P⩽.01). The strongest evidence was seen on chromosomes 8p21-22 (near D8S258 [LOD score 2.55]) and 10q23.3 (near D10S1765 [LOD score 1.99]), both coinciding with regions identified in previous scans in European subjects. This was also true of two lesser regions identified, on chromosomes 5q13 (D5S647 [LOD score 1.22] and 5q32 (D5S436 [LOD score 1.22]). Loci on 7p15.3 (LOD score 1.31) and 8q24.2 (LOD score 1.41) are novel. The final region showing evidence for linkage, on chromosome 1q24-25 (near D1S218 [LOD score 1.50]), colocalizes with evidence for linkage to diabetes found in Utah, French, and Pima families and in the GK rat. After dense-map genotyping (mean marker spacing 4.4 cM), evidence for linkage to this region increased to a LOD score of 1.98. Conditional analyses revealed nominally significant interactions between this locus and the regions on chromosomes 10q23.3 (P=.01) and 5q32 (P=.02). These data, derived from one of the largest genome scans undertaken in this condition, confirm that individual susceptibility-gene effects for type 2 diabetes are likely to be modest in size. Taken with genome scans in other populations, they provide both replication of previous evidence indicating the presence of a diabetes-susceptibility locus on chromosome 1q24-25 and support for the existence of additional loci on chromosomes 5, 8, and 10. These data should accelerate positional cloning efforts in these regions of interest. PMID:11484155

  20. Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population

    PubMed Central

    2012-01-01

    Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16 traits in the Nordic Holstein population. Methods The data consisted of de-regressed proofs (DRP) for 5 214 genotyped and 9 374 non-genotyped bulls. The bulls were divided into a training and a validation population by birth date, October 1, 2001. Five approaches for genomic prediction were used: 1) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted for the difference of scale between the genomic and the pedigree relationship matrices. A set of weights on the pedigree relationship matrix (ranging from 0.05 to 0.40) was used to build the combined relationship matrix in the single-step blending method and the GBLUP method with a polygenetic effect. Results Averaged over the 16 traits, reliabilities of genomic breeding values predicted using the GBLUP method with a polygenic effect (relative weight of 0.20) were 0.3% higher than reliabilities from the simple GBLUP method (without a polygenic effect). The adjusted single-step blending and original single-step blending methods (relative weight of 0.20) had average reliabilities that were 2.1% and 1.8% higher than the simple GBLUP method, respectively. In addition, the GBLUP method with a polygenic effect led to less bias of genomic predictions than the simple GBLUP method, and both single-step blending methods yielded less bias of predictions than all GBLUP methods. Conclusions The single-step blending method is an appealing approach for practical genomic prediction in dairy cattle. Genomic prediction from the single-step blending method can be improved by adjusting the scale of the genomic relationship matrix. PMID:22455934

  1. Statistical Methods in Integrative Genomics

    PubMed Central

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  2. A genetic contribution to circulating cytokines and obesity in children

    USDA-ARS?s Scientific Manuscript database

    Cytokines are considered to be involved in obesity-related metabolic diseases. Study objectives are to determine the heritability of circulating cytokine levels, to investigate pleiotropy between cytokines and obesity traits, and to present genome scan results for cytokines in 1030 Hispanic children...

  3. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    PubMed Central

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  4. [Ethical issues raised by direct-to-consumer personal genome analysis and whole body scans: discussion and contextualisation of a report by the Nuffield Council on Bioethics].

    PubMed

    Buyx, Alena M; Strech, Daniel; Schmidt, Harald

    2012-01-01

    The paradigm of personalised medicine has many different facets, further to the application of pharmacogenetics. We examine here (direct-to-consumer) personal genome analysis and whole body scans and summarise findings from the Nuffield Council's on Bioethics recent report "Medical profiling and online medicine: the ethics of 'personalised healthcare' in a consumer age". We describe the current situation in Germany with regard to access to such services, and contextualise the Nuffield Council's report with summaries of position statements by German professional bodies. We conclude with three points that merit examination further to the analyses of the Nuffield Council's report and the German professional bodies. These concern the role of indirect evidence in considering restrictive policies, the question of whether regulations should require commercial providers to contribute to the generation of better evidence, and the option of using data from evaluations in combination with indirect evidence in justifying restrictive policies. Copyright © 2011. Published by Elsevier GmbH.

  5. Genome-Wide Association Study Identifies African-Specific Susceptibility Loci in African Americans With Inflammatory Bowel Disease.

    PubMed

    Brant, Steven R; Okou, David T; Simpson, Claire L; Cutler, David J; Haritunians, Talin; Bradfield, Jonathan P; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J; Klapproth, Jan-Micheal A; Quiros, Antonio J; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S; Baldassano, Robert N; Dudley-Brown, Sharon; Cross, Raymond K; Dassopoulos, Themistocles; Denson, Lee A; Dhere, Tanvi A; Dryden, Gerald W; Hanson, John S; Hou, Jason K; Hussain, Sunny Z; Hyams, Jeffrey S; Isaacs, Kim L; Kader, Howard; Kappelman, Michael D; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S; Kuemmerle, John F; Kwon, John H; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E; Newberry, Rodney D; Osuntokun, Bankole O; Patel, Ashish S; Saeed, Shehzad A; Targan, Stephan R; Valentine, John F; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D; Duerr, Richard H; Silverberg, Mark S; Cho, Judy H; Hakonarson, Hakon; Zwick, Michael E; McGovern, Dermot P B; Kugathasan, Subra

    2017-01-01

    The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn's disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P < 5.0 × 10 -8 in meta-analysis with a nominal evidence (P < .05) in each scan were considered to have genome-wide significance. We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P < 1.6 × 10 -6 ): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B,PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. We performed a genome-wide association study of African Americans with IBD and identified loci associated with UC in only this population; we also replicated IBD, CD, and UC loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.

  6. Prediction of Ras-effector interactions using position energy matrices.

    PubMed

    Kiel, Christina; Serrano, Luis

    2007-09-01

    One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.

  7. Evolution of genome size and complexity in the rhabdoviridae.

    PubMed

    Walker, Peter J; Firth, Cadhla; Widen, Steven G; Blasdell, Kim R; Guzman, Hilda; Wood, Thomas G; Paradkar, Prasad N; Holmes, Edward C; Tesh, Robert B; Vasilakis, Nikos

    2015-02-01

    RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3' to 5' direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae.

  8. Evolution of Genome Size and Complexity in the Rhabdoviridae

    PubMed Central

    Walker, Peter J.; Firth, Cadhla; Widen, Steven G.; Blasdell, Kim R.; Guzman, Hilda; Wood, Thomas G.; Paradkar, Prasad N.; Holmes, Edward C.; Tesh, Robert B.; Vasilakis, Nikos

    2015-01-01

    RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3’ to 5’ direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae. PMID:25679389

  9. Contemporary evolution of a Lepidopteran species, Heliothis virescens, in response to modern agricultural practices.

    PubMed

    Fritz, Megan L; DeYonke, Alexandra M; Papanicolaou, Alexie; Micinski, Stephen; Westbrook, John; Gould, Fred

    2018-01-01

    Adaptation to human-induced environmental change has the potential to profoundly influence the genomic architecture of affected species. This is particularly true in agricultural ecosystems, where anthropogenic selection pressure is strong. Heliothis virescens primarily feeds on cotton in its larval stages, and US populations have been declining since the widespread planting of transgenic cotton, which endogenously expresses proteins derived from Bacillus thuringiensis (Bt). No physiological adaptation to Bt toxin has been found in the field, so adaptation in this altered environment could involve (i) shifts in host plant selection mechanisms to avoid cotton, (ii) changes in detoxification mechanisms required for cotton-feeding vs. feeding on other hosts or (iii) loss of resistance to previously used management practices including insecticides. Here, we begin to address whether such changes occurred in H. virescens populations between 1997 and 2012, as Bt-cotton cultivation spread through the agricultural landscape. For our study, we produced an H. virescens genome assembly and used this in concert with a ddRAD-seq-enabled genome scan to identify loci with significant allele frequency changes over the 15-year period. Genetic changes at a previously described H. virescens insecticide target of selection were detectable in our genome scan and increased our confidence in this methodology. Additional loci were also detected as being under selection, and we quantified the selection strength required to elicit observed allele frequency changes at each locus. Potential contributions of genes near loci under selection to adaptive phenotypes in the H. virescens cotton system are discussed. © 2017 John Wiley & Sons Ltd.

  10. Functional annotation from the genome sequence of the giant panda.

    PubMed

    Huo, Tong; Zhang, Yinjie; Lin, Jianping

    2012-08-01

    The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

  11. Approaches to Fungal Genome Annotation

    PubMed Central

    Haas, Brian J.; Zeng, Qiandong; Pearson, Matthew D.; Cuomo, Christina A.; Wortman, Jennifer R.

    2011-01-01

    Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center’s production genome annotation environment. PMID:22059117

  12. The EMBL-EBI bioinformatics web and programmatic tools framework.

    PubMed

    Li, Weizhong; Cowley, Andrew; Uludag, Mahmut; Gur, Tamer; McWilliam, Hamish; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Lopez, Rodrigo

    2015-07-01

    Since 2009 the EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications. These include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST, FASTA and PSI-Search, multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega, MAFFT and T-Coffee, and other sequence analysis tools (https://www.ebi.ac.uk/Tools/pfa/) such as InterProScan. Through these services users can search mainstream sequence databases such as ENA, UniProt and Ensembl Genomes, utilising a uniform web interface or systematically through Web Services interfaces (https://www.ebi.ac.uk/Tools/webservices/) using common programming languages, and obtain enriched results with novel visualisations. Integration with EBI Search (https://www.ebi.ac.uk/ebisearch/) and the dbfetch retrieval service (https://www.ebi.ac.uk/Tools/dbfetch/) further expands the usefulness of the framework. New tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, new categories such as RNA analysis tools (https://www.ebi.ac.uk/Tools/rna/), new databases such as ENA non-coding, WormBase ParaSite, Pfam and Rfam, and new workflow methods, together with the retirement of depreciated services, ensure that the framework remains relevant to today's biological community. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Multiple plant hormones and cell wall metabolism regulate apple fruit maturation patterns and texture attributes

    USDA-ARS?s Scientific Manuscript database

    Molecular events regulating apple fruit ripening and sensory quality are largely unknown. Such knowledge is essential for genomic-assisted apple breeding and postharvest quality management. In this study, a parallel transcriptome profile analysis, scanning electron microscopic (SEM) examination and...

  14. Development and application of a novel genome-wide SNP array reveals domestication history in soybean

    PubMed Central

    Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

    2016-01-01

    Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean. PMID:26856884

  15. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  16. Development and application of a novel genome-wide SNP array reveals domestication history in soybean.

    PubMed

    Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

    2016-02-09

    Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean.

  17. Cracking the genomic piggy bank: identifying secrets of the pig genome.

    PubMed

    Mote, B E; Rothschild, M F

    2006-01-01

    Though researchers are uncovering valuable information about the pig genome at unprecedented speed, the porcine genome community is barely scratching the surface as to understanding interactions of the biological code. The pig genetic linkage map has nearly 5,000 loci comprised of genes, microsatellites, and amplified fragment length polymorphism markers. Likewise, the physical map is becoming denser with nearly 6,000 markers. The long awaited sequencing efforts are providing multidimensional benefits with sequence available for comparative genomics and identifying single nucleotide polymorphisms for use in linkage and trait association studies. Scientists are using exotic and commercial breeds for quantitative trait loci scans. Additionally, candidate gene studies continue to identify chromosomal regions or genes associated with economically important traits such as growth rate, leanness, feed intake, meat quality, litter size, and disease resistance. The commercial pig industry is actively incorporating these markers in marker-assisted selection along with traditional performance information to improve said traits. Researchers are utilizing novel tools including pig microarrays along with advanced bioinformatics to identify new candidate genes, understand gene function, and piece together gene networks involved in important biological processes. Advances in pig genomics and implications to the pork industry as well as human health are reviewed.

  18. Alignment-free genome tree inference by learning group-specific distance metrics.

    PubMed

    Patil, Kaustubh R; McHardy, Alice C

    2013-01-01

    Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.

  19. Integrating Multiple Correlated Phenotypes for Genetic Association Analysis by Maximizing Heritability

    PubMed Central

    Zhou, Jin J.; Cho, Michael H.; Lange, Christoph; Lutz, Sharon; Silverman, Edwin K.; Laird, Nan M.

    2015-01-01

    Many correlated disease variables are analyzed jointly in genetic studies in the hope of increasing power to detect causal genetic variants. One approach involves assessing the relationship between each phenotype and each single nucleotide polymorphism (SNP) individually and using a Bonferroni correction for the effective number of tests conducted. Alternatively, one can apply a multivariate regression or a dimension reduction technique, such as principal component analysis (PCA), and test for the association with the principal components (PC) of the phenotypes rather than the individual phenotypes. Inspired by the previous approaches of combining phenotypes to maximize heritability at individual SNPs, in this paper, we propose to construct a maximally heritable phenotype (MaxH) by taking advantage of the estimated total heritability and co-heritability. The heritability and co-heritability only need to be estimated once, therefore our method is applicable to genome-wide scans. MaxH phenotype is a linear combination of the individual phenotypes with increased heritability and power over the phenotypes being combined. Simulations show that the heritability and power achieved agree well with the theory for large samples and two phenotypes. We compare our approach with commonly used methods and assess both the heritability and the power of the MaxH phenotype. Moreover we provide suggestions for how to choose the phenotypes for combination. An application of our approach to a COPD genome-wide association study shows the practical relevance. PMID:26111731

  20. Genome-wide patterns of genetic distances reveal candidate Loci contributing to human population-specific traits.

    PubMed

    de Magalhães, João Pedro; Matsuda, Alex

    2012-03-01

    Modern humans originated in Africa before migrating across the world with founder effects and adaptations to new environments contributing to their present phenotypic diversity. Determining the genetic basis of differences between populations may provide clues about our evolutionary history and may have clinical implications. Herein, we develop a method to detect genes and biological processes in which populations most differ by calculating the genetic distance between modern populations and a hypothetical ancestral population. We apply our method to large-scale single nucleotide polymorphism (SNP) data from human populations of African, European and Asian origin. As expected, ancestral alleles were more conserved in the African populations and we found evidence of high divergence in genes previously suggested as targets of selection related to skin pigmentation, immune response, senses and dietary adaptations. Our genome-wide scan also reveals novel candidates for contributing to population-specific traits. These include genes related to neuronal development and behavior that may have been influenced by cultural processes. Moreover, in the African populations, we found a high divergence in genes related to UV protection and to the male reproductive system. Taken together, these results confirm and expand previous findings, providing new clues about the evolution and genetics of human phenotypic diversity. © 2011 The Authors Annals of Human Genetics © 2011 Blackwell Publishing Ltd/University College London.

  1. The Identification of Microdeletion and Reciprocal Microduplication in 22q11.2 Using High-Resolution CMA Technology

    PubMed Central

    Leite, Ana Julia Cunha; Pinto, Irene Plaza; Cunha, Damiana Mirian da Cruz e; Ribeiro, Cristiano Luiz; da Silva, Claudio Carlos; da Cruz, Aparecido Divino; Minasi, Lysa Bernardes

    2016-01-01

    The chromosome 22q11.2 region has long been implicated in genomic diseases. Some genomic regions exhibit numerous low copy repeats with high identity in which they provide increased genomic instability and mediate deletions and duplications in many disorders. DiGeorge Syndrome is the most common deletion syndrome and reciprocal duplications could be occurring in half of the frequency of microdeletions. We described five patients with phenotypic variability that carries deletions or reciprocal duplications at 22q11.2 detected by Chromosomal Microarray Analysis. The CytoScan HD technology was used to detect changes in the genome copy number variation of patients who had clinical indication to global developmental delay and a normal karyotype. We observed in our study three microdeletions and two microduplications in 22q11.2 region with variable intervals containing known genes and unstudied transcripts as well as the LCRs that are often flanking and within this genomic rearrangement. The identification of these variants is of particular interest because it may provide insight into genes or genomic regions that are crucial for specific phenotypic manifestations and are useful to assist in the quest for understanding the mechanisms subjacent to genomic deletions and duplications. PMID:27123452

  2. Signatures of selection in the three-spined stickleback along a small-scale brackish water - freshwater transition zone.

    PubMed

    Konijnendijk, Nellie; Shikano, Takahito; Daneels, Dorien; Volckaert, Filip A M; Raeymaekers, Joost A M

    2015-09-01

    Local adaptation is often obvious when gene flow is impeded, such as observed at large spatial scales and across strong ecological contrasts. However, it becomes less certain at small scales such as between adjacent populations or across weak ecological contrasts, when gene flow is strong. While studies on genomic adaptation tend to focus on the former, less is known about the genomic targets of natural selection in the latter situation. In this study, we investigate genomic adaptation in populations of the three-spined stickleback Gasterosteus aculeatus L. across a small-scale ecological transition with salinities ranging from brackish to fresh. Adaptation to salinity has been repeatedly demonstrated in this species. A genome scan based on 87 microsatellite markers revealed only few signatures of selection, likely owing to the constraints that homogenizing gene flow puts on adaptive divergence. However, the detected loci appear repeatedly as targets of selection in similar studies of genomic adaptation in the three-spined stickleback. We conclude that the signature of genomic selection in the face of strong gene flow is weak, yet detectable. We argue that the range of studies of genomic divergence should be extended to include more systems characterized by limited geographical and ecological isolation, which is often a realistic setting in nature.

  3. Variation block-based genomics method for crop plants.

    PubMed

    Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

    2014-06-15

    In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.

  4. A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation.

    PubMed

    Aubrey, Wayne; Riley, Michael C; Young, Michael; King, Ross D; Oliver, Stephen G; Clare, Amanda

    2015-01-01

    Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.

  5. Fifteen years of genomewide scans for selection: trends, lessons and unaddressed genetic sources of complication.

    PubMed

    Haasl, Ryan J; Payseur, Bret A

    2016-01-01

    Genomewide scans for natural selection (GWSS) have become increasingly common over the last 15 years due to increased availability of genome-scale genetic data. Here, we report a representative survey of GWSS from 1999 to present and find that (i) between 1999 and 2009, 35 of 49 (71%) GWSS focused on human, while from 2010 to present, only 38 of 83 (46%) of GWSS focused on human, indicating increased focus on nonmodel organisms; (ii) the large majority of GWSS incorporate interpopulation or interspecific comparisons using, for example F(ST), cross-population extended haplotype homozygosity or the ratio of nonsynonymous to synonymous substitutions; (iii) most GWSS focus on detection of directional selection rather than other modes such as balancing selection; and (iv) in human GWSS, there is a clear shift after 2004 from microsatellite markers to dense SNP data. A survey of GWSS meant to identify loci positively selected in response to severe hypoxic conditions support an approach to GWSS in which a list of a priori candidate genes based on potential selective pressures are used to filter the list of significant hits a posteriori. We also discuss four frequently ignored determinants of genomic heterogeneity that complicate GWSS: mutation, recombination, selection and the genetic architecture of adaptive traits. We recommend that GWSS methodology should better incorporate aspects of genomewide heterogeneity using empirical estimates of relevant parameters and/or realistic, whole-chromosome simulations to improve interpretation of GWSS results. Finally, we argue that knowledge of potential selective agents improves interpretation of GWSS results and that new methods focused on correlations between environmental variables and genetic variation can help automate this approach. © 2015 John Wiley & Sons Ltd.

  6. Bone Mineral Density Variation in Men is influenced by Sex-Specific and Non Sex-Specific Quantitative Trait Loci

    PubMed Central

    Peacock, Munro; Koller, Daniel L.; Lai, Dongbing; Hui, Siu; Foroud, Tatiana; Econs, Michael J.

    2009-01-01

    Introduction A major predictor of age-related osteoporotic fracture is peak areal bone mineral density (aBMD) which is a highly heritable trait. However, few linkage and association studies have been performed in men to identify the genes contributing to normal variation in aBMD. The aim of this study was to perform a genome wide scan in healthy men to identify quantitative trait loci (QTL) that were significantly linked to aBMD and to test whether any of these might be sex-specific. Methods aBMD at the spine and hip were measured in 515 pairs of brothers, aged 18-61 (405 white pairs, 110 black pairs). Linkage analysis in the brother sample was compared with results in a previously published sample of 774 sister pairs to identify sex-specific quantitative trait loci (QTL). Results A genome wide scan identified significant QTL (LOD>3.6) for aBMD on chromosomes 4q21 (hip), 7q34 (spine), 14q32 (hip), 19p13 (hip), 21q21 (hip), and 22q13 (hip). Analysis suggested that the QTL on chromosome 7q34, 14q32, and 21q21 were male-specific whereas the others were not sex-specific. Conclusions This study demonstrates that six QTL were significantly linked with aBMD in men. One was linked to spine and five were linked to hip. When compared to published data in women from the same geographical region, the QTL on chromosomes 7, 14 and 21 were male-specific. The occurrence of sex-specific genes in humans for aBMD has important implications for the pathogenesis and treatment of osteoporosis. PMID:19427925

  7. Genome-wide linkage scan for loci of musical aptitude in Finnish families: evidence for a major locus at 4q22

    PubMed Central

    Pulli, K; Karma, K; Norio, R; Sistonen, P; Göring, H H H; Järvelä, I

    2008-01-01

    Background: Music perception and performance are comprehensive human cognitive functions and thus provide an excellent model system for studying human behaviour and brain function. However, the molecules involved in mediating music perception and performance are so far uncharacterised. Objective: To unravel the biological background of music perception, using molecular and statistical genetic approaches. Methods: 15 Finnish multigenerational families (with a total of 234 family members) were recruited via a nationwide search. The phenotype of all family members was determined using three tests used in defining musical aptitude: a test for auditory structuring ability (Karma Music test; KMT) commonly used in Finland, and the Seashore pitch and time discrimination subtests (SP and ST respectively) used internationally. We calculated heritabilities and performed a genome-wide variance components-based linkage scan using genotype data for 1113 microsatellite markers. Results: The heritability estimates were 42% for KMT, 57% for SP, 21% for ST and 48% for the combined music test scores. Significant evidence of linkage was obtained on chromosome 4q22 (LOD 3.33) and suggestive evidence of linkage at 8q13-21 (LOD 2.29) with the combined music test scores, using variance component linkage analyses. The major contribution of the 4q22 locus was obtained for the KMT (LOD 2.91). Interestingly, a positive LOD score of 1.69 was shown at 18q, a region previously linked to dyslexia (DYX6) using combined music test scores. Conclusion: Our results show that there is a genetic contribution to musical aptitude that is likely to be regulated by several predisposing genes or variants. PMID:18424507

  8. Comparison of demons deformable registration-based methods for texture analysis of serial thoracic CT scans

    NASA Astrophysics Data System (ADS)

    Cunliffe, Alexandra R.; Al-Hallaq, Hania A.; Fei, Xianhan M.; Tuohy, Rachel E.; Armato, Samuel G.

    2013-02-01

    To determine how 19 image texture features may be altered by three image registration methods, "normal" baseline and follow-up computed tomography (CT) scans from 27 patients were analyzed. Nineteen texture feature values were calculated in over 1,000 32x32-pixel regions of interest (ROIs) randomly placed in each baseline scan. All three methods used demons registration to map baseline scan ROIs to anatomically matched locations in the corresponding transformed follow-up scan. For the first method, the follow-up scan transformation was subsampled to achieve a voxel size identical to that of the baseline scan. For the second method, the follow-up scan was transformed through affine registration to achieve global alignment with the baseline scan. For the third method, the follow-up scan was directly deformed to the baseline scan using demons deformable registration. Feature values in matched ROIs were compared using Bland- Altman 95% limits of agreement. For each feature, the range spanned by the 95% limits was normalized to the mean feature value to obtain the normalized range of agreement, nRoA. Wilcoxon signed-rank tests were used to compare nRoA values across features for the three methods. Significance for individual tests was adjusted using the Bonferroni method. nRoA was significantly smaller for affine-registered scans than for the resampled scans (p=0.003), indicating lower feature value variability between baseline and follow-up scan ROIs using this method. For both of these methods, however, nRoA was significantly higher than when feature values were calculated directly on demons-deformed followup scans (p<0.001). Across features and methods, nRoA values remained below 26%.

  9. Comparative genomics of 3 farm canids in relation to the dog.

    PubMed

    Switonski, M; Szczerbal, I; Nowacka-Woszuk, J

    2009-01-01

    There are 3 canids besides the dog (Canis familiaris): the red fox (Vulpes vulpes), arctic fox (Alopex lagopus) and Chinese raccoon dog (Nyctereutes procyonoides procyonoides), which have been extensively studied with the use of cytogenetic and molecular genetics techniques. These 3 species are considered as farm fur-bearing animals. In addition, they are also useful models in comparative genomic studies of the canids. In this review genome organization, karyotype evolution, comparative marker maps, DNA polymorphism and similarity of selected gene sequences of the 3 farm species are discussed in relation to the dog. Also the nature and variability of the B chromosomes, present in the red fox and the Chinese raccoon dog, were considered. These comparative analyses showed that among the studied canids the Chinese raccoon dog is phylogenetically the closest species to the dog. On the other hand, the most advanced linkage and cytogenetic marker maps of the red fox genome facilitate genome scanning studies with the aim to search for chromosome locations of QTL regions for behavior and production traits. (c) 2009 S. Karger AG, Basel.

  10. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop.

    PubMed

    Hazzouri, Khaled M; Flowers, Jonathan M; Visser, Hendrik J; Khierallah, Hussam S M; Rosas, Ulises; Pham, Gina M; Meyer, Rachel S; Johansen, Caryn K; Fresquez, Zoë A; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A; Thirkhill, Deborah; Markhand, Ghulam S; Krueger, Robert R; Zaid, Abdelouahhab; Purugganan, Michael D

    2015-11-09

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop.

  12. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop

    PubMed Central

    Hazzouri, Khaled M.; Flowers, Jonathan M.; Visser, Hendrik J.; Khierallah, Hussam S. M.; Rosas, Ulises; Pham, Gina M.; Meyer, Rachel S.; Johansen, Caryn K.; Fresquez, Zoë A.; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A.; Thirkhill, Deborah; Markhand, Ghulam S.; Krueger, Robert R.; Zaid, Abdelouahhab; Purugganan, Michael D.

    2015-01-01

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop. PMID:26549859

  13. Combinations of SNP genotypes from the Wellcome Trust Case Control Study of bipolar patients.

    PubMed

    Mellerup, Erling; Jørgensen, Martin Balslev; Dam, Henrik; Møller, Gert Lykke

    2018-04-01

    Combinations of genetic variants are the basis for polygenic disorders. We examined combinations of SNP genotypes taken from the 446 729 SNPs in The Wellcome Trust Case Control Study of bipolar patients. Parallel computing by graphics processing units, cloud computing, and data mining tools were used to scan The Wellcome Trust data set for combinations. Two clusters of combinations were significantly associated with bipolar disorder. One cluster contained 68 combinations, each of which included five SNP genotypes. Of the 1998 patients, 305 had combinations from this cluster in their genome, but none of the 1500 controls had any of these combinations in their genome. The other cluster contained six combinations, each of which included five SNP genotypes. Of the 1998 patients, 515 had combinations from the cluster in their genome, but none of the 1500 controls had any of these combinations in their genome. Clusters of combinations of genetic variants can be considered general risk factors for polygenic disorders, whereas accumulation of combinations from the clusters in the genome of a patient can be considered a personal risk factor.

  14. Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry

    PubMed Central

    2013-01-01

    Background Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Results Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. Conclusions We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation. PMID:23848512

  15. Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry.

    PubMed

    Hider, Jessica L; Gittelman, Rachel M; Shah, Tapan; Edwards, Melissa; Rosenbloom, Arnold; Akey, Joshua M; Parra, Esteban J

    2013-07-12

    Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.

  16. A genome-wide SNP scan accelerates trait-regulatory genomic loci identification in chickpea

    PubMed Central

    Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C.L.L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

    2015-01-01

    We identified 44844 high-quality SNPs by sequencing 92 diverse chickpea accessions belonging to a seed and pod trait-specific association panel using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays. A GWAS (genome-wide association study) in an association panel of 211, including the 92 sequenced accessions, identified 22 major genomic loci showing significant association (explaining 23–47% phenotypic variation) with pod and seed number/plant and 100-seed weight. Eighteen trait-regulatory major genomic loci underlying 13 robust QTLs were validated and mapped on an intra-specific genetic linkage map by QTL mapping. A combinatorial approach of GWAS, QTL mapping and gene haplotype-specific LD mapping and transcript profiling uncovered one superior haplotype and favourable natural allelic variants in the upstream regulatory region of a CesA-type cellulose synthase (Ca_Kabuli_CesA3) gene regulating high pod and seed number/plant (explaining 47% phenotypic variation) in chickpea. The up-regulation of this superior gene haplotype correlated with increased transcript expression of Ca_Kabuli_CesA3 gene in the pollen and pod of high pod/seed number accession, resulting in higher cellulose accumulation for normal pollen and pollen tube growth. A rapid combinatorial genome-wide SNP genotyping-based approach has potential to dissect complex quantitative agronomic traits and delineate trait-regulatory genomic loci (candidate genes) for genetic enhancement in crop plants, including chickpea. PMID:26058368

  17. Genome-Wide Association Studies with a Genomic Relationship Matrix: A Case Study with Wheat and Arabidopsis.

    PubMed

    Gianola, Daniel; Fariello, Maria I; Naya, Hugo; Schön, Chris-Carolin

    2016-10-13

    Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals ( G: ) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G,: provided variance components are unaffected by exclusion of such marker(s) from G: The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G: does matter. Removal of eigenvectors from G: can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions. Copyright © 2016 Gianola et al.

  18. Genomic Basis of Adaptive Evolution: The Survival of Amur Ide (Leuciscus waleckii) in an Extremely Alkaline Environment

    PubMed Central

    Xu, Jian; Li, Jiong-Tang; Jiang, Yanliang; Peng, Wenzhu; Yao, Zongli; Chen, Baohua; Jiang, Likun; Feng, Jingyan; Ji, Peifeng; Liu, Guiming; Liu, Zhanjiang; Tai, Ruyu; Dong, Chuanju; Sun, Xiaoqing; Zhao, Zi-Xia; Zhang, Yan; Wang, Jian; Li, Shangqi; Zhao, Yunfeng; Yang, Jiuhui; Sun, Xiaowen; Xu, Peng

    2017-01-01

    The Amur ide (Leuciscus waleckii) is a cyprinid fish that is widely distributed in Northeast Asia. The Lake Dali Nur population inhabits one of the most extreme aquatic environments on Earth, with an alkalinity up to 50 mmol/L (pH 9.6), thus providing an exceptional model with which to characterize the mechanisms of genomic evolution underlying adaptation to extreme environments. Here, we developed the reference genome assembly for L. waleckii from Lake Dali Nur. Intriguingly, we identified unusual expanded long terminal repeats (LTRs) with higher nucleotide substitution rates than in many other teleosts, suggesting their more recent insertion into the L. waleckii genome. We also identified expansions in genes encoding egg coat proteins and natriuretic peptide receptors, possibly underlying the adaptation to extreme environmental stress. We further sequenced the genomes of 10 additional individuals from freshwater and 18 from Lake Dali Nur populations, and we detected a total of 7.6 million SNPs from both populations. In a genome scan and comparison of these two populations, we identified a set of genomic regions under selective sweeps that harbor genes involved in ion homoeostasis, acid-base regulation, unfolded protein response, reactive oxygen species elimination, and urea excretion. Our findings provide comprehensive insight into the genomic mechanisms of teleost fish that underlie their adaptation to extreme alkaline environments. PMID:28007977

  19. DNA fingerprinting of Shiga-toxin producing Escherichia coli O157 based on Multiple-Locus Variable-Number Tandem-Repeats Analysis (MLVA)

    PubMed Central

    Lindstedt, Bjørn-Arne; Heir, Even; Gjernes, Elisabet; Vardund, Traute; Kapperud, Georg

    2003-01-01

    Background The ability to react early to possible outbreaks of Escherichia coli O157:H7 and to trace possible sources relies on the availability of highly discriminatory and reliable techniques. The development of methods that are fast and has the potential for complete automation is needed for this important pathogen. Methods In all 73 isolates of shiga-toxin producing E. coli O157 (STEC) were used in this study. The two available fully sequenced STEC genomes were scanned for tandem repeated stretches of DNA, which were evaluated as polymorphic markers for isolate identification. Results The 73 E. coli isolates displayed 47 distinct patterns and the MLVA assay was capable of high discrimination between the E. coli O157 strains. The assay was fast and all the steps can be automated. Conclusion The findings demonstrate a novel high discriminatory molecular typing method for the important pathogen E. coli O157 that is fast, robust and offers many advantages compared to current methods. PMID:14664722

  20. Accuracy of tree diameter estimation from terrestrial laser scanning by circle-fitting methods

    NASA Astrophysics Data System (ADS)

    Koreň, Milan; Mokroš, Martin; Bucha, Tomáš

    2017-12-01

    This study compares the accuracies of diameter at breast height (DBH) estimations by three initial (minimum bounding box, centroid, and maximum distance) and two refining (Monte Carlo and optimal circle) circle-fitting methods The circle-fitting algorithms were evaluated in multi-scan mode and a simulated single-scan mode on 157 European beech trees (Fagus sylvatica L.). DBH measured by a calliper was used as reference data. Most of the studied circle-fitting algorithms significantly underestimated the mean DBH in both scanning modes. Only the Monte Carlo method in the single-scan mode significantly overestimated the mean DBH. The centroid method proved to be the least suitable and showed significantly different results from the other circle-fitting methods in both scanning modes. In multi-scan mode, the accuracy of the minimum bounding box method was not significantly different from the accuracies of the refining methods The accuracy of the maximum distance method was significantly different from the accuracies of the refining methods in both scanning modes. The accuracy of the Monte Carlo method was significantly different from the accuracy of the optimal circle method in only single-scan mode. The optimal circle method proved to be the most accurate circle-fitting method for DBH estimation from point clouds in both scanning modes.

  1. The complete mitogenome of the whale shark parasitic copepod Pandarus rhincodonicus norman, Newbound & Knott (Crustacea; Siphonostomatoida; Pandaridae)--a new gene order for the copepoda.

    PubMed

    Austin, Christopher M; Tan, Mun Hua; Lee, Yin Peng; Croft, Laurence J; Meekan, Mark G; Pierce, Simon J; Gan, Han Ming

    2016-01-01

    The complete mitochondrial genome of the parasitic copepod Pandarus rhincodonicus was obtained from a partial genome scan using the HiSeq sequencing system. The Pandarus rhincodonicus mitogenome has 14,480 base pairs (62% A+T content) made up of 12 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a putative 384 bp non-coding AT-rich region. This Pandarus mitogenome sequence is the first for the family Pandaridae, the second for the order Siphonostomatoida and the sixth for the Copepoda.

  2. Detection and validation of QTL affecting bacterial cold water disease resistance in rainbow trout using restriction-site associated DNA sequencing

    USDA-ARS?s Scientific Manuscript database

    Bacterial cold water disease (BCWD) causes significant economic loss in salmonid aquaculture. Using microsatellites genome scan we have previously detected significant and suggestive QTL with major effects on the phenotypic variation of survival following challenge with Flavobacterium psychrophilum...

  3. Genome-Wide scans for carcass and meat traits in nellore cattle

    USDA-ARS?s Scientific Manuscript database

    Beef cattle industry is one of the main highlights of Brazilian agribusiness, however the standardization of meat products is still an issue. The lack of standardization of quality characteristics as fat thickness and tenderness, and the burden and time spent on collecting and evaluating large numbe...

  4. MADS-box out of the black box

    USDA-ARS?s Scientific Manuscript database

    The compelling elegance of using genome-wide scans to detect the signature of selection is difficult to resist, but is countered by the low demonstrated efficacy of pinpointing the actual genes and traits that are the targets of selection in non-model species. While the difficulty of going from a s...

  5. Relationships among calpastatin single nucleotide polymorphisms, calpastatin expression and tenderness in pork longissimus

    USDA-ARS?s Scientific Manuscript database

    Genome scans in the pig have identified a region on chromosome 2 (SSC2) associated with tenderness. Calpastatin is a likely positional candidate gene in this region because of its inhibitory role in the calpain system that is involved in postmortem tenderization. Novel single nucleotide polymorphism...

  6. Allele-Specific Transcription Factor Binding in Pig Calpastatin Promoter Regions

    USDA-ARS?s Scientific Manuscript database

    The identification of predictive DNA markers for pork quality would allow U.S. pork producers and breeders to more quickly and efficiently select genetically superior animals for production of consistent, high quality meat. Genome scans have identified QTL for tenderness on pig chromosome 2 which ha...

  7. Predictive markers in calpastatin for tenderness in commercial pig populations

    USDA-ARS?s Scientific Manuscript database

    The identification of predictive DNA markers for pork quality would allow U.S. pork producers and breeders to more quickly and efficiently select genetically superior animals for production of consistent, high quality meat. Genome scans have identified QTL for tenderness on pig chromosome 2 which ha...

  8. Association of Functional SNPs in Pig Calpastatin Regulatory Regions with Tenderness

    USDA-ARS?s Scientific Manuscript database

    The identification of predictive DNA markers for pork quality would allow U.S. pork producers and breeders to more quickly and efficiently select genetically superior animals for production of consistent, high quality meat. Genome scans have identified QTL for tenderness on pig chromosome 2 which ha...

  9. Genetic diversity and genome-wide association analysis of cooking time in dry bean (Phaseolus vulgaris L.).

    PubMed

    Cichy, Karen A; Wiesinger, Jason A; Mendoza, Fernando A

    2015-08-01

    Fivefold diversity for cooking time found in a panel of 206 Phaseolus vulgaris accessions. Fastest accession cooks nearly 20 min faster than average.   SNPs associated with cooking time on Pv02, 03, and 06. Dry beans (Phaseolus vulgaris L.) are a nutrient dense food and a dietary staple in parts of Africa and Latin America. One of the major factors that limits greater utilization of beans is their long cooking times compared to other foods. Cooking time is an important trait with implications for gender equity, nutritional value of diets, and energy utilization. Very little is known about the genetic diversity and genomic regions involved in determining cooking time. The objective of this research was to assess cooking time on a panel of 206 P. vulgaris accessions, use genome- wide association analysis (GWAS) to identify genomic regions influencing this trait, and to test the ability to predict cooking time by raw seed characteristics. In this study 5.5-fold variation for cooking time was found and five bean accessions were identified which cook in less than 27 min across 2 years, where the average cooking time was 37 min. One accession, ADP0367 cooked nearly 20 min faster than average. Four of these five accessions showed close phylogenetic relationship based on a NJ tree developed with ~5000 SNP markers, suggesting a potentially similar underlying genetic mechanism. GWAS revealed regions on chromosomes Pv02, Pv03, and Pv06 associated with cooking time. Vis/NIR scanning of raw seed explained 68 % of the phenotypic variation for cooking time, suggesting with additional experimentation, it may be possible to use this spectroscopy method to non-destructively identify fast cooking lines as part of a breeding program.

  10. The role of DNA repair in herpesvirus pathogenesis.

    PubMed

    Brown, Jay C

    2014-10-01

    In cells latently infected with a herpesvirus, the viral DNA is present in the cell nucleus, but it is not extensively replicated or transcribed. In this suppressed state the virus DNA is vulnerable to mutagenic events that affect the host cell and have the potential to destroy the virus' genetic integrity. Despite the potential for genetic damage, however, herpesvirus sequences are well conserved after reactivation from latency. To account for this apparent paradox, I have tested the idea that host cell-encoded mechanisms of DNA repair are able to control genetic damage to latent herpesviruses. Studies were focused on homologous recombination-dependent DNA repair (HR). Methods of DNA sequence analysis were employed to scan herpesvirus genomes for DNA features able to activate HR. Analyses were carried out with a total of 39 herpesvirus DNA sequences, a group that included viruses from the alpha-, beta- and gamma-subfamilies. The results showed that all 39 genome sequences were enriched in two or more of the eight recombination-initiating features examined. The results were interpreted to indicate that HR can stabilize latent herpesvirus genomes. The results also showed, unexpectedly, that repair-initiating DNA features differed in alpha- compared to gamma-herpesviruses. Whereas inverted and tandem repeats predominated in alpha-herpesviruses, gamma-herpesviruses were enriched in short, GC-rich initiation sequences such as CCCAG and depleted in repeats. In alpha-herpesviruses, repair-initiating repeat sequences were found to be concentrated in a specific region (the S segment) of the genome while repair-initiating short sequences were distributed more uniformly in gamma-herpesviruses. The results suggest that repair pathways are activated differently in alpha- compared to gamma-herpesviruses. Copyright © 2014. Published by Elsevier Inc.

  11. African Ancestry Analysis and Admixture Genetic Mapping for Proliferative Diabetic Retinopathy in African Americans

    PubMed Central

    Tandon, Arti; Chen, Ching J.; Penman, Alan; Hancock, Heather; James, Maurice; Husain, Deeba; Andreoli, Christopher; Li, Xiaohui; Kuo, Jane Z.; Idowu, Omolola; Riche, Daniel; Papavasilieou, Evangelia; Brauner, Stacey; Smith, Sataria O.; Hoadley, Suzanne; Richardson, Cole; Kieser, Troy; Vazquez, Vanessa; Chi, Cheryl; Fernandez, Marlene; Harden, Maegan; Cotch, Mary Frances; Siscovick, David; Taylor, Herman A.; Wilson, James G.; Reich, David; Wong, Tien Y.; Klein, Ronald; Klein, Barbara E. K.; Rotter, Jerome I.; Patterson, Nick; Sobrin, Lucia

    2015-01-01

    Purpose. To examine the relationship between proportion of African ancestry (PAA) and proliferative diabetic retinopathy (PDR) and to identify genetic loci associated with PDR using admixture mapping in African Americans with type 2 diabetes (T2D). Methods. Between 1993 and 2013, 1440 participants enrolled in four different studies had fundus photographs graded using the Early Treatment Diabetic Retinopathy Study scale. Cases (n = 305) had PDR while controls (n = 1135) had nonproliferative diabetic retinopathy (DR) or no DR. Covariates included diabetes duration, hemoglobin A1C, systolic blood pressure, income, and education. Genotyping was performed on the Affymetrix platform. The association between PAA and PDR was evaluated using logistic regression. Genome-wide admixture scanning was performed using ANCESTRYMAP software. Results. In the univariate analysis, PDR was associated with increased PAA (odds ratio [OR] = 1.36, 95% confidence interval [CI] = 1.16–1.59, P = 0.0002). In multivariate regression adjusting for traditional DR risk factors, income and education, the association between PAA and PDR was attenuated and no longer significant (OR = 1.21, 95% CI = 0.59–2.47, P = 0.61). For the admixture analyses, the maximum genome-wide score was 1.44 on chromosome 1. Conclusions. In this largest study of PDR in African Americans with T2D to date, an association between PAA and PDR is not present after adjustment for clinical, demographic, and socioeconomic factors. No genome-wide significant locus (defined as having a locus-genome statistic > 5) was identified with admixture analysis. Further analyses with even larger sample sizes are needed to definitively assess if any admixture signal for DR is present. PMID:26098467

  12. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  13. A survey of genome-wide single nucleotide polymorphisms through genome resequencing in the Périgord black truffle (Tuber melanosporum Vittad.).

    PubMed

    Payen, Thibaut; Murat, Claude; Gigant, Anaïs; Morin, Emmanuelle; De Mita, Stéphane; Martin, Francis

    2015-09-01

    The Périgord black truffle (Tuber melanosporum Vittad.), considered a gastronomic delicacy worldwide, is an ectomycorrhizal filamentous fungus that is ecologically important in Mediterranean French, Italian and Spanish woodlands. In this study, we developed a novel resource of single nucleotide polymorphisms (SNPs) for T. melanosporum using Illumina high-throughput resequencing. The genome from six T. melanosporum geographical accessions was sequenced to a depth of approximately 20×. These geographical accessions were selected from different populations within the northern and southern regions of the geographical species distribution. Approximately 80% of the reads for each of the six resequenced geographical accessions mapped against the reference T. melanosporum genome assembly, estimating the core genome size of this organism to be approximately 110 Mbp. A total of 442 326 SNPs corresponding to 3540 SNPs/Mbps were identified as being included in all seven genomes. The SNPs occurred more frequently in repeated sequences (85%), although 4501 SNPs were also identified in the coding regions of 2587 genes. Using the ratio of nonsynonymous mutations per nonsynonymous site (pN) to synonymous mutations per synonymous site (pS) and Tajima's D index scanning the whole genome, we were able to identify genomic regions and genes potentially subjected to positive or purifying selection. The SNPs identified represent a valuable resource for future population genetics and genomics studies. © 2015 John Wiley & Sons Ltd.

  14. Improved Analysis of Nanopore Sequence Data and Scanning Nanopore Techniques

    NASA Astrophysics Data System (ADS)

    Szalay, Tamas

    The field of nanopore research has been driven by the need to inexpensively and rapidly sequence DNA. In order to help realize this goal, this thesis describes the PoreSeq algorithm that identifies and corrects errors in real-world nanopore sequencing data and improves the accuracy of de novo genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA advances through the nanopore and then using this model to find the sequence that best explains multiple reads of the same region of DNA. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85% to 99% at 100X coverage. We also use the algorithm to assemble E. coli with 30X coverage and the lambda genome at a range of coverages from 3X to 50X. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods. This thesis also reports preliminary progress towards controlling the motion of DNA using two nanopores instead of one. The speed at which the DNA travels through the nanopore needs to be carefully controlled to facilitate the detection of individual bases. A second nanopore in close proximity to the first could be used to slow or stop the motion of the DNA in order to enable a more accurate readout. The fabrication process for a new pyramidal nanopore geometry was developed in order to facilitate the positioning of the nanopores. This thesis demonstrates that two of them can be placed close enough to interact with a single molecule of DNA, which is a prerequisite for being able to use the driving force of the pores to exert fine control over the motion of the DNA. Another strategy for reading the DNA is to trap it completely with one pore and to move the second nanopore instead. To that end, this thesis also shows that a single strand of immobilized DNA can be captured in a scanning nanopore and examined for a full hour, with data from many scans at many different voltages obtained in order to detect a bound protein placed partway along the molecule.

  15. Genome-wide scan for selection signatures in six cattle breeds in South Africa.

    PubMed

    Makina, Sithembile O; Muchadeyi, Farai C; van Marle-Köster, Este; Taylor, Jerry F; Makgahlela, Mahlako L; Maiwashe, Azwihangwisi

    2015-11-26

    The detection of selection signatures in breeds of livestock species can contribute to the identification of regions of the genome that are, or have been, functionally important and, as a consequence, have been targeted by selection. This study used two approaches to detect signatures of selection within and between six cattle breeds in South Africa, including Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31) and Holstein (n = 29). The first approach was based on the detection of genomic regions in which haplotypes have been driven towards complete fixation within breeds. The second approach identified regions of the genome that had very different allele frequencies between populations (F ST). Forty-seven candidate genomic regions were identified as harbouring putative signatures of selection using both methods. Twelve of these candidate selected regions were shared among the breeds and ten were validated by previous studies. Thirty-three of these regions were successfully annotated and candidate genes were identified. Among these genes the keratin genes (KRT222, KRT24, KRT25, KRT26, and KRT27) and one heat shock protein gene (HSPB9) on chromosome 19 between 42,896,570 and 42,897,840 bp were detected for the Nguni breed. These genes were previously associated with adaptation to tropical environments in Zebu cattle. In addition, a number of candidate genes associated with the nervous system (WNT5B, FMOD, PRELP, and ATP2B), immune response (CYM, CDC6, and CDK10), production (MTPN, IGFBP4, TGFB1, and AJAP1) and reproductive performance (ADIPOR2, OVOS2, and RBBP8) were also detected as being under selection. The results presented here provide a foundation for detecting mutations that underlie genetic variation of traits that have economic importance for cattle breeds in South Africa.

  16. Genome-wide association study of type 2 diabetes in a sample from Mexico City and a meta-analysis of a Mexican-American sample from Starr County, Texas

    PubMed Central

    Parra, E. J.; Below, J. E.; Krithika, S.; Valladares, A.; Barta, J. L.; Cox, N. J.; Hanis, C. L.; Wacher, N.; Garcia-Mena, J.; Hu, P.; Shriver, M. D.; Kumate, J.; McKeigue, P. M.; Escobedo, J.; Cruz, M.

    2013-01-01

    Aims/hypothesis We report a genome-wide association study of type 2 diabetes in an admixed sample from Mexico City and describe the results of a meta-analysis of this study and another genome-wide scan in a Mexican-American sample from Starr County, TX, USA. The top signals observed in this meta-analysis were followed up in the Diabetes Genetics Replication and Meta-analysis Consortium (DIAGRAM) and DIAGRAM+ datasets. Methods We analysed 967 cases and 343 normoglycaemic controls. The samples were genotyped with the Affymetrix Genome-wide Human SNP array 5.0. Associations of genotyped and imputed markers with type 2 diabetes were tested using a missing data likelihood score test. A fixed-effects meta-analysis including 1,804 cases and 780 normoglycaemic controls was carried out by weighting the effect estimates by their inverse variances. Results In the meta-analysis of the two Hispanic studies, markers showing suggestive associations (p<10−5) were identified in two known diabetes genes, HNF1A and KCNQ1, as well as in several additional regions. Meta-analysis of the two Hispanic studies and the recent DIAGRAM+ dataset identified genome-wide significant signals (p<5×10−8) within or near the genes HNF1A and CDKN2A/CDKN2B, as well as suggestive associations in three additional regions, IGF2BP2, KCNQ1 and the previously unreported C14orf70. Conclusions/interpretation We observed numerous regions with suggestive associations with type 2 diabetes. Some of these signals correspond to regions described in previous studies. However, many of these regions could not be replicated in the DIAGRAM datasets. It is critical to carry out additional studies in Hispanic and American Indian populations, which have a high prevalence of type 2 diabetes. PMID:21573907

  17. Novel nuclei isolation buffer for flow cytometric genome size estimation of Zingiberaceae: a comparison with common isolation buffers

    PubMed Central

    Sadhu, Abhishek; Bhadra, Sreetama; Bandyopadhyay, Maumita

    2016-01-01

    Background and Aims Cytological parameters such as chromosome numbers and genome sizes of plants are used routinely for studying evolutionary aspects of polyploid plants. Members of Zingiberaceae show a wide range of inter- and intrageneric variation in their reproductive habits and ploidy levels. Conventional cytological study in this group of plants is severely hampered by the presence of diverse secondary metabolites, which also affect their genome size estimation using flow cytometry. None of the several nuclei isolation buffers used in flow cytometry could be used very successfully for members of Zingiberaceae to isolate good quality nuclei from both shoot and root tissues. Methods The competency of eight nuclei isolation buffers was compared with a newly formulated buffer, MB01, in six different genera of Zingiberaceae based on the fluorescence intensity of propidium iodide-stained nuclei using flow cytometric parameters, namely coefficient of variation of the G0/G1 peak, debris factor and nuclei yield factor. Isolated nuclei were studied using fluorescence microscopy and bio-scanning electron microscopy to analyse stain–nuclei interaction and nuclei topology, respectively. Genome contents of 21 species belonging to these six genera were determined using MB01. Key Results Flow cytometric parameters showed significant differences among the analysed buffers. MB01 exhibited the best combination of analysed parameters; photomicrographs obtained from fluorescence and electron microscopy supported the superiority of MB01 buffer over other buffers. Among the 21 species studied, nuclear DNA contents of 14 species are reported for the first time. Conclusions Results of the present study substantiate the enhanced efficacy of MB01, compared to other buffers tested, in the generation of acceptable cytograms from all species of Zingiberaceae studied. Our study facilitates new ways of sample preparation for further flow cytometric analysis of genome size of other members belonging to this highly complex polyploid family. PMID:27594649

  18. Whole genome annotation and comparative genomic analyses of bio-control fungus Purpureocillium lilacinum.

    PubMed

    Prasad, Pushplata; Varshney, Deepti; Adholeya, Alok

    2015-11-25

    The fungus Purpureocillium lilacinum is widely known as a biological control agent against plant parasitic nematodes. This research article consists of genomic annotation of the first draft of whole genome sequence of P. lilacinum. The study aims to decipher the putative genetic components of the fungus involved in nematode pathogenesis by performing comparative genomic analysis with nine closely related fungal species in Hypocreales. de novo genomic assembly was done and a total of 301 scaffolds were constructed for P. lilacinum genomic DNA. By employing structural genome prediction models, 13, 266 genes coding for proteins were predicted in the genome. Approximately 73% of the predicted genes were functionally annotated using Blastp, InterProScan and Gene Ontology. A 14.7% fraction of the predicted genes shared significant homology with genes in the Pathogen Host Interactions (PHI) database. The phylogenomic analysis carried out using maximum likelihood RAxML algorithm provided insight into the evolutionary relationship of P. lilacinum. In congruence with other closely related species in the Hypocreales namely, Metarhizium spp., Pochonia chlamydosporia, Cordyceps militaris, Trichoderma reesei and Fusarium spp., P. lilacinum has large gene sets coding for G-protein coupled receptors (GPCRs), proteases, glycoside hydrolases and carbohydrate esterases that are required for degradation of nematode-egg shell components. Screening of the genome by Antibiotics & Secondary Metabolite Analysis Shell (AntiSMASH) pipeline indicated that the genome potentially codes for a variety of secondary metabolites, possibly required for adaptation to heterogeneous lifestyles reported for P. lilacinum. Significant up-regulation of subtilisin-like serine protease genes in presence of nematode eggs in quantitative real-time analyses suggested potential role of serine proteases in nematode pathogenesis. The data offer a better understanding of Purpureocillium lilacinum genome and will enhance our understanding on the molecular mechanism involved in nematophagy.

  19. Genome-wide scan of 29,141 African Americans finds no evidence of directional selection since admixture.

    PubMed

    Bhatia, Gaurav; Tandon, Arti; Patterson, Nick; Aldrich, Melinda C; Ambrosone, Christine B; Amos, Christopher; Bandera, Elisa V; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Bock, Cathryn H; Caporaso, Neil; Casey, Graham; Deming, Sandra L; Diver, W Ryan; Gapstur, Susan M; Gillanders, Elizabeth M; Harris, Curtis C; Henderson, Brian E; Ingles, Sue A; Isaacs, William; De Jager, Phillip L; John, Esther M; Kittles, Rick A; Larkin, Emma; McNeill, Lorna H; Millikan, Robert C; Murphy, Adam; Neslund-Dudas, Christine; Nyante, Sarah; Press, Michael F; Rodriguez-Gil, Jorge L; Rybicki, Benjamin A; Schwartz, Ann G; Signorello, Lisa B; Spitz, Margaret; Strom, Sara S; Tucker, Margaret A; Wiencke, John K; Witte, John S; Wu, Xifeng; Yamamura, Yuko; Zanetti, Krista A; Zheng, Wei; Ziegler, Regina G; Chanock, Stephen J; Haiman, Christopher A; Reich, David; Price, Alkes L

    2014-10-02

    The extent of recent selection in admixed populations is currently an unresolved question. We scanned the genomes of 29,141 African Americans and failed to find any genome-wide-significant deviations in local ancestry, indicating no evidence of selection influencing ancestry after admixture. A recent analysis of data from 1,890 African Americans reported that there was evidence of selection in African Americans after their ancestors left Africa, both before and after admixture. Selection after admixture was reported on the basis of deviations in local ancestry, and selection before admixture was reported on the basis of allele-frequency differences between African Americans and African populations. The local-ancestry deviations reported by the previous study did not replicate in our very large sample, and we show that such deviations were expected purely by chance, given the number of hypotheses tested. We further show that the previous study's conclusion of selection in African Americans before admixture is also subject to doubt. This is because the FST statistics they used were inflated and because true signals of unusual allele-frequency differences between African Americans and African populations would be best explained by selection that occurred in Africa prior to migration to the Americas. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  20. A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number

    PubMed Central

    Beissinger, Timothy M.; Hirsch, Candice N.; Vaillancourt, Brieanne; Deshpande, Shweta; Barry, Kerrie; Buell, C. Robin; Kaeppler, Shawn M.; Gianola, Daniel; de Leon, Natalia

    2014-01-01

    A genome-wide scan to detect evidence of selection was conducted in the Golden Glow maize long-term selection population. The population had been subjected to selection for increased number of ears per plant for 30 generations, with an empirically estimated effective population size ranging from 384 to 667 individuals and an increase of more than threefold in the number of ears per plant. Allele frequencies at >1.2 million single-nucleotide polymorphism loci were estimated from pooled whole-genome resequencing data, and FST values across sliding windows were employed to assess divergence between the population preselection and the population postselection. Twenty-eight highly divergent regions were identified, with half of these regions providing gene-level resolution on potentially selected variants. Approximately 93% of the divergent regions do not demonstrate a significant decrease in heterozygosity, which suggests that they are not approaching fixation. Also, most regions display a pattern consistent with a soft-sweep model as opposed to a hard-sweep model, suggesting that selection mostly operated on standing genetic variation. For at least 25% of the regions, results suggest that selection operated on variants located outside of currently annotated coding regions. These results provide insights into the underlying genetic effects of long-term artificial selection and identification of putative genetic elements underlying number of ears per plant in maize. PMID:24381334

  1. mQTL-seq delineates functionally relevant candidate gene harbouring a major QTL regulating pod number in chickpea

    PubMed Central

    Das, Shouvik; Singh, Mohar; Srivastava, Rishi; Bajaj, Deepak; Saxena, Maneesha S.; Rana, Jai C.; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

    2016-01-01

    The present study used a whole-genome, NGS resequencing-based mQTL-seq (multiple QTL-seq) strategy in two inter-specific mapping populations (Pusa 1103 × ILWC 46 and Pusa 256 × ILWC 46) to scan the major genomic region(s) underlying QTL(s) governing pod number trait in chickpea. Essentially, the whole-genome resequencing of low and high pod number-containing parental accessions and homozygous individuals (constituting bulks) from each of these two mapping populations discovered >8 million high-quality homozygous SNPs with respect to the reference kabuli chickpea. The functional significance of the physically mapped SNPs was apparent from the identified 2,264 non-synonymous and 23,550 regulatory SNPs, with 8–10% of these SNPs-carrying genes corresponding to transcription factors and disease resistance-related proteins. The utilization of these mined SNPs in Δ (SNP index)-led QTL-seq analysis and their correlation between two mapping populations based on mQTL-seq, narrowed down two (CaqaPN4.1: 867.8 kb and CaqaPN4.2: 1.8 Mb) major genomic regions harbouring robust pod number QTLs into the high-resolution short QTL intervals (CaqbPN4.1: 637.5 kb and CaqbPN4.2: 1.28 Mb) on chickpea chromosome 4. The integration of mQTL-seq-derived one novel robust QTL with QTL region-specific association analysis delineated the regulatory (C/T) and coding (C/A) SNPs-containing one pentatricopeptide repeat (PPR) gene at a major QTL region regulating pod number in chickpea. This target gene exhibited anther, mature pollen and pod-specific expression, including pronounced higher up-regulated (∼3.5-folds) transcript expression in high pod number-containing parental accessions and homozygous individuals of two mapping populations especially during pollen and pod development. The proposed mQTL-seq-driven combinatorial strategy has profound efficacy in rapid genome-wide scanning of potential candidate gene(s) underlying trait-associated high-resolution robust QTL(s), thereby expediting genomics-assisted breeding and genetic enhancement of crop plants, including chickpea. PMID:26685680

  2. Tapping the promise of genomics in species with complex, nonmodel genomes.

    PubMed

    Hirsch, Candice N; Buell, C Robin

    2013-01-01

    Genomics is enabling a renaissance in all disciplines of plant biology. However, many plant genomes are complex and remain recalcitrant to current genomic technologies. The complexities of these nonmodel plant genomes are attributable to gene and genome duplication, heterozygosity, ploidy, and/or repetitive sequences. Methods are available to simplify the genome and reduce these barriers, including inbreeding and genome reduction, making these species amenable to current sequencing and assembly methods. Some, but not all, of the complexities in nonmodel genomes can be bypassed by sequencing the transcriptome rather than the genome. Additionally, comparative genomics approaches, which leverage phylogenetic relatedness, can aid in the interpretation of complex genomes. Although there are limitations in accessing complex nonmodel plant genomes using current sequencing technologies, genome manipulation and resourceful analyses can allow access to even the most recalcitrant plant genomes.

  3. Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    PubMed

    Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

    2015-02-14

    Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.

  4. Endogenous hepadnaviruses, bornaviruses and circoviruses in snakes.

    PubMed

    Gilbert, C; Meik, J M; Dashevsky, D; Card, D C; Castoe, T A; Schaack, S

    2014-09-22

    We report the discovery of endogenous viral elements (EVEs) from Hepadnaviridae, Bornaviridae and Circoviridae in the speckled rattlesnake, Crotalus mitchellii, the first viperid snake for which a draft whole genome sequence assembly is available. Analysis of the draft assembly reveals genome fragments from the three virus families were inserted into the genome of this snake over the past 50 Myr. Cross-species PCR screening of orthologous loci and computational scanning of the python and king cobra genomes reveals that circoviruses integrated most recently (within the last approx. 10 Myr), whereas bornaviruses and hepadnaviruses integrated at least approximately 13 and approximately 50 Ma, respectively. This is, to our knowledge, the first report of circo-, borna- and hepadnaviruses in snakes and the first characterization of non-retroviral EVEs in non-avian reptiles. Our study provides a window into the historical dynamics of viruses in these host lineages and shows that their evolution involved multiple host-switches between mammals and reptiles. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  5. The pig genome project has plenty to squeal about.

    PubMed

    Fan, B; Gorbach, D M; Rothschild, M F

    2011-01-01

    Significant progress on pig genetics and genomics research has been witnessed in recent years due to the integration of advanced molecular biology techniques, bioinformatics and computational biology, and the collaborative efforts of researchers in the swine genomics community. Progress on expanding the linkage map has slowed down, but the efforts have created a higher-resolution physical map integrating the clone map and BAC end sequence. The number of QTL mapped is still growing and most of the updated QTL mapping results are available through PigQTLdb. Additionally, expression studies using high-throughput microarrays and other gene expression techniques have made significant advancements. The number of identified non-coding RNAs is rapidly increasing and their exact regulatory functions are being explored. A publishable draft (build 10) of the swine genome sequence was available for the pig genomics community by the end of December 2010. Build 9 of the porcine genome is currently available with Ensembl annotation; manual annotation is ongoing. These drafts provide useful tools for such endeavors as comparative genomics and SNP scans for fine QTL mapping. A recent community-wide effort to create a 60K porcine SNP chip has greatly facilitated whole-genome association analyses, haplotype block construction and linkage disequilibrium mapping, which can contribute to whole-genome selection. The future 'systems biology' that integrates and optimizes the information from all research levels can enhance the pig community's understanding of the full complexity of the porcine genome. These recent technological advances and where they may lead are reviewed. Copyright © 2011 S. Karger AG, Basel.

  6. How Are Information Seeking, Scanning, and Processing Related to Beliefs About the Roles of Genetics and Behavior in Cancer Causation?

    PubMed

    Waters, Erika A; Wheeler, Courtney; Hamilton, Jada G

    2016-01-01

    Understanding that cancer is caused by both genetic and behavioral risk factors is an important component of genomic literacy. However, a considerable percentage of people in the United States do not endorse such multifactorial beliefs. Using nationally representative cross-sectional data from the U.S. Health Information National Trends Survey (N = 2,529), we examined how information seeking, information scanning, and key information-processing characteristics were associated with endorsing a multifactorial model of cancer causation. Multifactorial beliefs about cancer were more common among respondents who engaged in cancer information scanning (p = .001), were motivated to process health information (p = .005), and reported a family history of cancer (p = .0002). Respondents who reported having previous negative information-seeking experiences had lower odds of endorsing multifactorial beliefs (p = .01). Multifactorial beliefs were not associated with cancer information seeking, trusting cancer information obtained from the Internet, trusting cancer information from a physician, self-efficacy for obtaining cancer information, numeracy, or being aware of direct-to-consumer genetic testing (ps > .05). Gaining additional understanding of how people access, process, and use health information will be critical for the continued development and dissemination of effective health communication interventions and for the further translation of genomics research to public health and clinical practice.

  7. High degree of genetic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus).

    PubMed

    Defaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Merilä, Juha

    2013-09-01

    Populations of widespread marine organisms are typically characterized by a low degree of genetic differentiation in neutral genetic markers, but much less is known about differentiation in genes whose functional roles are associated with specific selection regimes. To uncover possible adaptive population divergence and heterogeneous genomic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus), we used a candidate gene-based genome-scan approach to analyse variability in 138 microsatellite loci located within/close to (<6 kb) functionally important genes in samples collected from ten geographic locations. The degree of genetic differentiation in markers classified as neutral or under balancing selection-as determined with several outlier detection methods-was low (F(ST) = 0.033 or 0.011, respectively), whereas average FST for directionally selected markers was significantly higher (F(ST) = 0.097). Clustering analyses provided support for genomic and geographic heterogeneity in selection: six genetic clusters were identified based on allele frequency differences in the directionally selected loci, whereas four were identified with the neutral loci. Allelic variation in several loci exhibited significant associations with environmental variables, supporting the conjecture that temperature and salinity, but not optic conditions, are important drivers of adaptive divergence among populations. In general, these results suggest that in spite of the high degree of physical connectivity and gene flow as inferred from neutral marker genes, marine stickleback populations are strongly genetically structured in loci associated with functionally relevant genes. © 2013 John Wiley & Sons Ltd.

  8. A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation

    PubMed Central

    Aubrey, Wayne; Riley, Michael C.; Young, Michael; King, Ross D.; Oliver, Stephen G.; Clare, Amanda

    2015-01-01

    Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method’s primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome. PMID:26630677

  9. A genome-wide scan study identifies a single nucleotide substitution in ASIP associated with white versus non-white coat-colour variation in sheep (Ovis aries)

    PubMed Central

    Li, M-H; Tiirikka, T; Kantanen, J

    2014-01-01

    In sheep, coat colour (and pattern) is one of the important traits of great biological, economic and social importance. However, the genetics of sheep coat colour has not yet been fully clarified. We conducted a genome-wide association study of sheep coat colours by genotyping 47 303 single-nucleotide polymorphisms (SNPs) in the Finnsheep population in Finland. We identified 35 SNPs associated with all the coat colours studied, which cover genomic regions encompassing three known pigmentation genes (TYRP1, ASIP and MITF) in sheep. Eighteen of these associations were confirmed in further tests between white versus non-white individuals, but none of the 35 associations were significant in the analysis of only non-white colours. Across the tests, the s66432.1 in ASIP showed significant association (P=4.2 × 10−11 for all the colours; P=2.3 × 10−11 for white versus non-white colours) with the variation in coat colours and strong linkage disequilibrium with other significant variants surrounding the ASIP gene. The signals detected around the ASIP gene were explained by differences in white versus non-white alleles. Further, a genome scan for selection for white coat pigmentation identified a strong and striking selection signal spanning ASIP. Our study identified the main candidate gene for the coat colour variation between white and non-white as ASIP, an autosomal gene that has been directly implicated in the pathway regulating melanogenesis. Together with ASIP, the two other newly identified genes (TYRP1 and MITF) in the Finnsheep, bordering associated SNPs, represent a new resource for enriching sheep coat-colour genetics and breeding. PMID:24022497

  10. A genome-wide association scan for acute insulin response to glucose in Hispanic-Americans: the Insulin Resistance Atherosclerosis Family Study (IRAS FS).

    PubMed

    Rich, S S; Goodarzi, M O; Palmer, N D; Langefeld, C D; Ziegler, J; Haffner, S M; Bryer-Ash, M; Norris, J M; Taylor, K D; Haritunians, T; Rotter, J I; Chen, Y-D I; Wagenknecht, L E; Bowden, D W; Bergman, R N

    2009-07-01

    This study sought to identify genes and regions in the human genome that are associated with the acute insulin response to glucose (AIRg), an important predictor of type 2 diabetes, in Hispanic-American participants from the Insulin Resistance Atherosclerosis Family Study (IRAS FS). A two-stage genome-wide association scan (GWAS) was performed in IRAS FS Hispanic-American samples. In the first stage, 317K single nucleotide polymorphisms (SNPs) were assessed in 229 Hispanic-American DNA samples from 34 families from San Antonio, TX, USA. SNPs with the most significant associations with AIRg were genotyped in the entire set of IRAS FS Hispanic-American samples (n = 1,190). In chromosomal regions with evidence of association, additional SNPs were genotyped to capture variation in genes. No individual SNP achieved genome-wide levels of significance (p < 5 x 10(-7)); however, two regions (chromosomes 6p21 and 20p11) had multiple highly ranked SNPs that were associated with AIRg. Additional genotyping in these regions supported the initial evidence of variants contributing to variation in AIRg. One region resides in a gene desert between PXT1 and KCTD20 on 6p21, while the region on 20p11 has several viable candidate genes (ENTPD6, PYGB, GINS1 and RP4-691N24.1). A GWAS in Hispanic-American samples identified several candidate genes and loci that may be associated with AIRg. These associations explain a small component of variation in AIRg. The genes identified are involved in phosphorylation and ion transport, and provide preliminary evidence that these processes are important in beta cell response.

  11. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.

    PubMed

    Excoffier, Laurent; Lischer, Heidi E L

    2010-05-01

    We present here a new version of the Arlequin program available under three different forms: a Windows graphical version (Winarl35), a console version of Arlequin (arlecore), and a specific console version to compute summary statistics (arlsumstat). The command-line versions run under both Linux and Windows. The main innovations of the new version include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans. Command-line versions are designed to handle large series of files, and arlsumstat can be used to generate summary statistics from simulated data sets within an Approximate Bayesian Computation framework. © 2010 Blackwell Publishing Ltd.

  12. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  13. Cell-free circulating tumour DNA as a liquid biopsy in breast cancer.

    PubMed

    De Mattos-Arruda, Leticia; Caldas, Carlos

    2016-03-01

    Recent developments in massively parallel sequencing and digital genomic techniques support the clinical validity of cell-free circulating tumour DNA (ctDNA) as a 'liquid biopsy' in human cancer. In breast cancer, ctDNA detected in plasma can be used to non-invasively scan tumour genomes and quantify tumour burden. The applications for ctDNA in plasma include identifying actionable genomic alterations, monitoring treatment responses, unravelling therapeutic resistance, and potentially detecting disease progression before clinical and radiological confirmation. ctDNA may be used to characterise tumour heterogeneity and metastasis-specific mutations providing information to adapt the therapeutic management of patients. In this article, we review the current status of ctDNA as a 'liquid biopsy' in breast cancer. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  14. A High-Density Admixture Map for Disease Gene Discovery in African Americans

    PubMed Central

    Smith, Michael W. ; Patterson, Nick ; Lautenberger, James A. ; Truelove, Ann L. ; McDonald, Gavin J. ; Waliszewska, Alicja ; Kessing, Bailey D. ; Malasky, Michael J. ; Scafe, Charles ; Le, Ernest ; De Jager, Philip L. ; Mignault, Andre A. ; Yi, Zeng ; de Thé, Guy ; Essex, Myron ; Sankalé, Jean-Louis ; Moore, Jason H. ; Poku, Kwabena ; Phair, John P. ; Goedert, James J. ; Vlahov, David ; Williams, Scott M. ; Tishkoff, Sarah A. ; Winkler, Cheryl A. ; De La Vega, Francisco M. ; Woodage, Trevor ; Sninsky, John J. ; Hafler, David A. ; Altshuler, David ; Gilbert, Dennis A. ; O’Brien, Stephen J. ; Reich, David 

    2004-01-01

    Admixture mapping (also known as “mapping by admixture linkage disequilibrium,” or MALD) provides a way of localizing genes that cause disease, in admixed ethnic groups such as African Americans, with ∼100 times fewer markers than are required for whole-genome haplotype scans. However, it has not been possible to perform powerful scans with admixture mapping because the method requires a dense map of validated markers known to have large frequency differences between Europeans and Africans. To create such a map, we screened through databases containing ∼450,000 single-nucleotide polymorphisms (SNPs) for which frequencies had been estimated in African and European population samples. We experimentally confirmed the frequencies of the most promising SNPs in a multiethnic panel of unrelated samples and identified 3,011 as a MALD map (1.2 cM average spacing). We estimate that this map is ∼70% informative in differentiating African versus European origins of chromosomal segments. This map provides a practical and powerful tool, which is freely available without restriction, for screening for disease genes in African American patient cohorts. The map is especially appropriate for those diseases that differ in incidence between the parental African and European populations. PMID:15088270

  15. Identification of Tumor Suppressor Genes by Genetic and Epigenetic Genome-Scanning

    DTIC Science & Technology

    2008-04-01

    SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC a. REPORT U b. ABSTRACT U...oncogene-related sequences in human neuroblastomas. Cell 35: 359-67; 1983. 3. Capon, D. J.; Seeburg, P. H.; McGrath, J. P.; Hayflick , J. S.; Edman

  16. A Coordinated Approach to Peach SNP Discovery in RosBREED

    USDA-ARS?s Scientific Manuscript database

    In the USDA-funded multi-institutional and trans-disciplinary project, “RosBREED”, crop-specific SNP genome scan platforms are being developed for peach, apple, strawberry, and cherry at a resolution of at least one polymorphic SNP marker every 5 cM in any random cross, for use in Pedigree-Based Ana...

  17. Invasion Science: A Horizon Scan of Emerging Challenges and Opportunities

    Treesearch

    Anthony Ricciardi; Tim M. Blackburn; James T. Carlton; Jaimie T.A. Dick; Philip E. Hulme; Josephine C. Iacarella; Jonathan M. Jeschke; Andrew M. Liebhold; Julie L. Lockwood; Hugh J. MacIsaac; Petr Pyšek; David M. Richardson; Gregory M. Ruiz; Daniel Simberloff; William J. Sutherland; David A. Wardle; David C. Aldridge

    2017-01-01

    We identified emerging scientific, technological, and sociopolitical issues likely to affect how biological invasions are studied and managed over the next two decades. Issues were ranked according to their probability of emergence, pervasiveness, potential impact, and novelty. Top-ranked issues include the application of genomic modification tools to control invasions...

  18. affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling.

    PubMed

    Hernandez-Ferrer, Carles; Quintela Garcia, Ines; Danielski, Katharina; Carracedo, Ángel; Pérez-Jurado, Luis A; González, Juan R

    2015-05-20

    The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies. We illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling. Both examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered.

  19. MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets

    PubMed Central

    Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    2016-01-01

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder. PMID:27684958

  20. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

    PubMed

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

  1. Genomic and Transcriptomic Associations Identify a New Insecticide Resistance Phenotype for the Selective Sweep at the Cyp6g1 Locus of Drosophila melanogaster.

    PubMed

    Battlay, Paul; Schmidt, Joshua M; Fournier-Level, Alexandre; Robin, Charles

    2016-08-09

    Scans of the Drosophila melanogaster genome have identified organophosphate resistance loci among those with the most pronounced signature of positive selection. In this study, the molecular basis of resistance to the organophosphate insecticide azinphos-methyl was investigated using the Drosophila Genetic Reference Panel, and genome-wide association. Recently released full transcriptome data were used to extend the utility of the Drosophila Genetic Reference Panel resource beyond traditional genome-wide association studies to allow systems genetics analyses of phenotypes. We found that both genomic and transcriptomic associations independently identified Cyp6g1, a gene involved in resistance to DDT and neonicotinoid insecticides, as the top candidate for azinphos-methyl resistance. This was verified by transgenically overexpressing Cyp6g1 using natural regulatory elements from a resistant allele, resulting in a 6.5-fold increase in resistance. We also identified four novel candidate genes associated with azinphos-methyl resistance, all of which are involved in either regulation of fat storage, or nervous system development. In Cyp6g1, we find a demonstrable resistance locus, a verification that transcriptome data can be used to identify variants associated with insecticide resistance, and an overlap between peaks of a genome-wide association study, and a genome-wide selective sweep analysis. Copyright © 2016 Battlay et al.

  2. Huffman and linear scanning methods with statistical language models.

    PubMed

    Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris

    2015-03-01

    Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.

  3. Coordinates and intervals in graph-based reference genomes.

    PubMed

    Rand, Knut D; Grytten, Ivar; Nederbragt, Alexander J; Storvik, Geir O; Glad, Ingrid K; Sandve, Geir K

    2017-05-18

    It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .

  4. [Analysis of cis-regulatory element distribution in gene promoters of Gossypium raimondii and Arabidopsis thaliana].

    PubMed

    Sun, Gao-Fei; He, Shou-Pu; Du, Xiong-Ming

    2013-10-01

    Cotton genomic studies have boomed since the release of Gossypium raimondii draft genome. In this study, cis-regulatory element (CRE) in 1 kb length sequence upstream 5' UTR of annotated genes were selected and scanned in the Arabidopsis thaliana (At) and Gossypium raimondii (Gr) genomes, based on the database of PLACE (Plant cis-acting Regulatory DNA Elements). According to the definition of this study, 44 (12.3%) and 57 (15.5%) CREs presented "peak-like" distribution in the 1 kb selected sequences of both genomes, respectively. Thirty-four of them were peak-like distributed in both genomes, which could be further categorized into 4 types based on their core sequences. The coincidence of TATABOX peak position and their actual position ((-) -30 bp) indicated that the position of a common CRE was conservative in different genes, which suggested that the peak position of these CREs was their possible actual position of transcription factors. The position of a common CRE was also different between the two genomes due to stronger length variation of 5' UTR in Gr than At. Furthermore, most of the peak-like CREs were located in the region of -110 bp-0 bp, which suggested that concentrated distribution might be conductive to the interaction of transcription factors, and then regulate the gene expression in downstream.

  5. CFGP: a web-based, comparative fungal genomics platform

    PubMed Central

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F.; Blair, Jaime E.; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the ‘fill-in-the-form-and-press-SUBMIT’ user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI. PMID:17947331

  6. Genome-wide high-throughput SNP discovery and genotyping for understanding natural (functional) allelic diversity and domestication patterns in wild chickpea

    PubMed Central

    Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

    2015-01-01

    We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313

  7. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking.

    PubMed

    Daetwyler, Hans D; Calus, Mario P L; Pong-Wong, Ricardo; de Los Campos, Gustavo; Hickey, John M

    2013-02-01

    The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.

  8. Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking

    PubMed Central

    Daetwyler, Hans D.; Calus, Mario P. L.; Pong-Wong, Ricardo; de los Campos, Gustavo; Hickey, John M.

    2013-01-01

    The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals. PMID:23222650

  9. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  10. Approximation of reliability of direct genomic breeding values

    USDA-ARS?s Scientific Manuscript database

    Two methods to efficiently approximate theoretical genomic reliabilities are presented. The first method is based on the direct inverse of the left hand side (LHS) of mixed model equations. It uses the genomic relationship matrix for a small subset of individuals with the highest genomic relationshi...

  11. Estimation of the genome sizes of the chigger mites Leptotrombidium pallidum and Leptotrombidium scutellare based on quantitative PCR and k-mer analysis

    PubMed Central

    2014-01-01

    Background Leptotrombidium pallidum and Leptotrombidium scutellare are the major vector mites for Orientia tsutsugamushi, the causative agent of scrub typhus. Before these organisms can be subjected to whole-genome sequencing, it is necessary to estimate their genome sizes to obtain basic information for establishing the strategies that should be used for genome sequencing and assembly. Method The genome sizes of L. pallidum and L. scutellare were estimated by a method based on quantitative real-time PCR. In addition, a k-mer analysis of the whole-genome sequences obtained through Illumina sequencing was conducted to verify the mutual compatibility and reliability of the results. Results The genome sizes estimated using qPCR were 191 ± 7 Mb for L. pallidum and 262 ± 13 Mb for L. scutellare. The k-mer analysis-based genome lengths were estimated to be 175 Mb for L. pallidum and 286 Mb for L. scutellare. The estimates from these two independent methods were mutually complementary and within a similar range to those of other Acariform mites. Conclusions The estimation method based on qPCR appears to be a useful alternative when the standard methods, such as flow cytometry, are impractical. The relatively small estimated genome sizes should facilitate whole-genome analysis, which could contribute to our understanding of Arachnida genome evolution and provide key information for scrub typhus prevention and mite vector competence. PMID:24947244

  12. MutScan: fast detection and visualization of target mutations by scanning FASTQ data.

    PubMed

    Chen, Shifu; Huang, Tanxiao; Wen, Tiexiang; Li, Hong; Xu, Mingyan; Gu, Jia

    2018-01-22

    Some types of clinical genetic tests, such as cancer testing using circulating tumor DNA (ctDNA), require sensitive detection of known target mutations. However, conventional next-generation sequencing (NGS) data analysis pipelines typically involve different steps of filtering, which may cause miss-detection of key mutations with low frequencies. Variant validation is also indicated for key mutations detected by bioinformatics pipelines. Typically, this process can be executed using alignment visualization tools such as IGV or GenomeBrowse. However, these tools are too heavy and therefore unsuitable for validating mutations in ultra-deep sequencing data. We developed MutScan to address problems of sensitive detection and efficient validation for target mutations. MutScan involves highly optimized string-searching algorithms, which can scan input FASTQ files to grab all reads that support target mutations. The collected supporting reads for each target mutation will be piled up and visualized using web technologies such as HTML and JavaScript. Algorithms such as rolling hash and bloom filter are applied to accelerate scanning and make MutScan applicable to detect or visualize target mutations in a very fast way. MutScan is a tool for the detection and visualization of target mutations by only scanning FASTQ raw data directly. Compared to conventional pipelines, this offers a very high performance, executing about 20 times faster, and offering maximal sensitivity since it can grab mutations with even one single supporting read. MutScan visualizes detected mutations by generating interactive pile-ups using web technologies. These can serve to validate target mutations, thus avoiding false positives. Furthermore, MutScan can visualize all mutation records in a VCF file to HTML pages for cloud-friendly VCF validation. MutScan is an open source tool available at GitHub: https://github.com/OpenGene/MutScan.

  13. Data-Independent Mass Spectrometry Approach for Screening and Identification of DNA Adducts.

    PubMed

    Guo, Jingshu; Villalta, Peter W; Turesky, Robert J

    2017-11-07

    Long-term exposures to environmental toxicants and endogenous electrophiles are causative factors for human diseases including cancer. DNA adducts reflect the internal exposure to genotoxicants and can serve as biomarkers for risk assessment. Liquid chromatography-multistage mass spectrometry (LC-MS n ) is the most common method for biomonitoring DNA adducts, generally targeting single exposures and measuring up to several adducts. However, the data often provide limited evidence for a role of a chemical in the etiology of cancer. An "untargeted" method is required that captures global exposures to chemicals, by simultaneously detecting their DNA adducts in the genome; some of which may induce cancer-causing mutations. We established a wide selected ion monitoring tandem mass spectrometry (wide-SIM/MS 2 ) screening method utilizing ultraperformance-LC nanoelectrospray ionization Orbitrap MS n with online trapping to enrich bulky, nonpolar adducts. Wide-SIM scan events are followed by MS 2 scans to screen for modified nucleosides by coeluting peaks containing precursor and fragment ions differing by -116.0473 Da, attributed to the neutral loss of deoxyribose. Wide-SIM/MS 2 was shown to be superior in sensitivity, specificity, and breadth of adduct coverage to other tested adductomic methods with detection possible at adduct levels as low as 4 per 10 9 nucleotides. Wide-SIM/MS 2 data can be analyzed in a "targeted" fashion by generation of extracted ion chromatograms or in an "untargeted" fashion where a chromatographic peak-picking algorithm can be used to detect putative DNA adducts. Wide-SIM/MS 2 successfully detected DNA adducts, derived from chemicals in the diet and traditional medicines and from lipid peroxidation products, in human prostate and renal specimens.

  14. Multiple Hotspot Mutations Scanning by Single Droplet Digital PCR.

    PubMed

    Decraene, Charles; Silveira, Amanda B; Bidard, François-Clément; Vallée, Audrey; Michel, Marc; Melaabi, Samia; Vincent-Salomon, Anne; Saliou, Adrien; Houy, Alexandre; Milder, Maud; Lantz, Olivier; Ychou, Marc; Denis, Marc G; Pierga, Jean-Yves; Stern, Marc-Henri; Proudhon, Charlotte

    2018-02-01

    Progress in the liquid biopsy field, combined with the development of droplet digital PCR (ddPCR), has enabled noninvasive monitoring of mutations with high detection accuracy. However, current assays detect a restricted number of mutations per reaction. ddPCR is a recognized method for detecting alterations previously characterized in tumor tissues, but its use as a discovery tool when the mutation is unknown a priori remains limited. We established 2 ddPCR assays detecting all genomic alterations within KRAS exon 2 and EGFR exon 19 mutation hotspots, which are of clinical importance in colorectal and lung cancer, with use of a unique pair of TaqMan ® oligoprobes. The KRAS assay scanned for the 7 most common mutations in codons 12/13 but also all other mutations found in that region. The EGFR assay screened for all in-frame deletions of exon 19, which are frequent EGFR-activating events. The KRAS and EGFR assays were highly specific and both reached a limit of detection of <0.1% in mutant allele frequency. We further validated their performance on multiple plasma and formalin-fixed and paraffin-embedded tumor samples harboring a panel of different KRAS or EGFR mutations. This method presents the advantage of detecting a higher number of mutations with single-reaction ddPCRs while consuming a minimum of patient sample. This is particularly useful in the context of liquid biopsy because the amount of circulating tumor DNA is often low. This method should be useful as a discovery tool when the tumor tissue is unavailable or to monitor disease during therapy. © 2017 American Association for Clinical Chemistry.

  15. IPRStats: visualization of the functional potential of an InterProScan run.

    PubMed

    Kelly, Ryan J; Vincent, David E; Friedberg, Iddo

    2010-12-21

    InterPro is a collection of protein signatures for the classification and automated annotation of proteins. Interproscan is a software tool that scans protein sequences against Interpro member databases using a variety of profile-based, hidden markov model and positional specific score matrix methods. It not only combines a set of analysis tools, but also performs data look-up from various sources, as well as some redundancy removal. Interproscan is robust and scalable, able to perform on any machine from a netbook to a large cluster. However, when performing whole-genome or metagenome analysis, there is a need for a fast statistical visualization of the results to have good initial grasp on the functional potential of the sequences in the analyzed data set. This is especially important when analyzing and comparing metagenomic or metaproteomic data-sets. IPRStats is a tool for the visualization of Interproscan results. Interproscan results are parsed from the Interproscan XML or EBIXML file into an SQLite or MySQL database. The results for each signature database scan are read and displayed as pie-charts or bar charts as summary statistics. A table is also provided, where each entry is a signature (e.g. a Pfam entry) accompanied by one or more Gene Ontology terms, if Interproscan was run using the Gene Ontology option. We present an platform-independent, open source licensed tool that is useful for Interproscan users who wish to view the summary of their results in a rapid and concise fashion.

  16. An efficient approach to BAC based assembly of complex genomes.

    PubMed

    Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

    2016-01-01

    There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

  17. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  18. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  19. A dictionary based informational genome analysis

    PubMed Central

    2012-01-01

    Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068

  20. Dissection of Genomewide-Scan Data in Extended Families Reveals a Major Locus and Oligogenic Susceptibility for Age-Related Macular Degeneration

    PubMed Central

    Iyengar, Sudha K.; Song, Danhong; Klein, Barbara E. K.; Klein, Ronald; Schick, James H.; Humphrey, Jennifer; Millard, Christopher; Liptak, Rachel; Russo, Karlie; Jun, Gyungah; Lee, Kristine E.; Fijal, Bonnie; Elston, Robert C.

    2004-01-01

    To examine the genetic basis of age-related macular degeneration (ARMD), a degenerative disease of the retinal pigment epithelium and neurosensory retina, we conducted a genomewide scan in 34 extended families (297 individuals, 349 sib pairs) ascertained through index cases with neovascular disease or geographic atrophy. Family and medical history was obtained from index cases and family members. Fundus photographs were taken of all participating family members, and these were graded for severity by use of a quantitative scale. Model-free linkage analysis was performed, and tests of heterogeneity and epistasis were conducted. We have evidence of a major locus on chromosome 15q (GATA50C03 multipoint P=1.98×10-7; empirical P⩽1.0×10-5; single-point P=3.6×10-7). This locus was present as a weak linkage signal in our previous genome scan for ARMD, in the Beaver Dam Eye Study sample (D15S659, multipoint P=.047), but is otherwise novel. In this genome scan, we observed a total of 13 regions on 11 chromosomes (1q31, 2p21, 4p16, 5q34, 9p24, 9q31, 10q26, 12q13, 12q23, 15q21, 16p12, 18p11, and 20q13), with a nominal multipoint significance level of P⩽.01 or LOD ⩾1.18. Family-by-family analysis of the data, performed using model-free linkage methods, suggests that there is evidence of heterogeneity in these families. For example, a single family (family 460) individually shows linkage evidence at 8 loci, at the level of P<.0001. We conducted tests for heterogeneity, which suggest that ARMD susceptibility loci on chromosomes 9p24, 10q26, and 15q21 are not present in all families. We tested for mutations in linked families and examined SNPs in two candidate genes, hemicentin-1 and EFEMP1, in subsamples (145 and 189 sib pairs, respectively) of the data. Mutations were not observed in any of the 11 exons of EFEMP1 nor in exon 104 of hemicentin-1. The SNP analysis for hemicentin-1 on 1q31 suggests that variants within or in very close proximity to this gene cause ARMD pathogenesis. In summary, we have evidence for a major ARMD locus on 15q21, which, coupled with numerous other loci segregating in these families, suggests complex oligogenic patterns of inheritance for ARMD. PMID:14691731

  1. Genome-wide variation within and between wild and domestic yak.

    PubMed

    Wang, Kun; Hu, Quanjun; Ma, Hui; Wang, Lizhong; Yang, Yongzhi; Luo, Wenchun; Qiu, Qiang

    2014-07-01

    The yak is one of the few animals that can thrive in the harsh environment of the Qinghai-Tibetan Plateau and adjacent Alpine regions. Yak provides essential resources allowing Tibetans to live at high altitudes. However, genetic variation within and between wild and domestic yak remain unknown. Here, we present a genome-wide study of the genetic variation within and between wild and domestic yak. Using next-generation sequencing technology, we resequenced three wild and three domestic yak with a mean of fivefold coverage using our published domestic yak genome as a reference. We identified a total of 8.38 million SNPs (7.14 million novel), 383,241 InDels and 126,352 structural variants between the six yak. We observed higher linkage disequilibrium in domestic yak than in wild yak and a modest but distinct genetic divergence between these two groups. We further identified more than a thousand of potential selected regions (PSRs) for the three domestic yak by scanning the whole genome. These genomic resources can be further used to study genetic diversity and select superior breeds of yak and other bovid species. © 2014 John Wiley & Sons Ltd.

  2. Exploration of the Drosophila buzzatii transposable element content suggests underestimation of repeats in Drosophila genomes.

    PubMed

    Rius, Nuria; Guillén, Yolanda; Delprat, Alejandra; Kapusta, Aurélie; Feschotte, Cédric; Ruiz, Alfredo

    2016-05-10

    Many new Drosophila genomes have been sequenced in recent years using new-generation sequencing platforms and assembly methods. Transposable elements (TEs), being repetitive sequences, are often misassembled, especially in the genomes sequenced with short reads. Consequently, the mobile fraction of many of the new genomes has not been analyzed in detail or compared with that of other genomes sequenced with different methods, which could shed light into the understanding of genome and TE evolution. Here we compare the TE content of three genomes: D. buzzatii st-1, j-19, and D. mojavensis. We have sequenced a new D. buzzatii genome (j-19) that complements the D. buzzatii reference genome (st-1) already published, and compared their TE contents with that of D. mojavensis. We found an underestimation of TE sequences in Drosophila genus NGS-genomes when compared to Sanger-genomes. To be able to compare genomes sequenced with different technologies, we developed a coverage-based method and applied it to the D. buzzatii st-1 and j-19 genome. Between 10.85 and 11.16 % of the D. buzzatii st-1 genome is made up of TEs, between 7 and 7,5 % of D. buzzatii j-19 genome, while TEs represent 15.35 % of the D. mojavensis genome. Helitrons are the most abundant order in the three genomes. TEs in D. buzzatii are less abundant than in D. mojavensis, as expected according to the genome size and TE content positive correlation. However, TEs alone do not explain the genome size difference. TEs accumulate in the dot chromosomes and proximal regions of D. buzzatii and D. mojavensis chromosomes. We also report a significantly higher TE density in D. buzzatii and D. mojavensis X chromosomes, which is not expected under the current models. Our easy-to-use correction method allowed us to identify recently active families in D. buzzatii st-1 belonging to the LTR-retrotransposon superfamily Gypsy.

  3. Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants.

    PubMed

    Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J

    2016-03-22

    The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.

  4. Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods and Challenges

    PubMed Central

    Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai

    2017-01-01

    Technological advances have led to an explosive growth of high-throughput functional genomic data. Exploiting the correlation among different data types, it is possible to predict one functional genomic data type from other data types. Prediction tools are valuable in understanding the relationship among different functional genomic signals. They also provide a cost-efficient solution to inferring the unknown functional genomic profiles when experimental data are unavailable due to resource or technological constraints. The predicted data may be used for generating hypotheses, prioritizing targets, interpreting disease variants, facilitating data integration, quality control, and many other purposes. This article reviews various applications of prediction methods in functional genomics, discusses analytical challenges, and highlights some common and effective strategies used to develop prediction methods for functional genomic data. PMID:28076869

  5. [Genome editing of industrial microorganism].

    PubMed

    Zhu, Linjiang; Li, Qi

    2015-03-01

    Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.

  6. Electron beams scanning: A novel method

    NASA Astrophysics Data System (ADS)

    Askarbioki, M.; Zarandi, M. B.; Khakshournia, S.; Shirmardi, S. P.; Sharifian, M.

    2018-06-01

    In this research, a spatial electron beam scanning is reported. There are various methods for ion and electron beam scanning. The best known of these methods is the wire scanning wherein the parameters of beam are measured by one or more conductive wires. This article suggests a novel method for e-beam scanning without the previous errors of old wire scanning. In this method, the techniques of atomic physics are applied so that a knife edge has a scanner role and the wires have detector roles. It will determine the 2D e-beam profile readily when the positions of the scanner and detectors are specified.

  7. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves.

    PubMed

    Schweizer, Rena M; Robinson, Jacqueline; Harrigan, Ryan; Silva, Pedro; Galverni, Marco; Musiani, Marco; Green, Richard E; Novembre, John; Wayne, Robert K

    2016-01-01

    In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations. © 2015 John Wiley & Sons Ltd.

  8. Genome-wide scans for microalbuminuria in Mexican Americans: the San Antonio Family Heart Study.

    PubMed

    Arar, Nedal; Nath, Subrata; Thameem, Farook; Bauer, Richard; Voruganti, Saroja; Comuzzie, Anthony; Cole, Shelley; Blangero, John; MacCluer, Jean; Abboud, Hanna

    2007-02-01

    Microalbuminuria, defined as urine albumin-to-creatinine ratio of 0.03 to 0.299 mg/mg, is a major risk factor for cardiovascular disease. Several genetic epidemiological studies have established that microalbuminuria clusters in families, suggesting a genetic predisposition. We estimated heritability of microalbuminuria and performed a genome-wide linkage analysis to identify chromosomal regions influencing urine albumin-to-creatinine ratio in 486 Mexican Americans from 26 multiplex families. Significant heritability was demonstrated for urine albumin-to-creatinine ratio (h = 24%, P < 0.003) after accounting for age, sex, body mass index, triglycerides, and hypertension. Genome scan revealed significant evidence of linkage of urine albumin-to-creatinine ratio to a region on chromosome 20q12 (LOD score of 3.5, P < 0.001) near marker D20S481. This region also exhibited a LOD score of 2.8 with diabetes status as a covariate and 3.0 with hypertension status as a covariate suggesting that the effect of this locus on urine albumin-to-creatinine ratio is largely independent of diabetes and hypertension. Findings indicate that there is a gene or genes located on human chromosome 20q12 that may have functional relevance to albumin excretion in Mexican Americans. Identifying and understanding the role of the genes that determine albumin excretion would lead to the development of novel therapeutic strategies targeted at high-risk individuals in whom intensive preventive measures may be most beneficial.

  9. Genome-wide survey of single-nucleotide polymorphisms reveals fine-scale population structure and signs of selection in the threatened Caribbean elkhorn coral, Acropora palmata

    PubMed Central

    2017-01-01

    The advent of next-generation sequencing tools has made it possible to conduct fine-scale surveys of population differentiation and genome-wide scans for signatures of selection in non-model organisms. Such surveys are of particular importance in sharply declining coral species, since knowledge of population boundaries and signs of local adaptation can inform restoration and conservation efforts. Here, we use genome-wide surveys of single-nucleotide polymorphisms in the threatened Caribbean elkhorn coral, Acropora palmata, to reveal fine-scale population structure and infer the major barrier to gene flow that separates the eastern and western Caribbean populations between the Bahamas and Puerto Rico. The exact location of this break had been subject to discussion because two previous studies based on microsatellite data had come to differing conclusions. We investigate this contradiction by analyzing an extended set of 11 microsatellite markers including the five previously employed and discovered that one of the original microsatellite loci is apparently under selection. Exclusion of this locus reconciles the results from the SNP and the microsatellite datasets. Scans for outlier loci in the SNP data detected 13 candidate loci under positive selection, however there was no correlation between available environmental parameters and genetic distance. Together, these results suggest that reef restoration efforts should use local sources and utilize existing functional variation among geographic regions in ex situ crossing experiments to improve stress resistance of this species. PMID:29181279

  10. An automated image-collection system for crystallization experiments using SBS standard microplates.

    PubMed

    Brostromer, Erik; Nan, Jie; Su, Xiao Dong

    2007-02-01

    As part of a structural genomics platform in a university laboratory, a low-cost in-house-developed automated imaging system for SBS microplate experiments has been designed and constructed. The imaging system can scan a microplate in 2-6 min for a 96-well plate depending on the plate layout and scanning options. A web-based crystallization database system has been developed, enabling users to follow their crystallization experiments from a web browser. As the system has been designed and built by students and crystallographers using commercially available parts, this report is aimed to serve as a do-it-yourself example for laboratory robotics.

  11. Genomics meets ethology: a new route to understanding domestication, behavior, and sustainability in animal breeding.

    PubMed

    Jensen, Per; Andersson, Leif

    2005-06-01

    Animal behavior is a central part of animal welfare, a keystone in sustainable animal breeding. During domestication, animals have adapted with respect to behavior and an array of other traits. We compared the behavior of junglefowl and White Leghorn layers, selected for egg production (and indirectly for growth). Jungle-fowl had a more active behavior in social, exploratory, anti-predatory, and feeding tests. A genome scan for Quantitative Trait Loci (QTLs) in a junglefowl x White Leghorn intercross revealed several significant or suggestive QTLs for different traits. Some production QTLs coincided with QTLs for behavior, suggesting that pleiotropic effects may be important for the development of domestication phenotypes. One gene has been located, which has a strong effect on the risk of being a victim of feather pecking, a detrimental behavior disorder. Modern genomics paired with analysis of behavior may help in designing more sustainable and robust breeding in the future.

  12. Genome-wide patterns of selection in 230 ancient Eurasians

    PubMed Central

    Mathieson, Iain; Lazaridis, Iosif; Rohland, Nadin; Mallick, Swapan; Patterson, Nick; Roodenberg, Songül Alpaslan; Harney, Eadaoin; Stewardson, Kristin; Fernandes, Daniel; Novak, Mario; Sirak, Kendra; Gamba, Cristina; Jones, Eppie R.; Llamas, Bastien; Dryomov, Stanislav; Pickrel, Joseph; Arsuaga, Juan Luís; de Castro, José María Bermúdez; Carbonell, Eudald; Gerritsen, Fokke; Khokhlov, Aleksandr; Kuznetsov, Pavel; Lozano, Marina; Meller, Harald; Mochalov, Oleg; Moiseyev, Vayacheslav; Rojo Guerra, Manuel A.; Roodenberg, Jacob; Vergès, Josep Maria; Krause, Johannes; Cooper, Alan; Alt, Kurt W.; Brown, Dorcas; Anthony, David; Lalueza-Fox, Carles; Haak, Wolfgang; Pinhasi, Ron; Reich, David

    2016-01-01

    Ancient DNA makes it possible to directly witness natural selection by analyzing samples from populations before, during and after adaptation events. Here we report the first scan for selection using ancient DNA, capitalizing on the largest genome-wide dataset yet assembled: 230 West Eurasians dating to between 6500 and 1000 BCE, including 163 with newly reported data. The new samples include the first genome-wide data from the Anatolian Neolithic culture whose genetic material we extracted from the DNA-rich petrous bone and who we show were members of the population that was the source of Europe’s first farmers. We also report a complete transect of the steppe region in Samara between 5500 and 1200 BCE that allows us to recognize admixture from at least two external sources into steppe populations during this period. We detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height. PMID:26595274

  13. Heritability and molecular genetic basis of acoustic startle eye blink and affectively modulated startle response: A genome-wide association study

    PubMed Central

    VAIDYANATHAN, UMA; MALONE, STEPHEN M.; MILLER, MICHAEL B.; McGUE, MATT; IACONO, WILLIAM G.

    2014-01-01

    Acoustic startle responses have been studied extensively in relation to individual differences and psychopathology. We examined three indices of the blink response in a picture-viewing paradigm—overall startle magnitude across all picture types, and aversive and pleasant modulation scores—in 3,323 twins and parents. Biometric models and molecular genetic analyses showed that half the variance in overall startle was due to additive genetic effects. No single nucleotide polymorphism was genome-wide significant, but GRIK3 did produce a significant effect when examined as part of a candidate gene set. In contrast, emotion modulation scores showed little evidence of heritability in either biometric or molecular genetic analyses. However, in a genome-wide scan, PARP14 did produce a significant effect for aversive modulation. We conclude that, although overall startle retains potential as an endophenotype, emotion-modulated startle does not. PMID:25387708

  14. Genetics of climate change adaptation.

    PubMed

    Franks, Steven J; Hoffmann, Ary A

    2012-01-01

    The rapid rate of current global climate change is having strong effects on many species and, at least in some cases, is driving evolution, particularly when changes in conditions alter patterns of selection. Climate change thus provides an opportunity for the study of the genetic basis of adaptation. Such studies include a variety of observational and experimental approaches, such as sampling across clines, artificial evolution experiments, and resurrection studies. These approaches can be combined with a number of techniques in genetics and genomics, including association and mapping analyses, genome scans, and transcription profiling. Recent research has revealed a number of candidate genes potentially involved in climate change adaptation and has also illustrated that genetic regulatory networks and epigenetic effects may be particularly relevant for evolution driven by climate change. Although genetic and genomic data are rapidly accumulating, we still have much to learn about the genetic architecture of climate change adaptation.

  15. Fish genome manipulation and directional breeding.

    PubMed

    Ye, Ding; Zhu, ZuoYan; Sun, YongHua

    2015-02-01

    Aquaculture is one of the fastest developing agricultural industries worldwide. One of the most important factors for sustainable aquaculture is the development of high performing culture strains. Genome manipulation offers a powerful method to achieve rapid and directional breeding in fish. We review the history of fish breeding methods based on classical genome manipulation, including polyploidy breeding and nuclear transfer. Then, we discuss the advances and applications of fish directional breeding based on transgenic technology and recently developed genome editing technologies. These methods offer increased efficiency, precision and predictability in genetic improvement over traditional methods.

  16. Common variants of FUT2 are associated with plasma vitamin B12 levels

    USDA-ARS?s Scientific Manuscript database

    A genome-wide scan is a way to distinguish small differences in the genetic makeup of individuals. It is also a way which distinguishes if a mutation in any particular gene is widespread or it is "polymorphic." The value of these analyses lies in the identification of genes that could influence a th...

  17. Huffman scanning: using language models within fixed-grid keyboard emulation☆

    PubMed Central

    Roark, Brian; Beckley, Russell; Gibbons, Chris; Fried-Oken, Melanie

    2012-01-01

    Individuals with severe motor impairments commonly enter text using a single binary switch and symbol scanning methods. We present a new scanning method –Huffman scanning – which uses Huffman coding to select the symbols to highlight during scanning, thus minimizing the expected bits per symbol. With our method, the user can select the intended symbol even after switch activation errors. We describe two varieties of Huffman scanning – synchronous and asynchronous –and present experimental results, demonstrating speedups over row/column and linear scanning. PMID:24244070

  18. A new strategy for genome assembly using short sequence reads and reduced representation libraries.

    PubMed

    Young, Andrew L; Abaan, Hatice Ozel; Zerbino, Daniel; Mullikin, James C; Birney, Ewan; Margulies, Elliott H

    2010-02-01

    We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.

  19. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

    PubMed Central

    Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.; Hugenholtz, Philip; Tyson, Gene W.

    2015-01-01

    Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. PMID:25977477

  20. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

    PubMed

    Parks, Donovan H; Imelfort, Michael; Skennerton, Connor T; Hugenholtz, Philip; Tyson, Gene W

    2015-07-01

    Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of "marker" genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. © 2015 Parks et al.; Published by Cold Spring Harbor Laboratory Press.

  1. Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints.

    PubMed

    Glusman, Gustavo; Mauldin, Denise E; Hood, Leroy E; Robinson, Max

    2017-01-01

    We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into "genome fingerprints" via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.

  2. Sizing up arthropod genomes: an evaluation of the impact of environmental variation on genome size estimates by flow cytometry and the use of qPCR as a method of estimation.

    PubMed

    Gregory, T Ryan; Nathwani, Paula; Bonnett, Tiffany R; Huber, Dezene P W

    2013-09-01

    A study was undertaken to evaluate both a pre-existing method and a newly proposed approach for the estimation of nuclear genome sizes in arthropods. First, concerns regarding the reliability of the well-established method of flow cytometry relating to impacts of rearing conditions on genome size estimates were examined. Contrary to previous reports, a more carefully controlled test found negligible environmental effects on genome size estimates in the fly Drosophila melanogaster. Second, a more recently touted method based on quantitative real-time PCR (qPCR) was examined in terms of ease of use, efficiency, and (most importantly) accuracy using four test species: the flies Drosophila melanogaster and Musca domestica and the beetles Tribolium castaneum and Dendroctonus ponderosa. The results of this analysis demonstrated that qPCR has the tendency to produce substantially different genome size estimates from other established techniques while also being far less efficient than existing methods.

  3. Searching for disease-susceptibility loci by testing for Hardy-Weinberg disequilibrium in a gene bank of affected individuals.

    PubMed

    Lee, Wen-Chung

    2003-09-01

    The future of genetic studies of complex human diseases will rely more and more on the epidemiologic association paradigm. The author proposes to scan the genome for disease-susceptibility gene(s) by testing for deviation from Hardy-Weinberg equilibrium in a gene bank of affected individuals. A power formula is presented, which is very accurate as revealed by Monte Carlo simulations. If the disease-susceptibility gene is recessive with an allele frequency of < or = 0.5 or dominant with an allele frequency of > or = 0.5, the number of subjects needed by the present method is smaller than that needed by using a case-parents design (using either the transmission/disequilibrium test or the 2-df likelihood ratio test). However, the method cannot detect genes with a multiplicative mode of inheritance, and the validity of the method relies on the assumption that the source population from which the cases arise is in Hardy-Weinberg equilibrium. Thus, it is prone to produce false positive and false negative results. Nevertheless, the method enables rapid gene hunting in an existing gene bank of affected individuals with no extra effort beyond simple calculations.

  4. [Genome-wide screening of predicted sugar transporters in Neurospora crassa and the application in hexose fermentation by Saccharomyces cerevisiae].

    PubMed

    Gao, Jingfang; Wang, Bang; Han, Xiaoyun; Tian, Chaoguang

    2017-01-25

    The lignocellulolytic filamentous fungus Neurospora crassa is able to assimilate various mono- and oligo-saccharides. However, more than half of predicted sugar transporters in the genome are still waiting for functional elucidation. In this study, system analysis of substrate spectra of predicted sugar transporters in N. crassa was performed at genome-wide level. NCU01868 and NCU08152 have the capability of uptaking various hexose, which are named as NcHXT-1 and NcHXT-2 respectively. Their transport activities for glucose were further confirmed by fluorescence resonance energy transfer analysis. Over-expression of either NcHXT-1 or NcHXT-2 in the null-hexose-transporter yeast EBY.VW4000 restored the growth and ethanol fermentation under submerged fermentation with glucose, galactose, or mannose as the sole carbon source. NcHXT-1/-2 homologues were found in a variety of cellulolytic fungi. Functional identification of two filamentous fungal-conserved hexose transporters NcHXT-1/-2 via genome scanning would represent novel targets for ongoing efforts in engineering cellulolytic fungi and hexose fermentation in yeast.

  5. snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome.

    PubMed

    Yang, Jian-Hua; Zhang, Xiao-Chen; Huang, Zhan-Peng; Zhou, Hui; Huang, Mian-Bo; Zhang, Shu; Chen, Yue-Qin; Qu, Liang-Hu

    2006-01-01

    Small nucleolar RNAs (snoRNAs) represent an abundant group of non-coding RNAs in eukaryotes. They can be divided into guide and orphan snoRNAs according to the presence or absence of antisense sequence to rRNAs or snRNAs. Current snoRNA-searching programs, which are essentially based on sequence complementarity to rRNAs or snRNAs, exist only for the screening of guide snoRNAs. In this study, we have developed an advanced computational package, snoSeeker, which includes CDseeker and ACAseeker programs, for the highly efficient and specific screening of both guide and orphan snoRNA genes in mammalian genomes. By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern. The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.

  6. Genome sequence, population history, and pelage genetics of the endangered African wild dog (Lycaon pictus).

    PubMed

    Campana, Michael G; Parker, Lillian D; Hawkins, Melissa T R; Young, Hillary S; Helgen, Kristofer M; Szykman Gunther, Micaela; Woodroffe, Rosie; Maldonado, Jesús E; Fleischer, Robert C

    2016-12-09

    The African wild dog (Lycaon pictus) is an endangered African canid threatened by severe habitat fragmentation, human-wildlife conflict, and infectious disease. A highly specialized carnivore, it is distinguished by its social structure, dental morphology, absence of dewclaws, and colorful pelage. We sequenced the genomes of two individuals from populations representing two distinct ecological histories (Laikipia County, Kenya and KwaZulu-Natal Province, South Africa). We reconstructed population demographic histories for the two individuals and scanned the genomes for evidence of selection. We show that the African wild dog has undergone at least two effective population size reductions in the last 1,000,000 years. We found evidence of Lycaon individual-specific regions of low diversity, suggestive of inbreeding or population-specific selection. Further research is needed to clarify whether these population reductions and low diversity regions are characteristic of the species as a whole. We documented positive selection on the Lycaon mitochondrial genome. Finally, we identified several candidate genes (ASIP, MITF, MLPH, PMEL) that may play a role in the characteristic Lycaon pelage.

  7. DMINDA: an integrated web server for DNA motif identification and analyses

    PubMed Central

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-01-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419

  8. A Two-stage Improvement Method for Robot Based 3D Surface Scanning

    NASA Astrophysics Data System (ADS)

    He, F. B.; Liang, Y. D.; Wang, R. F.; Lin, Y. S.

    2018-03-01

    As known that the surface of unknown object was difficult to measure or recognize precisely, hence the 3D laser scanning technology was introduced and used properly in surface reconstruction. Usually, the surface scanning speed was slower and the scanning quality would be better, while the speed was faster and the quality would be worse. In this case, the paper presented a new two-stage scanning method in order to pursuit the quality of surface scanning in a faster speed. The first stage was rough scanning to get general point cloud data of object’s surface, and then the second stage was specific scanning to repair missing regions which were determined by chord length discrete method. Meanwhile, a system containing a robotic manipulator and a handy scanner was also developed to implement the two-stage scanning method, and relevant paths were planned according to minimum enclosing ball and regional coverage theories.

  9. BAC sequencing using pooled methods.

    PubMed

    Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

    2015-01-01

    Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.

  10. De novo assembly of a haplotype-resolved human genome.

    PubMed

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  11. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.

  12. A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

    PubMed Central

    Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

    2006-01-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031

  13. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.

    PubMed

    Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel

    2018-06-01

    The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  14. PSP: rapid identification of orthologous coding genes under positive selection across multiple closely related prokaryotic genomes.

    PubMed

    Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping

    2013-12-27

    With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.

  15. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less

  16. Plasmodium copy number variation scan: gene copy numbers evaluation in haploid genomes.

    PubMed

    Beghain, Johann; Langlois, Anne-Claire; Legrand, Eric; Grange, Laura; Khim, Nimol; Witkowski, Benoit; Duru, Valentine; Ma, Laurence; Bouchier, Christiane; Ménard, Didier; Paul, Richard E; Ariey, Frédéric

    2016-04-12

    In eukaryotic genomes, deletion or amplification rates have been estimated to be a thousand more frequent than single nucleotide variation. In Plasmodium falciparum, relatively few transcription factors have been identified, and the regulation of transcription is seemingly largely influenced by gene amplification events. Thus copy number variation (CNV) is a major mechanism enabling parasite genomes to adapt to new environmental changes. Currently, the detection of CNVs is based on quantitative PCR (qPCR), which is significantly limited by the relatively small number of genes that can be analysed at any one time. Technological advances that facilitate whole-genome sequencing, such as next generation sequencing (NGS) enable deeper analyses of the genomic variation to be performed. Because the characteristics of Plasmodium CNVs need special consideration in algorithms and strategies for which classical CNV detection programs are not suited a dedicated algorithm to detect CNVs across the entire exome of P. falciparum was developed. This algorithm is based on a custom read depth strategy through NGS data and called PlasmoCNVScan. The analysis of CNV identification on three genes known to have different levels of amplification and which are located either in the nuclear, apicoplast or mitochondrial genomes is presented. The results are correlated with the qPCR experiments, usually used for identification of locus specific amplification/deletion. This tool will facilitate the study of P. falciparum genomic adaptation in response to ecological changes: drug pressure, decreased transmission, reduction of the parasite population size (transition to pre-elimination endemic area).

  17. Personal genomics services: whose genomes?

    PubMed

    Gurwitz, David; Bregman-Eschet, Yael

    2009-07-01

    New companies offering personal whole-genome information services over the internet are dynamic and highly visible players in the personal genomics field. For fees currently ranging from US$399 to US$2500 and a vial of saliva, individuals can now purchase online access to their individual genetic information regarding susceptibility to a range of chronic diseases and phenotypic traits based on a genome-wide SNP scan. Most of the companies offering such services are based in the United States, but their clients may come from nearly anywhere in the world. Although the scientific validity, clinical utility and potential future implications of such services are being hotly debated, several ethical and regulatory questions related to direct-to-consumer (DTC) marketing strategies of genetic tests have not yet received sufficient attention. For example, how can we minimize the risk of unauthorized third parties from submitting other people's DNA for testing? Another pressing question concerns the ownership of (genotypic and phenotypic) information, as well as the unclear legal status of customers regarding their own personal information. Current legislation in the US and Europe falls short of providing clear answers to these questions. Until the regulation of personal genomics services catches up with the technology, we call upon commercial providers to self-regulate and coordinate their activities to minimize potential risks to individual privacy. We also point out some specific steps, along the trustee model, that providers of DTC personal genomics services as well as regulators and policy makers could consider for addressing some of the concerns raised below.

  18. A genome-wide search for genes affecting circulating fibrinogen levels in the Framingham Heart Study.

    PubMed

    Yang, Qiong; Tofler, Geoffrey H; Cupples, L Adrienne; Larson, Martin G; Feng, DaLi; Lindpaintner, Klaus; Levy, Daniel; D'Agostino, Ralph B; O'Donnell, Christopher J

    2003-04-15

    Circulating levels of fibrinogen are associated with atherosclerosis and predict future coronary heart disease and stroke. Levels of fibrinogen are correlated among family members, suggesting a heritable component. Variants of the beta-fibrinogen gene subunit on 4q28 are associated with fibrinogen levels but explain only a small proportion of the total genetic variability. It remains unknown what role, if any, is played by other genetic variants in the inter-individual variability in levels of fibrinogen in the general population. We conducted a 10-cM spaced genome-wide scan using 402 original cohort subjects and 1193 offspring subjects from 330 extended families of the Framingham Heart Study. Heritability and linkage analyses were carried out using variance component methods. Regression analyses were performed to adjust for traditional risk factors and HindIII beta-148 genotypes. The total heritability was estimated as 0.24. The highest and second highest LOD scores of linkage were found on chromosomes 2 (LOD=1.5 at 243 cM) and 10 (LOD=2.4 at 87 cM) using only offspring subjects in the analysis, and on chromosomes 2 (LOD=2.1 at 242 cM) and 10(LOD=1.4 at 86 cM), 17 (LOD=1.4 at 96 cM) and 20 (LOD=1.4 at 80 cM) using both original cohort and offspring. These results suggest that there may be influential genetic regions on these chromosomes. While no linkage with genome-wide significance was detected, further research to confirm our findings is warranted.

  19. Signatures of co-evolutionary host-pathogen interactions in the genome of the entomopathogenic nematode Steinernema carpocapsae.

    PubMed

    Flores-Ponce, Mitzi; Vallebueno-Estrada, Miguel; González-Orozco, Eduardo; Ramos-Aboites, Hilda E; García-Chávez, J Noé; Simões, Nelson; Montiel, Rafael

    2017-04-26

    The entomopathogenic nematode Steinernema carpocapsae has been used worldwide as a biocontrol agent for insect pests, making it an interesting model for understanding parasite-host interactions. Two models propose that these interactions are co-evolutionary processes in such a way that equilibrium is never reached. In one model, known as "arms race", new alleles in relevant genes are fixed in both host and pathogens by directional positive selection, producing recurrent and alternating selective sweeps. In the other model, known as"trench warfare", persistent dynamic fluctuations in allele frequencies are sustained by balancing selection. There are some examples of genes evolving according to both models, however, it is not clear to what extent these interactions might alter genome-level evolutionary patterns and intraspecific diversity. Here we investigate some of these aspects by studying genomic variation in S. carpocapsae and other pathogenic and free-living nematodes from phylogenetic clades IV and V. To look for signatures of an arms-race dynamic, we conducted massive scans to detect directional positive selection in interspecific data. In free-living nematodes, we detected a significantly higher proportion of genes with sites under positive selection than in parasitic nematodes. However, in these genes, we found more enriched Gene Ontology terms in parasites. To detect possible effects of dynamic polymorphisms interactions we looked for signatures of balancing selection in intraspecific genomic data. The observed distribution of Tajima's D values in S. carpocapsae was more skewed to positive values and significantly different from the observed distribution in the free-living Caenorhabditis briggsae. Also, the proportion of significant positive values of Tajima's D was elevated in genes that were differentially expressed after induction with insect tissues as compared to both non-differentially expressed genes and the global scan. Our study provides a first portrait of the effects that lifestyle might have in shaping the patterns of selection at the genomic level. An arms-race between hosts and pathogens seems to be affecting specific genetic functions but not necessarily increasing the number of positively selected genes. Trench warfare dynamics seem to be acting more generally in the genome, likely focusing on genes responding to the interaction, rather than targeting specific genetic functions.

  20. Micro/nano moire methods

    NASA Astrophysics Data System (ADS)

    Asundi, Anand K.; Shang, Haixia; Xie, Huimin; Li, Biao

    2003-10-01

    Two novel micro/nano moire method, SEM scanning moiré and AFM scanning moire techniques are discussed in this paper. The principle and applications of two scanning moire methods are described in detail. The residual deformation in a polysilicon MEMS cantilever structure with a 5000 lines/mm grating after removing the SiO2 sacrificial layer is accurately measured by SEM scanning moire method. While AFM scanning moire method is used to detect thermal deformation of electronic package components, and formation of nano-moire on a freshly cleaved mica crystal. Experimental results demonstrate the feasibility of these two moire methods, and also show they are effective methods to measure the deformation from micron to nano-scales.

  1. Genomic Confirmation of Hybridisation and Recent Inbreeding in a Vector-Isolated Leishmania Population

    PubMed Central

    Smith, Barbara A.; Imamura, Hideo; Sanders, Mandy; Svobodova, Milena; Volf, Petr; Berriman, Matthew; Cotton, James A.; Smith, Deborah F.

    2014-01-01

    Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle. PMID:24453988

  2. A BAC clone fingerprinting approach to the detection of human genome rearrangements

    PubMed Central

    Krzywinski, Martin; Bosdet, Ian; Mathewson, Carrie; Wye, Natasja; Brebner, Jay; Chiu, Readman; Corbett, Richard; Field, Matthew; Lee, Darlene; Pugh, Trevor; Volik, Stas; Siddiqui, Asim; Jones, Steven; Schein, Jacquie; Collins, Collin; Marra, Marco

    2007-01-01

    We present a method, called fingerprint profiling (FPP), that uses restriction digest fingerprints of bacterial artificial chromosome clones to detect and classify rearrangements in the human genome. The approach uses alignment of experimental fingerprint patterns to in silico digests of the sequence assembly and is capable of detecting micro-deletions (1-5 kb) and balanced rearrangements. Our method has compelling potential for use as a whole-genome method for the identification and characterization of human genome rearrangements. PMID:17953769

  3. Invited review: Inbreeding in the genomics era: Inbreeding, inbreeding depression, and management of genomic variability.

    PubMed

    Howard, Jeremy T; Pryce, Jennie E; Baes, Christine; Maltecca, Christian

    2017-08-01

    Traditionally, pedigree-based relationship coefficients have been used to manage the inbreeding and degree of inbreeding depression that exists within a population. The widespread incorporation of genomic information in dairy cattle genetic evaluations allows for the opportunity to develop and implement methods to manage populations at the genomic level. As a result, the realized proportion of the genome that 2 individuals share can be more accurately estimated instead of using pedigree information to estimate the expected proportion of shared alleles. Furthermore, genomic information allows genome-wide relationship or inbreeding estimates to be augmented to characterize relationships for specific regions of the genome. Region-specific stretches can be used to more effectively manage areas of low genetic diversity or areas that, when homozygous, result in reduced performance across economically important traits. The use of region-specific metrics should allow breeders to more precisely manage the trade-off between the genetic value of the progeny and undesirable side effects associated with inbreeding. Methods tailored toward more effectively identifying regions affected by inbreeding and their associated use to manage the genome at the herd level, however, still need to be developed. We have reviewed topics related to inbreeding, measures of relatedness, genetic diversity and methods to manage populations at the genomic level, and we discuss future challenges related to managing populations through implementing genomic methods at the herd and population levels. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  4. TU-F-18A-06: Dual Energy CT Using One Full Scan and a Second Scan with Very Few Projections

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, T; Zhu, L

    Purpose: The conventional dual energy CT (DECT) requires two full CT scans at different energy levels, resulting in dose increase as well as imaging errors from patient motion between the two scans. To shorten the scan time of DECT and thus overcome these drawbacks, we propose a new DECT algorithm using one full scan and a second scan with very few projections by preserving structural information. Methods: We first reconstruct a CT image on the full scan using a standard filtered-backprojection (FBP) algorithm. We then use a compressed sensing (CS) based iterative algorithm on the second scan for reconstruction frommore » very few projections. The edges extracted from the first scan are used as weights in the Objectives: function of the CS-based reconstruction to substantially improve the image quality of CT reconstruction. The basis material images are then obtained by an iterative image-domain decomposition method and an electron density map is finally calculated. The proposed method is evaluated on phantoms. Results: On the Catphan 600 phantom, the CT reconstruction mean error using the proposed method on 20 and 5 projections are 4.76% and 5.02%, respectively. Compared with conventional iterative reconstruction, the proposed edge weighting preserves object structures and achieves a better spatial resolution. With basis materials of Iodine and Teflon, our method on 20 projections obtains similar quality of decomposed material images compared with FBP on a full scan and the mean error of electron density in the selected regions of interest is 0.29%. Conclusion: We propose an effective method for reducing projections and therefore scan time in DECT. We show that a full scan plus a 20-projection scan are sufficient to provide DECT images and electron density with similar quality compared with two full scans. Our future work includes more phantom studies to validate the performance of our method.« less

  5. Pdsg1 and Pdsg2, Novel Proteins Involved in Developmental Genome Remodelling in Paramecium

    PubMed Central

    Hoehener, Cristina; Singh, Aditi; Swart, Estienne C.; Nowacki, Mariusz

    2014-01-01

    The epigenetic influence of maternal cells on the development of their progeny has long been studied in various eukaryotes. Multicellular organisms usually provide their zygotes not only with nutrients but also with functional elements required for proper development, such as coding and non-coding RNAs. These maternally deposited RNAs exhibit a variety of functions, from regulating gene expression to assuring genome integrity. In ciliates, such as Paramecium these RNAs participate in the programming of large-scale genome reorganization during development, distinguishing germline-limited DNA, which is excised, from somatic-destined DNA. Only a handful of proteins playing roles in this process have been identified so far, including typical RNAi-derived factors such as Dicer-like and Piwi proteins. Here we report and characterize two novel proteins, Pdsg1 and Pdsg2 (Paramecium protein involved in Development of the Somatic Genome 1 and 2), involved in Paramecium genome reorganization. We show that these proteins are necessary for the excision of germline-limited DNA during development and the survival of sexual progeny. Knockdown of PDSG1 and PDSG2 genes affects the populations of small RNAs known to be involved in the programming of DNA elimination (scanRNAs and iesRNAs) and chromatin modification patterns during development. Our results suggest an association between RNA-mediated trans-generational epigenetic signal and chromatin modifications in the process of Paramecium genome reorganization. PMID:25397898

  6. Pdsg1 and Pdsg2, novel proteins involved in developmental genome remodelling in Paramecium.

    PubMed

    Arambasic, Miroslav; Sandoval, Pamela Y; Hoehener, Cristina; Singh, Aditi; Swart, Estienne C; Nowacki, Mariusz

    2014-01-01

    The epigenetic influence of maternal cells on the development of their progeny has long been studied in various eukaryotes. Multicellular organisms usually provide their zygotes not only with nutrients but also with functional elements required for proper development, such as coding and non-coding RNAs. These maternally deposited RNAs exhibit a variety of functions, from regulating gene expression to assuring genome integrity. In ciliates, such as Paramecium these RNAs participate in the programming of large-scale genome reorganization during development, distinguishing germline-limited DNA, which is excised, from somatic-destined DNA. Only a handful of proteins playing roles in this process have been identified so far, including typical RNAi-derived factors such as Dicer-like and Piwi proteins. Here we report and characterize two novel proteins, Pdsg1 and Pdsg2 (Paramecium protein involved in Development of the Somatic Genome 1 and 2), involved in Paramecium genome reorganization. We show that these proteins are necessary for the excision of germline-limited DNA during development and the survival of sexual progeny. Knockdown of PDSG1 and PDSG2 genes affects the populations of small RNAs known to be involved in the programming of DNA elimination (scanRNAs and iesRNAs) and chromatin modification patterns during development. Our results suggest an association between RNA-mediated trans-generational epigenetic signal and chromatin modifications in the process of Paramecium genome reorganization.

  7. New method for scanning spacecraft and balloon-borne/space-based experiments

    NASA Technical Reports Server (NTRS)

    Polites, Michael E.

    1991-01-01

    A new method is presented for scanning balloon-borne experiments, free-flying spacecraft, and gimballed experiments mounted to the space shuttle or the space station. It uses rotating-unbalanced-mass (RUM) devices for generating circular, line, or raster scan patterns and an auxiliary control system for target acquisition, keeping the scan centered on the target, and producing complementary motion for raster scanning. It is ideal for applications where the only possible way to accomplish the required scan is to physically scan the entire experiment or spacecraft as in X-ray and gamma ray experiments. In such cases, this new method should have advantages over prior methods in terms of either power, weight, cost, performance, stability, or a combination of these.

  8. A general method to correct PET data for tissue metabolites using a dual-scan approach.

    PubMed

    Gunn, R N; Yap, J T; Wells, P; Osman, S; Price, P; Jones, T; Cunningham, V J

    2000-04-01

    This article presents and analyses a general method of correcting for the presence of radiolabeled metabolites from a parent radiotracer in tissue during PET scanning. The method is based on a dual-scan approach, i.e., parent scan together with an independent supplementary scan in which the radiolabeled metabolite of interest itself is administered. The method corrects for the presence of systemically derived radiolabeled metabolite delivered to the tissues of interest through the blood. Data from the supplementary scan are analyzed to obtain the tissue impulse response function for the metabolite. The time course of the radiolabeled metabolite in plasma in the parent scan is convolved with its tissue impulse response function to derive a correction term. This is not a simple subtraction technique but 1 that takes account of the different time-activity curves of the radiolabeled metabolite in the 2 scans. The method, its implications, and its limitations are discussed with respect to [11C]thymidine and its principal metabolite 11CO2. The general method, based on a dual-scan approach, can be used to correct for radiolabeled metabolites in tissues of interest during PET scanning. The correction accounts for radiolabeled metabolites that are derived systemically and delivered to the tissues of interest through the blood.

  9. Construction of a genomic DNA library with a TA vector and its application in cloning of the phytoene synthase gene from the cyanobacterium Spirulina platensis M-135

    NASA Astrophysics Data System (ADS)

    Yoshikazu, Kawata; Shin-Ichi, Yano; Hiroyuki, Kojima

    1998-03-01

    An efficient and simple method for constructing a genomic DNA library using a TA cloning vector is presented. It is based on the sonicative cleavage of genomic DNA and modification of fragment ends with Taq DNA polymerase, followed by ligation using a TA vector. This method was applied for cloning of the phytoene synthase gene crt B from Spirulina platensis. This method is useful when genomic DNA cannot be efficiently digested with restriction enzymes, a problem often encountered during the construction of a genomic DNA library of cyanobacteria.

  10. An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.

    PubMed

    Shao, Mingfu; Lin, Yu; Moret, Bernard M E

    2015-05-01

    Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.

  11. Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.

    PubMed

    Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C

    2009-11-24

    Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.

  12. Damaging Effect of Low Energy N+ Implantation on Aspergillus niger Spores

    NASA Astrophysics Data System (ADS)

    Wang, Lisheng; Cai, Kezhou; Cheng, Maoji; Chen, Lijuan; Liu, Xuelan; Zhang, Shuqing; Yu, Zengliang

    2007-06-01

    The mutant effects of a keV range nitrogen ion (N+) beam on enzyme-producing probiotics were studied, particularly with regard to the induction in the genome. The electron spin resonance (ESR) results showed that the signal of ESR spectrum existed in both implanted and non-implanted spores, and the yields of free radicals increased in a dose-dependent manner. The ionic etching and dilapidation of cell wall could be observed distinctly through the scanning electron microscope (SEM). The mutagenic effect on genome indicated that N+ implantation could make base mutation. This study provided an insight into the roles low-energy ions might play in inducing mutagenesis of micro-organisms.

  13. Multi-InDel Analysis for Ancestry Inference of Sub-Populations in China

    PubMed Central

    Sun, Kuan; Ye, Yi; Luo, Tao; Hou, Yiping

    2016-01-01

    Ancestry inference is of great interest in diverse areas of scientific researches, including the forensic biology, medical genetics and anthropology. Various methods have been published for distinguishing populations. However, few reports refer to sub-populations (like ethnic groups) within Asian populations for the limitation of markers. Several InDel loci located very tightly in physical positions were treated as one marker by us, which is multi-InDel. The multi-InDel shows potential as Ancestry Inference Marker (AIM). In this study, we performed a genome-wide scan for multi-InDels as AIM. After examining the FST distributions in the 1000 Genomes Database, 12 candidates were selected and validated for eastern Asian populations. A multiplexed assay was developed as a panel to genotype 12 multi-InDel markers simultaneously. Ancestry component analysis with STRUCTURE and principal component analysis (PCA) were employed to estimate its capability for ancestry inference. Furthermore, ancestry assignments of trial individuals were conducted. It proved to be very effective when 210 samples from Han and Tibetan individuals in China were tested. The panel consisting of multi-InDel markers exhibited considerable potency in ancestry inference, and was suggested to be applied in forensic practices and genetic population studies. PMID:28004788

  14. Molecular cytogenetics: an indispensable tool for cancer diagnosis.

    PubMed

    Wan, Thomas Sk; Ma, Edmond Sk

    2012-01-01

    Cytogenetic aberrations may escape detection or recognition in traditional karyotyping. The past decade has seen an explosion of methodological advances in molecular cytogenetics technology. These cytogenetics techniques add color to the black and white world of conventional banding. Fluorescence in-situ hybridization (FISH) study has emerged as an indispensable tool for both basic and clinical research, as well as diagnostics, in leukemia and cancers. FISH can be used to identify chromosomal abnormalities through fluorescent labeled DNA probes that target specific DNA sequences. Subsequently, FISH-based tests such as multicolor karyotyping, comparative genomic hybridization (CGH) and array CGH have been used in emerging clinical applications as they enable resolution of complex karyotypic aberrations and whole global scanning of genomic imbalances. More recently, crossspecies array CGH analysis has also been employed in cancer gene identification. The clinical impact of FISH is pivotal, especially in the diagnosis, prognosis and treatment decisions for hematological diseases, all of which facilitate the practice of personalized medicine. This review summarizes the methodology and current utilization of these FISH techniques in unraveling chromosomal changes and highlights how the field is moving away from conventional methods towards molecular cytogenetics approaches. In addition, the potential of the more recently developed FISH tests in contributing information to genetic abnormalities is illustrated.

  15. A genome scan revealed significant associations of growth traits with a major QTL and GHR2 in tilapia

    PubMed Central

    Liu, Feng; Sun, Fei; Xia, Jun Hong; Li, Jian; Fu, Gui Hong; Lin, Grace; Tu, Rong Jian; Wan, Zi Yi; Quek, Delia; Yue, Gen Hua

    2014-01-01

    Growth is an important trait in animal breeding. However, the genetic effects underpinning fish growth variability are still poorly understood. QTL mapping and analysis of candidate genes are effective methods to address this issue. We conducted a genome-wide QTL analysis for growth in tilapia. A total of 10, 7 and 8 significant QTLs were identified for body weight, total length and standard length at 140 dph, respectively. The majority of these QTLs were sex-specific. One major QTL for growth traits was identified in the sex-determining locus in LG1, explaining 71.7%, 67.2% and 64.9% of the phenotypic variation (PV) of body weight, total length and standard length, respectively. In addition, a candidate gene GHR2 in a QTL was significantly associated with body weight, explaining 13.1% of PV. Real-time qPCR revealed that different genotypes at the GHR2 locus influenced the IGF-1 expression level. The markers located in the major QTL for growth traits could be used in marker-assisted selection of tilapia. The associations between GHR2 variants and growth traits suggest that the GHR2 gene should be an important gene that explains the difference in growth among tilapia species. PMID:25435025

  16. A new concept for risk assessment of the hazards of non-genotoxic chemicals--electronmicroscopic studies of the cell surface. Evidence for the action of lipophilic chemicals on the Ca2+ signaling system.

    PubMed

    Gartzke, J; Lange, K; Brandt, U; Bergmann, J

    1997-06-20

    Recently, we presented evidence for the localization of components of the cellular Ca2+ signaling pathway in microvilli. On stimulation of this pathway, microvilli undergo characteristic morphological changes which can be detected by scanning electron microscopy (SEM) of the cell surface. Here we show that both receptor-mediated (vasopressin) and unspecific stimulation of the Ca2+ signaling system by the lipophilic tumor promoters thapsigargin (TG) and phorbolmyristateacetate (PMA) are accompanied by the same type of morphological changes of the cell surface. Since stimulated cell proliferation accelerates tumor development and sustained elevation of the intracellular Ca2+ concentrations is a precondition for stimulated cell proliferation, activated Ca2+ signaling is one possible mechanism of non-genomic tumor promotion. Using isolated rat hepatocytes we show that all tested lipophilic chemicals with known tumor promoter action, caused characteristic microvillar shape changes. On the other hand, lipophilic solvents that were used as differentiating agents in cell cultures such as dimethylsulfoxide (DMSO) and dimethylformamide also, failed to change the microvillar shapes. Instead DMSO stabilized the original appearance of microvilli. The used technique provides a convenient method for the evaluation of non-genomic carcinogenicity of chemicals prior to their industrial application.

  17. A Distance Measure for Genome Phylogenetic Analysis

    NASA Astrophysics Data System (ADS)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  18. Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.

    PubMed

    Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V

    2018-02-01

    Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.

  19. Multi-level scanning method for defect inspection

    DOEpatents

    Bokor, Jeffrey; Jeong, Seongtae

    2002-01-01

    A method for performing scanned defect inspection of a collection of contiguous areas using a specified false-alarm-rate and capture-rate within an inspection system that has characteristic seek times between inspection locations. The multi-stage method involves setting an increased false-alarm-rate for a first stage of scanning, wherein subsequent stages of scanning inspect only the detected areas of probable defects at lowered values for the false-alarm-rate. For scanning inspection operations wherein the seek time and area uncertainty is favorable, the method can substantially increase inspection throughput.

  20. [Detection of the introgression of genome elements of Aegilops cylindrica Host. into Triticum aestivum L. genome with ISSR-analysis].

    PubMed

    Galaev, A V; Babaiants, L T; Sivolap, Iu M

    2003-01-01

    Comparative analysis of introgressive and parental forms of wheat was carried out to reveal the sites of donor genome with new loci of resistance to fungal diseases. By ISSR-method 124 ISSR-loci were detected in the genomes of 18 individual plants of introgressive line 5/20-91; 17 of them have been related to introgressive fragments of Ae. cylindrica genome in T. aestivum. It was shown that ISSR-method is effective for detection of the variability caused by introgression of alien genetic material to T. aestivum genome.

  1. Optimization of an In Situ Cellular ELISA Performed on Influenza A Virus-Infected Monolayers for Screening of Antiviral Agents

    DTIC Science & Technology

    1999-02-04

    level absorbance (data not shown). Based on these During this phase the virus genome is exposed to data. we chose 24 h as the optimal, post-infection...Scanning Watanabe, W., Sudo, K.. Asawa. S.. Konno, K.. Yokota, T., Electron Microscopy, AMF Ohare, IL, pp. 595-600. Shigeta, S., 1995. Use of lactate

  2. A Limousin specific myostatin allele affects longissimus muscle area and fatty acid profiles in a Wagyu-Limousin F*2* population

    USDA-ARS?s Scientific Manuscript database

    A microsatellite-based genome scan of a Wagyu x Limousin F2 cross population previously demonstrated QTL affecting longissimus muscle area (LMA) and fatty acid composition were present in regions near the centromere of BTA 2. In this study we used 70 SNP markers to examine the centromeric 20 megabas...

  3. Identification of quantitative trait loci affecting resistance to gastro-intestinal parasites in a double backcross population of Red Maasai and Dorper sheep

    USDA-ARS?s Scientific Manuscript database

    A genome-wide scan for quantitative trait loci (QTL) affecting gastrointestinal (GI) nematode resistance was completed using a double backcross sheep population derived from Red Maasai and Dorper ewes bred to F1 rams. These breeds were chosen, because Red Maasai sheep are known to be more tolerant ...

  4. Investigation of a functional role for Titin in the bovine ovary based on the results of an initial whole genome scan for antral follicle count

    USDA-ARS?s Scientific Manuscript database

    A world-wide food shortage is predicted by the year 2050, and biotechnologies are needed to improve production efficiency in agriculture. Biotechnologies that improve reproductive efficiency in domestic farm species will improve the availability and price of food for the growing world population. ...

  5. Golden angle based scanning for robust corneal topography with OCT

    PubMed Central

    Wagner, Joerg; Goldblum, David; Cattin, Philippe C.

    2017-01-01

    Corneal topography allows the assessment of the cornea’s refractive power which is crucial for diagnostics and surgical planning. The use of optical coherence tomography (OCT) for corneal topography is still limited. One limitation is the susceptibility to disturbances like blinking of the eye. This can result in partially corrupted scans that cannot be evaluated using common methods. We present a new scanning method for reliable corneal topography from partial scans. Based on the golden angle, the method features a balanced scan point distribution which refines over measurement time and remains balanced when part of the scan is removed. The performance of the method is assessed numerically and by measurements of test surfaces. The results confirm that the method enables numerically well-conditioned and reliable corneal topography from partially corrupted scans and reduces the need for repeated measurements in case of abrupt disturbances. PMID:28270961

  6. Current and future developments in patents for quantitative trait loci in dairy cattle.

    PubMed

    Weller, Joel I

    2007-01-01

    Many studies have proposed that rates of genetic gain in dairy cattle can be increased by direct selection on the individual quantitative loci responsible for the genetic variation in these traits, or selection on linked genetic markers. The development of DNA-level genetic markers has made detection of QTL nearly routine in all major livestock species. The studies that attempted to detect genes affecting quantitative traits can be divided into two categories: analysis of candidate genes, and genome scans based on within-family genetic linkage. To date, 12 patent cooperative treaty (PCT) and US patents have been registered for DNA sequences claimed to be associated with effects on economic traits in dairy cattle. All claim effects on milk production, but other traits are also included in some of the claims. Most of the sequences found by the candidate gene approach are of dubious validity, and have been repeated in only very few independent studies. The two missense mutations on chromosomes 6 and 14 affecting milk concentration derived from genome scans are more solidly based, but the claims are also disputed. A few PCT in dairy cattle are commercialized as genetic tests where commercial dairy farmers are the target market.

  7. Genome-wide scan of IQ finds significant linkage to a quantitative trait locus on 2q.

    PubMed

    Luciano, M; Wright, M J; Duffy, D L; Wainwright, M A; Zhu, G; Evans, D M; Geffen, G M; Montgomery, G W; Martin, N G

    2006-01-01

    A genome-wide linkage scan of 795 microsatellite markers (761 autosomal, 34 X chromosome) was performed on Multidimensional Aptitude Battery subtests and verbal, performance and full scale scores, the WAIS-R Digit Symbol subtest, and two word-recognition tests (Schonell Graded Word Reading Test, Cambridge Contextual Reading Test) highly predictive of IQ. The sample included 361 families comprising 2-5 siblings who ranged in age from 15.7 to 22.2 years; genotype, but not phenotype, data were available for 81% of parents. A variance components analysis which controlled for age and sex effects showed significant linkage for the Cambridge reading test and performance IQ to the same region on chromosome 2, with respective LOD scores of 4.15 and 3.68. Suggestive linkage (LOD score>2.2) for various measures was further supported on chromosomes 6, 7, 11, 14, 21 and 22. Where location of linkage peaks converged for IQ subtests within the same scale, the overall scale score provided increased evidence for linkage to that region over any individual subtest. Association studies of candidate genes, particularly those involved in neural transmission and development, will be directed to genes located under the linkage peaks identified in this study.

  8. Convergent linkage evidence from two Latin-American population isolates supports the presence of a susceptibility locus for bipolar disorder in 5q31-34.

    PubMed

    Herzberg, Ibi; Jasinska, Anna; García, Jenny; Jawaheer, Damini; Service, Susan; Kremeyer, Barbara; Duque, Constanza; Parra, María V; Vega, Jorge; Ortiz, Daniel; Carvajal, Luis; Polanco, Guadalupe; Restrepo, Gabriel J; López, Carlos; Palacio, Carlos; Levinson, Matthew; Aldana, Ileana; Mathews, Carol; Davanzo, Pablo; Molina, Julio; Fournier, Eduardo; Bejarano, Julio; Ramírez, Magui; Ortiz, Carmen Araya; Araya, Xinia; Sabatti, Chiara; Reus, Victor; Macaya, Gabriel; Bedoya, Gabriel; Ospina, Jorge; Freimer, Nelson; Ruiz-Linares, Andrés

    2006-11-01

    We performed a whole genome microsatellite marker scan in six multiplex families with bipolar (BP) mood disorder ascertained in Antioquia, a historically isolated population from North West Colombia. These families were characterized clinically using the approach employed in independent ongoing studies of BP in the closely related population of the Central Valley of Costa Rica. The most consistent linkage results from parametric and non-parametric analyses of the Colombian scan involved markers on 5q31-33, a region implicated by the previous studies of BP in Costa Rica. Because of these concordant results, a follow-up study with additional markers was undertaken in an expanded set of Colombian and Costa Rican families; this provided a genome-wide significant evidence of linkage of BPI to a candidate region of approximately 10 cM in 5q31-33 (maximum non-parametric linkage score=4.395, P<0.00004). Interestingly, this region has been implicated in several previous genetic studies of schizophrenia and psychosis, including disease association with variants of the enthoprotin and gamma-aminobutyric acid receptor genes.

  9. Genomewide Scan Reveals Amplification of mdr1 as a Common Denominator of Resistance to Mefloquine, Lumefantrine, and Artemisinin in Plasmodium chabaudi Malaria Parasites▿†‡

    PubMed Central

    Borges, Sofia; Cravo, Pedro; Creasey, Alison; Fawcett, Richard; Modrzynska, Katarzyna; Rodrigues, Louise; Martinelli, Axel; Hunt, Paul

    2011-01-01

    Multidrug-resistant Plasmodium falciparum malaria parasites pose a threat to effective drug control, even to artemisinin-based combination therapies (ACTs). Here we used linkage group selection and Solexa whole-genome resequencing to investigate the genetic basis of resistance to component drugs of ACTs. Using the rodent malaria parasite P. chabaudi, we analyzed the uncloned progeny of a genetic backcross between the mefloquine-, lumefantrine-, and artemisinin-resistant mutant AS-15MF and a genetically distinct sensitive clone, AJ, following drug treatment. Genomewide scans of selection showed that parasites surviving each drug treatment bore a duplication of a segment of chromosome 12 (translocated to chromosome 04) present in AS-15MF. Whole-genome resequencing identified the size of the duplicated segment and its position on chromosome 4. The duplicated fragment extends for ∼393 kbp and contains over 100 genes, including mdr1, encoding the multidrug resistance P-glycoprotein homologue 1. We therefore show that resistance to chemically distinct components of ACTs is mediated by the same genetic mutation, highlighting a possible limitation of these therapies. PMID:21709099

  10. Fast ancestral gene order reconstruction of genomes with unequal gene content.

    PubMed

    Feijão, Pedro; Araujo, Eloi

    2016-11-11

    During evolution, genomes are modified by large scale structural events, such as rearrangements, deletions or insertions of large blocks of DNA. Of particular interest, in order to better understand how this type of genomic evolution happens, is the reconstruction of ancestral genomes, given a phylogenetic tree with extant genomes at its leaves. One way of solving this problem is to assume a rearrangement model, such as Double Cut and Join (DCJ), and find a set of ancestral genomes that minimizes the number of events on the input tree. Since this problem is NP-hard for most rearrangement models, exact solutions are practical only for small instances, and heuristics have to be used for larger datasets. This type of approach can be called event-based. Another common approach is based on finding conserved structures between the input genomes, such as adjacencies between genes, possibly also assigning weights that indicate a measure of confidence or probability that this particular structure is present on each ancestral genome, and then finding a set of non conflicting adjacencies that optimize some given function, usually trying to maximize total weight and minimizing character changes in the tree. We call this type of methods homology-based. In previous work, we proposed an ancestral reconstruction method that combines homology- and event-based ideas, using the concept of intermediate genomes, that arise in DCJ rearrangement scenarios. This method showed better rate of correctly reconstructed adjacencies than other methods, while also being faster, since the use of intermediate genomes greatly reduces the search space. Here, we generalize the intermediate genome concept to genomes with unequal gene content, extending our method to account for gene insertions and deletions of any length. In many of the simulated datasets, our proposed method had better results than MLGO and MGRA, two state-of-the-art algorithms for ancestral reconstruction with unequal gene content, while running much faster, making it more scalable to larger datasets. Studing ancestral reconstruction problems under a new light, using the concept of intermediate genomes, allows the design of very fast algorithms by greatly reducing the solution search space, while also giving very good results. The algorithms introduced in this paper were implemented in an open-source software called RINGO (ancestral Reconstruction with INtermediate GenOmes), available at https://github.com/pedrofeijao/RINGO .

  11. Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints

    PubMed Central

    Glusman, Gustavo; Mauldin, Denise E.; Hood, Leroy E.; Robinson, Max

    2017-01-01

    We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into “genome fingerprints” via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics. PMID:29018478

  12. Anonymizing patient genomic data for public sharing association studies.

    PubMed

    Fernandez-Lozano, Carlos; Lopez-Campos, Guillermo; Seoane, Jose A; Lopez-Alonso, Victoria; Dorado, Julian; Martín-Sanchez, Fernando; Pazos, Alejandro

    2013-01-01

    The development of personalized medicine is tightly linked with the correct exploitation of molecular data, especially those associated with the genome sequence along with these use of genomic data there is an increasing demand to share these data for research purposes. Transition of clinical data to research is based in the anonymization of these data so the patient cannot be identified, the use of genomic data poses a great challenge because its nature of identifying data. In this work we have analyzed current methods for genome anonymization and propose a one way encryption method that may enable the process of genomic data sharing accessing only to certain regions of genomes for research purposes.

  13. A Common Ancestral Mutation in CRYBB3 Identified in Multiple Consanguineous Families with Congenital Cataracts

    PubMed Central

    Irum, Bushra; Khan, Arif O.; Wang, Qiwei; Li, David; Khan, Asma A.; Husnain, Tayyab; Akram, Javed; Riazuddin, Sheikh

    2016-01-01

    Purpose This study was performed to investigate the genetic determinants of autosomal recessive congenital cataracts in large consanguineous families. Methods Affected individuals underwent a detailed ophthalmological examination and slit-lamp photographs of the cataractous lenses were obtained. An aliquot of blood was collected from all participating family members and genomic DNA was extracted from white blood cells. Initially, a genome-wide scan was performed with genomic DNAs of family PKCC025 followed by exclusion analysis of our familial cohort of congenital cataracts. Protein-coding exons of CRYBB1, CRYBB2, CRYBB3, and CRYBA4 were sequenced bidirectionally. A haplotype was constructed with SNPs flanking the causal mutation for affected individuals in all four families, while the probability that the four familial cases have a common founder was estimated using EM and CHM-based algorithms. The expression of Crybb3 in the developing murine lens was investigated using TaqMan assays. Results The clinical and ophthalmological examinations suggested that all affected individuals had nuclear cataracts. Genome-wide linkage analysis localized the causal phenotype in family PKCC025 to chromosome 22q with statistically significant two-point logarithm of odds (LOD) scores. Subsequently, we localized three additional families, PKCC063, PKCC131, and PKCC168 to chromosome 22q. Bidirectional Sanger sequencing identified a missense variation: c.493G>C (p.Gly165Arg) in CRYBB3 that segregated with the disease phenotype in all four familial cases. This variation was not found in ethnically matched control chromosomes, the NHLBI exome variant server, or the 1000 Genomes or dbSNP databases. Interestingly, all four families harbor a unique disease haplotype that strongly suggests a common founder of the causal mutation (p<1.64E-10). We observed expression of Crybb3 in the mouse lens as early as embryonic day 15 (E15), and expression remained relatively steady throughout development. Conclusion Here, we report a common ancestral mutation in CRYBB3 associated with autosomal recessive congenital cataracts identified in four familial cases of Pakistani origin. PMID:27326458

  14. Use of a wiki as an interactive teaching tool in pathology residency education: Experience with a genomics, research, and informatics in pathology course

    PubMed Central

    Park, Seung; Parwani, Anil; MacPherson, Trevor; Pantanowitz, Liron

    2012-01-01

    Background: The need for informatics and genomics training in pathology is critical, yet limited resources for such training are available. In this study we sought to critically test the hypothesis that the incorporation of a wiki (a collaborative writing and publication tool with roots in “Web 2.0”) in a combined informatics and genomics course could both (1) serve as an interactive, collaborative educational resource and reference and (2) actively engage trainees by requiring the creation and sharing of educational materials. Materials and Methods: A 2-week full-time course at our institution covering genomics, research, and pathology informatics (GRIP) was taught by 36 faculty to 18 second- and third-year pathology residents. The course content included didactic lectures and hands-on demonstrations of technology (e.g., whole-slide scanning, telepathology, and statistics software). Attendees were given pre- and posttests. Residents were trained to use wiki technology (MediaWiki) and requested to construct a wiki about the GRIP course by writing comprehensive online review articles on assigned lectures. To gauge effectiveness, pretest and posttest scores for our course were compared with scores from the previous 7 years from the predecessor course (limited to informatics) given at our institution that did not utilize wikis. Results: Residents constructed 59 peer-reviewed collaborative wiki articles. This group showed a 25% improvement (standard deviation 12%) in test scores, which was greater than the 16% delta recorded in the prior 7 years of our predecessor course (P = 0.006). Conclusions: Our use of wiki technology provided a wiki containing high-quality content that will form the basis of future pathology informatics and genomics courses and proved to be an effective teaching tool, as evidenced by the significant rise in our resident posttest scores. Data from this project provide support for the notion that active participation in content creation is an effective mechanism for mastery of content. Future residents taking this course will continue to build on this wiki, keeping content current, and thereby benefit from this collaborative teaching tool. PMID:23024891

  15. Methods of Genomic Competency Integration in Practice

    PubMed Central

    Jenkins, Jean; Calzone, Kathleen A.; Caskey, Sarah; Culp, Stacey; Weiner, Marsha; Badzek, Laurie

    2015-01-01

    Purpose Genomics is increasingly relevant to health care, necessitating support for nurses to incorporate genomic competencies into practice. The primary aim of this project was to develop, implement, and evaluate a year-long genomic education intervention that trained, supported, and supervised institutional administrator and educator champion dyads to increase nursing capacity to integrate genomics through assessments of program satisfaction and institutional achieved outcomes. Design Longitudinal study of 23 Magnet Recognition Program® Hospitals (21 intervention, 2 controls) participating in a 1-year new competency integration effort aimed at increasing genomic nursing competency and overcoming barriers to genomics integration in practice. Methods Champion dyads underwent genomic training consisting of one in-person kick-off training meeting followed by monthly education webinars. Champion dyads designed institution-specific action plans detailing objectives, methods or strategies used to engage and educate nursing staff, timeline for implementation, and outcomes achieved. Action plans focused on a minimum of seven genomic priority areas: champion dyad personal development; practice assessment; policy content assessment; staff knowledge needs assessment; staff development; plans for integration; and anticipated obstacles and challenges. Action plans were updated quarterly, outlining progress made as well as inclusion of new methods or strategies. Progress was validated through virtual site visits with the champion dyads and chief nursing officers. Descriptive data were collected on all strategies or methods utilized, and timeline for achievement. Descriptive data were analyzed using content analysis. Findings The complexity of the competency content and the uniqueness of social systems and infrastructure resulted in a significant variation of champion dyad interventions. Conclusions Nursing champions can facilitate change in genomic nursing capacity through varied strategies but require substantial training in order to design and implement interventions. Clinical Relevance Genomics is critical to the practice of all nurses. There is a great opportunity and interest to address genomic knowledge deficits in the practicing nurse workforce as a strategy to improve patient outcomes. Exemplars of champion dyad interventions designed to increase nursing capacity focus on improving education, policy, and healthcare services. PMID:25808828

  16. Error analysis of motion correction method for laser scanning of moving objects

    NASA Astrophysics Data System (ADS)

    Goel, S.; Lohani, B.

    2014-05-01

    The limitation of conventional laser scanning methods is that the objects being scanned should be static. The need of scanning moving objects has resulted in the development of new methods capable of generating correct 3D geometry of moving objects. Limited literature is available showing development of very few methods capable of catering to the problem of object motion during scanning. All the existing methods utilize their own models or sensors. Any studies on error modelling or analysis of any of the motion correction methods are found to be lacking in literature. In this paper, we develop the error budget and present the analysis of one such `motion correction' method. This method assumes availability of position and orientation information of the moving object which in general can be obtained by installing a POS system on board or by use of some tracking devices. It then uses this information along with laser scanner data to apply correction to laser data, thus resulting in correct geometry despite the object being mobile during scanning. The major application of this method lie in the shipping industry to scan ships either moving or parked in the sea and to scan other objects like hot air balloons or aerostats. It is to be noted that the other methods of "motion correction" explained in literature can not be applied to scan the objects mentioned here making the chosen method quite unique. This paper presents some interesting insights in to the functioning of "motion correction" method as well as a detailed account of the behavior and variation of the error due to different sensor components alone and in combination with each other. The analysis can be used to obtain insights in to optimal utilization of available components for achieving the best results.

  17. A Genome-Wide Scan for Breast Cancer Risk Haplotypes among African American Women

    PubMed Central

    Song, Chi; Chen, Gary K.; Millikan, Robert C.; Ambrosone, Christine B.; John, Esther M.; Bernstein, Leslie; Zheng, Wei; Hu, Jennifer J.; Ziegler, Regina G.; Nyante, Sarah; Bandera, Elisa V.; Ingles, Sue A.; Press, Michael F.; Deming, Sandra L.; Rodriguez-Gil, Jorge L.; Chanock, Stephen J.; Wan, Peggy; Sheng, Xin; Pooler, Loreall C.; Van Den Berg, David J.; Le Marchand, Loic; Kolonel, Laurence N.; Henderson, Brian E.; Haiman, Chris A.; Stram, Daniel O.

    2013-01-01

    Genome-wide association studies (GWAS) simultaneously investigating hundreds of thousands of single nucleotide polymorphisms (SNP) have become a powerful tool in the investigation of new disease susceptibility loci. Haplotypes are sometimes thought to be superior to SNPs and are promising in genetic association analyses. The application of genome-wide haplotype analysis, however, is hindered by the complexity of haplotypes themselves and sophistication in computation. We systematically analyzed the haplotype effects for breast cancer risk among 5,761 African American women (3,016 cases and 2,745 controls) using a sliding window approach on the genome-wide scale. Three regions on chromosomes 1, 4 and 18 exhibited moderate haplotype effects. Furthermore, among 21 breast cancer susceptibility loci previously established in European populations, 10p15 and 14q24 are likely to harbor novel haplotype effects. We also proposed a heuristic of determining the significance level and the effective number of independent tests by the permutation analysis on chromosome 22 data. It suggests that the effective number was approximately half of the total (7,794 out of 15,645), thus the half number could serve as a quick reference to evaluating genome-wide significance if a similar sliding window approach of haplotype analysis is adopted in similar populations using similar genotype density. PMID:23468962

  18. A GENOME-WIDE LINKAGE AND ASSOCIATION SCAN REVEALS NOVEL LOCI FOR AUTISM

    PubMed Central

    Weiss, Lauren A.; Arking, Dan E.

    2009-01-01

    Summary Although autism is a highly heritable neurodevelopmental disorder, attempts to identify specific susceptibility genes have thus far met with limited success 1. Genome-wide association studies (GWAS) using half a million or more markers, particularly those with very large sample sizes achieved through meta-analysis, have shown great success in mapping genes for other complex genetic traits (http://www.genome.gov/26525384). Consequently, we initiated a linkage and association mapping study using half a million genome-wide SNPs in a common set of 1,031 multiplex autism families (1,553 affected offspring). We identified regions of suggestive and significant linkage on chromosomes 6q27 and 20p13, respectively. Initial analysis did not yield genome-wide significant associations; however, genotyping of top hits in additional families revealed a SNP on chromosome 5p15 (between SEMA5A and TAS2R1) that was significantly associated with autism (P = 2 × 10−7). We also demonstrated that expression of SEMA5A is reduced in brains from autistic patients, further implicating SEMA5A as an autism susceptibility gene. The linkage regions reported here provide targets for rare variation screening while the discovery of a single novel association demonstrates the action of common variants. PMID:19812673

  19. Signatures of Diversifying Selection in European Pig Breeds

    PubMed Central

    Wilkinson, Samantha; Lu, Zen H.; Megens, Hendrik-Jan; Archibald, Alan L.; Haley, Chris; Jackson, Ian J.; Groenen, Martien A. M.; Crooijmans, Richard P. M. A.; Ogden, Rob; Wiener, Pamela

    2013-01-01

    Following domestication, livestock breeds have experienced intense selection pressures for the development of desirable traits. This has resulted in a large diversity of breeds that display variation in many phenotypic traits, such as coat colour, muscle composition, early maturity, growth rate, body size, reproduction, and behaviour. To better understand the relationship between genomic composition and phenotypic diversity arising from breed development, the genomes of 13 traditional and commercial European pig breeds were scanned for signatures of diversifying selection using the Porcine60K SNP chip, applying a between-population (differentiation) approach. Signatures of diversifying selection between breeds were found in genomic regions associated with traits related to breed standard criteria, such as coat colour and ear morphology. Amino acid differences in the EDNRB gene appear to be associated with one of these signatures, and variation in the KITLG gene may be associated with another. Other selection signals were found in genomic regions including QTLs and genes associated with production traits such as reproduction, growth, and fat deposition. Some selection signatures were associated with regions showing evidence of introgression from Asian breeds. When the European breeds were compared with wild boar, genomic regions with high levels of differentiation harboured genes related to bone formation, growth, and fat deposition. PMID:23637623

  20. A tailing genome walking method suitable for genomes with high local GC content.

    PubMed

    Liu, Taian; Fang, Yongxiang; Yao, Wenjuan; Guan, Qisai; Bai, Gang; Jing, Zhizhong

    2013-10-15

    The tailing genome walking strategies are simple and efficient. However, they sometimes can be restricted due to the low stringency of homo-oligomeric primers. Here we modified their conventional tailing step by adding polythymidine and polyguanine to the target single-stranded DNA (ssDNA). The tailed ssDNA was then amplified exponentially with a specific primer in the known region and a primer comprising 5' polycytosine and 3' polyadenosine. The successful application of this novel method for identifying integration sites mediated by φC31 integrase in goat genome indicates that the method is more suitable for genomes with high complexity and local GC content. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Simultaneous gene finding in multiple genomes.

    PubMed

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions

    PubMed Central

    Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize

    2017-01-01

    Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399

  3. Genome-Wide Profiling of DNA Double-Strand Breaks by the BLESS and BLISS Methods.

    PubMed

    Mirzazadeh, Reza; Kallas, Tomasz; Bienko, Magda; Crosetto, Nicola

    2018-01-01

    DNA double-strand breaks (DSBs) are major DNA lesions that are constantly formed during physiological processes such as DNA replication, transcription, and recombination, or as a result of exogenous agents such as ionizing radiation, radiomimetic drugs, and genome editing nucleases. Unrepaired DSBs threaten genomic stability by leading to the formation of potentially oncogenic rearrangements such as translocations. In past few years, several methods based on next-generation sequencing (NGS) have been developed to study the genome-wide distribution of DSBs or their conversion to translocation events. We developed Breaks Labeling, Enrichment on Streptavidin, and Sequencing (BLESS), which was the first method for direct labeling of DSBs in situ followed by their genome-wide mapping at nucleotide resolution (Crosetto et al., Nat Methods 10:361-365, 2013). Recently, we have further expanded the quantitative nature, applicability, and scalability of BLESS by developing Breaks Labeling In Situ and Sequencing (BLISS) (Yan et al., Nat Commun 8:15058, 2017). Here, we first present an overview of existing methods for genome-wide localization of DSBs, and then focus on the BLESS and BLISS methods, discussing different assay design options depending on the sample type and application.

  4. HapMap scanning of novel human minor histocompatibility antigens.

    PubMed

    Kamei, Michi; Nannya, Yasuhito; Torikai, Hiroki; Kawase, Takakazu; Taura, Kenjiro; Inamoto, Yoshihiro; Takahashi, Taro; Yazaki, Makoto; Morishima, Satoko; Tsujimura, Kunio; Miyamura, Koichi; Ito, Tetsuya; Togari, Hajime; Riddell, Stanley R; Kodera, Yoshihisa; Morishima, Yasuo; Takahashi, Toshitada; Kuzushima, Kiyotaka; Ogawa, Seishi; Akatsuka, Yoshiki

    2009-05-21

    Minor histocompatibility antigens (mHags) are molecular targets of allo-immunity associated with hematopoietic stem cell transplantation (HSCT) and involved in graft-versus-host disease, but they also have beneficial antitumor activity. mHags are typically defined by host SNPs that are not shared by the donor and are immunologically recognized by cytotoxic T cells isolated from post-HSCT patients. However, the number of molecularly identified mHags is still too small to allow prospective studies of their clinical importance in transplantation medicine, mostly due to the lack of an efficient method for isolation. Here we show that when combined with conventional immunologic assays, the large data set from the International HapMap Project can be directly used for genetic mapping of novel mHags. Based on the immunologically determined mHag status in HapMap panels, a target mHag locus can be uniquely mapped through whole genome association scanning taking advantage of the unprecedented resolution and power obtained with more than 3 000 000 markers. The feasibility of our approach could be supported by extensive simulations and further confirmed by actually isolating 2 novel mHags as well as 1 previously identified example. The HapMap data set represents an invaluable resource for investigating human variation, with obvious applications in genetic mapping of clinically relevant human traits.

  5. Genomic analyses of Northern snakehead (Channa argus) populations in North America.

    PubMed

    Resh, Carlee A; Galaska, Matthew P; Mahon, Andrew R

    2018-01-01

    The introduction of northern snakehead ( Channa argus ; Anabantiformes: Channidae) and their subsequent expansion is one of many problematic biological invasions in the United States. This harmful aquatic invasive species has become established in various parts of the eastern United States, including the Potomac River basin, and has recently become established in the Mississippi River basin in Arkansas. Effective management of C. argus and prevention of its further spread depends upon knowledge of current population structure in the United States. Novel methods for invasive species using whole genomic scans provide unprecedented levels of data, which are able to investigate fine scale differences between and within populations of organisms. In this study, we utilize 2b-RAD genomic sequencing to recover 1,007 single-nucleotide polymorphism (SNP) loci from genomic DNA extracted from 165 C. argus individuals: 147 individuals sampled along the East Coast of the United States and 18 individuals sampled throughout Arkansas. Analysis of those SNP loci help to resolve existing population structure and recover five genetically distinct populations of C. argus in the United States. Additionally, information from the SNP loci enable us to begin to calculate the long-term effective population size ranges of this harmful aquatic invasive species. We estimate long-term N e to be 1,840,000-18,400,000 for the Upper Hudson River basin, 4,537,500-45,375,000 for the Lower Hudson River basin, 3,422,500-34,225,000 for the Potomac River basin, 2,715,000-7,150,000 for Philadelphia, and 2,580,000-25,800,000 for Arkansas populations. This work provides evidence for the presence of more genetic populations than previously estimated and estimates population size, showing the invasive potential of C. argus in the United States. The valuable information gained from this study will allow effective management of the existing populations to avoid expansion and possibly enable future eradication efforts.

  6. Genome-wide linkage scan for maximum and length-dependent knee muscle strength in young men: significant evidence for linkage at chromosome 14q24.3

    PubMed Central

    De Mars, G; Windelinckx, A; Huygens, W; Peeters, M W; Beunen, G P; Aerssens, J; Vlietinck, R; Thomis, M A I

    2008-01-01

    Background: Maintenance of high muscular fitness is positively related to bone health, functionality in daily life and increasing insulin sensitivity, and negatively related to falls and fractures, morbidity and mortality. Heritability of muscle strength phenotypes ranges between 31% and 95%, but little is known about the identity of the genes underlying this complex trait. As a first attempt, this genome-wide linkage study aimed to identify chromosomal regions linked to muscle and bone cross-sectional area, isometric knee flexion and extension torque, and torque–length relationship for knee flexors and extensors. Methods: In total, 283 informative male siblings (17–36 years old), belonging to 105 families, were used to conduct a genome-wide SNP-based multipoint linkage analysis. Results: The strongest evidence for linkage was found for the torque–length relationship of the knee flexors at 14q24.3 (LOD  = 4.09; p<10−5). Suggestive evidence for linkage was found at 14q32.2 (LOD  = 3.00; P = 0.005) for muscle and bone cross-sectional area, at 2p24.2 (LOD  = 2.57; p = 0.01) for isometric knee torque at 30° flexion, at 1q21.3, 2p23.3 and 18q11.2 (LOD  = 2.33, 2.69 and 2.21; p<10−4 for all) for the torque–length relationship of the knee extensors and at 18p11.31 (LOD  = 2.39; p = 0.0004) for muscle-mass adjusted isometric knee extension torque. Conclusions: We conclude that many small contributing genes rather than a few important genes are involved in causing variation in different underlying phenotypes of muscle strength. Furthermore, some overlap in promising genomic regions were identified among different strength phenotypes. PMID:18178634

  7. Genome-Wide Association Study of the Genetic Determinants of Emphysema Distribution

    PubMed Central

    Boueiz, Adel; Lutz, Sharon M.; Cho, Michael H.; Hersh, Craig P.; Bowler, Russell P.; Washko, George R.; Halper-Stromberg, Eitan; Bakke, Per; Gulsvik, Amund; Laird, Nan M.; Beaty, Terri H.; Coxson, Harvey O.; Crapo, James D.; Silverman, Edwin K.; Castaldi, Peter J.

    2017-01-01

    Rationale: Emphysema has considerable variability in the severity and distribution of parenchymal destruction throughout the lungs. Upper lobe–predominant emphysema has emerged as an important predictor of response to lung volume reduction surgery. Yet, aside from alpha-1 antitrypsin deficiency, the genetic determinants of emphysema distribution remain largely unknown. Objectives: To identify the genetic influences of emphysema distribution in non–alpha-1 antitrypsin–deficient smokers. Methods: A total of 11,532 subjects with complete genotype and computed tomography densitometry data in the COPDGene (Genetic Epidemiology of Chronic Obstructive Pulmonary Disease [COPD]; non-Hispanic white and African American), ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints), and GenKOLS (Genetics of Chronic Obstructive Lung Disease) studies were analyzed. Two computed tomography scan emphysema distribution measures (difference between upper-third and lower-third emphysema; ratio of upper-third to lower-third emphysema) were tested for genetic associations in all study subjects. Separate analyses in each study population were followed by a fixed effect metaanalysis. Single-nucleotide polymorphism–, gene-, and pathway-based approaches were used. In silico functional evaluation was also performed. Measurements and Main Results: We identified five loci associated with emphysema distribution at genome-wide significance. These loci included two previously reported associations with COPD susceptibility (4q31 near HHIP and 15q25 near CHRNA5) and three new associations near SOWAHB, TRAPPC9, and KIAA1462. Gene set analysis and in silico functional evaluation revealed pathways and cell types that may potentially contribute to the pathogenesis of emphysema distribution. Conclusions: This multicohort genome-wide association study identified new genomic loci associated with differential emphysematous destruction throughout the lungs. These findings may point to new biologic pathways on which to expand diagnostic and therapeutic approaches in chronic obstructive pulmonary disease. Clinical trial registered with www.clinicaltrials.gov (NCT 00608764). PMID:27669027

  8. Linkage disequilibrium compared between five populations of domestic sheep.

    PubMed

    Meadows, Jennifer R S; Chan, Eva K F; Kijas, James W

    2008-09-30

    The success of genome-wide scans depends on the strength and magnitude of linkage disequilibrium (LD) present within the populations under investigation. High density SNP arrays are currently in development for the sheep genome, however little is known about the behaviour of LD in this livestock species. This study examined the behaviour of LD within five sheep populations using two LD metrics, D' and x2'. Four economically important Australian sheep flocks, three pure breeds (White Faced Suffolk, Poll Dorset, Merino) and a crossbred population (Merino x Border Leicester), along with an inbred Australian Merino museum flock were analysed. Short range LD (0 - 5 cM) was observed in all five populations, however the persistence with increasing distance and magnitude of LD varied considerably between populations. Average LD (x2') for markers spaced up to 20 cM exceeded the non-syntenic average within the White Faced Suffolk, Poll Dorset and Macarthur Merino. LD decayed faster within the Merino and Merino x Border Leicester, with LD below or consistent with observed background levels. Using marker-marker LD as a guide to the behaviour of marker-QTL LD, estimates of minimum marker spacing were made. For a 95% probability of detecting QTL, a microsatellite marker would be required every 0.1 - 2.5 centimorgans, depending on the population used. Sheep populations were selected which were inbred (Macarthur Merino), highly heterogeneous (Merino) or intermediate between these two extremes. This facilitated analysis and comparison of LD (x2') between populations. The strength and magnitude of LD was found to differ markedly between breeds and aligned closely with both observed levels of genetic diversity and expectations based on breed history. This confirmed that breed specific information is likely to be important for genome wide selection and during the design of successful genome scans where tens of thousands of markers will be required.

  9. The current state of funded NIH grants in implementation science in genomic medicine: a portfolio analysis.

    PubMed

    Roberts, Megan C; Clyne, Mindy; Kennedy, Amy E; Chambers, David A; Khoury, Muin J

    2017-10-26

    PurposeImplementation science offers methods to evaluate the translation of genomic medicine research into practice. The extent to which the National Institutes of Health (NIH) human genomics grant portfolio includes implementation science is unknown. This brief report's objective is to describe recently funded implementation science studies in genomic medicine in the NIH grant portfolio, and identify remaining gaps.MethodsWe identified investigator-initiated NIH research grants on implementation science in genomic medicine (funding initiated 2012-2016). A codebook was adapted from the literature, three authors coded grants, and descriptive statistics were calculated for each code.ResultsForty-two grants fit the inclusion criteria (~1.75% of investigator-initiated genomics grants). The majority of included grants proposed qualitative and/or quantitative methods with cross-sectional study designs, and described clinical settings and primarily white, non-Hispanic study populations. Most grants were in oncology and examined genetic testing for risk assessment. Finally, grants lacked the use of implementation science frameworks, and most examined uptake of genomic medicine and/or assessed patient-centeredness.ConclusionWe identified large gaps in implementation science studies in genomic medicine in the funded NIH portfolio over the past 5 years. To move the genomics field forward, investigator-initiated research grants should employ rigorous implementation science methods within diverse settings and populations.Genetics in Medicine advance online publication, 26 October 2017; doi:10.1038/gim.2017.180.

  10. A genome-wide 3C-method for characterizing the three-dimensional architectures of genomes.

    PubMed

    Duan, Zhijun; Andronescu, Mirela; Schutz, Kevin; Lee, Choli; Shendure, Jay; Fields, Stanley; Noble, William S; Anthony Blau, C

    2012-11-01

    Accumulating evidence demonstrates that the three-dimensional (3D) organization of chromosomes within the eukaryotic nucleus reflects and influences genomic activities, including transcription, DNA replication, recombination and DNA repair. In order to uncover structure-function relationships, it is necessary first to understand the principles underlying the folding and the 3D arrangement of chromosomes. Chromosome conformation capture (3C) provides a powerful tool for detecting interactions within and between chromosomes. A high throughput derivative of 3C, chromosome conformation capture on chip (4C), executes a genome-wide interrogation of interaction partners for a given locus. We recently developed a new method, a derivative of 3C and 4C, which, similar to Hi-C, is capable of comprehensively identifying long-range chromosome interactions throughout a genome in an unbiased fashion. Hence, our method can be applied to decipher the 3D architectures of genomes. Here, we provide a detailed protocol for this method. Published by Elsevier Inc.

  11. Genetics of gout.

    PubMed

    Choi, Hyon K; Zhu, Yanyan; Mount, David B

    2010-03-01

    This review provides an update on recent findings with regards to the genetics of hyperuricemia and gout, including recent data from genome-wide association studies (GWAS). Five GWAS around the same time reported that genetic variants of SLC2A9/GLUT9 were associated with lower serum uric acid (SUA) levels and the effects were stronger among women (e.g. SUA level difference per copy of a minor allele, -0.46 mg/dl in women vs. -0.22 mg/dl in men). One study involving four cohorts and one meta-analysis of 14 genome-wide scans found that genetic variants of ABCG2 were associated with higher SUA concentrations and these effects were stronger among men (e.g. uric acid level difference per copy of the minor allele, 0.32 mg/dl in men vs. 0.18 mg/dl in women). Limited data indicate that these associations likely translate into those with the risk of gout. Functional determination that GLUT9 and ABCG2 can transport urate at the apical border of proximal tubules implicates them as substantial players in the renal excretion of urate. Furthermore, five novel genetic loci have been reported in the meta-analysis of 14 genome-wide scans. Combined with their activities as urate transporters and their strong associations with serum uric acid concentrations, GLUT9 and ABCG2 appeared to be important modulators of uric acid levels and likely of the risk of gout. Together with a growing list of environmental risk factors, these genetic data add considerably to our understanding of the pathogenesis of hyperuricemia and gout.

  12. Does parental expressed emotion moderate genetic effects in ADHD? An exploration using a genome wide association scan.

    PubMed

    Sonuga-Barke, Edmund J S; Lasky-Su, Jessica; Neale, Benjamin M; Oades, Robert; Chen, Wai; Franke, Barbara; Buitelaar, Jan; Banaschewski, Tobias; Ebstein, Richard; Gill, Michael; Anney, Richard; Miranda, Ana; Mulas, Fernando; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph; Steinhausen, Hans Christoph; Thompson, Margaret; Asherson, Philip; Faraone, Stephen V

    2008-12-05

    Studies of gene x environment (G x E) interaction in ADHD have previously focused on known risk genes for ADHD and environmentally mediated biological risk. Here we use G x E analysis in the context of a genome-wide association scan to identify novel genes whose effects on ADHD symptoms and comorbid conduct disorder are moderated by high maternal expressed emotion (EE). SNPs (600,000) were genotyped in 958 ADHD proband-parent trios. After applying data cleaning procedures we examined 429,981 autosomal SNPs in 909 family trios. ADHD symptom severity and comorbid conduct disorder was measured using the Parental Account of Childhood Symptoms interview. Maternal criticism and warmth (i.e., EE) were coded by independent observers on comments made during the interview. No G x E interactions reached genome-wide significance. Nominal effects were found both with and without genetic main effects. For those with genetic main effects 36 uncorrected interaction P-values were <10(-5) implicating both novel genes as well as some previously supported candidates. These were found equally often for all of the interactions being investigated. The observed interactions in SLC1A1 and NRG3 SNPs represent reasonable candidate genes for further investigation given their previous association with several psychiatric illnesses. We find evidence for the role of EE in moderating the effects of genes on ADHD severity and comorbid conduct disorder, implicating both novel and established candidates. These findings need replicating in larger independent samples. Copyright 2008 Wiley-Liss, Inc.

  13. A Genome-wide Admixture Scan for Ancestry-linked Genes Predisposing to Sarcoidosis in African Americans

    PubMed Central

    Rybicki, Benjamin A.; Levin, Albert M.; McKeigue, Paul; Datta, Indrani; Gray-McGuire, Courtney; Colombo, Marco; Reich, David; Burke, Robert R.; Iannuzzi, Michael C.

    2010-01-01

    Genome-wide linkage and association studies have uncovered variants associated with sarcoidosis, a multi-organ granulomatous inflammatory disease. African ancestry may influence disease pathogenesis since African Americans are more commonly affected by sarcoidosis. Therefore, we conducted the first sarcoidosis genome-wide ancestry scan using a map of 1,384 highly ancestry informative single nucleotide polymorphisms genotyped on 1,357 sarcoidosis cases and 703 unaffected controls self-identified as African American. The most significant ancestry association was at marker rs11966463 on chromosome 6p22.3 (ancestry association risk ratio (aRR)= 1.90; p=0.0002). When we restricted the analysis to biopsy-confirmed cases, the aRR for this marker increased to 2.01; p=0.00007. Among the eight other markers that demonstrated suggestive ancestry associations with sarcoidosis were rs1462906 on chromosome 8p12 which had the most significant association with European ancestry (aRR=0.65; p=0.002), and markers on chromosomes 5p13 (aRR=1.46; p=0.005) and 5q31 (aRR=0.67; p=0.005), which correspond to regions we previously identified through sib pair linkage analyses. Overall, the most significant ancestry association for Scadding stage IV cases was to marker rs7919137 on chromosome 10p11.22 (aRR=0.27; p=2×10−5), a region not associated with disease susceptibility. In summary, through admixture mapping of sarcoidosis we have confirmed previous genetic linkages and identified several novel putative candidate loci for sarcoidosis. PMID:21179114

  14. Differentially Methylated Region-Representational Difference Analysis (DMR-RDA): A Powerful Method to Identify DMRs in Uncharacterized Genomes.

    PubMed

    Sasheva, Pavlina; Grossniklaus, Ueli

    2017-01-01

    Over the last years, it has become increasingly clear that environmental influences can affect the epigenomic landscape and that some epigenetic variants can have heritable, phenotypic effects. While there are a variety of methods to perform genome-wide analyses of DNA methylation in model organisms, this is still a challenging task for non-model organisms without a reference genome. Differentially methylated region-representational difference analysis (DMR-RDA) is a sensitive and powerful PCR-based technique that isolates DNA fragments that are differentially methylated between two otherwise identical genomes. The technique does not require special equipment and is independent of prior knowledge about the genome. It is even applicable to genomes that have high complexity and a large size, being the method of choice for the analysis of plant non-model systems.

  15. Internal scanning method as unique imaging method of optical vortex scanning microscope

    NASA Astrophysics Data System (ADS)

    Popiołek-Masajada, Agnieszka; Masajada, Jan; Szatkowski, Mateusz

    2018-06-01

    The internal scanning method is specific for the optical vortex microscope. It allows to move the vortex point inside the focused vortex beam with nanometer resolution while the whole beam stays in place. Thus the sample illuminated by the focused vortex beam can be scanned just by the vortex point. We show that this method enables high resolution imaging. The paper presents the preliminary experimental results obtained with the first basic image recovery procedure. A prospect of developing more powerful tools for topography recovery with the optical vortex scanning microscope is discussed shortly.

  16. GStream: Improving SNP and CNV Coverage on Genome-Wide Association Studies

    PubMed Central

    Alonso, Arnald; Marsal, Sara; Tortosa, Raül; Canela-Xandri, Oriol; Julià, Antonio

    2013-01-01

    We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method. PMID:23844243

  17. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

    PubMed Central

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-01-01

    Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048

  18. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

    PubMed

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-03-01

    Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.

  19. Strategies and tools for whole genome alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With amore » view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.« less

  20. WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

    PubMed Central

    2010-01-01

    Background An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. Methods This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. Results A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. Conclusion WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. PMID:21210985

  1. Genome scan of hybridizing sunflowers from Texas (Helianthus annuus and H. debilis) reveals asymmetric patterns of introgression and small islands of genomic differentiation.

    PubMed

    Scascitelli, M; Whitney, K D; Randell, R A; King, Matthew; Buerkle, C A; Rieseberg, L H

    2010-02-01

    Although the sexual transfer of genetic material between species (i.e. introgression) has been documented in many groups of plants and animals, genome-wide patterns of introgression are poorly understood. Is most of the genome permeable to interspecific gene flow, or is introgression typically restricted to a handful of genomic regions? Here, we assess the genomic extent and direction of introgression between three sunflowers from the south-central USA: the common sunflower, Helianthus annuus ssp. annuus; a near-endemic to Texas, Helianthus debilis ssp. cucumerifolius; and their putative hybrid derivative, thought to have recently colonized Texas, H. annuus ssp. texanus. Analyses of variation at 88 genetically mapped microsatellite loci revealed that long-term migration rates were high, genome-wide and asymmetric, with higher migration rates from H. annuus texanus into the two parental taxa than vice versa. These results imply a longer history of intermittent contact between H. debilis and H. annuus than previously believed, and that H. annuus texanus may serve as a bridge for the transfer of alleles between its parental taxa. They also contradict recent theory suggesting that introgression should predominantly be in the direction of the colonizing species. As in previous studies of hybridizing sunflower species, regions of genetic differentiation appear small, whether estimated in terms of FST or unidirectional migration rates. Estimates of recent immigration and admixture were inconsistent, depending on the type of analysis. At the individual locus level, one marker showed striking asymmetry in migration rates, a pattern consistent with tight linkage to a Bateson-Dobzhansky-Muller incompatibility.

  2. Landscape genomics in Atlantic salmon (Salmo salar): searching for gene-environment interactions driving local adaptation.

    PubMed

    Vincent, Bourret; Dionne, Mélanie; Kent, Matthew P; Lien, Sigbjørn; Bernatchez, Louis

    2013-12-01

    A growing number of studies are examining the factors driving historical and contemporary evolution in wild populations. By combining surveys of genomic variation with a comprehensive assessment of environmental parameters, such studies can increase our understanding of the genomic and geographical extent of local adaptation in wild populations. We used a large-scale landscape genomics approach to examine adaptive and neutral differentiation across 54 North American populations of Atlantic salmon representing seven previously defined genetically distinct regional groups. Over 5500 genome-wide single nucleotide polymorphisms were genotyped in 641 individuals and 28 bulk assays of 25 pooled individuals each. Genome scans, linkage map, and 49 environmental variables were combined to conduct an innovative landscape genomic analysis. Our results provide valuable insight into the links between environmental variation and both neutral and potentially adaptive genetic divergence. In particular, we identified markers potentially under divergent selection, as well as associated selective environmental factors and biological functions with the observed adaptive divergence. Multivariate landscape genetic analysis revealed strong associations of both genetic and environmental structures. We found an enrichment of growth-related functions among outlier markers. Climate (temperature-precipitation) and geological characteristics were significantly associated with both potentially adaptive and neutral genetic divergence and should be considered as candidate loci involved in adaptation at the regional scale in Atlantic salmon. Hence, this study significantly contributes to the improvement of tools used in modern conservation and management schemes of Atlantic salmon wild populations. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.

  3. The physical size of transcription factors is key to transcriptional regulation in chromatin domains

    NASA Astrophysics Data System (ADS)

    Maeshima, Kazuhiro; Kaizu, Kazunari; Tamura, Sachiko; Nozaki, Tadasu; Kokubo, Tetsuro; Takahashi, Koichi

    2015-02-01

    Genetic information, which is stored in the long strand of genomic DNA as chromatin, must be scanned and read out by various transcription factors. First, gene-specific transcription factors, which are relatively small (˜50 kDa), scan the genome and bind regulatory elements. Such factors then recruit general transcription factors, Mediators, RNA polymerases, nucleosome remodellers, and histone modifiers, most of which are large protein complexes of 1-3 MDa in size. Here, we propose a new model for the functional significance of the size of transcription factors (or complexes) for gene regulation of chromatin domains. Recent findings suggest that chromatin consists of irregularly folded nucleosome fibres (10 nm fibres) and forms numerous condensed domains (e.g., topologically associating domains). Although the flexibility and dynamics of chromatin allow repositioning of genes within the condensed domains, the size exclusion effect of the domain may limit accessibility of DNA sequences by transcription factors. We used Monte Carlo computer simulations to determine the physical size limit of transcription factors that can enter condensed chromatin domains. Small gene-specific transcription factors can penetrate into the chromatin domains and search their target sequences, whereas large transcription complexes cannot enter the domain. Due to this property, once a large complex binds its target site via gene-specific factors it can act as a ‘buoy’ to keep the target region on the surface of the condensed domain and maintain transcriptional competency. This size-dependent specialization of target-scanning and surface-tethering functions could provide novel insight into the mechanisms of various DNA transactions, such as DNA replication and repair/recombination.

  4. Genetic subdivision and candidate genes under selection in North American grey wolves.

    PubMed

    Schweizer, Rena M; vonHoldt, Bridgett M; Harrigan, Ryan; Knowles, James C; Musiani, Marco; Coltman, David; Novembre, John; Wayne, Robert K

    2016-01-01

    Previous genetic studies of the highly mobile grey wolf (Canis lupus) found population structure that coincides with habitat and phenotype differences. We hypothesized that these ecologically distinct populations (ecotypes) should exhibit signatures of selection in genes related to morphology, coat colour and metabolism. To test these predictions, we quantified population structure related to habitat using a genotyping array to assess variation in 42 036 single-nucleotide polymorphisms (SNPs) in 111 North American grey wolves. Using these SNP data and individual-level measurements of 12 environmental variables, we identified six ecotypes: West Forest, Boreal Forest, Arctic, High Arctic, British Columbia and Atlantic Forest. Next, we explored signals of selection across these wolf ecotypes through the use of three complementary methods to detect selection: FST /haplotype homozygosity bivariate percentilae, bayescan, and environmentally correlated directional selection with bayenv. Across all methods, we found consistent signals of selection on genes related to morphology, coat coloration, metabolism, as predicted, as well as vision and hearing. In several high-ranking candidate genes, including LEPR, TYR and SLC14A2, we found variation in allele frequencies that follow environmental changes in temperature and precipitation, a result that is consistent with local adaptation rather than genetic drift. Our findings show that local adaptation can occur despite gene flow in a highly mobile species and can be detected through a moderately dense genomic scan. These patterns of local adaptation revealed by SNP genotyping likely reflect high fidelity to natal habitats of dispersing wolves, strong ecological divergence among habitats, and moderate levels of linkage in the wolf genome. © 2015 John Wiley & Sons Ltd.

  5. An efficient scan diagnosis methodology according to scan failure mode for yield enhancement

    NASA Astrophysics Data System (ADS)

    Kim, Jung-Tae; Seo, Nam-Sik; Oh, Ghil-Geun; Kim, Dae-Gue; Lee, Kyu-Taek; Choi, Chi-Young; Kim, InSoo; Min, Hyoung Bok

    2008-12-01

    Yield has always been a driving consideration during fabrication of modern semiconductor industry. Statistically, the largest portion of wafer yield loss is defective scan failure. This paper presents efficient failure analysis methods for initial yield ramp up and ongoing product with scan diagnosis. Result of our analysis shows that more than 60% of the scan failure dies fall into the category of shift mode in the very deep submicron (VDSM) devices. However, localization of scan shift mode failure is very difficult in comparison to capture mode failure because it is caused by the malfunction of scan chain. Addressing the biggest challenge, we propose the most suitable analysis method according to scan failure mode (capture / shift) for yield enhancement. In the event of capture failure mode, this paper describes the method that integrates scan diagnosis flow and backside probing technology to obtain more accurate candidates. We also describe several unique techniques, such as bulk back-grinding solution, efficient backside probing and signal analysis method. Lastly, we introduce blocked chain analysis algorithm for efficient analysis of shift failure mode. In this paper, we contribute to enhancement of the yield as a result of the combination of two methods. We confirm the failure candidates with physical failure analysis (PFA) method. The direct feedback of the defective visualization is useful to mass-produce devices in a shorter time. The experimental data on mass products show that our method produces average reduction by 13.7% in defective SCAN & SRAM-BIST failure rates and by 18.2% in wafer yield rates.

  6. Whole-genome alignment.

    PubMed

    Dewey, Colin N

    2012-01-01

    Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.

  7. Efficient engineering of a bacteriophage genome using the type I-E CRISPR-Cas system.

    PubMed

    Kiro, Ruth; Shitrit, Dror; Qimron, Udi

    2014-01-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) system has recently been used to engineer genomes of various organisms, but surprisingly, not those of bacteriophages (phages). Here we present a method to genetically engineer the Escherichia coli phage T7 using the type I-E CRISPR-Cas system. T7 phage genome is edited by homologous recombination with a DNA sequence flanked by sequences homologous to the desired location. Non-edited genomes are targeted by the CRISPR-Cas system, thus enabling isolation of the desired recombinant phages. This method broadens CRISPR Cas-based editing to phages and uses a CRISPR-Cas type other than type II. The method may be adjusted to genetically engineer any bacteriophage genome.

  8. A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation

    PubMed Central

    2012-01-01

    Background Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation. Methods An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis. Results Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored. Conclusions The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations. PMID:22462519

  9. Comparison of different methods for isolation of bacterial DNA from retail oyster tissues

    USDA-ARS?s Scientific Manuscript database

    Oysters are filter-feeders that bio-accumulate bacteria in water while feeding. To evaluate the bacterial genomic DNA extracted from retail oyster tissues, including the gills and digestive glands, four isolation methods were used. Genomic DNA extraction was performed using the Allmag™ Blood Genomic...

  10. Repeat-aware modeling and correction of short read errors.

    PubMed

    Yang, Xiao; Aluru, Srinivas; Dorman, Karin S

    2011-02-15

    High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors for genomes with high repeat content.

  11. Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

    PubMed Central

    Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

    2014-01-01

    Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool, with a user-friendly JAVA graphical interface. PMID:24291226

  12. COMPLEXO: identifying the missing heritability of breast cancer via next generation collaboration.

    PubMed

    Southey, Melissa C; Park, Daniel J; Nguyen-Dumont, Tu; Campbell, Ian; Thompson, Ella; Trainer, Alison H; Chenevix-Trench, Georgia; Simard, Jacques; Dumont, Martine; Soucy, Penny; Thomassen, Mads; Jønson, Lars; Pedersen, Inge S; Hansen, Thomas Vo; Nevanlinna, Heli; Khan, Sofia; Sinilnikova, Olga; Mazoyer, Sylvie; Lesueur, Fabienne; Damiola, Francesca; Schmutzler, Rita; Meindl, Alfons; Hahnen, Eric; Dufault, Michael R; Chris Chan, Tl; Kwong, Ava; Barkardóttir, Rosa; Radice, Paolo; Peterlongo, Paolo; Devilee, Peter; Hilbers, Florentine; Benitez, Javier; Kvist, Anders; Törngren, Therese; Easton, Douglas; Hunter, David; Lindstrom, Sara; Kraft, Peter; Zheng, Wei; Gao, Yu-Tang; Long, Jirong; Ramus, Susan; Feng, Bing-Jian; Weitzel, Jeffrey N; Nathanson, Katherine; Offit, Kenneth; Joseph, Vijai; Robson, Mark; Schrader, Kasmintan; Wang, San; Kim, Yeong C; Lynch, Henry; Snyder, Carrie; Tavtigian, Sean; Neuhausen, Susan; Couch, Fergus J; Goldgar, David E

    2013-06-21

    Linkage analysis, positional cloning, candidate gene mutation scanning and genome-wide association study approaches have all contributed significantly to our understanding of the underlying genetic architecture of breast cancer. Taken together, these approaches have identified genetic variation that explains approximately 30% of the overall familial risk of breast cancer, implying that more, and likely rarer, genetic susceptibility alleles remain to be discovered.

  13. Toward the automated generation of genome-scale metabolic networks in the SEED.

    PubMed

    DeJongh, Matthew; Formsma, Kevin; Boillot, Paul; Gould, John; Rycenga, Matthew; Best, Aaron

    2007-04-26

    Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis. Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.

  14. The coupling hypothesis: why genome scans may fail to map local adaptation genes.

    PubMed

    Bierne, Nicolas; Welch, John; Loire, Etienne; Bonhomme, François; David, Patrice

    2011-05-01

    Genomic scans of multiple populations often reveal marker loci with greatly increased differentiation between populations. Often this differentiation coincides in space with contrasts in ecological factors, forming a genetic-environment association (GEA). GEAs imply a role for local adaptation, and so it is tempting to conclude that the strongly differentiated markers are themselves under ecologically based divergent selection, or are closely linked to loci under such selection. Here, we highlight an alternative and neglected explanation: intrinsic (i.e. environment-independent) pre- or post-zygotic genetic incompatibilities rather than local adaptation can be responsible for increased differentiation. Intrinsic genetic incompatibilities create endogenous barriers to gene flow, also known as tension zones, whose location can shift over time. However, tension zones have a tendency to become trapped by, and therefore to coincide with, exogenous barriers due to ecological selection. This coupling of endogenous and exogenous barriers can occur easily in spatially subdivided populations, even if the loci involved are unlinked. The result is that local adaptation explains where genetic breaks are positioned, but not necessarily their existence, which can be best explained by endogenous incompatibilities. More precisely, we show that (i) the coupling of endogenous and exogenous barriers can easily occur even when ecological selection is weak; (ii) when environmental heterogeneity is fine-grained, GEAs can emerge at incompatibility loci, but only locally, in places where habitats and gene pools are sufficiently intermingled to maintain linkage disequilibria between genetic incompatibilities, local-adaptation genes and neutral loci. Furthermore, the association between the locally adapted and intrinsically incompatible alleles (i.e. the sign of linkage disequilibrium between endogenous and exogenous loci) is arbitrary and can form in either direction. Reviewing results from the literature, we find that many predictions of our model are supported, including endogenous genetic barriers that coincide with environmental boundaries, local GEA in mosaic hybrid zones, and inverted or modified GEAs at distant locations. We argue that endogenous genetic barriers are often more likely than local adaptation to explain the majority of Fst-outlying loci observed in genome scan approaches - even when these are correlated to environmental variables. © 2011 Blackwell Publishing Ltd.

  15. Optimized gene editing technology for Drosophila melanogaster using germ line-specific Cas9.

    PubMed

    Ren, Xingjie; Sun, Jin; Housden, Benjamin E; Hu, Yanhui; Roesel, Charles; Lin, Shuailiang; Liu, Lu-Ping; Yang, Zhihao; Mao, Decai; Sun, Lingzhu; Wu, Qujie; Ji, Jun-Yuan; Xi, Jianzhong; Mohr, Stephanie E; Xu, Jiang; Perrimon, Norbert; Ni, Jian-Quan

    2013-11-19

    The ability to engineer genomes in a specific, systematic, and cost-effective way is critical for functional genomic studies. Recent advances using the CRISPR-associated single-guide RNA system (Cas9/sgRNA) illustrate the potential of this simple system for genome engineering in a number of organisms. Here we report an effective and inexpensive method for genome DNA editing in Drosophila melanogaster whereby plasmid DNAs encoding short sgRNAs under the control of the U6b promoter are injected into transgenic flies in which Cas9 is specifically expressed in the germ line via the nanos promoter. We evaluate the off-targets associated with the method and establish a Web-based resource, along with a searchable, genome-wide database of predicted sgRNAs appropriate for genome engineering in flies. Finally, we discuss the advantages of our method in comparison with other recently published approaches.

  16. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGES

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  17. Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

    PubMed Central

    Farreras, Montse

    2014-01-01

    Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675

  18. Recovering complete and draft population genomes from metagenome datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  19. Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies

    PubMed Central

    Zhang, Yu; Liu, Jun S.

    2011-01-01

    Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online. PMID:22140288

  20. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells.

    PubMed

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. © 2018 Han et al.; Published by Cold Spring Harbor Laboratory Press.

Top